Robust IR @ SIGIR 2025

The First Workshop on Robust Information Retrieval

July 17th 2025, Padua, Italy

Padova Congress center, Meeting room PETRARCA A

Held in conjunction with the SIGIR 2025

More





About

With the advancement of information retrieval (IR) technologies, robustness is increasingly attracting attention. When deploying technology into practice, we consider not only its average performance under normal conditions but, more importantly, its ability to maintain functionality across a variety of exceptional situations. In recent years, the research on IR robustness covers theory, evaluation, methodology, and application, and all of them show a growing trend. The purpose of this workshop is to systematize the latest results of each research aspect, to foster comprehensive communication within this niche domain while also bridging robust IR research with the broader community, and to promote further future development of robust IR.





Invited speakers

Speaker: Guido Zuccon
Title: IR Robustness in the Era of LLM
  • Abstract
    This talk explores the fragility of modern LLM-based IR systems, focusing on their sensitivity to both the specific instructions (system prompts) used to instruct LLM-based IR systems and the various ways users formulate their information needs. I show that minor variations in prompt wording or user question phrasing can lead to significant fluctuations in system effectiveness. Ultimately, this talk argues for a more rigorous evaluation practice that systematically accounts for robustness against these variations, ensuring that progress in IR is driven by genuine algorithmic intelligence rather than by sensitivity to superficial phrasing and that the systems we develop are robust to how our users interact with them.

  • Bio
    Prof. Guido Zuccon is a Professor of IR at the School of Electrical Engineering and Computer Science at The University of Queensland, Australia, and a Visiting Researcher at Google Research Australia.
    His research focuses on the creation, understanding and development of new information access models and paradigms – the most recent being the investigation of the use of LLMs within information seeking and retrieval systems. He has also contributed to the evaluation of information access systems, especially for tasks involving health data, and the development of sustainable practices for research and deployment of information access systems.


  • Speaker: Omer Ben-Porat
    Title: Strategic Content Creation: From Human-created Content to Generated Content
    • Abstract
      Search engines serve as mechanisms that match users (content consumers) with web pages created by publishers (content creators). Over the past decade, the strategic behavior of content creators in response to ranking algorithms has been both observed and rigorously analyzed. However, the recent rise of Generative AI (GenAI) tools introduces a new wave of challenges and opportunities, calling for fresh theoretical models and practical approaches. In this talk, I will present several recent directions that address this paradigm shift.

    • Bio
      Omer Ben-Porat is an Assistant Professor at the Faculty of Data and Decision Sciences at the Technion, where he leads research at the intersection of machine learning and algorithmic game theory. His current research focuses on incentive-aware recommender systems and strategic behavior in the presence of LLMs, developing both theory and practical tools. Omer earned his Ph.D. from the Technion and completed his postdoctoral studies at the Blavatnik School of Computer Science at Tel-Aviv University. He has received several awards, including the J.P. Morgan AI Ph.D. Fellowship, the Israeli Association for Artificial Intelligence Ph.D. Dissertation Award, and the Rothschild Postdoctoral Fellowship. Omer's research has appeared in leading international venues including NeurIPS, ICML, AAAI, and EC, and he serves the community as an Area Chair for major AI conferences.

Call for papers

We invite submissions related to Robust IR, including (but not limited to):
  • Theory
    • Game theory: Modeling strategic interactions as games, designing mechanisms to mitigate adversarial behaviors, and understanding the implications of these strategies for robust system design.
    • Competitive search: Analyzing how competition influences the ecosystem and exploring mechanisms to promote desired properties in these contexts such as robustness and fairness.
    • Probability ranking principle: Investigating the assumptions of PRP and their adaptation to different adversarial scenarios.

  • Evaluation method
    • Specific evaluation: Robustness evaluation for specific robustness types (e.g., adversarial, OOD robustness), scenarios (e.g., corpus updating, queries with typos), and IR models (e.g., sparse, dense, and generative retrieval models).
    • General evaluation: Using an evaluation metric to comprehensively cover as many robustness scenarios as possible.
    • Diverse evaluation tools: Developing new evaluation forms, including new evaluation functions, LLMs, and human evaluations, for comprehensive robustness comparisons.
    • Benchmarks of robustness: Discussion of existing robustness datasets and proposing new benchmark tasks to address diverse robustness categories and requirements.

  • Method
    • Adversarial attack & defense: Investigating adversarial vulnerabilities in IR models and developing defenses against malicious attacks like data poisoning and adversarial document generation.
    • Zero/few shot IR: Using zero-shot or few-shot learning, transfer learning, and large-scale pre-trained models to improve cross-domain and task generalization.
    • Balancing robustness and effectiveness: Exploring enhancing robustness without compromising effectiveness.
    • Long-term learning: Explores continual learning in IR to enhance stability.
    • Noise Resistance: Making IR systems resistant to noise in queries, documents, or training data, such as typo handling, semantic noise filtering, and processing incomplete or corrupted inputs.
    • Enhancing RAG Robustness: Improving RAG pipeline robustness by reducing error propagation, ensuring consistency between retrieved documents and generated content, and enhancing output reliability under uncertain conditions.

  • Application
    • Robust search engines: Deploying robustness enhancement methods in resource- and condition-constrained search engines.
    • Robust recommendation systems: Robustness for recommendation systems, addressing sparse user data, cold-start issues, adversarial manipulation, and dynamic user preferences.
    • Data-specific scenarios: Robustness in specialized data retrieval, including scientific literature, medical documents, legal data, and long-form or multi-modal documents.
    • Federated and distributed IR systems: Robustness for distributed IR using federated learning involves tackling inconsistent local data, implementing privacy-preserving strategies, and improving communication efficiency across distributed nodes.

  • Society Impact
    • Human behaviors that affect robustness: Understand how user behaviors like query biases, click feedback loops, and manipulated web content affect IR system robustness.
    • Robustness and Ethics: Addressing ethical concerns by ensuring fairness, minimizing algorithmic biases, and maintaining transparency. Ensuring robust models do not disproportionately affect certain user groups or perpetuate societal inequalities.
    • Explainability and truthfulness: Discuss the impact of IR model explainability and information truthfulness on users.

Submission Site: Robust IR @ SIGIR 2025.
Author Kit: Overleaf. LaTeX, Word.

All submissions will be peer reviewed (double-blind) by the program committee and judged by their relevance to the workshop, especially to the main themes identified above, and their potential to generate discussion. All submission must be written in English and formatted according to the latest ACM SIG proceedings template.
We accept submissions that were previously on arXiv or that got rejected from the main SIGIR conference.
The workshop follows a double-blind reviewing process. Please note that at least one of the authors of each accepted paper must register for the workshop and present the paper either remote or on location (strongly preferred).
We invite research contributions, position, demo and opinion papers. Submissions must either be short (at most 4 pages) or full papers (at most 9 pages). References do not count against the page limit. We also allow for an unlimited number of pages for appendices in the same PDF.
We encourage but do not require authors to release any code and/or datasets associated with their paper.


All deadlines are at 11:59 PM UTC-12:00 (“anywhere on Earth”).
Dates and Deadlines
Workshop paper submission 15 May, 2025
Workshop paper notification 31 May, 2025
Workshop paper camera - ready 15 June, 2025
Workshops 17 July, 2025

Schedule

📅 July 17th, 2025



9:00 AM - 9:15 AM Welcome and opening remarks
9:15 AM - 10:00 AM Keynote: Guido Zuccon
10:00 AM - 10:10 AM Oral 1: Sneha Singhania, Neon: News Entity-Interaction Extraction for Enhanced Question Answering
10:10 AM - 10:30 AM Panel discussion: Xu Chen, Yongkang Li, Tommy Mordo, Guoxuan Chen
10:30 AM - 11:00 AM Coffee Break
11:00 AM - 11:40 AM Keynote: Omer Ben-Porat
11:40 AM - 11:50 AM Oral 2: Harshvardhan Pande, GrocerySearch: Utilizing User Preferences to recommend Suitable Packaged Food Products in the Marketplace
11:50 AM - 12:00 PM Invited Talk 1: Yuqi Zhou, Length-Induced Embedding Collapse in Transformer-based Models
12:00 PM - 12:10 PM Oral 3: Alisa Rieger, A Research Vision for Web Search on Emerging Topics
12:10 PM - 12:20 PM Invited Talk 2: Zechun Niu, Distributionally Robust Optimization for Unbiased Learning to Rank
12:20 PM - 12:30 PM Oral 4: Michael Günther, Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models
12:30 PM - 12:40 PM Invited Talk 3: Guoxuan Chen, Pre-training for Unlearning: A Model-agnostic Paradigm for Recommendation Unlearning
12:40 PM Closing Remark

1:00 PM - 2:30 PM Post session

Accepted Papers

GrocerySearch: Utilizing User Preferences to recommend Suitable Packaged Food Products in the Marketplace Harshvardhan Pande, Shrikant Kapse, Shankar Kausley and Beena Rai
A Research Vision for Web Search on Emerging Topics Alisa Rieger, Stefan Dietze and Ran Yu
Overcoming Ambiguity-Induced RAG Hallucination with Reflection: A Case in the Semiconductor Industry Zhiyu An, Xianzhong Ding, Yen-Chun Fu, Cheng-Chung Chu, Yan Li and Wan Du
Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models Michael Günther, Isabelle Mohr, Daniel James Williams, Bo Wang and Han Xiao
Neon: News Entity-Interaction Extraction for Enhanced Question Answering Sneha Singhania, Silviu Cucerzan, Allen Herring and Sujay Kumar Jauhar
Chain-of-Thought Poisoning Attacks against R1-based Retrieval-Augmented Generation Systems Hongru Song, Yuan Liu, Ruqing Zhang, Yixing Fan and Jiafeng Guo

Organizers

Yu-An Liu
ICT, CAS
Haya Nachimovsky
Technion, IIT
Ruqing Zhang
ICT, CAS
Oren Kurland
Technion, IIT
Jiafeng Guo
ICT, CAS
Moshe Tennenholtz
Technion, IIT