With the advancement of information retrieval (IR) technologies, robustness is increasingly attracting attention. When deploying technology into practice, we consider not only its average performance under normal conditions but, more importantly, its ability to maintain functionality across a variety of exceptional situations. In recent years, the research on IR robustness covers theory, evaluation, methodology, and application, and all of them show a growing trend. The purpose of this workshop is to systematize the latest results of each research aspect, to foster comprehensive communication within this niche domain while also bridging robust IR research with the broader community, and to promote further future development of robust IR.
Theory
• Game theory: Modeling strategic interactions as games, designing mechanisms to mitigate adversarial behaviors, and understanding the implications of these strategies for robust system design.
• Competitive search: Analyzing how competition influences the ecosystem and exploring mechanisms to promote desired properties in these contexts such as robustness and fairness.
• Probability ranking principle: Investigating the assumptions of PRP and their adaptation to different adversarial scenarios.
Evaluation method
• Specific evaluation: Robustness evaluation for specific robustness types (e.g., adversarial, OOD robustness), scenarios (e.g., corpus updating, queries with typos), and IR models (e.g., sparse, dense, and generative retrieval models).
• General evaluation: Using an evaluation metric to comprehensively cover as many robustness scenarios as possible.
• Diverse evaluation tools: Developing new evaluation forms, including new evaluation functions, LLMs, and human evaluations, for comprehensive robustness comparisons.
• Benchmarks of robustness: Discussion of existing robustness datasets and proposing new benchmark tasks to address diverse robustness categories and requirements.
Method
• Adversarial attack & defense: Investigating adversarial vulnerabilities in IR models and developing defenses against malicious attacks like data poisoning and adversarial document generation.
• Zero/few shot IR: Using zero-shot or few-shot learning, transfer learning, and large-scale pre-trained models to improve cross-domain and task generalization.
• Balancing robustness and effectiveness: Exploring enhancing robustness without compromising effectiveness.
• Long-term learning: Explores continual learning in IR to enhance stability.
• Noise Resistance: Making IR systems resistant to noise in queries, documents, or training data, such as typo handling, semantic noise filtering, and processing incomplete or corrupted inputs.
• Enhancing RAG Robustness: Improving RAG pipeline robustness by reducing error propagation, ensuring consistency between retrieved documents and generated content, and enhancing output reliability under uncertain conditions.
Application
• Robust search engines: Deploying robustness enhancement methods in resource- and condition-constrained search engines.
• Robust recommendation systems: Robustness for recommendation systems, addressing sparse user data, cold-start issues, adversarial manipulation, and dynamic user preferences.
• Data-specific scenarios: Robustness in specialized data retrieval, including scientific literature, medical documents, legal data, and long-form or multi-modal documents.
• Federated and distributed IR systems: Robustness for distributed IR using federated learning involves tackling inconsistent local data, implementing privacy-preserving strategies, and improving communication efficiency across distributed nodes.
Society Impact
• Human behaviors that affect robustness: Understand how user behaviors like query biases, click feedback loops, and manipulated web content affect IR system robustness.
• Robustness and Ethics: Addressing ethical concerns by ensuring fairness, minimizing algorithmic biases, and maintaining transparency. Ensuring robust models do not disproportionately affect certain user groups or perpetuate societal inequalities.
• Explainability and truthfulness: Discuss the impact of IR model explainability and information truthfulness on users.
Dates and Deadlines | |
---|---|
Workshop paper submission | 15 May, 2025 |
Workshop paper notification | 31 May, 2025 |
Workshop paper camera - ready | 15 June, 2025 |
Workshops | 17 July, 2025 |
9:00 AM - 9:15 AM | Welcome and opening remarks |
---|---|
9:15 AM - 10:00 AM | Keynote: Guido Zuccon |
10:00 AM - 10:10 AM | Oral 1: Sneha Singhania, Neon: News Entity-Interaction Extraction for Enhanced Question Answering |
10:10 AM - 10:30 AM | Panel discussion: Xu Chen, Yongkang Li, Tommy Mordo, Guoxuan Chen |
10:30 AM - 11:00 AM | Coffee Break |
11:00 AM - 11:40 AM | Keynote: Omer Ben-Porat |
11:40 AM - 11:50 AM | Oral 2: Harshvardhan Pande, GrocerySearch: Utilizing User Preferences to recommend Suitable Packaged Food Products in the Marketplace |
11:50 AM - 12:00 PM | Invited Talk 1: Yuqi Zhou, Length-Induced Embedding Collapse in Transformer-based Models |
12:00 PM - 12:10 PM | Oral 3: Alisa Rieger, A Research Vision for Web Search on Emerging Topics |
12:10 PM - 12:20 PM | Invited Talk 2: Zechun Niu, Distributionally Robust Optimization for Unbiased Learning to Rank |
12:20 PM - 12:30 PM | Oral 4: Michael Günther, Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models |
12:30 PM - 12:40 PM | Invited Talk 3: Guoxuan Chen, Pre-training for Unlearning: A Model-agnostic Paradigm for Recommendation Unlearning |
12:40 PM | Closing Remark |
1:00 PM - 2:30 PM | Post session |
---|
GrocerySearch: Utilizing User Preferences to recommend Suitable Packaged Food Products in the Marketplace | Harshvardhan Pande, Shrikant Kapse, Shankar Kausley and Beena Rai |
---|---|
A Research Vision for Web Search on Emerging Topics | Alisa Rieger, Stefan Dietze and Ran Yu |
Overcoming Ambiguity-Induced RAG Hallucination with Reflection: A Case in the Semiconductor Industry | Zhiyu An, Xianzhong Ding, Yen-Chun Fu, Cheng-Chung Chu, Yan Li and Wan Du |
Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models | Michael Günther, Isabelle Mohr, Daniel James Williams, Bo Wang and Han Xiao |
Neon: News Entity-Interaction Extraction for Enhanced Question Answering | Sneha Singhania, Silviu Cucerzan, Allen Herring and Sujay Kumar Jauhar |
Chain-of-Thought Poisoning Attacks against R1-based Retrieval-Augmented Generation Systems | Hongru Song, Yuan Liu, Ruqing Zhang, Yixing Fan and Jiafeng Guo |
![]() ICT, CAS |
![]() Technion, IIT |
![]() ICT, CAS |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() Technion, IIT |
![]() ICT, CAS |
![]() Technion, IIT |