Robust-IR @ SIGIR 2025:
The First Workshop on Robust Information Retrieval

1ICT, CAS, China, 2Technion, Israel

Thursday, July 17, 9:00 AM - 5:00 PM

About this tutorial

With the advancement of information retrieval (IR) technologies, robustness is increasingly attracting attention. When deploying technology into practice, we consider not only its average performance under normal conditions but, more importantly, its ability to maintain functionality across a variety of exceptional situations. In recent years, the research on IR robustness covers theory, evaluation, methodology, and application, and all of them show a growing trend.

The purpose of this workshop is to systematize the latest results of each research aspect, to foster comprehensive communication within this niche domain while also bridging robust IR research with the broader community, and to promote further future development of robust IR. To avoid the one-sided talk of mini-conferences, this workshop adopts a highly interactive format, including round-table and panel discussion sessions, to encourage active participation and meaningful exchange among attendees.

Schedule

Time Section Presenter
13:30 - 13:50 Section 1: Introduction Maarten de Rijke
13:50 - 14:10 Section 2: Preliminaries Yu-An Liu
14:10 - 15:00 Section 3: Adversarial robustness Yu-An Liu
15:00 - 15:30 30min coffee break
15:30 - 16:20 Section 4: Out-of-distribution robustness Yu-An Liu
16:20 - 16:30 Section 5: Robust IR in the age of LLMs Yu-An Liu
16:30 - 16:50 Section 6: Challenges and future directions Maarten de Rijke
16:50 - 17:00 Q & A All

Benchmark


Perspective Papers


Reading List

A curated list of papers related to robustness in IR can be found at Awesome Robustness in Information Retrieval.

The tutorial extensively covers papers highlighted in bold.


Section 3: Adversarial robustness

3.1 Adversarial attacks

3.1.0 Classification of adversarial attack tasks

Adversarial retrieval attack


Adversarial ranking attack


Topic-oriented adversarial retrieval/ranking attack


3.1.1 Steal knowledge from black-box models

Surrogate model training


3.1.2 Identify vulnerable positions in documents

Pre-defined position


Output-guided position


Gradient-guided position


3.1.3 Add Perturbation to identified positions
3.1.3.1 Perturbation type

Word substitution


Trigger sentence


Multi-granular


Encoding error


Grammatical error


3.1.3.2 Perturb strategy

Static: greedy search


Dynamic: reinforcement learning


3.2 Adversarial defenses

3.2.1 Empirical defense

Data augmentation


Traditional adversarial training


Theory-guided adversarial training


3.2.2 Certified defense

Certified robustness


3.2.3 Attack detection

Perplexity-based detection


Language-based detection


Learning-based detection


Section 4: Out-of-distribution robustness

4.1 OOD generalizability on unforeseen documents

4.1.1 Adaptation to new corpus

Data augmentation


Domain modeling


Architectural modifications


Scaling up the model capacity


4.1.2 Updates to a corpus

Continual learning for dense retrieval


Continual learning for generative retrieval


4.2 OOD generalizability on unforeseen queries

4.2.1 Query variation

Self-teaching


Contrastive learning


Hybrid training


4.2.2 Unseen query type

BibTeX

@inproceedings{liu2024robust,
author = {Liu, Yu-An and Zhang, Ruqing and Guo, Jiafeng and de Rijke, Maarten},
title = {Robust Information Retrieval},
year = {2024},
booktitle = {SIGIR},
}