Understanding Overview of the TREC 2023 NeuCLIR Track

The TREC 2023 NeuCLIR track aimed to study the impact of neural approaches in cross-language information retrieval (CLIR). It included four document collections: large Chinese, Persian, and Russian newswire collections, and a smaller Chinese scientific abstracts collection. The track featured five tasks: ranked CLIR for news in three languages using English topics, multilingual information retrieval (MLIR), and a new technical document CLIR pilot task. Six teams submitted 220 runs, with track coordinators also submitting baseline runs. The CLIR task involved retrieving news documents in Chinese, Persian, or Russian using English topics. The MLIR task required generating a single ranked list for a given topic that included documents in all three languages. The technical document CLIR pilot task aimed to search Chinese dissertation abstracts using English topics. The track used the same document collections as in 2022, but new topics were developed for 2023 to optimize their utility for MLIR evaluation. Topics were created by NIST assessors and Track Coordinators, with the latter using a modified process due to limited language skills. Relevance judgments were made based on predefined categories, with some topics being dropped due to an imbalance in relevant documents across languages. Additional resources included machine-translated versions of queries and document collections, translations of the MS MARCO dataset, and multilingual CLIR datasets. The track had five participants for the CLIR task and three for the MLIR task. The technical document pilot task involved retrieving Chinese academic documents using English queries. The task used the Chinese Scientific Literature dataset and involved seven graduate students in various scientific fields. Relevance judgments were made based on whether the document contained central information and how valuable the most important information was. The track will continue in 2024, with plans to expand the technical document task to a full task and introduce a new pilot task for automatic cross-language report generation. The submission deadline will be pushed back to August, and the track will continue to develop new topic sets in the three languages. The track aims to improve the range of tasks for which the collections will be useful and to create a repository for storing topic translations.The TREC 2023 NeuCLIR track aimed to study the impact of neural approaches in cross-language information retrieval (CLIR). It included four document collections: large Chinese, Persian, and Russian newswire collections, and a smaller Chinese scientific abstracts collection. The track featured five tasks: ranked CLIR for news in three languages using English topics, multilingual information retrieval (MLIR), and a new technical document CLIR pilot task. Six teams submitted 220 runs, with track coordinators also submitting baseline runs. The CLIR task involved retrieving news documents in Chinese, Persian, or Russian using English topics. The MLIR task required generating a single ranked list for a given topic that included documents in all three languages. The technical document CLIR pilot task aimed to search Chinese dissertation abstracts using English topics. The track used the same document collections as in 2022, but new topics were developed for 2023 to optimize their utility for MLIR evaluation. Topics were created by NIST assessors and Track Coordinators, with the latter using a modified process due to limited language skills. Relevance judgments were made based on predefined categories, with some topics being dropped due to an imbalance in relevant documents across languages. Additional resources included machine-translated versions of queries and document collections, translations of the MS MARCO dataset, and multilingual CLIR datasets. The track had five participants for the CLIR task and three for the MLIR task. The technical document pilot task involved retrieving Chinese academic documents using English queries. The task used the Chinese Scientific Literature dataset and involved seven graduate students in various scientific fields. Relevance judgments were made based on whether the document contained central information and how valuable the most important information was. The track will continue in 2024, with plans to expand the technical document task to a full task and introduce a new pilot task for automatic cross-language report generation. The submission deadline will be pushed back to August, and the track will continue to develop new topic sets in the three languages. The track aims to improve the range of tasks for which the collections will be useful and to create a repository for storing topic translations.

Overview of the TREC 2023 NeuCLIR Track

2024 | Dawn Lawrie, Sean MacAvaney, James Mayfield, Paul McNamee, Douglas W. Oard, Luca Soldaini, Eugene Yang