This is an application-driven task, where the application is a fixed cross-lingual information retrieval system. Participants disambiguate text by assigning WordNet synsets and the system will do the expansion to other languages, index the expanded documents and run the retrieval for all the languages in batch. The retrieval results will be taken as a measure for fitness of the disambiguation. The modules and rules for the expansion and the retrieval will be exactly the same for all participants.
We propose two specific tasks:
1. participants disambiguate the corpus, the corpus is expanded to synonyms and translations and we measure the effects on cross-lingual retrieval. Queries are not processed.
2. participants disambiguate the queries per language, we expand the queries to synonyms and translations and we measure the effects on cross-lingual retrieval. Documents are not processed.
- Supported languages for queries: English and Spanish
- Supported languages in documents: English
- Document collection: CLEF ad-hoc English document collection 2000-2003 (169.477 documents, 579 MB).
- Queries: CLEF Queries in supported languages linked to best documents according to the pooling strategy
- Application: cross-lingual retrieval system TwentyOne, developed during the 90's by the TNO research institute at Delft (The Netherlands), it is now further developed by Irion technologies.
- Evaluation: standard IR/CLIR measures will be used. The relevance judgements will be taken from CLEF.
- Sense repository and expansions: we will use the publicly available Multilingual Central Repository (MCR) from the MEANING project. The MCR follows the EuroWordNet design, and currently includes English, Spanish, Italian, Basque and Catalan wordnets tightly connected through the Interlingual Index (based on WordNet 1.6, but linked to all other WordNet versions).
Eneko Agirre (email@example.com)
Bernardo Magnini (firstname.lastname@example.org)
German Rigau (email@example.com)
Piek Vossen (Piek.Vossen@irion.nl)