Task #1: Evaluating WSD on Cross-Language Information Retrieval
A joint SemEval-2007 / CLEF (Cross-Language Evaluation Forum) task
This is an application-driven task, where the application is a fixed
cross-lingual information retrieval system. Participants disambiguate
text by assigning WordNet synsets and the system will do the expansion
to other languages, index the expanded documents and run the retrieval
for all the languages in batch. The retrieval results will be taken as
a measure for fitness of the disambiguation. The modules and rules for
the expansion and the retrieval will be exactly the same for all
participants.
We propose two specific tasks:
1. participants disambiguate the corpus, the corpus is expanded to
synonyms and translations and we measure the effects on cross-lingual
retrieval. Queries are not processed.
2. participants disambiguate the queries per language, we expand the
queries to synonyms and translations and we measure the effects on
cross-lingual retrieval. Documents are not processed.
- Supported languages for queries: English and Spanish
- Supported languages in documents: English
- Document collection: CLEF ad-hoc English document collection
2000-2003 (169.477 documents, 579 MB).
- Queries: CLEF Queries in supported languages linked to best
documents according to the pooling strategy
- Application: cross-lingual retrieval system TwentyOne, developed
during the 90's by the TNO research institute at Delft (The
Netherlands), it is now further developed by Irion technologies.
- Evaluation: standard IR/CLIR measures will be used. The relevance
judgements will be taken from CLEF.
- Sense repository and expansions: we will use the publicly available
Multilingual Central Repository (MCR) from the MEANING project. The
MCR follows the EuroWordNet design, and currently includes English,
Spanish, Italian, Basque and Catalan wordnets tightly connected
through the Interlingual Index (based on WordNet 1.6, but linked to
all other WordNet versions).
Organizers
Eneko Agirre (e.agirre@ehu.es)
Bernardo Magnini (magnini@itc.it)
German Rigau (rigau@lsi.upc.edu)
Piek Vossen (Piek.Vossen@irion.nl)
|