Organizers
Eneko Agirre (e.agirre@ehu.es)
Bernardo Magnini (magnini@itc.it)
German Rigau (g.rigau@si.ehu.es)
Piek Vossen (Piek.Vossen@irion.nl)
The community has long mentioned the necessity of evaluating WSD in an application, in order to check which WSD strategy is best, and more important, to try to show that WSD can make a difference in applications. The use of WSD in Machine Translation has been the subject of some recent papers, but less attention has been paid to Information Retrieval (IR).
With this proposal we want to make a first try to define a task where WSD is evaluated with respect to an Information Retrieval and Cross-Lingual Information Retrieval (CLIR) exercise. From the WSD perspective, this task will evaluate all-words WSD systems indirectly on a real task. From the CLIR perspective, this task will evaluate which WSD systems and strategies work best.
We are conscious that the number of possible configurations for such an exercise is very large (including sense inventory choice, using word sense induction instead of disambiguation, query expansion, WSD strategies, IR strategies, etc.), so we wanted to focus this first edition in the following way:
1) the IR/CLIR system is fixed 2) the expansion / translation strategy is fixed 3) the participants need to choose the best WSD strategy 4) the IR system is used as the upperbound for the CLIR systems.
We think that it is important to start doing this kind of application-driven evaluations, which might shed light to intricacies of the interaction among WSD and IR strategies. We see this as the first of a series of exercises, and one outcome of this task should be that both WSD and CLIR communities discuss together future evaluation possibilities.
In fact, CLEF-2008 will have a special track with the
same data, where CLIR systems will have the opportunity of using the
annotated data produced as a result of the Semeval-2007 task.
The results of the task will be presented both in the Semeval workshop
(ACL 2007) and the CLEF conference.
Description of the task
This is an application-driven task, where the application is a fixed
cross-lingual information retrieval system. Participants disambiguate
text by assigning WordNet synsets and the system will do the expansion
to other languages, index the expanded documents and run the retrieval
for all the languages in batch. The retrieval results will be taken as
a measure for fitness of the disambiguation. The modules and rules for
the expansion and the retrieval will be exactly the same for all
participants.
We propose two specific tasks:
1. participants disambiguate the corpus, the corpus is expanded to synonyms and translations and we measure the effects on cross-lingual retrieval. Queries are not processed.
2. participants disambiguate the queries per language, we expand the queries to synonyms and translations and we measure the effects on cross-lingual retrieval. Documents are not processed
The corpora and queries will be obtained from the ad-hoc CLEF tasks. The scores can be compared among the Semeval participants but also with the past CLEF participants.
Supported languages for queries: English, Spanish
Supported languages in documents: English
The English CLEF data from years 2000-2003 covers 250 topics, 169.477 documents (579 MB). The relevance judgments will be taken from CLEF. This has the disadvantage of having been produced by pooling the results of CLEF participants, and might bias the results towards systems not using WSD, specially for monolingual English retrieval. A post-hoc analysis of the participants results will analyze the effects of this.
Expansion and retrieval
The retrieval engine will be an adaptation of the TwentyOne search system was developed during the 90's by the TNO
research institute at Delft (The Netherlands). It is now further
developed by Irion technologies as a cross-lingual retrieval system
(Hiemstra & Kraaig, 1999; Vossen et al. 2006). It can be stripped down
to a basic vector space retrieval system.
For expansion and translation we will use the publicly available Multilingual Central Repository (MCR, Atserias et al. 2004) from the MEANING project. The MCR follows the EuroWordNet design, and currently includes English, Spanish, Italian, Basque and Catalan wordnets tightly connected through the Interlingual Index (based on WordNet 1.6, but linked to all other WordNet versions).
The following work will be carried out by the organizers:
Instructions for participation
Please note the steps in order to participate:
by participating in Semeval-2007 participants give the grant for future CLEF-2008 participants to use their automatically annotated data for research purposes.
the results returned by participants need to conform to the dtd's as provided by the organizers. Otherwise we cannot guarantee to be able to score those results. Software to validate the results is provided in the task website.
given the high overload of expanding and scoring the systems, the dealine for uploading results is tighter than other Semeval tasks (19 of March).
given the amount of text to be tagged, participants have 2 weeks to submit results starting from test data download time.
General design
The participants will be provided with (see Full
Description for more details):
1. the document collections (.nam+id files)
2. the topics (.nam+id files)
The participants need to return a single compressed file with the input
files enriched with WordNet
1.6 sense tags:
1. for all the documents in the collection (.wsd
files)
2. for all the topics (.wsd files)
All files are in XML. Input documents and topics (.nam+id files) will
follow the "docs.nam+id.dtd" and "topics.nam+id.dtd" dtds
respectively. Output documents and topics (.wsd files) will follow the
"docs.wsd.dtd" and "topics.wsd.dtd" dtds respectively.
The result files will be organized into a rich directory structure
which follows the directory structure of the input.
See the trial data release for further details and examples of files.
Note that
all
senses returned by participants will be expanded, regardless of their
weight. The current CLIR system does not use the weight information.
Additional data
We will also provide some of the widely used WSD features in a
word-to-word fashion (Agirre et al. 2006) in order to make
participation easier. These features will be available for both topics
and documents (test data) as well as all the words with frequency above
10 in SemCor 1.6 (which can be taken as the training data for
supervised WSD systems). They will be posten in the task webpage soon.
Evaluation
The organizers will run an internal IR and CLIR system based on (Vossen
et al. 2006) on each of the participants results as follows:
1. expand the returned sense tags to all synonyms in the target
languages (Spanish and English) for both the documents and
queries
2. index both original and expanded documents
3. perform the queries in two fashions:
a. original queries on expanded document collection
b. expanded queries on original document collection
4. compare the returned documents with relevance judgements
The participant systems will be scored according to standard IR/CLIR measures as implemented in the TREC evaluation package.
Schedule
Jan. 3 Trial data available
Jan. 15 (aprox.) End user agreement (EUA) available in website
Feb. 26 Test data available fr download (signed EUA required)
Mar. 26 Deadline for uploading results
June SemEval Workshop
September CLEF conference
References
Agirre E., O. Lopez de Lacalle Lekuona , D. Martinez. "Exploring
feature set combinations for WSD". In procceedings of the annual
meeting of the SEPLN, Spain. 2006
Jordi Atserias, Luis Villarejo, German Rigau, Eneko Agirre, John
Carroll, Bernardo Magnini, Piek Vossen. "The MEANING
Multilingual Central Repository". In Proceedings of the Second
International WordNet Conference-GWC 2004, pg. 23-30, January 2004,
Brno, Czech Republic. ISBN 80-210-3302-9
Djoerd Hiemstra and Wessel Kraaij. Twenty-One at TREC-7: Ad Hoc and Cross Language track. In Ellen M. Voorhees and Donna K. Harman, editors, The Seventh Text Retrieval Conference (TREC-7), volume 7, 1999. National Institute of Standards and Technology, NIST. Note: NIST Special Publication 500-242. 1999.
Vossen, Piek, German Rigau, Iñaki Alegria, Eneko Agirre, David Farwell, Manuel Fuentes. "Meaningful results for Information Retrieval in the MEANING project". In Proceedings of the 3rd Global Wordnet Conference, Jeju Island, Korea, South Jeju, January 22-26, 2006.