Task #7: Coarse-grained English all-words (Coarse AW)

Roberto Navigli and Ken Litkowski

One of the major obstacles to effective WSD is the fine granularity of the adopted computational lexicon. Specifically, WordNet, by large the most commonly used dictionary within the NLP community, encodes sense distinctions which are too subtle even for human annotators (Edmonds and Kilgariff, 1998). Nonetheless, many annotated resources, as well as the vast majority of disambiguation systems, rely on WordNet as a sense inventory: as a result, choosing a different sense inventory would make it hard to retrain supervised systems and would even pose copyright problems. To deal with these issues, we propose a coarse-grained English all-words task for Semeval-2007.

Description of the task
We will tag approximately 6,000 words of three running texts (analogous with the previous Senseval all-words tasks) with coarse senses. Coarse senses will be based on a clustering of the WordNet sense inventory obtained via a mapping to the Oxford Dictionary of English (ODE), a long-established dictionary which encodes coarse sense distinctions (the Macquarie Dictionary will also be considered as an option -- contacts are ongoing). We will prepare the coarse-grained sense inventory semi-automatically: starting from an automatic clustering of senses produced by Navigli (2006) with the Structural Semantic Interconnections (SSI) algorithm, we will manually validate the clustering for the words occurring in the text. Two annotators will tag the text with the coarse senses by using a special web interface. A judge will solve disputed cases (though we hope that, given the coarse nature of the task, there will be a very small number of such cases). For each content word we will provide the participants with its lemma and part of speech. As a second stage, we plan to associate fine-grained senses with those words in the test set which have clear-cut distinctions in the WordNet inventory. This second set of annotations would allow for both a coarse and a fine-grained assessment of WSD systems.

For disambiguation purposes, participating systems can exploit the knowledge of coarse distinctions as well as each fine-grained WordNet sense belonging to a sense cluster. Thus, supervised systems can be retrained on the usual data sets (e.g. SemCor) where a sense cluster replaces the fine-grained sense choice. Each system will provide a single coarse (and possibly fine) answer for each content word in the test set. We will provide an example data set beforehand (as in the tradition of the previous Senseval exercises) and a test set.

Evaluation will be performed in terms of standard precision, recall and F1 scores. We will avoid words with untagged senses, i.e. the "U" cases present in the Senseval-3 all-words test set.

Resources required to prepare the task
The following steps will be carried out by the task organizers:

1. Obtaining the necessary resources to be provided to the participants (copyright, etc.)

2. Computation and annotation necessary to prepare the task resources (estimated at 40-50 hours, but it is a difficult estimate)

3. Validating the mapping WordNet -> ODE for the content words (estimated at 40-50 hours)

4. Processing the text to be disambiguated (POS tagging, parsing, etc.)

5. Annotating the text with coarse (and possibly fine) senses (estimated at 40 hours), using professional lexicographers

Provisional schedule
- Competition: March 2007
- SemEval-2007 Workshop: June 2007

Pending issues
- Data format
- Availability of the original sense inventory from the Oxford Dictionary of English (this would allow the participants to work on both WordNet and the ODE sense distinctions)

P. Edmonds and A. Kilgariff. Introduction to the special issue on evaluating word sense disambiguation systems, Journal of Natural Language Engineering, 8(4), Cambridge University, 1998.

R. Navigli. Meaningful Clustering of Senses Helps Boost Word Sense Disambiguation Performance. Proc. of COLING-ACL 2006, Sydney, Australia, July 17-21, 2006.