Roberto Navigli and Ken Litkowski
One of the major obstacles to effective WSD is the fine granularity of the adopted computational lexicon. Specifically, WordNet, by large the most commonly used dictionary within the NLP community, encodes sense distinctions which are too subtle even for human annotators (Edmonds and Kilgariff, 1998). Nonetheless, many annotated resources, as well as the vast majority of disambiguation systems, rely on WordNet as a sense inventory: as a result, choosing a different sense inventory would make it hard to retrain supervised systems and would even pose copyright problems.
Following these observations, we will organize a coarse-grained English all-words task for Semeval-2007. We will tag approximately 6,000 words of three running texts (analogous with the previous Senseval all-words tasks) with coarse senses. Coarse senses will be based on a clustering of the WordNet sense inventory obtained via a mapping to the Oxford Dictionary of English (ODE), a long-established dictionary which encodes coarse sense distinctions (the Macquarie Dictionary will also be considered as an option -- contacts are ongoing). We will prepare the coarse-grained sense inventory semi-automatically: starting from an automatic clustering of senses produced by Navigli (2006) with the Structural Semantic Interconnections (SSI) algorithm, we will manually validate the clustering for the words occurring in the text. Two annotators will tag the text with the coarse senses by using a special web interface. A judge will solve disputed cases (but we hope that, given the coarse nature of the task, there will be a very small number of such cases). For each content word we will provide the participants with its lemma and part of speech. As a second stage, we plan to associate fine-grained senses with those words in the test set which have clear-cut distinctions in the WordNet inventory. This second set of annotations would allow for both a coarse and a fine-grained assessment of WSD systems.
For disambiguation purposes, participating systems can exploit the knowledge of coarse distinctions as well as each fine-grained WordNet sense belonging to a sense cluster. Thus, supervised systems can be retrained on the usual data sets (e.g. SemCor) where a sense cluster replaces the fine-grained sense choice. Each system will provide a single coarse (and possibly fine) answer for each content word in the test set. We will provide an example data set beforehand (as in the tradition of the previous Senseval exercises) and a test set.
Evaluation will be performed in terms of the usual precision, recall and F1 scores. We will avoid words with untagged senses, i.e. the "U" cases present in the Senseval-3 all-words test set.
P. Edmonds and A. Kilgariff. Introduction to the special issue on evaluating word sense disambiguation systems, Journal of Natural Language Engineering, 8(4), Cambridge University, 1998.
R. Navigli. Meaningful Clustering of Senses Helps Boost Word Sense Disambiguation Performance. To appear in Proc. of COLING-ACL 2006, Sydney, Australia, July 17-21, 2006.