Task #2: Evaluating Word Sense Induction and Discrimination Systems
The goal of this task is to allow for comparison across
sense-induction and discrimination systems, and also to compare these
systems to other supervised and knowledge-based systems. With this
goal on mind the following evaluation types are proposed:
1) evaluating the induced senses as clusters of examples. The induced
clusters will be compared to the sets of examples tagged with the
given gold standard word senses (classes), and evaluated using the
standard purity, entropy and f-score measures for clusters.
2) mapping induced senses to gold standard senses, and using the
mapping to tag the test corpus with gold standard tags. The mapping
will be automatically produced by the task organizers, and the
resulting system evaluated according to the usual precision and recall
measures for supervised word sense disambiguation systems.
This double evaluation methodology has already been tried in (Agirre
et al. 2006).
In particular we propose to use the data from English lexical-sample
task in SemEval-2007, with the usual training + test split. The sense
inventory will be that of a coarse-grained WordNet. Please refer to
that task for reference.
Eneko Agirre and Aitor Soroa
University of the Basque country
Agirre E., Lopez de Lacalle Lekuona O., Martinez D., Soroa A. 2006. Two graph-based algorithms for state-of-the-art WSD. Procceedings of EMNLP