This section will describe our dataset, XML specification, features, etc.
In order to measure the relative quality of the
submitted for the task, we will perform an indirect evaluation by using
all the resources delivered as Topic
Signatures (TS). That is, word vectors with weights which are
associated to a particular WordNet synset. A weight of 1 will be given
to each word in the TS for those words without weights. This simple
representation tries to be as neutral as possible with respect to the
All knowledge resources submitted for the task will be indirectly evaluated on a common Word Sense Disambiguation task. In particular, we will use the Lexical Sample framework of SemEval-07. That is, using a limited number of words and annotated test examples for their word senses. All performances will be evaluated on the test data using the scoring system provided by the Lexical Sample task organizers.
Furthermore, trying to be as neutral as possible with respect to the semantic resources submitted, we will apply systematically the same disambiguation method to all of them. Recall that our main goal is to establish a fair comparison of the knowledge resources rather than providing the best disambiguation technique for a particular semantic knowledge base.
A simple word overlapping counting (or weighting) will be performed between the Topic Signature and the test example. We will also consider multiword terms. Thus, the occurrence evaluation measure will count the amount of overlapping words and the weight evaluation measure will add up the weights of the overlapping words. The synset having higher overlapping word counts (or weights) will be selected for a particular test example.
The participants, having wide-coverage semantic resources associated to WordNet, will receive a list of word senses. In a very short time, they should submit the appropriate Topic Signatures for the corresponding word senses.
We will also stablish a set of baselines using existing wide coverage semantic resources such as WordNet, MCR, etc.
This section will contain evaluation software, useful scripts, complementary materials, baseline systems, etc.
This section will be completed after the competition.
Montse Cuadros and German Rigau. Quality Assessment of Large-scale Knowledge Resources. EMNLP'2006.