Task 16: Evaluation of Wide Coverage Knowledge Resources

Datasets and Formats

This section will describe our dataset, XML specification, features, etc.

Evaluation

In order to measure the relative quality of the knowledge resources submitted for the task, we will perform an indirect evaluation by using all the resources delivered as Topic Signatures (TS). That is, word vectors with weights which are associated to a particular WordNet synset. A weight of 1 will be given to each word in the TS for those words without weights. This simple representation tries to be as neutral as possible with respect to the evaluation framework.

All knowledge resources submitted for the task will be indirectly evaluated on a common Word Sense Disambiguation task. In particular, we will use the Lexical Sample framework of SemEval-07. That is, using a limited number of words and annotated test examples for their word senses. All performances will be evaluated on the test data using the scoring system provided by the Lexical Sample task organizers.

Furthermore, trying to be as neutral as possible with respect to the semantic resources submitted, we will apply systematically the same disambiguation method to all of them. Recall that our main goal is to establish a fair comparison of the knowledge resources rather than providing the best disambiguation technique for a particular semantic knowledge base.

A simple word overlapping counting (or weighting) will be performed between the Topic Signature and the test example. We will also consider multiword terms. Thus, the occurrence evaluation measure will count the amount of overlapping words and the weight evaluation measure will add up the weights of the overlapping words. The synset having higher overlapping word counts (or weights) will be selected for a particular test example.

The participants, having wide-coverage semantic resources associated to WordNet, will receive a list of word senses. In a very short time, they should submit the appropriate Topic Signatures for the corresponding word senses.

We will also stablish a set of baselines using existing wide coverage semantic resources such as WordNet, MCR, etc.

Download area

This section will contain evaluation software, useful scripts, complementary materials, baseline systems, etc.

System and Results

This section will be completed after the competition.

References

Montse Cuadros and German Rigau. Quality Assessment of Large-scale Knowledge Resources. EMNLP'2006.
Sidney, Australia.


Last updated:  December 12, 2006