This section will describe our dataset, XML specification, features, etc.
In order to measure the relative quality of the
knowledge resources
submitted for the task, we will perform an indirect evaluation by using
all the resources delivered as Topic
Signatures (TS). That is, word vectors with weights which are
associated to a particular WordNet synset. A weight of 1 will be given
to each word in the TS for those words without weights. This simple
representation tries to be as neutral as possible with respect to the
evaluation framework.
All knowledge resources submitted for the task will be indirectly
evaluated on a common Word Sense Disambiguation task. In particular, we
will use the Lexical Sample framework of SemEval-07.
That is, using a limited number of words and annotated test examples
for their word senses. All performances will be evaluated on the test
data using the scoring system provided by the Lexical Sample task
organizers.
Furthermore, trying to be as neutral as possible with respect to the
semantic resources submitted, we will apply systematically the same
disambiguation method to all of them. Recall that our main goal is to
establish a fair comparison of the knowledge resources rather than
providing the best disambiguation technique for a particular semantic
knowledge base.
A simple word overlapping counting (or weighting) will be performed
between the Topic Signature and the test example. We will also consider
multiword terms. Thus, the occurrence evaluation measure will count the
amount of overlapping words and the weight evaluation measure will add
up the weights of the overlapping words. The synset having higher
overlapping word counts (or weights) will be selected for a particular
test example.
The participants, having wide-coverage semantic resources associated to
WordNet, will receive a list of word senses. In a very short time, they
should submit the appropriate Topic Signatures for the corresponding
word senses.
We will also stablish a set of baselines using existing wide coverage
semantic resources such as WordNet, MCR, etc.
This section will contain evaluation software, useful scripts, complementary materials, baseline systems, etc.
This section will be completed after the competition.
Montse Cuadros and German Rigau. Quality Assessment of Large-scale Knowledge Resources. EMNLP'2006.
Sidney, Australia.