Task #5: Multilingual Chinese-English Lexical Sample Task
The goal of this task is to create a framework for the evaluation of
word sense disambiguation in Chinese-English machine translation
systems. We will provide 40 Chinese polysemous words: 20 nouns and 20
verbs, and each sense of one word will be provided at least 15
instances, in which around 2/3 is used as the training data and 1/3
test data. The "sense tags" for the ambiguous Chinese target words are
given in the form of their English translations. The translator comes
from the Chinese Semantic Dictionary (CSD) developed by the Institute
of Computational Linguistics, Peking University (ICL/PKU). The texts
will be extracted from the corpus of People's Daily News, which have
been word segmented and POS-tagged. The semantically ambiguous target
words will be manually sense tagged with their English equivalents.
The training sense tagged data will be distributed to all
participants. The test data will be used to evaluate the participating
systems, where the target ambiguous words are explicitly marked, and
the participants are required to assign one unique translator to each
instance. And an answer key file will be provided as a separate one.
Coordinators
Peng Jin (jandp@pku.edu.cn)
Yunfang Wu (wuyf@pku.edu.cn)
Shiwen Yu (yusw@pku.edu.cn)
|