The goal of this task is to create a framework for the evaluation of word sense disambiguation in Chinese-English machine translation systems. We will provide 40 Chinese polysemous words: 20 nouns and 20 verbs, and each sense of one word will be provided at least 15 instances, in which around 2/3 is used as the training data and 1/3 test data. The "sense tags" for the ambiguous Chinese target words are given in the form of their English translations. The translator comes from the Chinese Semantic Dictionary (CSD) developed by the Institute of Computational Linguistics, Peking University (ICL/PKU). The texts will be extracted from the corpus of People's Daily News, which have been word segmented and POS-tagged. The semantically ambiguous target words will be manually sense tagged with their English equivalents.
The training sense tagged data will be distributed to all participants. The test data will be used to evaluate the participating systems, where the target ambiguous words are explicitly marked, and the participants are required to assign one unique translator to each instance. And an answer key file will be provided as a separate one.
Coordinators
Peng Jin (jandp@pku.edu.cn)
Yunfang Wu (wuyf@pku.edu.cn)
Shiwen Yu (yusw@pku.edu.cn)