Task #6: Word-Sense Disambiguation of Prepositions

Ken Litkowski

One of the current major research topics in computational linguistics is semantic role labeling. This topic has been the subject of a previous Senseval task and a Conference on Natural Language Learning (CoNLL) task. A special issue of Computational Linguistics on semantic role labeling has also been announced. Research into the behavior of prepostions has also been a topic of considerable research, with two recent ACL workshops and also a planned special issue of Computational Linguistics on prepositions.

To a large extent, prepositions have received little attention in previous research, relegated to minor roles with little variation in treatment in such resources as the Penn Treebank. Similarly, even within the lexicographic community (and dictionaries), prepositions are rarely accorded the full treatment of corpus analysis given to other parts of speech, particularly verbs. To the extent that they have been included in computational treatment, they have been closely tied to verbs, as indicators of internal arguments. Notwithstanding their view as "mere" function words, prepositions have a range of polysemy comparable to other parts of speech. Fortunately, the number of such prepositions is relatively small, as a generally closed class. Prepositions are the bearers of much semantic information, so the development of techniques for their disambiguation would be of great benefit to the computational community, particularly if they could be dealt with in a concentrated effort by the community.

The publicly available Preposition Project has been developed to provide a comprehensive treatment of preposition behavior. As part of the project, Oxford University Press has made its preposition sense inventory publicly available. This sense inventory has been used in tagging large numbers of preposition instances from FrameNet, with a professional lexicographer performing this task. As a result, large numbers of instances are available for more than 50 prepositions, ranging from 100 to over 4000 for the preposition "of". In addition, these tagged instances have been prepared in the format used in previous Sensevals, so they are immediately available for use in the task.

The task will be carried out in the same manner as previous Senseval lexical sample tasks, following the same methodology for evaluation (including the use of the same evaluation scripts, with sense tagging available for both fine-grained and coarse-grained disambiguation). All the necessary resources are already available to potential participants.

Task Details (to be finalized)
1. All prepositions currently tagged on the Preposition Project (the 34 most common English prepositions as of August 1, 2006) will be included in the task. It is expected that this task will provide a definitive characterization of the closed class of prepositions that will then be publicly available to all members of the computational linguistics community.

2. Participants can use other data that is available from the Preposition Project. Each instance includes an identifying number from the FrameNet project, so all information from the FrameNet tagging is available. This includes a syntactic characterization of the sentence elements and FrameNet frames and frame elements.

Individuals interested in participating in the task are encouraged to discuss their concerns with the task organizer, Ken Litkowski.