Task #9: Multilevel Semantic Annotation of Catalan and Spanish

Web page: http://www.lsi.upc.edu/~nlp/semeval/msacs.html

Email address: semeval-msacs@lsi.upc.edu

In this task, we aim at evaluating and comparing automatic systems for semantic annotation at several levels for the Catalan and Spanish languages. The three semantic levels considered include: semantic roles and verb disambiguation, disambiguation of all nouns, and named entity recognition.

[1, Semantic Role Labeling, SRL] The annotation of semantic roles of verb predicates will be in PropBank style (Palmer et al. 2005; Taulé et al. 2005; Taulé et al. 2006), and the task setting similar to that of 2005 CoNLL shared task (http://www.lsi.upc.edu/~srlconll/). Verb disambiguation refers to the assignment of the proper role-set tag to the verb, which is a much coarser grained level than the usual sense disambiguation. This tag is composed by the thematic structure number (as indexed in the role set file for the verb predicate) and the lexico-semantic class, which is used to map the numbered arguments into semantic roles.

[2, Noun Sense Disambiguation, NSD] The disambiguation of nouns will have a similar shape to an "all-words" disambiguation task. The sense repository used for the annotation will consist of the current versions of the Catalan and Spanish WordNets (see resources below).

[3, Named Entity Recognition, NER] The annotation of named entities will include recognition and classification of simple entity types (person, location, organization, etc.) but including embedding of entities. We will be considering core "strong" entities (e.g., [US]_loc) and "weak" entities, which, by definition, include some strong entities (e.g., The [president of [US]_loc]_per) (Arévalo, Civit & Martí 2004; Arévalo et al. 2002).

All semantic annotation tasks will be performed on exactly the same corpora for each language. We present all the annotation levels together as a complex global task, since we are interested in approaches which address these problems jointly, possibly taking into account cross-dependencies among them. However, we will be also accepting systems approaching the annotation in a pipeline style, or addressing any of the particular subtasks in any of the languages (3 levels x 2 languages = 6 subtasks).

Lluís Màrquez (TALP, Universitat Politècnica de Catalunya)
Maria Antònia Martí (CLiC, Universitat de Barcelona)
Mariona Taulé (CLiC, Universitat de Barcelona)
Luis Villarejo (TALP, Universitat Politècnica de Catalunya)