Task #9: Multilevel Semantic Annotation of Catalan and Spanish
Web page: http://www.lsi.upc.edu/~nlp/semeval/msacs.html
Email address: semeval-msacs@lsi.upc.edu
In this task, we aim at evaluating and comparing automatic systems for
semantic annotation at several levels for the Catalan and Spanish
languages. The three semantic levels considered include: semantic
roles and verb disambiguation, disambiguation of all nouns, and named
entity recognition.
[1, Semantic Role Labeling, SRL] The annotation of semantic roles of
verb predicates will be in PropBank style (Palmer et al. 2005; Taulé
et al. 2005; Taulé et al. 2006), and the task setting similar to that
of 2005 CoNLL shared task (http://www.lsi.upc.edu/~srlconll/). Verb
disambiguation refers to the assignment of the proper role-set tag to
the verb, which is a much coarser grained level than the usual sense
disambiguation. This tag is composed by the thematic structure number
(as indexed in the role set file for the verb predicate) and the
lexico-semantic class, which is used to map the numbered arguments
into semantic roles.
[2, Noun Sense Disambiguation, NSD] The disambiguation of nouns will
have a similar shape to an "all-words" disambiguation task. The sense
repository used for the annotation will consist of the current
versions of the Catalan and Spanish WordNets (see resources below).
[3, Named Entity Recognition, NER] The annotation of named entities
will include recognition and classification of simple entity types
(person, location, organization, etc.) but including embedding of
entities. We will be considering core "strong" entities (e.g.,
[US]_loc) and "weak" entities, which, by definition, include some
strong entities (e.g., The [president of [US]_loc]_per) (Arévalo,
Civit & Martí 2004; Arévalo et al. 2002).
All semantic annotation tasks will be performed on exactly the same
corpora for each language. We present all the annotation levels
together as a complex global task, since we are interested in
approaches which address these problems jointly, possibly taking into
account cross-dependencies among them. However, we will be also
accepting systems approaching the annotation in a pipeline style, or
addressing any of the particular subtasks in any of the languages (3
levels x 2 languages = 6 subtasks).
Organizers
Lluís Màrquez (TALP, Universitat Politècnica de Catalunya)
Maria Antònia Martí (CLiC, Universitat de Barcelona)
Mariona Taulé (CLiC, Universitat de Barcelona)
Luis Villarejo (TALP, Universitat Politècnica de Catalunya)
|