Task #9: Multilevel Semantic Annotation of Catalan and Spanish

Task Organizers
Lluís Màrquez (TALP, Universitat Politècnica de Catalunya)
Maria Antònia Martí (CLiC, Universitat de Barcelona)
Mariona Taulé (CLiC, Universitat de Barcelona)
Luis Villarejo (TALP, Universitat Politècnica de Catalunya)

Web page: http://www.lsi.upc.edu/~nlp/semeval/msacs.html
Email: semeval-msacs@lsi.upc.edu

Task description
In this task, we aim at evaluating and comparing automatic systems for semantic annotation at several levels for the Catalan and Spanish languages. The three semantic levels considered include: semantic roles and verb disambiguation, disambiguation of all nouns, and named entity recognition.

[1, Semantic Role Labeling, SRL] The annotation of semantic roles of verb predicates will be in PropBank style (Palmer et al. 2005; Taulé et al. 2005; Taulé et al. 2006), and the task setting similar to that of 2005 CoNLL shared task (http://www.lsi.upc.edu/~srlconll/). Verb disambiguation refers to the assignment of the proper role-set tag to the verb, which is a much coarser grained level than the usual sense disambiguation. This tag is composed by the thematic structure number (as indexed in the role set file for the verb predicate) and the lexico-semantic class, which is used to map the numbered arguments into semantic roles.

[2, Noun Sense Disambiguation, NSD] The disambiguation of nouns will have a similar shape to an "all-words" disambiguation task. The sense repository used for the annotation will consist of the current versions of the Catalan and Spanish WordNets (see resources below).

[3, Named Entity Recognition, NER] The annotation of named entities will include recognition and classification of simple entity types (person, location, organization, etc.) but including embedding of entities. We will be considering core "strong" entities (e.g., [US]_loc) and "weak" entities, which, by definition, include some strong entities (e.g., The [president of [US]_loc]_per) (Arévalo, Civit & Martí 2004; Arévalo et al. 2002).

All semantic annotation tasks will be performed on exactly the same corpora for each language. We present all the annotation levels together as a complex global task, since we are interested in approaches which address these problems jointly, possibly taking into account cross-dependencies among them. However, we will be also accepting systems approaching the annotation in a pipeline style, or addressing any of the particular subtasks in any of the languages (3 levels x 2 languages = 6 subtasks). See the evaluation section for details.

More particularly, the input for training will consists of a medium-size set of sentences (100-200Kwords per language) with gold-standard full syntactic annotation (including function tags) and the semantic annotations of SRL, NSD, and NER, which is the target knowledge to be learned. The full parse trees are provided only to ease the learning process, but participants are not committed to use them if they do not want. The test corpus will be about 10 times smaller than the training corpus and will include the full syntactic annotation without the semantic levels, which have to be predicted. In order to put the evaluation task under a realistic scenario, parse trees for testing material will be automatically generated by state-of-the art parsers, while for training both the gold standard (hand-corrected) and the automatic parse trees will be provided.

Formats will be formally described later on, but will be highly similar to those of the CoNLL-2005 shared task (column style presentation of levels of annotation). in order to be able to share evaluation tools and already developed scripts for format conversion.

As previously said, we will use standard evaluation metrics for each of the defined subtasks (SRL, NSD, NER), presumably based on precision/recall/F1 measures, since they are basically recognition tasks. Classification accuracy will be also calculated for verb disambiguation and NSD. Special metrics relaxing the need for perfect matching of arguments/entities will be also studied for the NER and SRL subtasks.

All systems will be ranked and studied according to the official evaluation metrics in each of the six subtasks (SRL-cat, NSD-cat, NER-cat, SRL-sp, NSD-sp, NER-sp). Additionally, global measures will be derived as a combination of all partial evaluations to rank systems' performance per language and for the complete global task (language independent).

The organization will prepare a simple baseline processor for each of the subtasks. Participant teams not presenting results in any of the subtasks will be evaluated using the baseline processors in those tasks in order to get global performance scores.

The evaluation on the test set will be carried out by the organizers based on the outputs submitted by participant systems.

The participants will have available the official evaluation software from the moment in which the training datasets are released.

Resources provided to the participants
With the aim of easing the participation of teams with few resources/tools/experience on Spanish and Catalan languages, we will provide as many resources/tools as possible to participants. By now, we have in mind the following:

* The full syntactic annotation level of training and test files, which can be very useful for feature extraction.

* Updated Catalan and Spanish WordNets, which are linked to English WordNet 1.6 for all noun synsets, some of them enriched with glosses, examples, collocations, etc.

* Roleset descriptions for all verbs in the training/test corpora

* General scripts for format conversion, which are very useful to convert CoNLL-style files into more suited representations for automatic processing.

Development of resources
All the resources will be provided by the organizers. They are free for research usage, thus no special requirements will be needed by participants to get and use them (signing a simple license agreement for all the distributed materials will suffice). All these resources and tools are being developed in a joint effort by several NLP research groups and partially funded by the Spanish government under several projects: 3LB (FIT-150500-2002-244) responsible in 2003-2004 for the syntactic annotation of 100Kw Catalan and Spanish corpora together with noun/verb sense annotation; CESS-ECE (HUM-2004-21127-E) which is currently extending the 3LB corpora to 500Kw including a first annotation of semantic roles; and a probable follow-up project, PRAXEM, which will provide extra resources to complete the SRL annotation and to include the labeling of named entities.

By the time of the SemEval-2007 exercise we can guarantee a portion of the corpus completely annotated with semantic information in the [100Kw-200Kw] interval.

Contact address
Please direct all your questions regarding the SemEval-2007 task on Multilevel Semantic Annotation of Catalan and Spanish to the following email address: semeval-msacs@lsi.upc.edu

Task URL: http://www.lsi.upc.edu/~nlp/semeval/msacs.html

Arévalo, M., M. Civit and M.A. Martí (2004) MICE: a Module for Named-Entities Recognition and Classification, in International Journal of Corpus Lingüistics, vol. 9 num. 1. John Benjamins, Amsterdam.

Arévalo, M., X. Carreras, L. Màrquez, M.A. Martí, L. Padró, M.J. Simón (2002) A proposal for Wide-Coverage Spanish Named Entity Recognition, in Procesamiento del Lenguaje Natural, revista 28. SEPLN, Alicante.

Palmer, M., P. Kingsbury, D. Gildea (2005) The Proposition Bank: An Annotated Corpus of Semantic Roles, Computational Linguistics, 21 (1), MIT Press, USA.

Taulé, M., J. Aparicio, J. Castellví, M.A. Martí (2005) 'Mapping syntactic functions into semantic roles', Proceedings of the Fourth Workshop on Treebanks and Linguistic Theories (TLT 2005). Barcelona: Universitat de Barcelona.

Taulé, M., J. Castellví, M.A. Martí, J. Aparicio (2006) 'Fundamentos teóricos y metodológicos para el etiquetado semántico de CESS-CAT y CESS-ESP', Procesamiento del Lenguaje Natural, SEPLN, Zaragoza.