SemEval-2007 Home
 News
 Schedule
 senseval.org


 Call for Tasks
 Call for Participation


 Task Descriptions
 Download Data


 Paper Submissions
 Program Committee
 Program
 Registration


 Organizers
 Administration

  Organizers

Organizers

Sameer Pradhan, Martha Palmer, and Edward Loper

 

Subtask 1: Coarse-grained English Lexical Sample WSD

This task consists of lexical sample style training and testing data for 35 nouns and 65 verbs in the WSJ Penn Treebank II as well as the Brown corpus. This data will include, for each target item: OntoNotes sense tags (these are groupings of WordNet senses that are more coarse-grained than traditional WN entries, and which have achieved on average 90% ITA), as well as the sense inventory for these lemmas.

  • Track 1: Closed track -- participants can use only the training data supplied, plus Charniak parses provided with the data, along with any other features that can be extracted from WordNet 2.1 as well as any unsupervised techniques. 
  • Track 2: Open track -- participants can use any additional data, including the entire training portions (sections 02-21) of Penn Treebank and PropBank (if they are LDC members), or tools trained on these data, other potential knowledge sources for WSD as well as any unsupervised techniques.

This data has been made available to Eneko Agirre and Aitor Soroa for use in the Word Sense Induction task, and also to German Rigau and Montse Cuadros for their evaluation of lexical resources. As described above, the OntoNotes senses have links to WN senses.

Subtask 2: Coarse-grained English Lexical Sample SRL

For the same lemmas (but not necessarily exactly the same training and testing instances), we will also supply:

  • PropBank annotation
  • VerbNet class membership and VerbNet thematic role labels for the same targets.
  • Charniak parses.

This will support a second subtask for SRL, in both PropBank style and VerbNet style. We propose that the SRL subtask have two evaluation tracks.

  • Track 1: Closed track -- participants can use only the training data supplied, plus downloadable VerbNet as well as any unsupervised techniques.
  • Track 2: Open track -- participants can use any additional data, including the entire training portions (sections 02-21) of Treebank and PropBank (if they are LDC members) or tools trained on those data.

Subtask 3: English fine-grained All-Words

We have supplied a 5000 word chunk of WSJ where all of the verbs and the head words of the verb arguments have WordNet 2.1 sense tags. This is for testing purposes only, and has no training annotation associated with it, or PropBank or VerbNet labels. Participants can of course use Semcor and the previous Senseval data as training data if they choose to. We have coordinated with Roberto Navigli and Ken Litkowski so that their Coarse-grained All-Words task annotates the same data. Since there is no training data for this task there is no Closed track.

  • Track: Open track -- participants can use any additional data, including the entire Treebank and PropBank (if they are LDC members).

The data tagged with OntoNotes sense tags was prepared under DARPA GALE funding at the University of Colorado (verbs) and ISI (nouns). The VerbNet labels were attached to the PropBank data at the University of Colorado using a semi-automatic process that involved a hand correction step. This was funded by AQUAINT, and is described in a NAACL-07 paper, (Yi, Loper, Palmer).

The thematic role labels will be evaluated using Precision and Recall against the Gold Standard test data the same way PropBank was evaluated at CoNLL. The sense tags will be evaluated using Precision and Recall the same way Senseval English sense tags have been evaluated.

LDC agreed to let Semeval distribute the WSJ raw text for the data.

 

Deadline:

We have extended the deadline for submitting the results to the midnight of April 4th.

 

 

 

Please e-mail questions to .
Website hosted by the Department of Computer Science at Swarthmore College.