TempEval Final Training Data

February 26th, 2007

This document describes the TempEval data, the way they were created, and the validation and scoring scripts that are bundled with the data. If needed, updates to this document will be posted on the TempEval website and on the TempEval Google group and mailing list (see the TempEval website on how to join the mailing list). This document does not replace the task description on the SemEval and TempEval websites, but complements it.

Data Description

The TempEval annotation language is a simplified version of TimeML. The TimeML specifications, annotation guidelines and document type definition (all for TimeML version 1.2.1) are included here for easy reference. For TempEval, we use the following five tags:
<TempEval>
The document root.
<s>
The sentence tag. All sentence tags in the TempEval data are automatically created using the Alembic Natural Language processing tools. A sentence tag can contain TIMEX3 tags and EVENT tags, but no TLINK tags.
<TIMEX3>
Tags the time expressions in the text. It is identical to the TIMEX3 tag in TimeML. See the TimeML specifications and guidelines for further details on this tag and its attributes. Each document has one special TIMEX3 tag, the Document Creation Time, which is interpreted as an interval that spans the whole day.
<EVENT>
Tags the events in the text. The TempEval EVENT merges the information on two TimeML tags: EVENT and MAKEINSTANCE. TimeML used these two tags to refer to two instances of an event in sentences like "He taught on Wednesday and Friday". This complication was not necessary for the TempEval data. Both tags and their attributes are described in the TimeML specifications and guidelines. For TempEval task C, one extra attribute is added: mainevent, with possible values YES and NO.
<TLINK>
A simplified version of the TimeML TLINK tag. The relation types for the TimeML version form a fine-grained set based on James Allen's interval logic (James Allen, "Maintaining Knowledge about Temporal Intervals." Communications of the ACM 26, 11, 832-843, November 1983). For TempEval, we only use three relations as well as three disjunctions over those three: BEFORE, OVERLAP, AFTER, BEFORE-OR-OVERLAP, OVERLAP-OR-AFTER, and VAGUE. Here, OVERLAP refers to two events (or an event and a time interval) that have a non-empty overlap. VAGUE is used for those cases where no particular relation can be established.
The training data contain all TLINKS required by tasks A, B and C. In addition, the training data contain all event and timex information, including, for task C, markers to indicate main events for each sentence. Recall that tasks A and B are constrained to linking events from the event target list. The event target list consists of those events that occur 20 times or more in the corpus. A complete list of stems ordered on frequency is included in the docs directory (only stems occurring more than once are added to the list).

The data directory has two sub directories, one with the data for tasks A and B, with 162 documents, and one with data for task C, with 163 documents. This discrepancy is due to one document where the Document Creation Time was placed in the future, which makes task B rather hard to do. This document was removed from the training set.

Test Data

The test data are distributed separately from the training data. The format of the test data is identical to the format of the training data. But there are two differences in the actual content:
  1. the test data is comprised of a different documents set
  2. all relation types of TLINKs in test documents are set to UNKNOWN

Annotation Procedure

The EVENT and TIMEX3 annotation were taken from TimeBank (http://timeml.org/site/timebank/timebank.html). The annotation procedure for TLINKs includes dual annotation by seven annotators using a web-based annotation interface (see the screen shot page for more details). After this phase, two experienced annotators looked at all occurrences where two annotators differed as to what relation type to select. For task C, there was an extra annotation phase where the main events were selected. Annotation guidelines for main event annotation are included in this distribution.

Validation

Included with the trial data are a Perl validation script and a Document Type Definition for TempEval annotation. All files in the training set have been validatated. To validate TempEval files using the DTD, open a terminal window (Linux/Unix/MacOSX) or a command prompt (Windows) and type the following:
% perl validate.pl ../data/taskAB
% perl validate.pl ../data/taskC
This will write validation errors and warnings to the standard output. All lines with INFO-300 can be ignored, in general, they report on reference counts. On Unix/Linux systems, these lines can be filtered out by using:
% perl validate.pl ../data/taskAB | grep -v INFO-300
% perl validate.pl ../data/taskC | grep -v INFO-300

The script assumes the Perl modules XML::Checker and XML::RegExp, both available at CPAN (http://www.cpan.org).

Evaluation

Also included with the training data is a Perl scoring script. It measures precision and recall using a strict and a relaxed scoring scheme. See the evaluation document in the docs directory for more details.