Task 15: TempEval Temporal Relation Identification

Organized by

Short Task Description

We specify three separate tasks that involve identifying event-time and event-event temporal relations. A restricted set of temporal relations will be used, which includes only the relations: BEFORE, AFTER, and OVERLAP (defined to encompass all cases where event intervals have non-empty overlap).

TASK A: For a restricted set of event terms, identify temporal relations between events and all time expressions appearing in the same sentence.
(NOTE: The restricted set of event terms is to be specified by providing a list of root forms. Time expressions will be annotated in the source, in accordance with TIMEX3.)
TASK B: For a restricted set of event terms, identify temporal relations between events and the Document Creation Time (DCT).
(NOTE: The restricted set of events will be the same as for Task A. DCTs will be explicitly annotated in the source.)
TASK C: Identify the temporal relations betweeen contiguous pairs of matrix verbs.
(NOTE: matrix verbs, i.e. the main verb of the matrix clause in each sentence, will be explicitly annotated in the source.)

Long Task Description

Newspaper texts, narratives and other such texts describe events which occur in time and specify the temporal location and order of these events. Text comprehension, even at the most general level, involves the capability to identify the events described in a text and locate these in time. This capablity is crucial to a wide range of NLP applications, from document summarization and question answering to machine translation. Furthermore, recent work on the annotation of event and temporal relations have resulted in both a de-facto standard for expressing these relations (TimeML) and a hand-built gold standard of annotated texts (TimeBank). These have already been used as the basis for automatic Time and Event annotation tasks in a number of research projects in recent years.

As in many areas of NLP an open evaluation challenge in the area of temporal annotation will serve to drive research forward. The automatic identification of all temporal referring expressions, events and temporal relations within a text is the ultimate aim of research in this area. However, addressing this aim in a first evaluation challenge is likely to be too difficult and a staged approach more effective. Thus we here propose an initial evaluation exercise based on three limited tasks that we believe are realistic both from the perspective of assembling resources for development and testing and from the perspective of developing systems capable of addressing the tasks.

Task Definitions

Given a set of test texts (DataSet1) for which (1) sentence boundaries are annotated, (2) all temporal expressions are annotated in accordance with TIMEX3, (3) the document creation time (DCT) is specially annotated, and (4) a list of root forms of event identifying terms (the Event Target List or ETL) is supplied, complete the following tasks


  1. The restricted set of temporal relations contains: BEFORE, AFTER, and OVERLAP (defined to encompass all cases where event intervals have non-empty overlap). In addition, we allow three disjunctive relations: BEFORE-OR-OVERLAP, OVERLAP-OR-AFTER and VAGUE (for completely underspecified relations).
  2. By "as appropriate" here is meant as indicated in the TimeML standard.
  3. For tasks A and B, in cases where there are multiple time expressions in the sentence, the event should be linked to all TIMEXs where appropriate.
  4. For the ETL we propose to use those terms whose variants in all inflected forms occur as events in TimeBank 20 times or more, which yields a list of around 63 root forms whose variants are included.
  5. Task C is the most ambitious of the three tasks proposed, one which we view as exploratory in nature. Given the challenges it presents we would not expect all participants to attempt it.


Participants will be supplied with a version of TimeBank (183 documents, approx. 2500 sentences) which has had TimeML annotations removed or modified so they contain only the information to be supplied in the test corpus plus the TLINK annotations to be found as part of the task definitions.

The test corpus will consist of a number of articles not currently included within TimeBank, which will be annotated in accordance with the schemes outlined above. For tasks A and B, it is intended that this should include at least 5 occurrences for each item in the ETL. For task C, we propose to annotate around 20-25 news articles (including of the order of 200-250 sentences) drawn from sources similar to those used for TimeBank.

Evaluation Methodology

Tasks A, B and C can all be seen as classification tasks, where a given temporal links is assigned a relation type from the set BEFORE, AFTER, OVERLAP, BEFORE-OR-OVERLAP, OVERLAP-OR-AFTER or VAGUE. Precision and recall over these relation types are used as evaluation metrics.