Online TPP
CL Research, 2006
Please note that the online version of The Preposition Project (TPP) is
intended for quick reference of the data available in the project. The data is
in the process of active development and is not yet complete. Currently (12/9/06),
semantic relation or role names are available for 558 senses of prepositions
and prepositional phrases through the word off, and for major
prepositions thereafter (on, over, round,
through, to, toward(s), and with). There are
approximately 847 senses of 374 prepositions and prepositional phrases in
total. Online TPP will be updated monthly.
Online TPP is based on data generated in TPP (the link to the project provides
an in-depth discussion of how the data is being developed). TPP combines data
from two major sources: the
Oxford Dictionary of English (Oxford University Press,
2003) and sentence instances in the FrameNet project for individual
prepositions. It includes all the data from ODE that appears in the printed
text (primarily definitions and usage examples), plus a considerable range of
additional data describing the behavior of each preposition sense. The intent
is that such data will be useful in text processing applications that can use
information about preposition behavior in understanding the semantic relations
of textual units (the objects of the prepositions).
The display routines for Online TPP are adapted from various CGI scripts
provided by James McCracken of Oxford University Press. These scripts were
developed as a demonstration project used to display the full contents of ODE.
Online TPP does not employ the full range of capabilities used on Online ODE,
such as full disambiguation of all content words in definitions and the display
of noun and domain hierarchies.
In addition to all the components which appear in the printed text of the
Oxford Dictionary of English, the Online TPP data has a number of
special features which distinguish it from typical dictionary databases. These
features make it particularly suitable for computational applications - both as
an electronic dictionary and as a database for natural language processing
applications.
Some aspects of these features are described below:
Data structure and lexical objects — Fundamental to all the
functionality of Online ODE is the fact that all the data is structured as a
series of discrete lexical objects. These function as small packets of data
which exist independently of each other, each containing all relevant
information about meaning, morphology, lexical function, semantic category,
etc. Crucially, each lexical object corresponds to a sense
rather than to a dictionary entry. Hence every sense may be queried, extracted,
or manipulated as an independent packet of data without any dependencies on the
entry in which it appears. Although this can seem counterintuitive to human
readers used to treating the entry as the basic object in a dictionary, from a
computational point of view it allows a much more detailed and exhaustive
specification of the way the language functions on a sense-by-sense basis.
Lexical and phrasal morphology — Every lexical object in Online
TPP provides a complete and discrete specification of all the lexical forms
relevant to its sense. This includes not only the morphology of the word forms
themselves, but also structured data about their syntactic roles, variant
spellings, British and US spellings, alternative forms, and the correspondence
relationships between them (the source data also includes phonetics). This is
true not only for single-word lexemes but also for multi-word phrases. Online
TPP thus provides a facility for robust and positive lookup of real-world
lexical forms, including permutation of phrases.
The top bar includes a quick search box, which searches for the query
term, either a one-word preposition or a prepositional phrase. The search
operates by pushing the Go button or hitting the return key. The top bar
also includes a link to the main site for The Preposition Project, a link to a
feedback page (at CL Research) for making comments or asking questions, and a
link to this help page.
Note that if a simple search returns no results, Online TPP may indicate
that no match was found with suggestions for revising the search term. Or a set
of alternatives may be displayed in a table with four columns (the matched
lemma, the part of speech of the matched lemma, the definition for the first
matching sense in the entry, and the headword of the entry), in which case
clicking on the matched lemma will take you to the desired entry.
The entry display includes a top bar with «
and » buttons linking to the previous and
next entries in the dictionary. The definitions for an entry appears next in
the left column of a two-column display. For entries that have a preposition
part of speech in ODE, this part of speech is shown. For phrases, no part of
speech is shown. If an entry contains more than one core sense, the core
senses are numbered. Subsenses are displayed in bullets underneath a core
sense.
At the bottom of each entry are controls for modifying the information that is
displayed in the right column of the entry display. This column displays one of
the twelve properties associated with each lexical object (i.e., sense
or subsense). Clicking on a control displays the selected property for all
senses and subsenses. However, it is possible that a given sense or subsense
does not have the particular property, in which case the result of clicking on
a property type will result in an empty right column. The following properties
are provided, with the primary source of the property in parentheses (see note
above for the proportion of entries and senses for which these properties are
available):
- labels mode shows classification or domain labels (relatively few
preposition senses have such domain-specific labels) (ODE)
- word forms mode shows word forms and inflectional morphology (for
simple prepositions, this will be only the preposition itself, while for
phrases, some variant forms may be listed) (ODE)
- semantic relations mode shows the semantic relation or semantic role
name that has been assigned by the lexicographer (TPP)
- complement properties mode shows properties of the object of the
preposition or phrasal preposition (TPP)
- attachment properties mode shows properties of the linguistic entity
to which the preposition phrase (i.e., the preposition plus the complement),
usually a noun, a verb, or an adjective (TPP)
- Quirk syntax mode shows the syntactic position where a prepositional
phrase headed by the preposition or phrasal preposition (noun
postmodifier (1); adverbial adjunct (2a), subjunct (2b),
disjunct (2c), or conjunct (2d); and/or verb (3a) or
adjective (3b) complement, as described in paragraph 9.1, p. 657
of Quirk et al.) (TPP)
- Quirk paragraph mode shows the paragraph(s) in Quirk et al. where an
in-depth discussion of the sense may be found, with an asterisk (*) indicating
that the sense is not discussed (TPP)
- Frame::Element mode shows FrameNet (FN) Frame::FrameElement pairs in
which prepositional phrases headed by the given preposition appear in the set
of instance sentences tagged by the lexicographer (note that instance sets are
available in FN for only about major, single-word prepositions) (FN, TPP)
- other prepositions (short) mode shows other prepositions that have a
highly similar sense, as judged by the lexicographer (TPP)
- other prepositions (long) mode shows other prepositions that have
been found in the FN sentences expressing the same Frame::Element pair (TPP)
- sense relations mode shows the relation of the subsenses to the core
sense, usually either specific (a more narrow sense) or extension
(a broader sense)
- comments mode shows any notes the lexicographer may have made in
analyzing the sense
Ken Litkowski
CL Research
2006