There is growing interest in the task of classifying semantic relations between pairs of words. However, many different classification schemes have been used, which makes it difficult to compare the various classification algorithms. We will create a benchmark dataset and evaluation task that will enable researchers to compare their algorithms. To constrain the scope of the task, we have chosen a specific application for semantic relation classification, relational search. The application we envision is a kind of search engine that can answer queries such as list all X such that X causes asthma. Given this application, we have decided to focus on semantic relations between nominals (i.e., nouns and base noun phrases, excluding named entities). The dataset for the task will consist of annotated sentences. We will select a sample of relation classes from several different classification schemes and then gather sentences from the Web using a search engine. We will manually markup the sentences, indicating the nominals and their relations. Algorithms will be evaluated by their average classification performance over all of the sampled relations, but we will also be able to see whether some relations are more difficult to classify than others, and whether some algorithms are best suited for certain types of relations.