Entity-Ranking Track Guidelines

Introduction

Many user tasks would be simplified if search engines would support typed search, and return entities instead of 'just' web pages. Since 2007, INEX has started the XML Entity Ranking track (INEX-XER) to provide a forum where researchers may compare and evaluate techniques for engines that return lists of entities. In entity ranking and entity list completion, the goal is to evaluate how well systems can rank entities in response to a query; the set of entities to be ranked is assumed to be loosely defined by a generic category, implied in the query itself, or by some example entities. We will continue to run both the entity ranking and list completion tasks this year. This year we will adopt the new document collection containing annotations with the general goal to understand how such semantic annotations can be exploited for improving Entity Ranking.

Entity ranking concerns triples of type <query, category, entity>. The category (that is entity type), specifies the type of 'objects' to be retrieved. The query is a free text description that attempts to capture the information need. Entity specifies example instances of the entity type. The usual information retrieval tasks of document and element retrieval can be viewed as special instances of this more general retrieval problem, where the category membership relates to a syntactic (layout) notion of 'text document', or 'XML element'. Expert finding uses the semantic notion of 'people' as its category, where the query would specify 'expertise on T' for expert finding topic T. Our goal is not to evaluate how well systems identify instances of entities within text (to some extent this is part of the goal of the Link-the-Wiki track).

Data

The track uses the new Wikipedia 2009 XML data. Available annotations can be exploited to find relevant entities to return. Category information about the pages loosely defines the entity sets.

The entities in such a set are assumed to loosely correspond to those Wikipedia pages that are labeled with this category (or perhaps a sub-category of the given category). Obviously, this is not perfect as many Wikipedia articles are assigned to categories in an inconsistent fashion. Your retrieval method should handle the situation that the category assignments to Wikipedia pages are not always consistent, and also far from complete. The human assessor will not be constrained by the category assignments made in the corpus when making his or her relevance assessments!

We expect that the data set provides a sufficiently useful collection as a starting point for the purpose of the track. The challenge for participants is to exploit the rich information from text, structure, links and annotations to perform the search tasks.

Tasks

Entity Ranking

This year's entity ranking task consists of two sub-tasks, i.e., entity ranking (without examples), and entity list completion (with examples). Entity list completion is a special case of entity ranking where a few examples of relevant entities are provided as relevance feedback information.

The motivation for the entity ranking task is to return entities that satisfy a topic described in natural language text. Given preferred categories, relevant entities are assumed to loosely correspond to those Wikipedia pages that are labeled with these preferred categories (or perhaps sub-categories of these preferred categories). Retrieval methods need to handle the situation where the category assignments to Wikipedia pages are not always consistent, and also far from complete. For example, given a preferred category 'art museums and galleries', an article about a particular museum such as the 'Van Gogh Museum' (155508) may not be labeled by 'art museums and galleries' but labeled by a sub-category of the preferred category instead, such as category 'art museums and galleries in the Netherlands'. Therefore, when searching for "art museums in Amsterdam", correct answers may belong to other categories close to this category in the Wikipedia category graph, or may not have been categorized at all by the Wikipedia contributors. The category 'art museums and galleries' is only an indication of what is expected, not a strict constraint (like in the CAS title for the ad-hoc track).

An example topic of entity ranking is as the following.
<inex_topic topic_id="9999">
<title>Impressionist art in the Netherlands</title>
<description>
I want a list of art galleries and museums in the Netherlands that have impressionist art.
</description>
<narrative>Each answer should be the article about a specific art gallery or museum that contain impressionist or post-impressionist art works.
</narrative>
<categories>
<category>art museums and galleries</category>
</categories>
</inex_topic>
The category name(s) should be the exact names used in the INEX version of the Wikipedia, but they should be loosely interpreted.

Entity List Completion

List completion (LC) is a sub-task of entity ranking which considers relevance feedback information. Instead of knowing the desired category (entity type), the topic specifies a number of correct entities (instances) together with the free-text context description. Results consist again of a list of entities (Wikipedia pages).

If we provide the system with the topic text and a number of entity examples, the task of list completion refers to the problem of completing the partial list of answers. As an example, when ranking 'Countries' with topic text 'European countries where I can pay with Euros', and entity examples such as 'France', 'Germany', 'Spain', then the 'Netherlands' would be a correct completion, but the 'United Kingdom' would not. An example topic in the entity list completion task is:
<inex_topic topic_id="9999">
<title>European countries where I can pay with Euros</title>
<description>
I want a list of European countries where I can pay with Euros.
</description>
<narrative>
Each answer should be the article about a specific European country that uses the Euro as currency.
</narrative>
<entities>
<entity id=" 5843419">France</entity>
<entity id="11867">Germany</entity>
<entity id="26667">Spain</entity>
</entities>
</inex_topic>

Topics

Based on the topics from the previous years, we have set up a collection of 60 Entity Ranking topics, with 25 from 2007 and 35 topics form 2008. <categories> part is supposed to be used exclusivelyfor the Entity Ranking Task. <entities> part is supposed to be used exclusively for the List Completion Task.

Topic format:
<inex_topic topic_id="9999">
<title>Impressionist art in the Netherlands</title>
<description>
I want a list of art galleries and museums in the Netherlands that have impressionist art.
</description>
<narrative>Each answer should be the article about a specific art gallery or museum that contain impressionist or post-impressionist art works.
</narrative>
<categories>
<category>art museums and galleries</category>
</categories>
<entities>
<entity id="155508">Van Gogh Museum</entity>
<entity id="892971">Kröller-Müller Museum</entity>
</entities>
</inex_topic>

Runs

For the entity ranking run submission, title, description, categories, and example main entities can all be used, and the narrative field should not be used. Each group can submit up to 6 runs to the entity ranking task. However, there are two mandatory runs. The first mandatory run is an automatic Entity Ranking run which can only use the title and the category information. The second mandatory run is an automatic LC run which must use the example main entities, along with the title. Both mandatory runs must not use the topic description. We also encourage participants to submit runs with other combinations of query parts, for example, assuming the scenario where the user doesn't know the category information and she would only search with the title of the topic.

The participants should indicate which parts of the topic have been used in their submitted runs for both the entity ranking tasks as detailed in the Result Submission Specification document.

Evaluation

Participants judge the assigned topics. Because we make the assumption that an entity corresponds to a Wikipedia page, the answer pool corresponds to a list of links into the collection. We assume that the nature of the task is such that it is feasible to assess answer correctness quickly. For the entity ranking task, quite often, the title of the page should be enough for a topic author to judge the answer's relevance, and judging can be quick.

In the 2007 edition a pool depth of 50 was used. In 2008, we used a sampling strategy for pooling, where the pool was created with a stratified sampling approach. This year we will the same pooling strategy as in 2008.
As last year, evaluation measures that support stratified sampling (such as xInfAp) will be used to measure performance of systems on both tasks.
For both tasks, if a run uses the example entities as relevance feedback information, there is no need to return the entity instances given as examples in the topic description.

Training topics

For the entity ranking task, the 2007 and 2008 test collections can be used as training data.

Topics and relevance judgments are available at:

Future tasks

Ranking Entities as Passages

A different approach to entity ranking is to identify entities in the Wikipedia article content. We will explore this approach in future in the track by exploiting the pre-annotated Wikipedia collection. The aim is to find entities in the text content and retrieve passages containing supporting evidence for the entity.

Ranking passages would require retrieving "supporting passages" for the relevant entities to be used for relevance assessment. Moreover, the passages might belong to pages different from the relevant entities' Wikipedia pages (i.e., the evidence for the assessor to judge the relevance might be in a different page).