Ad Hoc Retrieval Task and Result Submission Specification

Abstract

What's New in 2010?   The INEX 2010 Adhoc track continues with the same Wikipedia collection, and similar topics/judgments, but variants of the tasks that address the impact of result length/reading effort -- think of focused retrieval as a form of "snippet" retrieval. In addition, we combined Ad hoc and Efficiency. We envision four tasks: 1) Relevant in Context, but evaluated with an effort-based measure. 2) Restricted Relevant in Context, with max. 500 chars per article. 3) Restricted Focused, with max. 1,000 chars per topic. 4) Efficiency, a Thorough run with 15, 150, or 1,500 results per topic.

Retrieval Tasks

The retrieval task to be performed by the participating groups of INEX 2010 is defined as the adhoc retrieval of XML elements or passages. In information retrieval (IR) literature, adhoc retrieval is described as a simulation of how a library might be used, and it involves the searching of a static set of documents using a new set of topics. While the principle is the same, the difference for INEX is that the library consists of XML documents, the queries may contain both content and structural conditions and, in response to a query, arbitrary XML elements or passages may be retrieved from the library.

The general aim of an IR system is to find relevant information for a given topic of request. In the case of XML retrieval there is, for each article containing relevant information, a choice from a whole hierarchy of different elements or passages to return. Hence, within XML retrieval, we regard as relevant results those results that both For example, if an XML element contains another element but they have the same amount of relevant text, the shorter element is strictly more specific and a preferred result. The same holds for different passages covering the same amount of relevant text.

At INEX 2010, we study resource restricted conditions such as a small screen mobile device or a document summary on a hit-list, retrieval full articles is no option, and we need to find the best elements/passages that convey the relevant information in the Wikipedia page(s). So you can view the retrieved elements/passages as extensive result snippets, or an on-the-fly document summary, that allow searchers to directly jump to the relevant document parts.

This leads to the definition of new variants of the four types of adhoc XML retrieval tasks used until INEX 2009:

1) Relevant in Context Task

Motivation for the Task

The scenario underlying the Relevant in Context Task is to return the relevant information (captured by a set of elements or passages) within the context of the full article. The task makes a number of assumptions: Display results will be grouped per article, in their original document order, providing access through further navigational means; Users consider the article as the most natural unit, and prefer an overview of relevance in their context.

Evaluation

Task is as before, but viewed as a form of snippet retrieval. We use as main measure the proposal from the Univ. Tampere that takes the reading length into account -- probably the T2I(300) measure from http://dx.doi.org/10.1007/s10791-010-9133-9 which strongly penalizes the retrieval of non-relevant text. The new measure takes the suggested reading order within each article into account, but almost all earlier RiC submissions contained this ranking. In addition, we will report the earlier measures, as well as the induced Best in Context measures (interpreting the first retrieved character per article as the suggested entry point).

Results to Return

The aim of the Relevant in Context Task is to first identify relevant articles (the fetching phase), and then to identify the relevant results within the fetched articles (the browsing phase). The /article[1] element itself need not be returned, but is implied by any result from a given article. Since the content of an element is fully contained in its parent element and ascendants, the set may not contain overlapping elements. Also passage results may not be overlapping.

Summarizing: The Relevant in Context Task requires a ranked list of articles, and for each article a ranked list results covering the relevant material in the article. Overlap is not permitted.

2) Restricted Relevant in Context Task

Motivation for the Task

This is a variant of 1) where we allow for only 500 characters per article to be retrieved. This will simulate the small screen or hit-list summaries conditions directly.

Evaluation

We'll use the same measure as (1) but here we expect that the new measure will agree with the old MAgP ranking -- hence task 2 with MAgP will be a good prediction for the performance on both task 1 and 2!

Results to return

The Restricted Relevant in Context Task requires a ranked list of articles, and for each article a ranked list results covering the relevant material in the article. Per article maximally 500 characters may be retrieved. Overlapping results are not permitted.

3) Restricted Focused Task

Motivation for the Task

The are interested in giving a quick overview of the relevant information in the whole Wikipedia. This is a variant of the Focused Task where we restrict the results to max. 1,000 characters per topic.

The scenario underlying the Focused Task is to return to the user a ranked-list of element or passages for her topic of request. The task makes a number of assumptions: Display the results are presented as a ranked-list of results to the user; Users view the result list top-down, one-by-one; they do not want overlapping results in the result-list, and if equally relevant prefer shorter results over longer ones.

Evaluation

Evaluation will be in terms of set-based precision over the retrieved characters.

Results to return

The Restricted Focused Task requires results (elements or passages) ranked in relevance order up to a maximal length of 1,000 characters per topic. Overlap is not permitted in the submitted run.

4) Efficiency Task

Motivation for the Task

Here the focus is on efficiency rather than effectiveness, and especially the trade-off between efficiency and effectiveness.

The task is the unrestricted retrieval of elements or passages -- do a Thorough run estimating the relevance of document components -- restricted to maximally 15, 150, or 1,500 elements or passages per topic.

Evaluation

Here, we will look at the usual precision (iP) and overall (MAiP) scores in relation to the run time efficiency.

Results to return

The Efficiency Task requires a ranked list of elements or passages, where either each topic contains maximally 15 results (top 15 submission), or 150 results (top 15 submission), or 1,500 results (top 1,500 submission).

Note that we will like to have a number of statistics: hardware details, indexing time, query run time (also per-topic run-times), etc., in order to relate the effectiveness to the efficiency performance.

Wrap Up

In 2010, we will extend the INEX test-collection on the 2009 Wikipedia collection: task 1 allows you to submit also Best in Context runs (1 result per article), task 4 allows you to submit Thorough (potentially overlapping) and Focused (non-overlapping) runs. We will provide the 2009 INEX measures, including derived article retrieval scores. However, the focus in 2010 is considerably changed against the retrieval of non-relevant text, making strategies like article retrieval less interesting. Of course, finding the right articles remains a prerequisite for performing well on any of the tasks!

Passage Retrieval, Structured Queries, and Phrases

Within the Adhoc retrieval tasks, we invite participants to experiment with three sets of different retrieval approaches:

Elements or arbitrary passages

There is no separate passage retrieval task, but for all three tasks arbitrary passages may be returned instead of elements. Although both types of retrieval may be used for each task, the best performing element and passage runs will also be reported. The type of results, XML elements or arbitrary passages, is recorded in submission format.

Structured Queries

At INEX 2010 there is no separate CAS task, but all topics have both a keyword CO query and a structured CAS query. Note that, for topics lacking a CAS query, the CO query has been rephrased as a CAS query of the form //*[about(., CO query] using the tag wildcard * that matches any element. As noted above, for all the tasks, we want to find out if, when and how the structural constraints in the query have an impact on retrieval effectiveness. Although both types of queries may be used for each task, mixing runs with both query types, also the best performing CAS query runs (restricted to topics containing a non-default CAS query), and the best CO query runs will be reported. The use of CO/CAS query fields is recorded in submission format.

Phrase Queries

At INEX 2010 there is no separate Phrase task, but most topics have a phrase query. Note that, for topics lacking a phrase query formulated by the topic author, the CO query can be reused. As noted above, for all the tasks, we want to find out if, when and how the verbose queries have an impact on retrieval effectiveness. Although all types of queries may be used for each task, mixing runs with both query types, also the best performing Phrase query runs (restricted to topics containing a non-default Phrase query) will be reported. The use of CO/CAS/Phrase query fields is recorded in submission format.

Reference Article Ranking

One way of viewing focused retrieval is as the combination of i) locating relevant articles (as in standard document retrieval), and ii) locating the relevant information inside those articles (as focused retrieval proper). Each submission also imposes an article-level ranking, when regarding the results from articles on a first-come-first-served basis. The quality of the imposed article-level ranking varies considerably, and makes it difficult to compare accross runs. Hence, in 2010, we will distribute a reference run containing a article-level ranking made with the effective BM25 model. We invite participants to submit a run using the article-level ranking of this reference run. Specifically, we suggest reordering the original submission such that results from a higher ranked article in the reference run appear first.

Result Submission

Fact sheet:

INEX 2010 Topics

There is only one set of topics to be used for all adhoc retrieval tasks at INEX 2010. The format of the topics is defined in the following DTD:
<!ELEMENT inex-topic-file (topic+)>
<!ELEMENT topic (title,castitle,phrasetitle,description,narrative)>
<!ATTLIST topic id CDATA #REQUIRED>
<!ATTLIST topic ct_no CDATA #REQUIRED>
<!ELEMENT title (#PCDATA)>
<!ELEMENT castitle (#PCDATA)>
<!ELEMENT phrasetitle (#PCDATA)>
<!ELEMENT description (#PCDATA)>
<!ELEMENT narrative (#PCDATA)>

The submission format will record the precise topic fields that are used in a run. Participants are allowed to use all fields, but only runs using either the <title>, <castitle>, or <phrasetitle> fields, or a combination of these, will be regarded as truly automatic, since the additional fields will not be available in operational settings.

The <title> part of the topics should be used as queries for the CO submissions. The <castitle> part of the topics should be used as queries for the CAS submissions. The <phrasetitle> part of the topics should be used as queries for the Phrase submissions. In the number of runs allowed to be submitted, runs using more fields than the <title> (or <castitle>) will still be regarded as an CO (or CAS) submission.

Since the comparative analysis of CO and CAS queries is a main research question, we encourage participants to submit runs using only the <title> field (CO query) or only the <castitle> field (CAS query). Similarly, the comparative analysis of CO and Phrase queries is a research question, and we encourage participants to submit runs using only the <title> field (CO query) or only the <phrasetitle> field (Phrase query). We do not outlaw the use of the other topic fields, to allow participants to conduct their own experiments involving them, and since such deviating runs may in fact improve the quality of the assessment pool.

Runs

For each of the four tasks, we allow up to 2 XML element submissions, and up to 2 passage submissions. In addition, for each task except for the efficiency task, we allow for 1 extra submission using the article-level ranking of the distributed reference run. The results of one run must be contained in one submission file (i.e. up to 19 files can be submitted in total). A submission may contain up to 1,500 retrieval results for each of the INEX topics (except for the top 15 and top 150 runs for the Efficiency Task).

There are however a number of additional task-specific requirements.

For the (Restricted) Focused Task, it is not allowed to retrieve elements or passages that contain text already retrieved by another result. For example, within the same article, the element /article[1]/sec[1] is disjoint from /article[1]/sec[2], but overlapping with all ancestors (e.g., /article[1]) and all descendants (e.g., /article[1]/sec[1]/p[1]).

For the (Restricted) Relevant in Context Task, articles may not be interleaved. That is, if a result from article a is retrieved, and then a result from a different article b, then it is not allowed to retrieve further results from article a. Additionally, it is not allowed to retrieve results than contain text already retrieved by another result (similar to the Focused Task). Note also that for this task the /article[1] result is implied by any result from the article, and need not be returned.

Submission format

For relevance assessments and the evaluation of the results we require submission files to be in the format described in this section. The submission format for all tasks is a variant of the familiar TREC format. The submission system will have a form requesting information about the runs: For each topic a maximum of 1,500 results may be included per task. The standard TREC format having 6 fields is extended with two additional fields:
<qid> Q0 <file> <rank> <rsv> <run_id> <column_7> <column_8>
Here: The remaining two columns depend on the submission type (XML elements, FOL, or ranges of elements):

(1) Elements: Element paths are given in XPath syntax. To be more precise, only fully specified paths are allowed, as described by the following grammar:
Path  ::=  '/' ElementNode Path | '/' ElementNode | '/' AttributeNode
ElementNode  ::=  ElementName Index
AttributeNode  ::=  '@' AttributeName
Index  ::=  '[' integer ']'
Example:
/article[1]/bdy[1]/sec[1]/p[1]
This path identifies the element which can be found if we start at the document root, select the first "article" element, then within that, select the first "body" element, within which we select the first "section" element, and finally within that element we select the first "p" element. Important: XPath counts elements starting with 1 and takes into account the element type, e.g. if a section had a title and two paragraphs then their paths would be given as: title[1], p[1] and p[2].

A result element may then be identified unambiguously using the combination of its file name (or <id>) in column 3 and element path in column 7. Column 8 will not be used. Example:
1 Q0 9996 1 0.9999 I09UniXRun1 /article[1]/bdy[1]/sec[1]
1 Q0 9996 2 0.9998 I09UniXRun1 /article[1]/bdy[1]/sec[2]
1 Q0 9996 3 0.9997 I09UniXRun1 /article[1]/bdy[1]/sec[3]/p[1]
Here the results are from 9996 and select the first section, the second section, and the first paragraph of the third section.

(2) Passages: Passage results can be given in File-Offset-Length (FOL) format, where offset and length are calculated in characters with respect to the textual content (ignoring all tags) of the XML file. A special text-only version of the collection is provided to facilitate the use of passage retrieval systems. File offsets start counting a 0 (zero).

A result element may then be identified unambiguously using the combination of its file name (or <id>) in column 3 and an offset in column 7 and a length in column 8. Example: The following example is effectively equivalent to the example element result above:
1 Q0 9996 1 0.9999 I09UniXRun1 465 3426
1 Q0 9996 2 0.9998 I09UniXRun1 3892 960
1 Q0 9996 3 0.9997 I09UniXRun1 4865 496
The results are from article 9996, and the first section starts at the 465th character (so 464 characters beyond the first character), and has a length of 3,426 characters.

(3) Ranges of Elements: The XML passage notation of INEX 2007 is partly admissible, to ensure backward compatibility and to allow for ranges of elements. In the INEX 2007 notation, passage paths are given in the same XPath syntax as elements, but allow for an optional character-offset. We only allow elemental paths (ending in an element, not a text-node in the DOM tree) plus an optional offset
PassagePath  ::=  Path | Path .' Offset
Offset  ::=  integer
A result element may then be identified unambiguously using the combination of its file name (or <id>) in column 3, its start at the element path in column 7, and its end at the element path in column 8. Example:
1 Q0 9996 1 0.9999 I09UniXRun1 /article[1]/bdy[1]/sec[1] /article[1]/bdy[1]/sec[1]
Here the result is again the first section from 9996. Note that the seventh column will refer to the beginning of an element (or its first content), and the eighth column will refer to the ending of an element (or its last content). Note that this format is very convenient for specifying ranges of elements, e.g., the first three sections:
1 Q0 9996 1 0.9999 I09UniXRun1 /article[1]/bdy[1]/sec[1] /article[1]/bdy[1]/sec[3]

Result Submission Procedure

The online submission tool will be available soon.