INEX 2009 Guidelines for Topic Development

New in 2009

In 2009 we use the INEX 2009 Collection consisting of a dump of the English Wikipedia with semantic annotations from YAGO. The markup allows for interesting CO+S topics using the semantic tags of YAGO, and we will devote special attention to the use of YAGO tags as structural hints. We will also collect verbose queries having phrases explicitly marked up.

Aims

The aim of the INEX initiative is to provide the means, in the form of a large test collection and appropriate measures, for the evaluation of content-oriented XML element retrieval. Within the INEX initiative it is the task of the participating organizations to provide the topics and relevance assessments that will contribute to the test collection. Each participating organization, therefore, plays a vital role in this collaborative effort.

Introduction

Test collections, as traditionally used in information retrieval (IR), consist of three parts: a set of documents, a set of information needs called topics, and a set of relevance assessments listing (for each topic) the set of relevant documents.

A test collection for XML retrieval differs from traditional IR test collections in many respects. Although it still consists of the same three parts, the nature of these parts is fundamentally different. In IR test collections, documents are considered units of unstructured text, queries are generally treated as collections of terms and / or phrases, and relevance assessments provide judgments whether a document as a whole is relevant to a query or not. XML documents, on the other hand, organize their content into smaller, nested structural elements. Each of these elements in the document's hierarchy, along with the document itself (the root), is a retrievable unit. In addition, with the use of XML query languages, users of an XML retrieval system can express their information need as a combination of content and structural conditions: they can restrict their search to specific structural elements within the collection. Consequently the relevance assessments for an XML collection must also consider the structural nature of a document and provide assessments at different levels of the document hierarchy.

This guide deals only with topics. Each group participating in INEX will have to submit several CO+S topics. This guide provides detailed guidelines for creating these topics.

Topic Creation Criteria

Creating a set of topics for a test collection requires a balance between competing interests. The performance of retrieval systems varies largely for different topics. This variation is usually greater than the performance variation of different retrieval methods on the same topic. Thus, to judge whether one retrieval strategy is (in general) more effective than another, the retrieval performance must be averaged over a large and diverse set of topics. In addition, to be a useful diagnostic tool, the average performance of the retrieval systems on the topics can be neither too good nor too bad as little can be learned about retrieval strategies if systems retrieve no, or only relevant, documents.

When creating topics, a number of factors should be taken into consideration. Topics should:

• be authored by an expert in (or someone familiar with) the subject areas covered by the collection,
• reflect real needs of operational systems,
• represent the type of service an operational system might provide,
• be diverse,
• differ in their coverage, e.g. broad or narrow topic queries,
• be assessed by the topic author.

Topic Format

In previous years, different topic types have been used for the two main ad hoc retrieval tasks at INEX (i.e., a distinction was made between Content Only (CO) and Content And Structure (CAS) topics). In addition, different parts of the topics were designed also to be used in other tracks (e.g., the topic <description> was tuned to the needs of the Natural language Processing (NLP) track). These topic types have now been merged into one type: Content Only + Structure (CO+S) Topics. Likewise, all the information needed by the different ad hoc tasks and tracks are now expressed in the individual topic parts, and only one topic type is needed. The CO+S topics consist of the following parts, which are explained in detail below:

<title> in which Content Only (CO) queries are given
<castitle> in which Content And Structure (CAS) queries are given
<phrasetitle> in which phrase queries are given
<description> a one or two sentence natural language definition of the information need
<narrative> in which the definitive definition of relevance and irrelevance are given

General considerations

A clear and precise description of the information need is required in order to unambiguously determine whether or not a given element fulfills the given need. In a test collection this description is known as the narrative. It is the only true and accurate interpretation of a user's needs. Precise recording of the narrative is important for scientific repeatability - there must exist, somewhere, a definitive description of what is and is not relevant to the user. To aid this, the <narrative> should explain not only what information is being sought, but also the context and motivation of the information need, i.e., why the information is being sought and what work-task it might help to solve.

Many different queries could be drawn from the <narrative>, and some are better than others. For example, some might contain phrases; some might contain ambiguous words; while some might even contain domain specific terms or structural constraints. Regardless of the query, the search engine results are not necessarily relevant. Even though a result might contain search terms from the query, it might not match the explanation given in the <narrative>. Equally, some relevant documents might not be found, but they remain relevant because they are described as so by the <narrative>.

The different CO+S topic parts relate to different scenarios that lead to different types of queries.

The topic <title> simulates a user who does not know (or does not want to use) the actual structure of the XML documents in a query. The query expressed in the topic <title> is therefore a Content Only (CO) query. This profile is likely to fit most users searching XML digital libraries.

Upon discovering their <title> query returned many irrelevant hits, a user might decide to add structural hints (to rewrite as a CAS query). This is similar to a user adding + and - to a web query when too many irrelevant pages are found. At INEX, these added structural constraints (+S) are specified using the formal syntax called NEXI [1] - and recorded in the topic <castitle>.

Example

Suppose a user wants to find pictures of the Apple II computer. They enter the CO query:

Apple II figure

but discover that most results are figures of products for the Apple II. They decide to add structural hints:

//figure[about(., Apple II)]

restricting the results to figure elements only, known to contain the captions of figures.

The collection has been enriched with semantic tags from YAGO (itself based on WordNet and Wikipedia). The whole article has been encapsulated with tags, such as the <group> tag added to the Queen page:

<article xmlns:xlink="http://www.w3.org/1999/xlink">
<holder confidence="0.9511911446218017" wordnetid="103525454">
<entity confidence="0.9511911446218017" wordnetid="100001740">
<musical_organization confidence="0.8" wordnetid="108246613">
<artist confidence="0.9511911446218017" wordnetid="109812338">
<group confidence="0.8" wordnetid="100031264">
<header>
<title>Queen (band)</title>
<id>42010</id>
...
</header>
<bdy>
...
<songwriter wordnetid="110624540" confidence="0.9173553029164789">
<person wordnetid="100007846" confidence="0.9508927676800064">
<manufacturer wordnetid="110292316" confidence="0.9173553029164789">
<musician wordnetid="110340312" confidence="0.9173553029164789">
<singer wordnetid="110599806" confidence="0.9173553029164789">
<artist wordnetid="109812338" confidence="0.9508927676800064">
<link xlink:type="simple" xlink:href="../068/42068.xml">
Freddie Mercury</link></artist>
</singer>
</musician>
</manufacturer>
</person>
</songwriter>
...
</bdy>
</group>
</artist>
</musical_organization>
</entity>
</holder>
</article>

This allows us to find particular article types easily, e.g., instead of a query requesting articles about Freddie Mercury:

//article[about(., Freddie Mercury)]

we can specifically ask about a group about Freddie Mercury:

//group[about(., Freddie Mercury)]

which will return pages of (pop) groups mentioning Freddy Mercury. In fact, also all internal Wikipedia links have been annotated with the tags assigned to the page they link to, e.g., in the example about the link to Freddie Mercury gets the <singer> tag assigned. We can also use these tags to identify pages where certain types of links occur, and further refine the query as:

//group[about(.//singer, Freddie Mercury)]

The exact NEXI query format used to express the structural hints will be explained below.

Topic parts

Topics are made up of several parts, these parts explain the same information need, but for different purposes. An example of a full topic combining all these is given in the Appendix. Those parts are:

<narrative> A detailed explanation of the information need and the description of what makes an element relevant or not. The <narrative> should explain not only what information is being sought, but also the context and motivation of the information need, i.e., why the information is being sought and what work-task it might help to solve. Assessments will be made on compliance to the narrative alone; it is therefore important that this description is clear and precise.
<title> A short explanation of the information need. It serves as a summary of the content of the user's information need. The exact format of the topic title is discussed in more detail below.
<castitle> A short explanation of the information need, specifying any structural requirements. The exact format of the castitle is discussed in more detail below. The castitle is optional but the majority of topics should include one.
<phrasetitle> A short explanation of the information need given as a series of phrases, just as the <title> is given as a series of keywords. It is described in more detail below below.
<description> A brief description of the information need written in natural language, typically one or two sentences, it is also described below.

Any ambiguity or disagreement is resolved by reference to the <narrative>, the only accurate definition of the information need.

Topic <title>

To ensure topics are syntactically correct, a parser has been implemented in Flex and Bison (the GNU tools compatible with LEX and YACC) and the source code is available for download or online use.

The topic title is a short representation of the information need. Each term is either a word or a phrase. Phrases are encapsulated in double quotes. Furthermore the terms can have either the prefix + or -, where + is used to emphasize an important concept, and - is used to denote an unwanted concept.

Example

A user wants to retrieve information about computer science degrees that are not master degrees:

"computer science" +degree -master

the + and - signs are used as hints to the search engine and do not have strict semantics. As an example the following text might be judged relevant to the information need, even though it contains the word master.

The university offers a program leading to a PhD degree in computer science. Applicants must have a master degree...

Example

A user wants to retrieve information about information retrieval from semi-structured documents:

"information retrieval" +semi-structured documents

As in the previous example the following text might be judged relevant, even though it neither contains the word semi-structured, nor the phrase "information retrieval".

The main goal of INEX is to promote the evaluation of content-oriented XML retrieval by providing a large test collection of XML documents, uniform scoring procedures, and a forum for organizations to compare their results...

Although the semantics of phrases and the + / - tokens is not strict, they may be of use to the search engine.

Topic <castitle>

As structural constraints are not an inherent part of all information needs the <castitle> is optional. However, we aim to have a castitle for most topics. This is needed in order to facilitate the evaluation of structural hints, which is a central concern at INEX.

Only a high level description is included here, for a more formal specification of the topic description language (NEXI) see the paper in the proceedings of INEX 2004 [1] for further details.

To make sure that topics are syntactically correct, parsers have been implemented in Flex and Bison (the GNU tools compatible with LEX and YACC) and are available for download. An online parser is also available.

Castitles are XPath expressions of the form:

A[B]

or

A[B]C[D]

where A and C are navigational XPath expressions using only the descendant axis. B and D are predicates using functions for text; the arithmetic operators <, <=, >, and >= for numbers; and the connectives and and or. The about function has (nearly) the same syntax as the XPath function contains. Usage is restricted to the form:

about(.path, query)

where path is empty or contains only tag-names and descendant axis; and query is an IR query having the same syntax as the CO titles (i.e. query terms). The about function denotes that the content of the element located by the path is about the information need expressed in the query. As with the title, the castitle is only a hint to the search engine and does not have definite semantics.

Example

A user wants to know about Tolkien's languages and assumes an article on Tolkien will have a section discussing these languages:

//article[about(., Tolkien)]//sec[about(., language)]

But the user might be happy with retrieving whole articles. In the formalism expressed above,

A = //article
B = about(., Tolkien)
C = //sec
D = about(., language)


A CAS query contains two kinds of structural hints: where to look (support elements; in this case //article and //article//sec), and which elements to return (target elements; in this case //article//sec). In prior INEX workshops the target element hint has been interpreted either strictly or loosely (vaguely). Where to look has always been interpreted loosely. This created considerable debate over how to interpret where to look. There is the database view: all structural constraints must be followed strictly (by exact match). Then there is the information retrieval view: an element is relevant if it satisfies the information need, irrespective of the structural constraints.

The main purpose of the INEX initiative is to build a test collection for the evaluation of content oriented XML retrieval. The most valuable part of the collection is the human made relevance assessments. Thus, each structured query must have at least one about function in the rightmost predicate.

Topic <phrasetitle>

The topic phrasetitle is a verbose representation of the information need focusing on multiword phrases, i.e.\ it consists of one or more double-quote encapsulated phrases. The title prefixes, in which + is used to emphasize an important concept, and - is used to denote an unwanted concept is permitteed.

Topic <description>

The <description> is a short natural language statement of the information need. The <description> should be precise and concise and would typically consist of one or two sentences explaining what the user is looking for.

Procedure for Topic Development

Submission is done by filling in the Candidate Topic Submission Form on the INEX web site. The topic creation process is divided into several steps.

Step 1: Initial Topic Statement

Create a one or two sentence description of the information you are seeking. This should be a simple description of the information need without regard to retrieval system capabilities or document collection peculiarities. Record also the context and motivation of the information need, i.e. why the information is being sought. Add to this a description of the work-task, that is, with what task it is to help (e.g. writing an essay on a given topic).

Step 2: Exploration Phase

In this step the initial topic statement is used to explore the collection. Obtain an estimate of the number of relevant elements then evaluate whether this topic can be judged consistently. You may use any retrieval engine for this task, including your own or the TopX system.

Step 3: Assess Top 25 Results

Judge the top 25 retrieval documents. To assess the relevance of a retrieved document use the following working definition: mark it relevant if it would be useful if you were writing a report on the subject of the topic, or if it contributes toward satisfying your information need. Each result should be judged on it own merits. That is, information is still relevant even if it is the thirtieth time you have seen the same information. It is important that your judgment of relevance is consistent throughout this task. If there are:

• fewer than 2 or more than 20 relevant documents within the top 25, abandon the topic and use a new one,
• more than 2 and fewer than 20 relevant within the top 25, continue with this query.

Step 4: Assess Top 100 Results

Judge the top 100 results (25 are already judged), and record the number of relevant results in the Candidate Topic Submission Form. Record the query in the title field of the Candidate Topic Submission Form.

Step 5: Optionally write the CO+S <castitle>

Inspect the sources of some of the relevant articles to acquaint yourself with the collection's markup. Can you formulate structural hints that guide the search engine to the right results? If so, then re-write the title by adding structural constraints and target elements. Record this as the <castitle> on the Candidate Topic Submission Form. Please note that we aim at having castitles in most topics.

Step 6: Optionally write the <phrasetitle>

Optionally re-write the title to include only phrases. Record this as the <phrasetitle> on the Candidate Topic Submission Form. Add to the <narrative>, a description of why you think phrases may help. We aim to have phrases for most topics.

Step 7: Write the <description>

Write the <description>, the natural language interpretation of the information need.

Step 8: Write the <narrative>

Having judged the top 100 results you should have a clear idea of what makes a component relevant or not. It is important to record this in minute detail as the <narrative> of the topic. The <narrative> is the definitive instruction used to determine relevance during the assessment phase (after runs have been submitted). Record not only what information is being sought, but also what makes it relevant or irrelevant. Also record, if a CAS query was formulated, why you think the structural hints might help. Also record the context and motivation of the information need. Include the work-task, which is: the form the information will take after having been found (e.g. written report), or record a use-case which is: the reason the user needs XML-IR to solve their problem. Make sure your description is exhaustive as there will be several months between topic development and topic assessment.

Step 9: Refining Topic Statements

Finalize the topic <title>, <castitle>, <phrasetitle>, <description>, and <narrative>. It is important that these parts all express the same information need; it should be possible to use each part of a topic in a stand-alone fashion (e.g. title for retrieval, description for experiments long topics statements, etc.). In case of dispute, the <narrative> is the definitive definition of the information need - all assessments are made relative to the <narrative> and the <narrative> alone.

Step 10: Topic Submission

Once you are finished, submit the Candidate Topic Submission. After submitting a topic you will be asked to fill out an online questionnaire (this should take no longer than 5-10 minutes). It is important that this is done as part of the topic submission as the questions relate to the individual topic just submitted and the submission process. This is part of an effort to collect more context for the INEX topics, thereby increasing the reusability of the test collection. Initial results demonstrating the applicability of this can be found in [2].

Topic Selection

From the received candidate topics, the INEX organizers will decide which topics to include in the final set. This is done to ensure inclusion of a broad set of topics. The data obtained from the collection exploration phase is used as part of the topic selection process. The final set of topics will be distributed for use in retrieval and evaluation.

Acknowledgments

The topic format proposed in this document is based on the outcome of working groups set up during previous INEX workshops along with the online discussions they created. We are very grateful for this contribution. This document is a modified version of the topic development guides from previous INEX workshops authored by Shlomo Geva, Jaap Kamps, Mounia Lalmas, Birger Larsen, Saadia Malik, Börkur Sigurbjörnsson, and Andrew Trotman.

References

[1] Trotman, A., & Sigurbjornsson, B. (2004). Narrowed Extended XPath I (NEXI). In Proceedings of the INEX 2004 Workshop, (pp. 16-40).

[2] Kamps, J. and Larsen, B. (2006). Understanding Differences between Search Requests in XML Element Retrieval. In Proceedings of the SIGIR 2006 Workshop on XML Element Retrieval Methodology, p. 13-19.

Appendix: Example CO+S Topics

<inex_topic query_type="CO+S">
<title>Tolkien languages "lord of the rings"</title>
<castitle>//article[about(., Tolkien) or about(., "lord of the rings")]//sec[about(., Tolkien languages)]</castitle>
<phrasetitle>"Tolkien languages" "lord of the rings"</phrasetitle>
<description>Find information about Tolkien languages from the Lord of the Rings.</description>
<narrative>
The "Lord of the Rings" movie trilogy fascinate me. I have learned from other fans that the languages spoken by e.g., elves and dwarfs in the screen version are not just the usual effects. Apparently, these languages were invented by Tolkien himself and are central to his work with the original books.

For my own personal interest, I would like to learn more background about Tolkien's artificial languages, and how they have affected the world portrayed in the Lord of the Rings universe. Later I may want to add a section on the influence languages to my Lord of the Rings fan web page. As Tolkien's languages seem to be a rather specialized topic, I expect to find relevant information as elements in larger documents that deal with Tolkien or Lord of the Rings, e.g., as sections in documents about Tolkien or the Lord of the Rings (although I would be pleasantly surprised to see whole documents on the topic of Tolkien's languages).

To be relevant an element should discuss Tolkien's artificial languages and their influence on the Lord of the Rings books or movies. Information on the languages alone without explicit discussion of their impact on the books or movies is not relevant; nor is general information on Tolkien or the Lord of the Rings.
</narrative>
</inex_topic>


<inex_topic query_type="CO+S">
<title>band freddy mercury</title>
<castitle>//group[about(.//singer, Freddy Mercury)]</castitle>
<phrasetitle>"Freddy Mercury"</phrasetitle>
<description>Find information about the bands in which Freddie Mercury played.</description>
<narrative>
I am a huge fan of Freddy Mercury and all his work, and was wondering in how many different bands, groups, or line-ups did he perform. I am aware of some of them (such as Queen, obviously) but I am sure there must be more.

I conduct this search for my own personal interest of completing my record collection, I am trying to get hold of a complete discography of Freddy Mercury that would shed light on his evolution as an artist during his different career stages.

To be relevant an result should discuss a band, group, or line-up that included Freddy Mercury. Not relevant is information about other occurrences of Freddy Mercury in the media.
</narrative>
</inex_topic>