For these analysies, 250 results were used for the 6600 orphan set, 50 for the 50 orphan set. Automatic assessment was against a ground-truth that is in the Wikipedia
articles themselves, manual assessment is those topics assessed as relevant by a human assessor. BEP proximity scoring was as per the ad hoc track with a decrease in
score from 1 (exact hit) to 0 (1000 characters or more from the assessors BEP, even if in the correct file).
•600 F2F evaluated by F2F against Wikipedia Ground-Truth
This is a simple evaluation of file-to-file links,
treating the submission as 250 ranked links in the order in which they appear,
using existing Wikipedia links as qrels.
•50 A2B evaluated by F2F against Wikipedia Ground-Truth
This is the same as the previous one, using Wikipedia ground-truth links as qrels,
but using the 50 manually assessed topics
•50 A2B evaluated by F2F against Manual Assessment Result
This is as before, but using the manual qrels.
qrels is derived from the manual assessment result.
•50 A2B evaluated by F2B against Manual Assessment Result
This is as before, but using the BEP proximity in scoring links.
The distance between the submission BEP and the assessorís BEP is used to compute a relevance score in [0..1]
An exact match gives 1, and the score drops linearly to zero over a distance of 1000 characters. The best score will be taken into evaluation since there may be more than 1 bep within this distance in the same document.
Beyond 1000 chars the BEP score is zero even if the files are linked.
•50 A2B evaluated by A2F against Manual Assessment Result (only 50 anchors, 1st bep per anchor)
This is using only the first link specified for each anchor. So it is using 50 anchors (the first link for each anchor = 50 links) only. The rest of links in the anchor won't be taken into account for evaluation.
Evaluated for file to file links.
•50 A2B evaluated by A2B against Manual Assessment Result (only 50 anchors, 1st bep per anchor)
As previous one, evaluated with BEP proximity.