Comparing Information Retrieval Evaluation Measures

Compare some common evaluation measures for information retrieval on the results given by two systems to the same query. It is assumed that both systems retrieve the same number of results for that query; this number can be considered as a cutoff value.
Mark a retrieved item as relevant or nonrelevant by clicking a number on either side and watch how measures change. Mouseover the measures to see their definitions.
In order to evaluate and compare information retrieval systems, traditionally one uses a "test collection" composed of three entities:
• a set of queries (or "information needs")
• a target dataset (where to look for items that will satisfy the information needs)
• a set of relevance assessments (the items that satisfy the information needs)
This test collection is generated artificially and it is assumed to be available.
Based on this test collection one can compare the results of a query using two different information retrieval systems. The items retrieved by the two systems are represented by the two sequences of circles on the left and right.
There are two ways to interact with this Demonstration:
1. by using the controls at the top
2. by marking items as relevant/nonrelevant in the graphics
Use the controls to set the number of relevant and nonrelevant items for the query being considered. These numbers are characteristics of the test collection for each given query.
Relevant and nonrelevant items do not necessarily sum up to all the items in the collection, but they are those items in the collection that are known to be relevant or not to a given query because their relevance has been assessed (or inferred).
The main concern of information retrieval evaluation is to decide the better retrieval system based on a test collection. Is the system that retrieves the most relevant items the better one? Or the one with the higher average precision? Or the one with higher normalized cumulative gain?


  • [Snapshot]
  • [Snapshot]
  • [Snapshot]


Wikipedia, "Information Retrieval".
E. M. Voorhees, "The Philosophy of Information Retrieval Evaluation," in: Evaluation of Cross-Language Information Retrieval Systems, Lecture Notes in Computer Science 2001, pp. 143–170.
T. Sakai, "Alternatives to Bpref," SIGIR '07 Proceedings, ACM, 2007 pp. 71–78.
    • Share:

Embed Interactive Demonstration New!

Just copy and paste this snippet of JavaScript code into your website or blog to put the live Demonstration on your site. More details »

Files require Wolfram CDF Player or Mathematica.

Mathematica »
The #1 tool for creating Demonstrations
and anything technical.
Wolfram|Alpha »
Explore anything with the first
computational knowledge engine.
MathWorld »
The web's most extensive
mathematics resource.
Course Assistant Apps »
An app for every course—
right in the palm of your hand.
Wolfram Blog »
Read our views on math,
science, and technology.
Computable Document Format »
The format that makes Demonstrations
(and any information) easy to share and
interact with.
STEM Initiative »
Programs & resources for
educators, schools & students.
Computerbasedmath.org »
Join the initiative for modernizing
math education.
Step-by-Step Solutions »
Walk through homework problems one step at a time, with hints to help along the way.
Wolfram Problem Generator »
Unlimited random practice problems and answers with built-in step-by-step solutions. Practice online or make a printable study sheet.
Wolfram Language »
Knowledge-based programming for everyone.
Powered by Wolfram Mathematica © 2018 Wolfram Demonstrations Project & Contributors  |  Terms of Use  |  Privacy Policy  |  RSS Give us your feedback
Note: To run this Demonstration you need Mathematica 7+ or the free Mathematica Player 7EX
Download or upgrade to Mathematica Player 7EX
I already have Mathematica Player or Mathematica 7+