9716

Term Weighting with TF-IDF

TF-IDF (term frequency-inverse document frequency) is a way of determining which terms in a document should be weighted most heavily when trying to understand what the document is about. The term frequency reflects how often a given term appears in the document of interest. The document frequency is measured with respect to a corpus of other documents. It tells you how often the term appears in your corpus overall. The terms that are most informative about a particular text have a high term frequency and a low document frequency. The TF-IDF for a term is the product of its term frequency and the scaled inverse of its document frequency. Stopwords are those words that occur so frequently in the language that they rarely convey information about the meaning of a particular document. In this Demonstration, stopwords can be turned on and off, and the font size of each term can be scaled by various weighting factors.

SNAPSHOTS

  • [Snapshot]
  • [Snapshot]
  • [Snapshot]

DETAILS

This Demonstration uses Old Bailey Online t18100110-41. The built-in Mathematica function WordData was used to generate a list of stopwords for this Demonstration.
The Old Bailey Online contains digital versions of almost 200,000 criminal trials held at London's central criminal court between 1674 and 1913.
This work is part of the Criminal Intent project, funded by a 2009 Digging into Data challenge grant.
For more about TF-IDF, see [1].
Reference
[1] C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval, Cambridge: Cambridge University Press, 2008.
    • Share:

Embed Interactive Demonstration New!

Just copy and paste this snippet of JavaScript code into your website or blog to put the live Demonstration on your site. More details »

Files require Wolfram CDF Player or Mathematica.









 
RELATED RESOURCES
Mathematica »
The #1 tool for creating Demonstrations
and anything technical.
Wolfram|Alpha »
Explore anything with the first
computational knowledge engine.
MathWorld »
The web's most extensive
mathematics resource.
Course Assistant Apps »
An app for every course—
right in the palm of your hand.
Wolfram Blog »
Read our views on math,
science, and technology.
Computable Document Format »
The format that makes Demonstrations
(and any information) easy to share and
interact with.
STEM Initiative »
Programs & resources for
educators, schools & students.
Computerbasedmath.org »
Join the initiative for modernizing
math education.
Step-by-step Solutions »
Walk through homework problems one step at a time, with hints to help along the way.
Wolfram Problem Generator »
Unlimited random practice problems and answers with built-in Step-by-step solutions. Practice online or make a printable study sheet.
Wolfram Language »
Knowledge-based programming for everyone.
Powered by Wolfram Mathematica © 2014 Wolfram Demonstrations Project & Contributors  |  Terms of Use  |  Privacy Policy  |  RSS Give us your feedback
Note: To run this Demonstration you need Mathematica 7+ or the free Mathematica Player 7EX
Download or upgrade to Mathematica Player 7EX
I already have Mathematica Player or Mathematica 7+