9758

Collocation by Symmetric Conditional Probability

Roughly speaking, a "collocation" is an -gram (subsequence) that appears more frequently in some sequence than would be expected if the constituent parts of the -gram were drawn at random. One way of making this definition more precise is to use the notion of "symmetric conditional probability". For a bigram (i.e., an -gram with two parts), its symmetric conditional probability is the product of (1) the conditional probability that the second half of the bigram would appear given the first half of the bigram and (2) the conditional probability that the first half of the bigram would appear given the second half of the bigram. Bigrams with high specific conditional probability may be thought of as collocations. This concept can be extended to -grams by cutting the -gram at all positions (where is the length of the -gram), finding the symmetric conditional probability of this "pseudo-bigram", and then taking the mean of the results.
This Demonstration shows how this concept can be built into an algorithm when the sequence is stored as a "trie", that is, a data structure that converts "sentences" of symbols into a tree structure by requiring that all -grams up to a certain length that appear in a sentence have as their parent "most" of that -gram, that is, all but the rightmost element of the original -gram. You select the elementary cellular automaton that generates the sentences. You select whether 3 or 4 is the maximum length of -grams examined. You further select which -gram of length 2 or more you wish to consider. And you select where you want to slice the -gram to convert it into a pseudo-bigram. The system responds with a figure showing the conditional probability of the second part of the pseudo-bigram given the first part of the bigram and a figure showing the conditional probability of the first part of the pseudo-bigram given the second part of the pseudo-bigram. The green circle shows the -gram you selected. The purple dashed -gram shows the relevant parent of that -gram and the blue dashed -grams show all appropriate children of the parent. The bottom row of the output shows how symmetric conditional probability (the mean of the products of the conditional probabilities over all possible slicings of the -gram) is done.

SNAPSHOTS

  • [Snapshot]
  • [Snapshot]
  • [Snapshot]

DETAILS

The Demonstration executes faster when the maximum length of -grams examined is set to 3.
An application of the concept behind this Demonstration will be presented at the 2008 International Mathematica Symposium, where it will be used as part of an effort to find critical concepts in the leading international treaty governing sales of goods.
Snapshot 1: Examines the -gram {0,0,0} in sentences generated by Rule 153. Slice position for creating the pseudo-bigram is 1.
Snapshot 2: Examines the -gram {0,0,0} in sentences generated by Rule 153. Slice position for creating the pseudo-bigram is 2.
Snapshot 3: Examines the -gram {0,0,0} in sentences generated by Rule 130. Slice position for creating the pseudo-bigram is 1. It has a high symmetric conditional probability and could be a collocation.
    • Share:

Embed Interactive Demonstration New!

Just copy and paste this snippet of JavaScript code into your website or blog to put the live Demonstration on your site. More details »

Files require Wolfram CDF Player or Mathematica.









 
RELATED RESOURCES
Mathematica »
The #1 tool for creating Demonstrations
and anything technical.
Wolfram|Alpha »
Explore anything with the first
computational knowledge engine.
MathWorld »
The web's most extensive
mathematics resource.
Course Assistant Apps »
An app for every course—
right in the palm of your hand.
Wolfram Blog »
Read our views on math,
science, and technology.
Computable Document Format »
The format that makes Demonstrations
(and any information) easy to share and
interact with.
STEM Initiative »
Programs & resources for
educators, schools & students.
Computerbasedmath.org »
Join the initiative for modernizing
math education.
Step-by-step Solutions »
Walk through homework problems one step at a time, with hints to help along the way.
Wolfram Problem Generator »
Unlimited random practice problems and answers with built-in Step-by-step solutions. Practice online or make a printable study sheet.
Wolfram Language »
Knowledge-based programming for everyone.
Powered by Wolfram Mathematica © 2014 Wolfram Demonstrations Project & Contributors  |  Terms of Use  |  Privacy Policy  |  RSS Give us your feedback
Note: To run this Demonstration you need Mathematica 7+ or the free Mathematica Player 7EX
Download or upgrade to Mathematica Player 7EX
I already have Mathematica Player or Mathematica 7+