10182

# Collocation by Symmetric Conditional Probability

Roughly speaking, a "collocation" is an -gram (subsequence) that appears more frequently in some sequence than would be expected if the constituent parts of the -gram were drawn at random. One way of making this definition more precise is to use the notion of "symmetric conditional probability". For a bigram (i.e., an -gram with two parts), its symmetric conditional probability is the product of (1) the conditional probability that the second half of the bigram would appear given the first half of the bigram and (2) the conditional probability that the first half of the bigram would appear given the second half of the bigram. Bigrams with high specific conditional probability may be thought of as collocations. This concept can be extended to -grams by cutting the -gram at all positions (where is the length of the -gram), finding the symmetric conditional probability of this "pseudo-bigram", and then taking the mean of the results.
This Demonstration shows how this concept can be built into an algorithm when the sequence is stored as a "trie", that is, a data structure that converts "sentences" of symbols into a tree structure by requiring that all -grams up to a certain length that appear in a sentence have as their parent "most" of that -gram, that is, all but the rightmost element of the original -gram. You select the elementary cellular automaton that generates the sentences. You select whether 3 or 4 is the maximum length of -grams examined. You further select which -gram of length 2 or more you wish to consider. And you select where you want to slice the -gram to convert it into a pseudo-bigram. The system responds with a figure showing the conditional probability of the second part of the pseudo-bigram given the first part of the bigram and a figure showing the conditional probability of the first part of the pseudo-bigram given the second part of the pseudo-bigram. The green circle shows the -gram you selected. The purple dashed -gram shows the relevant parent of that -gram and the blue dashed -grams show all appropriate children of the parent. The bottom row of the output shows how symmetric conditional probability (the mean of the products of the conditional probabilities over all possible slicings of the -gram) is done.

### DETAILS

The Demonstration executes faster when the maximum length of -grams examined is set to 3.
An application of the concept behind this Demonstration will be presented at the 2008 International Mathematica Symposium, where it will be used as part of an effort to find critical concepts in the leading international treaty governing sales of goods.
Snapshot 1: Examines the -gram {0,0,0} in sentences generated by Rule 153. Slice position for creating the pseudo-bigram is 1.
Snapshot 2: Examines the -gram {0,0,0} in sentences generated by Rule 153. Slice position for creating the pseudo-bigram is 2.
Snapshot 3: Examines the -gram {0,0,0} in sentences generated by Rule 130. Slice position for creating the pseudo-bigram is 1. It has a high symmetric conditional probability and could be a collocation.

### PERMANENT CITATION

 Share: Embed Interactive Demonstration New! Just copy and paste this snippet of JavaScript code into your website or blog to put the live Demonstration on your site. More details » Download Demonstration as CDF » Download Author Code »(preview ») Files require Wolfram CDF Player or Mathematica.

#### Related Topics

 RELATED RESOURCES
 The #1 tool for creating Demonstrations and anything technical. Explore anything with the first computational knowledge engine. The web's most extensive mathematics resource. An app for every course—right in the palm of your hand. Read our views on math,science, and technology. The format that makes Demonstrations (and any information) easy to share and interact with. Programs & resources for educators, schools & students. Join the initiative for modernizing math education. Walk through homework problems one step at a time, with hints to help along the way. Unlimited random practice problems and answers with built-in Step-by-step solutions. Practice online or make a printable study sheet. Knowledge-based programming for everyone.