Generating Random DNA Sequences

Requires a Wolfram Notebook System

Interact on desktop, mobile and cloud with the free Wolfram CDF Player or other Wolfram Language products.

Requires a Wolfram Notebook System

Edit on desktop, mobile and cloud with any Wolfram Language product.

This Demonstration generates pseudorandom sequences of four letters, representing each nucleotide in the DNA by taking into consideration GC content, which is the number of Gs and Cs occurring in the sequence. Known GC content can also be chosen for popular organisms, including many mammals, but also a few bacteria, ranging in GC content from 20% to almost 52%. Even though the human genome GC content can vary from 35% to 60% from chromosome to chromosome, the average human genome GC content is 46.1%.

[more]

Various statistics and estimates of complexity values are provided, including Shannon entropy of the pseudorandomly generated sequence, a Kolmogorov–Chaitin complexity estimation using the Compress[] function in Mathematica to implement the Deflate algorithm, the lossless compression ratio (also using Compress[]), and a histogram of the distribution of nucleotides in the generated sequence. The RNA option simply substitutes thymine (T) for uracil (U). The program generates sequences of up to 1kbp (base pairs). You can select the full sequence even if all of it is not displayed in the Demonstration window.

[less]

Contributed by: Hector Zenil (July 2013)
Open content licensed under CC BY-NC-SA


Snapshots


Details

Reference

[1] J. Romiguier, V. Ranwez, E. J. P. Douzery, and N. Galtier, "Contrasting GC-Content Dynamics across 33 Mammalian Genomes: Relationship with Life-History Traits and Chromosome Sizes," Genome Research, 20(8), 2010 pp.1001–1009. www.ncbi.nlm.nih.gov/pmc/articles/PMC2909565.



Feedback (field required)
Email (field required) Name
Occupation Organization
Note: Your message & contact information may be shared with the author of any specific Demonstration for which you give feedback.
Send