Generating Random DNA Sequences

This Demonstration generates pseudorandom sequences of four letters, representing each nucleotide in the DNA by taking into consideration GC content, which is the number of Gs and Cs occurring in the sequence. Known GC content can also be chosen for popular organisms, including many mammals, but also a few bacteria, ranging in GC content from 20% to almost 52%. Even though the human genome GC content can vary from 35% to 60% from chromosome to chromosome, the average human genome GC content is 46.1%.
Various statistics and estimates of complexity values are provided, including Shannon entropy of the pseudorandomly generated sequence, a Kolmogorov–Chaitin complexity estimation using the Compress[] function in Mathematica to implement the Deflate algorithm, the lossless compression ratio (also using Compress[]), and a histogram of the distribution of nucleotides in the generated sequence. The RNA option simply substitutes thymine (T) for uracil (U). The program generates sequences of up to 1kbp (base pairs). You can select the full sequence even if all of it is not displayed in the Demonstration window.


  • [Snapshot]
  • [Snapshot]
  • [Snapshot]
  • [Snapshot]


[1] J. Romiguier, V. Ranwez, E. J. P. Douzery, and N. Galtier, "Contrasting GC-Content Dynamics across 33 Mammalian Genomes: Relationship with Life-History Traits and Chromosome Sizes," Genome Research, 20(8), 2010 pp.1001–1009.
    • Share:

Embed Interactive Demonstration New!

Just copy and paste this snippet of JavaScript code into your website or blog to put the live Demonstration on your site. More details »

Files require Wolfram CDF Player or Mathematica.