In the coding protein of the DNA molecules genetic information is encoded in three bases called triplets or codons. Every codon encodes the information for one amino acid and every amino acid can be encoded by one or more codons. The genetic code is the biochemical system that establishes the rules by which the nucleotide sequence of a gene is transcribed into the mRNA codon sequence and next the mRNA is translated into the amino acid sequence of the corresponding protein. The genetic code is an extension of the four-letter alphabet found in DNA molecules. These "letters" are the DNA bases: adenine, guanine, cytosine, and thymine, usually denoted

,

,

, and

respectively (in RNA,

is changed to

, uracil). They are paired according to the following rule (Watson–Crick base pairings):

,

. The base

is the complementary base of

, and

is the complementary base of

(or

) in the DNA (or RNA) molecule and vice-versa. The standard genetic code table is:

In 1968, Nobel Prize winner Francis Crick pointed out the principal partitions of the genetic code [2], showing that the 20 amino acids are not distributed at random among the 64 codons. He summarized the following rules:

• XYU and XYC always code the same amino acid.

• XYA and XYG often code the same amino acid. The rare amino acids, methionine and tryptophan, which have only one codon each, appear to be exceptions to this rule.

• In 8 out of 16 cases the first two bases of codons are constant: XYA, XYC, XYG, and XYU.

• In most cases the codons representing a single amino acid start with the same pair of bases. Thus the two codons for histidine both start with CA. There are three exceptions to this: leucine has CUA, CUC, CUG, CUU, UUA, and UUG; serine has UCA, UCC, UCG, UCU, AGC, and AGU; arginine has CGA, CGC, CGG, CGU, AGA, and AGG.

• If the first two bases consist only of Gs and Cs, then the four codons sharing the same initial doublet all code the same amino acid. That is, the meaning of these codons is independent of the third base. This is in fact true for all codons having C in the second position.

If the Watson–Crick base pairings are symbolically expressed by means of the sum "+" operation in such a way that the following relationships hold:

,

, (i.e., the complementary RNA (or DNA) bases

and

(or

) are, respectively, algebraic complements), then this requirement leads us to define an additive group (

on the set of five RNA (DNA) bases

(see the "sum and times operation tables" in Snapshot 1). Explicitly, it was required that the bases with the same number of hydrogen bonds in the DNA molecule and different chemical types be algebraic inverses in the additive group defined in the set of DNA bases

. This definition also reflects the non-specific pairings of the ancient hypothetical base(s)

, which is taken as the neutral element of the sum operation. Next, there is only one possible definition for the product operation ("·") (with

as the neutral element for this operation) in such a way that it completes a finite (Galois) field structure isomorphic to the field and defined over the set of integers modulo 5,

(see "sum and times operation tables" in Snapshot 2). The sum and times operations over the sets

and

define two isomorphic finite fields that imply the bijections:

,

,

,

,

.

The plausible ancient genetic code comprises the standard genetic code. In Snapshot 1, all codons are pointed out because, in "sum and times operation tables", the "Ancestral Triplet Subset" was selected in member 1 and the "Standard Genetic Code" in member 2. The triplets that belong to the standard genetic code (also called codons) are yellow since in "mcolor 1", it was chosen to have a yellow color. Black triplets (extended triplets) correspond to the extinct plausible codons. According to Crick's rules, amino acids with similar physicochemical properties are encoded by extended codons belonging to the same vertical plane in the cube (the same affine subspace) and extended codons encoding for the same or very similar amino acids are located in the same vertical line. For instance, codons

that encode hydrophilic amino acids are found in the blue vertical plane (see Snapshot 1). The color of the five principal vertical planes

,

,

,

, and

can be changed in "vertical planes handling". Notice that biologically speaking both the ancient and the present genetic codes are not cubes but they can be represented, according to their biological and algebraic features, as three-dimensional cubes.

The principal partitions of the standard genetic code—pointed out by Crick—can be algebraically derived by manipulating the sum and times operations set in the cube. The cube presented in this Demonstration can be seen as a building or a hotel where the rooms (placed in the cube's nodes) are filled according to biological and algebraic criteria. Thus, as Crick pointed out (see above), the base triplets are located in the cube not at random but holding algebraic relationships. Next, the empty hotel rooms will be filled out starting from some occupied rooms and using the sum and times operations. In the vector space of the extended triplets, the vertical plane

forms a two-dimensional vector subspace, which is contained in the

plane(see Snapshot 2) and the rest of the vertical planes are affine subspaces. This is the reason why, for example, the plane

can be obtained from the plane

by adding to the latter any codon of plane

. In Snapshot 2, for instance, the sum

is derived by selecting "Vertical Plane XDZ" in member 1, selecting

in member 2, and clicking the sum checkbox of "sum and times operations". Codons belonging to the plane

are yellow and codons from the plane

are white. Likewise it is possible to obtain the vertical plane

from the plane

by adding to the latter any of the codons that belong to plane

, for instance, codon

. In Snapshot 3, the sum

is derived by selecting "Vertical Plane XDZ" in member 1 and CGA in member 2 of "sum and times operations". In particular, in Snapshot 4, we can see codon

.

The cube edge inserted in the coordinate

axis (the vertical line

) is a one-dimensional vector subspace (see Snapshot 5). In the cube there are another 24 vertical lines of four segments each, which are affine subspaces (also called cosets), every one of which can be obtained from the vectorial line

. For instance, the vertical line

can be derived from the line

by adding to the latter any of the

codons with

. In Snapshot 5, we can see the sum

. In most of the cases, codons encoding the same or very similar amino acids (with similar physico-chemical properties) belong to the same vertical line [1]. This fact can be easily observed using, for instance, "sum and times operations" and selecting any of the amino acids that appear in the popup menu of member 1 and "None" in member 2.

The product operation between two triplets

and

is defined by

, according to the rules:

,

for all

,

in

,

,

,

,

,

, and

.

The subset of all codons

(the subset of codons that form the standard genetic code) is closed under products; that is, the standard genetic code with this product operation is a multiplicative group. In other words, this means that the product of any two codons is always a codon and never produces an extended triplet. For this reason, in Snapshot 6, the products of the codon that encodes amino acid Trp (UGG, codon yellow) and the codons that encode amino acid Ala (green codons) are, according to the standard genetic code table, the codons that encode amino acids Glu (with codons

and

) and Asp (with codons

and

) (white codons).

The subset of codon

(inserted in the vertical plane

) is a subgroup of the multiplicative group defined over the standard genetic code. This means that any of the codon subsets:

,

, and

(with

) inserted, respectively, in the vertical planes

,

, and

, can be obtained from the subset

. In simpler terms, for instance, in Snapshot 7, the subset of codons

(white) can be obtained by selecting the "Code Plane XAZ" (white, this is a plane of the standard genetic code) in the popup menu of "member 1" and the amino acid Gly (glycine) in "member 2" of "sum and times operations" and by clicking the checkbox "times". Note that the same result can be obtained just by using one of the codons that encode amino acid Gly or any of the codons from the subset

.

In Snapshot 8, the "Vertical Plane XDZ" (yellow triplets) and the "GC Random Sample" (white codons) have been selected in the popup menus of "member 1" and "member 2", respectively. Every time that "GC Random Sample" is selected (it must be previously deselected) or the corresponding triplet color is changed, a new sample of four codons is picked out. Next, in Snapshot 9, the whole ancient genetic code is algebraically obtained, keeping the selected triplet subsets of Snapshot 8 and just clicking the sum checkbox of "sum and times operations". Notice that it does not matter which codons are picked out with "GC Random Sample", the result will be the same. Likewise, in Snapshot 10 the presented standard genetic code is algebraically derived by choosing "Code Plane XAZ" in the popup menu of "member 1", "GC Random Sample" in "member 2", and clicking the times checkbox of "sum and times operations".

By means of this Demonstration the biological and abstract algebraic features of the genetic code described in [1] are visualized. In general, users can operate over different subsets of codons and derive new subsets by clicking the sum or times checkboxes of "sum and times operations". Mathematically speaking, this means that if the genetic code evolves so as to minimize the transcription and replication errors, then both the ancient and the present standard genetic codes are mathematically determined.

[1] R. Sánchez and R. Grau, "An Algebraic Hypothesis about the Primeval Genetic Code Architecture,"

*Mathematical Biosciences* **221**(1), 2009 pp. 60–76.

[2] F. H. C. Crick, "The Origin of the Genetic Code,"

*Journal of Molecular Biology* **38**(3), 1968 pp. 367-379.