One of the most fascinating Biology Topics is the study of genetics and how traits are passed down through generations.
What are Genetic Codons? – Composition, Codon Usage Pattern, Protein Properties
The sequence of the three nucleotides consecutively present over the mRNA that helps in the incorporation of specific amino acids in the polypeptide chain is called genetic code or codon. Therefore, the codon is the symbolic unit for the stored information in the gene meaning particular amino acids.
Information of a gene remains stored in DNA and that information is translated into a polypeptide chain. The specificity of amino acids in the polypeptide is possible due to the genetic code which is the conversion of information stored in DNA into mRNA. The sequence of three nucleotides in mRNA is known as a codon. The genetic code is the way in which the nucleotide sequence in nucleic acids specifies the amino acid sequence in the protein. It is a triplet code, where the codon (i.e., groups of three nucleotides) specifies a particular amino acid.
The mRNA carries some codon sequence that directs the synthesis of polypeptides. Each of the codons codes for particular amino acid and therefore, the codon sequence of the mRNA determines the amino acid sequence of the polypeptide chain. Any mRNA molecule represents a linear chain of nucleotides in which four types of nucleotides designated as A, G, C and U remain in definite sequences. Three nucleotides present consecutively in this polynucleotide constitute one codon.
Nature and Properties of Codon
1. Codon is Triplet in Nature:
It means that three nucleotides together constitute one codon. mRNA contains four types of nucleotides formed of four different nitrogenous bases namely A, G, C, and U. Out of these four nucleotides if three constitute one codon, then 43 or 64 codon combinations may be obtained. Scientists also observed that in nature 64 codons are present and out of these, 61 codons bear some sense and three codons do not mean any amino acid. Therefore, they are non-sense codons. The code word dictionary is shown in the following table which shows different codons and their meanings. Francis Crick in 1960 confirmed the triplet nature of the genetic code.
Code Word Dictionary:
Deciphering the genetic code was one of the most important scientific achievements in Molecular biology. Artificial mRNAs were used to decipher the genetic code. In 1961 Nirenberg and Matthaei discovered that synthetic RNA with a known base sequence as Poly-U, Poly-A, and Poly-C produced particular amino acids, e.g., UUU (Poly-U) triplet codon codes for phenylalanine, Poly-A or AAA was found to be the code for lysine and Poly-C or CCC code for proline. Nirenberg in 1961 for the first time cracked genetic code. In 1965, H. G. Khorana succeeded in synthesizing mRNA of known sequences.
2. Non-overlapping Codon:
One codon may be overlapping in nature when two adjacent codons share a common base or bases. Thus if a codon is a triplet in nature and overlapping in nature, there may be two different situations viz. codons with one base overlap or codons with two bases overlap, but actually, codons are non-overlapping in nature and each triplet codon is distinct with its individuality, and identity.
Nucleic Acids: GCACAGGCA……….
X Y Z
1 2 3
Orientation of non-overiapping Codon:
If codons would have been overlapping in nature, there should have been a nearest neighbour relationship between the amino acid incorporated into a polypeptide chain. Brenner (1957) from analysis of amino acid sequences in polypeptide chains suggested that neighbouring amino acids were coded by unrelated groups of nucleotides and amino acid sequences appear to be completely random without any nearest neighbour relationship. Besides studies on single-site mutation also supports the non-overlapping nature of codon, because a base substitution type mutation always causes single amino acid replacement.
Studies on normal haemoglobin (HbA) and sickle cell haemoglobin (HbS) indicated that a single base substitution results in a change of mRNA codon GAA to GUA which leads to substitution of glutamic acid at a particular site of β polypeptide of haemoglobin by valine. If there would have been a sharing of the base between two adjacent codons more than one codon would have been affected resulting substitution of more than one amino acid. The same type of effect is observed in the case of haemoglobin C (HbC) also.
The amino acid analysis by Yanofsky et. al. (1967) for different mutations (site specific) of tryptophan synthetase A convincingly supports again the non-overlapping nature of codon. However, some exceptions to this condition may be observed in nature. For example, Barrel and co-workers (1976) and Sanger (1977) observed the existence of overlapping genes in φ × 174. This phage virus contains nine genes namely A, B, C, D, E, J, F, G & H which are located within its single-stranded circular DNA having 5386 nucleotides. On the basis of the triplet nature of codon, this DNA should code a total of 1800 amino acids in the proteins formed by the virus. But actually, the proteins produced by this virus contain more amino acids and it has been observed that the B gene in this case (containing 360 nucleotides) is completely included with the A gene (having 1536 nucleotides). Similarly, its E gene (with 270 nucleotides) is present within the gene D (containing 1456 nucleotides).
3. Commaless Codon:
This property of the codon states that the codons are arranged in a continuous fashion on the mRNA and successive codons are not interrupted by any punctuation. Commaless codon may be established by simple logic that as the codons are present on the mRNA chain, if any base out of the four different types in the sequence act as a comma, the number of bases to form codons is reduced to three, and three bases together may form (3)3 or 27 codons which are contradictory to the original situation.
Generation of frameshift mutation with acridine dyes in T4 phages by Crick et. al. (1961) and again the development of revertants from the rII mutants with the application of proflavin support the commaless codon. Sanger et. al. (1969) studied the nucleotide sequence of a segment of RNA from bacteriophage R2 that corresponds to amino acid sequence from 81 to 99 in the coat protein of the virus. Again when Khorana and colleagues used a repeating sequence of CU to frame the messenger RNA, it incorporated leucine (having codon CUC) and serine (codon UCU) in alternating fashion in the polypeptide during protein synthesis. They also observed that any of these two amino acids being present could not complete polypeptide synthesis. All these observations are in support of the commaless codon.
4. Degeneracy and Non-ambiguity:
Degeneracy of genetic codon is the most striking feature when one amino acid may be specified by more than one codon. In degeneracy, several codons may mean one particular amino acid and hence, such codons may be considered synonymous. As degeneracy is an attribute of genetic code, so codon naturally is non-ambiguous i.e., a codon that specifies a particular amino acid may never act alternatively specifying a second amino acid. However, a unique and single triplet codon is available for only two amino acids those are methionine and tryptophan, but for all other amino acids, degeneracy at various levels may be uncovered. Actually, degeneracy for two, three, four, and six codons is found for the amino acids as indicated in the following table.
Level of Degeneracy of the Genetic Codons:
|Number of Codons||Amino Acids||Total Codon Number|
|6||Leu, Ser, Arg||18|
|4||Gly, Pro, Ala, Val, Thr||20|
|2||Phe, Tyr, Cys, His, Gin, Glu, Asn, Asp, Lys||18|
Degeneracy may be classified into two types namely partial degeneracy and complete degeneracy. Partial degeneracy confers the condition when codons for two amino acids have identical bases at the first two positions, but the bases at the third position differ in the set of codons. For example, codons for histidine and glutamine contain common bases in the first two. positions, i.e., C & A but the synonymous codons of these two amino acids differ by their third base (CAU & CAC for histidine and CAA & CAG for glutamine). On the contrary a set of four codons meaning a particular amino acid when differing only by the third base as in the case of serine the codons are UCU, UCC, UCA, and UCG.
The existence of degenerate codons may be supported on the basis of the existence of iso-acceptor tRNA. The tRNAs that differ structurally but carry the same amino acid are called isoacceptor tRNA. The isoacceptor tRNAs contain different anticodon sequences to pair with the codons in mRNA. There are two such tRNA types for leucine designated as tRNAlLeu (having anticodon GAC) and tRNA2Leu (having anticodon GAG).
Significance of Degeneracy:
Degeneracy permits some biological advantages to living organisms. Some of the advantages may be highlighted in the following points:
- Microorganisms varying widely in the base composition of their DNA may carry out the synthesis of the same complement of enzymes and other proteins.
- Mutational lethality may be neutralized significantly because of degeneracy. Degeneracy sometimes is reflected through base substitution and therefore, base substitution on mutation does not lead to change in protein structure. Hence, it permits the conservation of protein sequences in living organisms.
- As degeneracy permits the conservation of protein structure, it contributes favourably to genetic stability.
- Degeneracy confers some plasticity to the genetic makeup of the living organism and hence, it confers molecular adaptation to the species in the arena of changing environment with variety of stresses.
- Polarities of codon: As a codon is part of mRNA which is read from its 5′ end to 3′ end hence, the codon should have polarities like mRNA. During translation, each codon of mRNA is also read from the 5′ end to the 3′ end. Therefore, a codon may be assigned with two poles; the 5’ end and the 3′ end. Three bases of a code word are designated as 1st, 2nd, and 3rd letters from its 5′ end to 3′ end. Hence, the polarities of mRNA, the codes on it, and the synthesized polypeptides may be shown in the following manner.
The specificity of the code word with its sequence from 5′-3′ end. If the code word is read from the 3′-5′ end, its meaning differs. For example 5′ GAU 3′ means aspartic acid, but if the same codon is read in the reverse way i.e., 3′-5′ end it becomes UAG meaning a stop codon that is non-sense by nature. Therefore, the polarity of the code word is important for its recognition. The polarity of the code word is taken into account for codon-anticodon pairing. tRNA anticodon sequence pairs with the codon sequence in a specific manner. The 3′ base of the anticodon sequence pairs with the 5′ base of the codon and the 5′ base of the anticodon pairs with the 3′ base of the codon. However, three anticodon bases are also read from a 5′ to 3′ direction, in which its 5′ base is considered as the 1st letter and the 3′ base is the third letter alike the codon letters.
|The pattern of codon-anticodon pairing|
|Codon (mRNA)||5′ AUG 3′|
|Anticodon (tRNA)||3′ UAC 5′|
The Wobble Hypothesis
The wobble hypothesis is concerned with codon-anticodon pairing. From the study on the degeneracy of the codon it appears that in most of the cases of degeneracy for a given amino acid the code words differ by the third base or base at its 3′ end and therefore, the codons may be represented by XYGA or XYCU. On this basis, it may be said that the specificity of codons depends on the first two bases of the codon. As the codon is read with the help of tRNA when the anticodon sequence helps in pairing aminoacyl tRNA with the codon on mRNA, the codon and tRNA show distinct relationships that may be expressed through the following points.
1. The bases in the code word render the specificity of a codon and they participate in pairing with the corresponding bases of the anticodon by the Watson-Crick base pairing rule.
2. Codons for a particular amino acid when differ in the first two bases require different tRNA for their reading.
3. The first base of anticodon (i.e., the base present at the 5′ end) in many cases exhibits loose pairing with the corresponding base of the codon (i.e., the base at the 3′ end of the codon). In this pairing of codons the following rule is usually observed:
- if the 1st base of the anticodon is C or A, the tRNA may pair with only one codon,
- if the 1st base of the anticodon is U or G, the tRNA can pair alternatively with two code words,
- but if the 1st base of the anticodon is I (inosine) or another modified base, the tRNA can pair with three codons.
When the tRNA pairs with more than one codon, the first two bases of the codon show firm pairing with the corresponding bases of the anticodon, and wobble pairing occurs between the 3rd base of the codon and the 1st base of the anticodon.
Codon anticodon base pairing phenomenon according to wobble hypothesis:
|Anticodon Sequence||Codon Sequence Prefered for Pairing 5′-3′||3rd base Pairing||Pairing nature|
|XYU||YXA or YXG||U=A or U-G||Strong or Wobble|
4. To read 61 different codons in the prokaryotes all together 32 tRNA molecules are needed.
Start & Stop Codon:
In most organisms, AUG acts as a start codon, that codes the amino acid methionine. Similarly, three codons UAA (Ochre), UAG (Amber), and UGA (Opal) act as a stop or nonsense codon.
Nearly Universal Nature of Codon
This is the uniqueness of the genetic codon meaning that leaving only a few exceptions, the genetic codon is universal in nature, and because of the few exceptions the genetic codon is considered to be largely universal or nearly universal. As constitutionally the genetic material, the DNA, is mostly identical in all living organisms, the same type of codons specifying the same type of amino acids may be encountered in all types of organisms. To be more precise in explanation, AUG specifies methionine in all organisms starting from virus to man. If the exceptions for such a nature of codons would have not been encountered it may have been safely said that codon is universal. The nearly universal nature of the codons may be established through several pieces of evidence.
The viruses during infection introduce only the genetic material into the host cell and with the use of resources from the host cell the viral genetic material may produce the viral proteins. It means that the language of the virus may be understood or read by the host which may be a prokaryote or eukaryote. In the laboratory experiments protein synthesizing machinery from one species could be utilized to translate mRNA from another species. For instance, rabbit reticulocyte mRNA could be translated in the in vitro synthesis by using charged tRNA from E. coli. Again great uniformity in an amino acid sequence of homologous proteins from widely divergent species is supportive of the universality of genetic codons. Cytochrome C obtained from man, horse, chickens, tuna, yeast, and several bacteria, in spite of their notable diversity, showed significant similarity in amino acid sequence.
In some situation exception to the universality of genetic codon appears prominent. Barrel et. al., 1979 reported that in vertebrate mitochondria. AUA is used for specifying methionine and UGA for tryptophan contrary to their use universally for specifying isoleucine and for termination of polypeptide chain respectively. The exception to the universality of genetic code was also noticed in several microorganisms. Yamao et al., 1985 showed that in Mycoplasma capricolum UGA codes for tryptophan. In ciliated protozoa, Tetrahymena, two universal stop codons, UAA and UAG code for glutamine (Kuchino et. al., 1985).