Anda di halaman 1dari 68

The Mechanism of Translation

Translation involves three steps:

1. Initiation
2. Elongation
3. Termination

Initiation

Translation begins with the binding of the small ribosomal subunit to a specific sequence on the
mRNA chain. The small subunit binds via complementary base pairing between one of its internal
subunits and the ribosome binding site, a sequence of about ten nucleotides on the mRNA
located anywhere from 5 and 11 nucleotides from the initiating codon, AUG.

Figure %: Initiation
Once the small subunit has bound, a special tRNA molecule, called N-formyl methionine, or fMet,
recognizes and binds to the initiator codon. Next, the large subunit binds, forming what is known
as the initiation complex. With the formation of the initiation complex, the fMet-tRNA occupies the
P site of the ribosome and the A site is left empty. This entire initiation process is facilitated by
extra proteins, called initiation factors that help with the binding of ribosomal subunits and tRNA
to the mRNA chain.

Elongation

With the formation of the complex containing fMet-tRNA in the peptidyl site, an aminoacyl tRNA
with the complementary anticodon sequence can bind to the mRNA passing through the acceptor
site. This binding is aided by elongation factors that are dependent upon the energy from the
hydrolysis of GTP. Elongation factors go through a cycle to regenerate GTP after its hydrolysis.

Now, with tRNA bearing a chain of amino acids in the p site and tRNA containing a single amino
acid in the A site, the addition of a link to the chain can be made. This addition occurs through the
formation of a peptide bond, the nitrogen-carbon bond that forms between amino acid subunits to
form a polypeptide chain. This bond is catalyzed by the enzyme peptidyl transferase.
Figure %: Peptide Formation

The peptide bond occurs between the carboxyl group on the lowest link in the peptide chain
located at the p site and the amine group on the amino acid in the A group. As a result, the
peptide chain shifts over to the A site, with the original amino acid on the A site as the lowest link
in the chain. The tRNA in the A site becomes peptidyl RNA, and shifts over to the P site.
Meanwhile, the ribosome engages in a process called translocation: spurred by elongation
factors, the ribosome moves three nucleotides in the 3' prime direction along the mRNA. In other
words, the ribosome moves so that a new mRNA codon is accessible in the A site.

Introduction
Translation is the RNA directed synthesis of polypeptides. This process requires all three
classes of RNA. Although the chemistry of peptide bond formation is relatively simple, the
processes leading to the ability to form a peptide bond are exceedingly complex. The template for
correct addition of individual amino acids is the mRNA, yet both tRNAs and rRNAs are involved in
the process. The tRNAs carry activated amino acids into the ribosome which is composed of
rRNA and ribosomal proteins. The ribosome is associated with the mRNA ensuring correct
access of activated tRNAs and containing the necessary enzymatic activities to catalyze peptide
bond formation.

back to the top

Historical Perspectives

Early genetic experiments demonstrated:

1. The co-linearity between the DNA and protein encoded by the DNA. Yanofsky
showed that the order of observed mutations in the E. coli tryptophan synthetase
gene was the same as the corresponding amino acid changes in the protein.
2. Crick and Brenner demonstrated, from a large series of double mutants of the
bacteriophage T4, that the genetic code is read in a sequential manner starting from a
fixed point in the gene, the code was most likely a triplet and that all 64 possible
combinations of the 4 nucleotides code for amino acids, i.e. the code is degenerate
since there are only 20 amino acids.

The above mentioned experiments only indicated deductive correlation's regarding the genetic
code. The precise dictionary of the genetic code was originally determined by the use of in vitro
translation systems derived from E. coli cells. Synthetic polyribonucleotides were added to these
translation system along with all twenty amino acids. One amino acid at a time was radiolabeled.
The first demonstration of the dictionary of the genetic code was with the use of poly(U). This
synthetic polyribonucleotide encoded the amino acid phenylalanine, i.e. the resulting polypeptide
was poly(F).

The utilization of a variety of repeating di- tri- and tetra polyribonucleotides established the
entire genetic code. These results of these experiments confirmed that some amino acids are
encoded for by more than one triplet codon, hence the degeneracy of the genetic code. These
experiments also established the identity of translational termination codons.

An additional important point to come from these early experiments was that the 5' end of the
RNA corresponded to the amino terminus of the polypeptide. This was important since previous
labeling experiments had demonstrated that the N-terminus is the beginning of the elongating
polypeptide. Therefore, in vitro translation experiments established that the RNA is read in the 5'
to 3' direction.

Crick first postulated that translation of the genetic code would be carried out through
mediation of adapter molecules. Each adapter was postulated to carry a specific amino acid and
to recognize the corresponding codon. He suggested that the adapters contain RNA because
codon recognition could then occur by complementarity to the sequences of the codons in the
mRNA.

During the course of in vitro protein synthesis and labeling experiments it was shown that the
amino acids became transiently bound to a low molecular weight mass fraction of RNA. This
fraction of RNAs have been termed transfer RNAs (tRNAs) since they transfer amino acids to the
elongating polypeptide. These results indicate that accurate translation requires two equally
important recognition steps:

1. The correct choice of amino acid needs to be made for attachment to the
correspondingly correct tRNA.
2. Selection of the correct amino acid-charged tRNA by the mRNA. This process is
facilitated by the ribosomes which we will discuss below.

Summary of Experiments to Determine the Genetic Code

1. The genetic code is read in a sequential manner starting near the 5' end of the
mRNA. This means that translation proceeds along the mRNA in the 5' ——> 3'
direction which corresponds to the N-terminal to C-terminal direction of the amino acid
sequences within proteins.
2. The code is composed of a triplet of nucleotides.
3. That all 64 possible combinations of the 4 nucleotides code for amino acids, i.e. the
code is degenerate since there are only 20 amino acids.

The precise dictionary of the genetic code was determined with the use of in vitro translation
systems and polyribonucleotides. The results of these experiments confirmed that some amino
acids are encoded by more than one triplet codon, hence the degeneracy of the genetic code.
These experiments also established the identity of translational termination codons.

back to the top

The Genetic Code

Shown below are the triplets that are used for each of the 20 amino acids found in eukaryotic
proteins. The row on the left side indicates the first nucleotide of each triplet and the row across
the top represents the second nucleotide. The wobble position nucleotides are indicated in blue.
The three stop codons are highlighted in red.

back to the top

Characteristics of tRNAs

More than 300 different tRNAs have been sequenced, either directly or from their
corresponding DNA sequences. tRNAs vary in length from 60–95 nucleotides (18–28 kD). The
majority contain 76 nucleotides. Evidence has shown that the role of tRNAs in translation is to
carry activated amino acids to the elongating polypeptide chain. All tRNAs:

1. Exhibit a cloverleaf-like secondary structure.


2. Have a 5'-terminal phosphate.
3. Have a 7 bp stem that includes the 5'-terminal nucleotide and may contain non-
Watson-Crick base pairs, e.g. GU. This portion of the tRNA is called the acceptor
since the amino acid is carried by the tRNA while attached to the 3'-terminal OH
group.
4. Have a D loop and a TΨC loop.

Dihydrouridine (D) Pseudouridine (Ψ)

5. Have an anti-codon loop.


6. Terminate at the 3'-end with the sequence 5'–CCA–3'.
7. Contain 13 invariant positions and 8 semi-variant positions.
8. Contain numerous modified nucleotide bases (see Biochemistry of Nucleic Acids
for structures of several modified nucleotides in tRNAs).
back to the top

Activation of Amino Acids

Activation of amino acids is carried out by a two step process catalyzed by aminoacyl-tRNA
synthetases. Each tRNA, and the amino acid it carries, are recognized by individual aminoacyl-
tRNA synthetases. This means there exists at least 20 different aminoacyl-tRNA synthetases,
there are actually at least 21 since the initiator met-tRNA of both prokaryotes and eukaryotes is
distinct from non-initiator met-tRNAs.

Activation of amino acids requires energy in the form of ATP and occurs in a two step reaction
catalyzed by the aminoacyl-tRNA synthetases. First the enzyme attaches the amino acid to the α-
phosphate of ATP with the concomitant release of pyrophosphate. This is termed an aminoacyl-
adenylate intermediate. In the second step the enzyme catalyzes transfer of the amino acid to
either the 2'– or 3'–OH of the ribose portion of the 3'-terminal adenosine residue of the tRNA
generating the activated aminoacyl-tRNA. Although these reaction are freely reversible, the
forward reaction is favored by the coupled hydrolysis of PPi.

Accurate recognition of the correct amino acid as well as the correct tRNA is different for each
aminoacyl-tRNA synthetase. Since the different amino acids have different R groups, the enzyme
for each amino acid has a different binding pocket for its specific amino acid. It is not the
anticodon that determines the tRNA utilized by the synthetases. Although the exact mechanism is
not known for all synthetases, it is likely to be a combination of the presence of specific modified
bases and the secondary structure of the tRNA that is correctly recognized by the synthetases.

It is absolutely necessary that the discrimination of correct amino acid and correct tRNA be
made by a given synthetase prior to release of the aminoacyl-tRNA from the enzyme. Once the
product is released there is no further way to proof-read whether a given tRNA is coupled to its
corresponding tRNA. Erroneous coupling would lead to the wrong amino acid being incorporated
into the polypeptide since the discrimination of amino acid during protein synthesis comes from
the recognition of the anticodon of a tRNA by the codon of the mRNA and not by recognition of
the amino acid. This was demonstrated by reductive desulfuration of cys-tRNA cys with Raney
nickel generating ala-tRNAcys. Alanine was then incorporated into an elongating polypeptide
where cysteine should have been.

back to the top

The Wobble Hypothesis

As discussed above, 3 of the possible 64 triplet codons are recognized as translational


termination codons. The remaining 61 codons might be considered as being recognized by
individual tRNAs. Most cells contain isoaccepting tRNAs, different tRNAs that are specific for the
same amino acid, however, many tRNAs bind to two or three codons specifying their cognate
amino acids. As an example yeast tRNAphe has the anticodon 5'–GmAA–3' and can recognize
the codons 5'–UUC–3' and 5'–UUU–3'. It is, therefore, possible for non-Watson-Crick base
pairing to occur at the third codon position, i.e. the 3' nucleotide of the mRNA codon and the 5'
nucleotide of the tRNA anticodon. This has phenomenon been termed the wobble hypothesis

Diagram showing the various modified nucleotides of tRNAs that are found in the wobble
position in the anticodon. The top half shows the wobble nucleotides of the anticodon in blue and
the various nucleotides (in red) of the wobble position of the codon that can be found in non-
Watson-Crick base-pairs. The lower panel illustrates the opposite showing the wobble
nucleotides of the codon in blue and the associated wobble nucleotides of the anticodon in red.

Now that we have charged aminoacyl-tRNAs and the mRNAs to convert nucleotide sequences
to amino acid sequences we need to bring the two together accurately and efficiently. This is the
job of the ribosomes. Ribosomes are composed of proteins and rRNAs.

All living organisms need to synthesis proteins and all cells of an organism need to synthesize
proteins, therefore, it is not hard to imagine that ribosomes are a major constituent of all cells of
all organisms. The make up of the ribosomes, both rRNA and associated proteins are slightly
different between prokaryotes and eukaryotes.

back to the top


Order of Events in Translation

The ability to begin to identify the roles of the various ribosomal proteins in the processes of
ribosome assembly and translation was aided by the discovery that the ribosomal subunits will
self assemble in vitro from their constituent parts.

Following assembly of both the small and large subunits onto the mRNA, and given the
presence of charged tRNAs, protein synthesis can take place. To reiterate the process of protein
synthesis:

1. Synthesis proceeds from the N-terminus to the C-terminus of the protein.


2. The ribosomes "read" the mRNA in the 5' to 3' direction.
3. Active translation occurs on polyribosomes (also termed polysomes). This means
that more than one ribosome can be bound to and translate a given mRNA at any one
time.
4. Chain elongation occurs by sequential addition of amino acids to the C-terminal
end of the ribosome bound polypeptide.

Translation proceeds in an ordered process. First accurate and efficient initiation occurs, then
chain elongation and finally accurate and efficient termination must occur. All three of these
processes require specific proteins, some of which are ribosome associated and some of which
are separate from the ribosome, but may be temporarily associated with it.

back to the top

Initiation

Initiation of translation in both prokaryotes and eukaryotes requires a specific initiator tRNA,
tRNAmeti, that is used to incorporate the initial methionine residue into all proteins. In E. coli a
specific version of tRNAmeti is required to initiate translation, [tRNAfmeti]. The methionine attached
to this initiator tRNA is formylated. Formylation requires N10-formy-THF and is carried out after the
methionine is attached to the tRNA. The fmet-tRNAfmeti still recognizes the same codon, AUG, as
regular tRNAmet. Although tRNAmeti is specific for initiation in eukaryotes it is not a formylated
tRNAmet.

The initiation of translation requires recognition of an AUG codon. In the polycistronic


prokaryotic RNAs this AUG codon is located adjacent to a Shine-Delgarno element in the mRNA.
The Shine-Delgarno element is recognized by complimentary sequences in the small subunit
rRNA (16S in E. coli). In eukaryotes initiator AUGs are generally, but not always, the first
encountered by the ribosome. A specific sequence context, surrounding the initiator AUG, aids
ribosomal discrimination. This context is A/GCCA/GCCAUGA/G in most mRNAs.
The Shine-Delgarno element is found at the 5' side of each initiator AUG codon in prokaryotic
polycistronic mRNAs. This element is complementary to sequences present near the 3'-end of the
16S rRNA of the prokaryotic ribosome.

back to the top

Eukaryotic Initiation Factors and Their Functions

The specific non-ribosomally associated proteins required for accurate translational initiation
are termed initiation factors. In E. coli they are IFs in eukaryotes they are eIFs. Numerous eIFs
have been identified:

Initiation Factor Activity

eIF-1 repositioning of met-tRNA to facilitate mRNA binding

eIF-2 ternary complex formation

eIF-2A AUG-dependent met-tRNAmeti binding to 40S ribosome

eIF-2B (also called GEF) guanine


GTP/GDP exchange during eIF-2 recycling
nucleotide exchange factor

ribosome subunit antiassociation by binding to 40S


subunit; eIF-3e and eIF-3i subunits transform normal
eIF-3, composed of 13 subunits (see
cells when overexpressed, eIF-3A (also called eIF3
below)
p170) overexpression has been shown to be associated
with several human cancers
Initiation factor complex often referred
to as eIF-4F composed of 3 primary mRNA binding to 40S subunit, ATPase-dependent RNA
subunits: eIF-4E, eIF-4A, eIF-4G and helicase activity, interaction between polyA tail and cap
at least 2 additional factors: PABP, structure
Mnk1 (or Mnk2)

binds to the polyA tail of mRNAs and provides a link to


PABP: polyA-binding protein
eIF-4G

Mnk1 and Mnk2 phosphorylate eIF-4E increasing association with cap


eIF-4E kinases structure

eIF-4A ATPase-dependent RNA helicase

5' cap recognition; frequently found overexpressed in


eIF-4E (see below) human cancers, inhibition of eIF4E is currently a target
for anti-cancer therapies

when de-phosphorylated 4E-BP binds eIF-4E and


4E-BP (also called PHAS) 3 known represses its' activity, phosphorylation of 4E-BP occurs
forms in response to many growth stimuli leading to release of
eIF-4E and increased translational initiation

acts as a scaffold for the assembly of eIF-4E and -4A in


eIF-4G the eIF-4F complex, interaction with PABP allows 5'-end
and 3'-ends of mRNAs to interact

eIF-4B stimulates helicase, binds simultaneously with eIF-4F

release of eIF-2 and eIF-3, ribosome-dependent


eIF-5
GTPase

eIF-6 ribosome subunit antiassociation

back to the top

Activities of eIF-3

The eIF-3 complex is composed of 13 different subunits whose sizes, nomenclature and
functions are described in the Table below. The importance of the eIF-3 complex in translation
initiation is demonstrated by the fact that assembly of the eIF-2-GTP-met-tRNA imet (the ternary
complex), binding of the ternary complex and other components of the 43S pre-initiation complex
(PIC) to the ribosome 40S subunit, recruitment of the mRNA to the 43S PIC, and scanning of the
mRNA for the initiator AUG codon recognition are all dependent on eIF-3 complex activity.
Therefore, primary function of the components of eIF-3 is to act as a scaffold for the assembly of
the PIC and this assembled complex is referred to as the multi-initiation factor complex (MFC).
Human subunit
Nomenclature Function(s)
designation

binds 40S subunit, binds eIF-4B, involved in formation of


eIF3A p170
MFC, recruitment of mRNA and the ternary complex

binds 40S subunit, involved in formation of MFC,


eIF3B p116 recruitment and scanning of mRNA, recruitment of ternary
complex

binds 40S subunit, involved in formation of MFC,


eIF3C p110 recruitment and scanning of mRNA, recruitment of ternary
complex, recognition of the initiator AUG

eIF3D p66

eIF3E p48

proposed to be the binding site for mTOR and p70S6K


eIF3F p47
(see regulation of eIF-4E activity below)

eIF3G p44 binding of eIF-4B

eIF3H p40

eIF3I p36

eIF3J p35 binds 40S subunit, involved in formation of the MFC

eIF3K p28

eIF3L p67

eIF3M GA17

back to the top

Specific Steps in Translational Initiation

Initiation of translation requires 4 specific steps:

1. A ribosome must dissociate into its' 40S and 60S subunits.


2. A ternary complex termed the preinitiation complex is formed consisting of the
initiator, GTP, eIF-2 and the 40S subunit.
3. The mRNA is bound to the preinitiation complex.
4. The 60S subunit associates with the preinitiation complex to form the 80S initiation
complex.

The initiation factors eIF-1 and eIF-3 bind to the 40S ribosomal subunit favoring
antiassociation to the 60S subunit. The prevention of subunit reassociation allows the preinitiation
complex to form.

The first step in the formation of the preinitiation complex is the binding of GTP to eIF-2 to
form a binary complex. eIF-2 is composed of three subunits, α, β and γ. The binary complex then
binds to the activated initiator tRNA, met-tRNAmet forming a ternary complex that then binds to the
40S subunit forming the 43S preinitiation complex. The preinitiation complex is stabilized by the
earlier association of eIF-3 and eIF-1 to the 40S subunit.

The cap structure of eukaryotic mRNAs is bound by specific eIFs prior to association with the
preinitiation complex. Cap binding is accomplished by the initiation factor eIF-4F. This factor is
actually a complex of 3 proteins; eIF-4E, A and G. The protein eIF-4E is a 24 kDa protein which
physically recognizes and binds to the cap structure. eIF-4A is a 46 kDa protein which binds and
hydrolyzes ATP and exhibits RNA helicase activity. Unwinding of mRNA secondary structure is
necessary to allow access of the ribosomal subunits. eIF-4G aids in binding of the mRNA to the
43S preinitiation complex.

Once the mRNA is properly aligned onto the preinitiation complex and the initiator met-tRNA met
is bound to the initiator AUG codon (a process facilitated by eIF-1) the 60S subunit associates
with the complex. The association of the 60S subunit requires the activity of eIF-5 which has first
bound to the preinitiation complex. The energy needed to stimulate the formation of the 80S
initiation complex comes from the hydrolysis of the GTP bound to eIF-2. The GDP bound form of
eIF-2 then binds to eIF-2B which stimulates the exchange of GTP for GDP on eIF-2. When GTP
is exchanged eIF-2B dissociates from eIF-2. This is termed the eIF-2 cycle (see diagram below).
This cycle is absolutely required in order for eukaryotic translational initiation to occur. The GTP
exchange reaction can be affected by phosphorylation of the α-subunit of eIF-2.

At this stage the initiator met-tRNAmet is bound to the mRNA within a site of the ribosome
termed the P-site, for peptide site. The other site within the ribosome to which incoming charged
tRNAs bind is termed the A-site, for amino acid site.
The eIF-2 cycle involves the regeneration of GTP-bound eIF-2 following the hydrolysis of GTP
during translational initiation. When the 40S preinitiation complex is engaged with the 60S
ribosome to form the 80S initiation complex, the GTP bound to eIF-2 is hydrolyzed providing
energy for the process. In order for additional rounds of translational initiation to occur, the GDP
bound to eIF-2 must be exchanged for GTP. This is the function of eIF-2B which is also called
guanine nucleotide exchange factor (GEF).

back to the top

Elongation

The process of elongation, like that of initiation requires specific non-ribosomal proteins. In E.
coli these are EFs and in eEFs. Elongation of polypeptides occurs in a cyclic manner such that at
the end of one complete round of amino acid addition the A site will be empty and ready to accept
the incoming aminoacyl-tRNA dictated by the next codon of the mRNA. This means that not only
does the incoming amino acid need to be attached to the peptide chain but the ribosome must
move down the mRNA to the next codon. Each incoming aminoacyl-tRNA is brought to the
ribosome by an eEF-1α-GTP complex. When the correct tRNA is deposited into the A site the
GTP is hydrolyzed and the eEF-1α-GDP complex dissociates. In order for additional translocation
events the GDP must be exchanged for GTP. This is carried out by eEF-1βγ similarly to the GTP
exchange that occurs with eIF-2 catalyzed by eIF-2B.

The peptide attached to the tRNA in the P site is transferred to the amino group at the
aminoacyl-tRNA in the A site. This reaction is catalyzed by peptidyltransferase. This process is
termed transpeptidation. The elongated peptide now resides on a tRNA in the A site. The A site
needs to be freed in order to accept the next aminoacyl-tRNA. The process of moving the
peptidyl-tRNA from the A site to the P site is termed, translocation. Translocation is catalyzed by
eEF-2 coupled to GTP hydrolysis. In the process of translocation the ribosome is moved along
the mRNA such that the next codon of the mRNA resides under the A site. Following
translocation eEF-2 is released from the ribosome. The cycle can now begin again. The ability of
eEF-2 to carry out translocation is regulated by the state of phosphorylation of the enzyme, when
phosphorylated the enzyme is inhibited. Phosphorylation of eEF-2 is catalyzed by the enzyme
eEF2 kinase (eEF2K). Regulation of eEF2K activity is normally under the control of insulin and
Ca2+ fluxes. The Ca2+-mediated effects are the result of calmodulin interaction with eEF2K.
Activation of eEF2K in skeletal muscle by Ca2+ is important to reduce consumption of ATP in the
process of protein synthesis during periods of exertion which will lead to release of intracellular
Ca2+ stores. eEF2K itself is also regulated by phosphorylation and one of the kinases that
phosphorylates the enzyme is regulated by mTOR (see Regulation of eIF-4E below). In addition,
the master metabolic regulatory kinase, AMP-activated protein kinase (AMPK) will phosphorylate
and activate eEF2K leading to inhibition of eEF-2 activity.

back to the top

Termination

Like initiation and elongation, translational termination requires specific protein factors
identified as releasing factors, RFs in E. coli and eRFs in eukaryotes. There are 2 RFs in E. coli
and one in eukaryotes. The signals for termination are the same in both prokaryotes and
eukaryotes. These signals are termination codons present in the mRNA. There are 3 termination
codons, UAG, UAA and UGA.

In E. coli the termination codons UAA and UAG are recognized by RF-1, whereas RF-2
recognizes the termination codons UAA and UGA. The eRF binds to the A site of the ribosome in
conjunction with GTP. The binding of eRF to the ribosome stimulates the peptidytransferase
activity to transfer the peptidyl group to water instead of an aminoacyl-tRNA. The resulting
uncharged tRNA left in the P site is expelled with concomitant hydrolysis of GTP. The inactive
ribosome then releases its mRNA and the 80S complex dissociates into the 40S and 60S
subunits ready for another round of translation.

back to the top

Selenoproteins

Selenium is a trace element and is found as a component of several prokaryotic and


eukaryotic enzymes that are involved in redox reactions. The selenium in these selenoproteins is
incorporated as a unique amino acid, selenocysteine, during translation. A particularly important
eukaryotic selenoenzyme is glutathione peroxidase. This enzyme is required during the oxidation
of glutathione by hydrogen peroxide (H2O2) and organic hydroperoxides.

Structure of the Selenocysteine Residue

Incorporation of selenocysteine by the translational machinery occurs via an interesting and


unique mechanism. The tRNA for selenocysteine is charged with serine and then enzymatically
selenylated to produce the selenocysteinyl-tRNA. The anticodon of selenocysteinyl-tRNA
interacts with a stop codon in the mRNA (UGA) instead of a serine codon. The selenocysteinyl-
tRNA has a unique structure that is not recognized by the termination machinery and is brought
into the ribosome by a dedicated specific elongation factor. An element in the 3' non-translated
region (UTR) of selenoprotein mRNAs determines whether UGA is read as a stop codon or as a
selenocysteine codon.

back to the top

Regulation of eIF-4E Activity

The cellular levels of eIF-4E are the lowest of all eukaryotic initiation factors which makes this
factor a prime target for regulation. Indeed, at least 3 distinct mechanisms are known to exist that
regulate the level and activity of eIF-4E. These include regulation of the level of transcription of
the eIF-4E gene, post-translational modification via phosphorylation and inhibition by interaction
with binding proteins.

Although the exact mechanisms used to upregulate the transcription of the eIF-4E gene are
not yet well understood, it is known that exposure of cells to growth factors as well as activation of
T cells leads to increased expression of eIF-4E. The proto-oncogene MYC is believed to play a
role in the transcriptional activation of eIF-4E as 2 functional MYC-binding sites have been found
in the promoter region of the eIF-4E gene. Of significant note is the finding that cells that are
stably over-expressing the MYC gene also have enhanced levels of eIF-4E. Quite strikingly it has
been shown that promiscuous elevation in the levels of eIF-4E lead to tumorigenesis placing this
translation factor in the category of proto-oncogene.

Numerous extracellular stimuli (e.g. insulin, EGF, angiotensin II and gastrin) that exert a
portion of their effects at the level of enhanced translation do so by affecting the state of eIF-4E
phosphorylation. However, it should be noted that not all signals that lead to increased eIF-4E
phosphorylation lead to increased rates of translation. Changes in eIF-4E phosphorylation
correlate well with progression through the cell cycle. In resting (G0) cells eIF-4E phosphorylation
is low, it increases during G1 and S phase and then declines again in M phase. Phosphorylation
of eIF-4E occurs at one major site which is Ser209 (in the human and mouse proteins).

The primary signal transduction pathway leading to eIF-4E phosphorylation is that involving
the RAS gene. Many growth factors stimulate activation of RAS in response to binding their
cognate receptors. Subsequently, RAS activation leads to the phosphorylation and activation of
MAP-interacting kinase-1 (Mnk1) which in turn phosphorylates eIF-4E. Although the exact effect
of eIF-4E phosphorylation is not clearly defined, it may be necessary to increase affinity of eIF-4E
for the mRNA cap structure and for eIF-4G.

The principal mechanism utilized in the regulation of eIF-4E activity is through its interaction
with a family of binding/repressor proteins termed 4EBPs (4E binding proteins) which are widely
distributed in numerous vertebrate and invertebrate organisms. In mammalian cells 3 related
4EBPs have been found where 4EBP1 and 4EBP2 are also identified as PHAS-I and PHAS-II
(PHAS refers to properties of heat and acid stability).

Binding of 4E-BPs to eIF-4E does not alter the affinity of eIF-4E for the cap structure but
prevents the interaction of eIF-4E with eIF-4G which in turn suppresses the formation of the eIF-
4F complex (see Table of Initiation Factors above). The ability of 4EBPs to interact with eIF-4E is
controlled via the phosphorylation of specific Ser and Thr residues in 4EBP. When
hypophosphorylated, 4EBPs bind with high efficiency to eIF-4E but lose their binding capacity
when phosphorylated. Numerous growth and signal transduction stimulating effectors lead to
phosphorylation of 4E-BPs just as these same responses can lead to phosphorylation of eIF-4E.
There are several signal transduction pathways whose activations lead to phosphorylation of
4E-BPs. These include pathways that lead to activation of phosphatidylinositol 3-kinase (PI3K),
the Akt Ser/Thr kinase which is also called protein kinase B (PKB) and the FKBP12-rapamycin-
associated protein/mammalian target of rapamycin (FRAP/mTOR) family of proteins. Akt was
originally identified as a virally encoded oncogene and there are now at least three members of
the PKB/Akt family identified as Akt1, Akt2, and Akt3. The mammalian TOR proteins are
homologs of the yeast TOR proteins that were identified in a screen for yeast mutants resistant to
rapamycin. Rapamycin is an immunosuppressant used primarily in the prevention of tissue
rejection following organ transplantation. Rapamycin functions within cells by binding the
immunophilin FK506-binding protein 12 (FKBP12). Immunophilins are intracellular proteins that
binds to immunosuppressive drugs such as FK506 and rapamycin. When rapamycin inhibits the
kinase activity of FRAP/mTOR it can no longer phosphorylate 4EB. One of the major effects of
insulin is increased protein synthesis and this effect is elicited, in part, via activation of mTOR
function. For more information on the regulation of protein synthesis by insulin see the Insulin
Action page.

Targets for mTOR regulation of translational initiation and elongation. AMPK = AMP-activated
kinase. TSC1 and TSC2 = Tuberous sclerosis tumor suppressors 1 (hamartin) and 2 (tuberin);
Rheb = Ras homolog enriched in brain; PKB/Akt = protein kinase B; 4EBP1 = eIF-4E binding
protein; p70S6K = 70kDa ribosomal protein S6 kinase, also called S6K; eEF2K = eukaryotic
elongation factor 2 kinase.

Regulation of mTOR activity is effected via several mechanisms. Activation of AMPK results in
phosphorylation and activation of the TSC1/TSC2 complex which results in inhibition of mTOR.
AMPK can also phosphorylate and inhibit mTOR. Conversely, activation of PKB (as in the case of
insulin receptor activation) leads to activation of mTOR either by inhibition of the TSC1/TSC2
complex or by phosphorylation and activation of mTOR directly. Activation of mTOR leads to
phosphorylation of p70S6K and 4EBP1. The net effect of phosphorylation of 4EBP1 is that it is
released from eIF-4E allowing eIF-4E to actively bind eIF-4G and recognize the cap structure of
mRNAs. Activated p70S6K phosphorylates and inhibits eEF2K. If eEF2K does not phosphorylate
eEF2 then translation elongation proceeds uninhibited.

back to the top

Heme Control of Translation

Regulation of initiation in eukaryotes is effected by phosphorylation of a ser(S) residue in the


α-subunit of eIF-2. Phosphorylated eIF-2 in the absence of eIF-2B is just as active an initiator as
non-phosphorylated eIF-2. However, when eIF-2 is phosphorylated the GDP-bound complex is
stabilized and exchange for GTP is inhibited. The exchange of GDP for GTP is mediated by eIF-
2B (also called guanine nucleotide exchange factor, GEF). When eIF-2 is phosphorylated it binds
eIF-2B more tightly thus slowing the rate of exchange. It is this inhibited exchange that affects the
rate of initiation.

The phosphorylation of eIF-2 is the result of an activity called heme-controlled inhibitor (HCI)
which functions as diagrammed below. HCI is generated in the absence of heme, a mitochondrial
product. Removal of phosphate is catalyzed by a specific eIF-2 phosphatase which is unaffected
by heme. The presence of HCI was first seen in in vitro translation system derived from lysates of
reticulocytes. Reticulocytes synthesize almost exclusively hemoglobin at an extremely high rate.
In an intact reticulocyte eIF-2 is protected from phosphorylation by a specific 67 kDa protein.
The regulation of translation by heme controlled inhibitor (HCI). Control of translation by heme
is clinically important only in erythrocytes. Erythrocytes are enucleate and contain primarily globin
mRNA. When the level of heme (required for the synthesis of biologically active hemoglobin) is
low it would be inefficient for erythrocytes to synthesize globin protein. As the level of heme falls
the activity of HCI increases. HCI is a kinase which phosphorylates eIF-2. When phosphorylated,
eIF-2 still hydrolyzes bound GTP to GDP and still interacts with eIF-2B (GEF). However, the rate
of eIF-2B-mediated GTP exchange is greatly reduced. This renders eIF-2 incapable of being
used to form a new ternary initiation complex and translational initiation is reduced. When the
level of heme again rises the activity of HCI is reduced and translational initiation is once again
active.

back to the top

Interferon Control of Translation

Regulation of translation can also be induced in virally infected cells. It would benefit a virally
infected cell to turn off protein synthesis to prevent propagation of the viruses. This is
accomplished by the induced synthesis of interferons (IFs). There are 3 classes of IFs. The
leukocyte or α-IFs, the fibroblast or β-IFs and the lymphocyte or γ-IFs. IFs are induced by
dsRNAs and themselves induce a specific kinase termed RNA-dependent protein kinase (PKR)
that phosphorylates eIF-2 thereby shutting off translation in a similar manner to that of heme
control of translation. Additionally, IFs induce the synthesis of 2'-5'-oligoadenylate, pppA(2'p5'A)n,
that activates a pre-existing ribonuclease, RNase L. RNase L degrades all classes of mRNAs
thereby shutting off translation.

back to the top

Iron Control of Translation

Regulation of the translation of certain mRNAs occurs through the action of specific RNA-
binding proteins. Protein of this class have been identified that bind to sequences in either the 5'
non-translated region (5'-UTR) or 3'-UTR. Two particularly interesting and important regulatory
schemes related to iron metabolism encompass RNA binding proteins that bind to either the 5'-
UTR of one mRNA or the 3'-UTR of another.

The transferrin receptor is a protein located in the plasma membrane that binds the protein
transferrin. Transferrin is the major iron transport protein in the plasma. When iron levels are low
the rate of synthesis of the transferrin receptor mRNA increases so that cells can take up more
iron. This regulation occurs through the action of an iron response element binding protein (IRBP)
that binds to specific iron response elements (IREs) in the 3'-UTR of the transferrin receptor
mRNA. These IREs form hair-pin loop structures that are recognized by IRBP. This IRBP is an
iron-deficient form of aconitase, the iron-requiring enzyme of the TCA cycle. When iron levels are
low, IRBP is free of iron and can therefore, interact with the IREs in the 3'-UTR of the transferrin
receptor mRNA. Transferrin receptor mRNA with IRBP bound is stabilized from degradation.
Conversely, when iron levels are high, IRBP binds iron then cannot interact with the IREs in the
transferrin receptor mRNA. The effect is an increase in degradation of the transferrin receptor
mRNA.

A related, but opposite, phenomenon controls the translation of the ferritin mRNA. Ferritin is
an iron-binding protein that prevents toxic levels of ionized iron (Fe2+) from building up in cells.
The ferritin mRNA has an IRE in its 5'-UTR. As with the transferrin receptor story, when iron
levels are high, IRBP cannot bind to the IRE in the 5'-UTR of the ferritin mRNA. This allows the
ferritin mRNA to be translated. Conversely, when iron levels are low, the IRBP binds to the IRE in
the ferrritin mRNA preventing its translation.

back to the top

Protein Synthesis Inhibitors

Many of the antibiotics utilized for the treatment of bacterial infections as well as certain toxins
function through the inhibition of translation. Inhibition can be effected at all stages of translation
from initiation to elongation to termination.

Several Antibiotic and Toxin inhibitors of Translation

Inhibitor Comments

Chloramphenicol inhibits prokaryotic peptidyl transferase


Streptomycin inhibits prokaryotic peptide chain initiation, also induces mRNA misreading

inhibits prokaryotic aminoacyl-tRNA binding to the ribosome small subunit

Tetracycline

Neomycin similar in activity to streptomycin

Erythromycin inhibits prokaryotic translocation through the ribosome large subunit

similar to erythromycin only by preventing EFG from dissociating from the


Fusidic acid
large subunit

resembles an aminoacyl-tRNA, interferes with peptide transfer resulting in


Puromycin
premature termination in both prokaryotes and eukaryotes

Diptheria toxin catalyzes ADP-ribosylation of and inactivation of eEF-2, eEF-2 contains a


modified His residue known as dipthamide, it is this resudue that is the target
of diptheria toxin
ADP-ribosylated dipthamide residue

found in castor beans, catalyzes cleavage of the eukaryotic large subunit


Ricin
rRNA

inhibits eukaryotic peptidyltransferase

Cycloheximide
Figure %: Translocation
With the A site open again, the next appropriate aminoacyl tRNA can bind there and the same
reaction takes place, yielding a three-amino acid peptide chain. This process repeats, creating a
polypeptide chain in the P site of the ribosome. A single ribosome can translate 60 nucleotides
per second. This speed can be vastly augmented when ribosomes link up to form polyribosomes.

Termination

Translation ends when one of three stop codons, UAA, UAG, or UGA, enters the A site of
the ribosome. There are no aminoacyl tRNA molecules that recognize these sequences. Instead,
release factors bind to the P site, catalyzing the release of the completed polypeptide chain and
separating the ribosome into its original small and large subunits.

Fig. 1: Transfer RNA (tRNA)

Translation of mRNA by tRNA: Formation of the Initiation Complex

Codon Sheet

To initiate translation, a 30S ribosomal subunitbinds to a short nucleotide sequence on the mRNA
called the ribosome binding site. However, translation doesn't usually begin until the 30S
ribosomal subunit reaches the first AUG sequence in the mRNA. For this reason, AUG is known
as the start codon. At this point, an initiation complex composed of the 30S subunit, a tRNA
having the anticodon UAC and carrying an altered form of the amino acid methionine (N-
formylmethionine or f-Met), and proteins called initiation factors is formed.
Fig. 4: Translation of mRNA by tRNA: 50S Ribosomal Subunit Attaches to the Initiation
Complex.

Codon Sheet (Fig. 2)

A 50S ribosomal subunit then attaches to the initiation complex and the initiation factors leave.
This forms the 70S ribosome.

Fig. 5A: Translation of mRNA by tRNA.

Codon Sheet (Fig. 2)

Now an aminoacyl-tRNA with an anticodon complementary to the third codon, GGA, comes into
the "A" site of the ribosome.
Translation of mRNA by tRNA.

Codon Sheet (Fig. 2)

Once the anticodon of the tRNA at the "A" site forms hydrogen bonds with the second codon along the
mRNA, the amino acid being held by the tRNA at the "P" site of the ribosome is enzymatically removed
and forms a peptide bond with the amino acid carried by the tRNA at the "A" site.

Termination:In comparison to initiation and elongation, termination is relatively a simple process.


Multiple cycles of elongation occur culminating in polymerization
of the specific amino acids into a protein molecule. There is no tRNA with an anticodon capable of
recognizing such a termination signal.

Releasing factors (eRF) are capable of recognizing termination signal residues in the A site. The releasing
factor, in conjugation with GTP and the peptidyl transferases, promotes the hydrolysis of the bond between
the peptide and the tRNA occupying the P site. The ribosome dissociates into 40S and 60S subunits.

Prokaryotic translation

Initiation
The process of initiation of translation in prokaryotes.

Initiation of translation in prokaryotes involves the assembly of the components of the


translation system which are: the two ribosomal subunits (50S & 30S subunits), the mRNA to be
translated, the first (formyl) aminoacyl tRNA (the tRNA charged with the first amino acid), GTP
(as a source of energy), and three initiation factors (IF1, IF2, and IF3) which help the assembly of
the initiation complex.[1]

The ribosome has three sites: the A site, the P site, and the E site. The A site is the point of
entry for the aminoacyl tRNA (except for the first aminoacyl tRNA, fMet-tRNAfMet, which enters at
the P site). The P site is where the peptidyl tRNA is formed in the ribosome. And the E site which
is the exit site of the now uncharged tRNA after it gives its amino acid to the growing peptide
chain.

Elongation

Elongation of the polypeptide chain involves addition of amino acids to the carboxyl end of the
growing chain. The growing protein exits the ribosome through the polypeptide exit tunnel in the
large subunit[2].

Elongation starts when the fmet-tRNA enters the P site, causing a conformational change
which opens the A site for the new aminoacyl-tRNA to bind. This binding is facilitated by
elongation factor-Tu (EF-Tu), a small GTPase. Now the P site contains the beginning of the
peptide chain of the protein to be encoded and the A site has the next amino acid to be added to
the peptide chain. The growing polypeptide connected to the tRNA in the P site is detached from
the tRNA in the P site and a peptide bond is formed between the last amino acids of the
polypeptide and the amino acid still attached to the tRNA in the A site. This process, known as
peptide bond formation, is catalyzed by a ribozyme (the 23S ribosomal RNA in the 50S ribosomal
subunit). Now, the A site has the newly formed peptide, while the P site has an uncharged tRNA
(tRNA with no amino acids). In the final stage of elongation, translocation, the ribosome moves 3
nucleotides towards the 3'end of mRNA. Since tRNAs are linked to mRNA by codon-anticodon
base-pairing, tRNAs move relative to the ribosome taking the nascent polypeptide from the A site
to the P site and moving the uncharged tRNA to the E exit site. This process is catalyzed by
elongation factor G (EF-G).
The ribosome continues to translate the remaining codons on the mRNA as more aminoacyl-
tRNA bind to the A site, until the ribosome reaches a stop codon on mRNA(UAA, UGA, or UAG).

Termination

Termination occurs when one of the three termination codons moves into the A site. These
codons are not recognized by any tRNAs. Instead, they are recognized by proteins called release
factors, namely RF1 (recognizing the UAA and UAG stop codons) or RF2 (recognizing the UAA
and UGA stop codons). These factors trigger the hydrolysis of the ester bond in peptidyl-tRNA
and the release of the newly synthesized protein from the ribosome. A third release factor RF-3
catalyzes the release of RF-1 and RF-2 at the end of the termination process.

Polysomes

Translation is carried out by more than one ribosome simultaneously. Because of the relatively
large size of ribosomes, they can only attach to sites on mRNA 35 nucleotides apart. The
complex of one mRNA and a number of ribosomes is called a polysome or polyribosome.

Effect of antibioticSeveral antibiotics exert their action by targeting the translation process
in bacteria. They exploit the differences between prokaryotic and eukaryotic
translation mechanisms to selectively inhibit protein synthesis in bacteria without
affecting the host.

Secreted and Membrane-Associated Proteins

Proteins that are membrane bound or are destined for excretion are synthesized by ribosomes
associated with the membranes of the endoplasmic reticulum (ER). The ER associated with
ribosomes is termed rough ER (RER). This class of proteins all contain an N-terminus termed a
signal sequence or signal peptide. The signal peptide is usually 13-36 predominantly
hydrophobic residues. The signal peptide is recognized by a multi-protein complex termed the
signal recognition particle (SRP). This signal peptide is removed following passage through the
endoplasmic reticulum membrane. The removal of the signal peptide is catalyzed by signal
peptidase. Proteins that contain a signal peptide are called preproteins to distinguish them from
proproteins. However, some proteins that are destined for secretion are also further proteolyzed
following secretion and, therefore contain pro sequences. This class of proteins is termed
preproproteins.
Mechanism of synthesis of membrane bound or secreted proteins. Ribosomes engage the ER
membrane through interaction of the signal recognition particle, SRP in the ribosome with the
SRP receptor in the ER membrane. As the protein is synthesized the signal sequence is passed
through the ER membrane into the lumen of the ER. After sufficient synthesis the signal peptide
is removed by the action of signal peptidase. Synthesis will continue and if the protein is secreted
it will end up completely in the lumen of the ER. If the protein is membrane associated a stop
transfer motif in the protein will stop the transfer of the protein through the ER membrane. This
will become the membrane spanning domain of the protein.

back to the top

Proteolytic Cleavage

Most proteins undergo proteolytic cleavage following translation. The simplest form of this is
the removal of the initiation methionine. Many proteins are synthesized as inactive precursors that
are activated under proper physiological conditions by limited proteolysis. Pancreatic enzymes
and enzymes involved in clotting are examples of the latter. Inactive precursor proteins that are
activated by removal of polypeptides are termed proproteins.

A complex example of post-translational processing of a preproprotein is the cleavage of


prepro-opiomelanocortin (POMC) synthesized in the pituitary (see the Peptide Hormones page
for discussion of POMC). This preproprotein undergoes complex cleavages, the pathway of which
differs depending upon the cellular location of POMC synthesis.

Another is example of a preproprotein is insulin. Since insulin is secreted from the pancreas it
has a prepeptide. Following cleavage of the 24 amino acid signal peptide the protein folds into
proinsulin. Proinsulin is further cleaved yielding active insulin which is composed of two peptide
chains linked togehter through disulfide bonds.
Still other proteins (of the enzyme class) are synthesized as inactive precursors called
zymogens. Zymogens are activated by proteolytic cleavage such as is the situation for several
proteins of the blood clotting cascade.

back to the top

Acylation

Many proteins are modified at their N-termini following synthesis. In most cases the initiator
methionine is hydrolyzed and an acetyl group is added to the new N-terminal amino acid. Acetyl-
CoA is the acetyl donor for these reactions. Some proteins have the 14 carbon myristoyl group
added to their N-termini. The donor for this modification is myristoyl-CoA. This latter modification
allows association of the modified protein with membranes. The catalytic subunit of cyclicAMP-
dependent protein kinase (PKA) is myristoylated.

back to the top

Methylation

Post-translational methylation of proteins occurs on nitrogens and oxygens. The activated


methyl donor is S-adenosylmethionine (SAM). The most common methylations are on the ε-
amine of lysine residues. Methylation of lysine residues in histones in DNA is an important
regulator of chromatin structure and consequently of transcriptional activity. Lysine methylation
was originally thought to be a permanent covalent mark, providing long-term signaling, including
the histone-dependent mechanism for transcriptional memory. However, recent evidence has
shown that lysine methylation, similar to other covalent modifications, can be transient and
dynamically regulated by an opposing de-methylation activity. Recent findings indicate that
methylation of lysine residues affects gene expression not only at the level of chromatin, but also
by modifying transcription factors.

Additional nitrogen methylations are found on the imidazole ring of histidine, the guanidino
moiety of arginine and the R-group amides of glutamate and aspartate. Methylation of the oxygen
of the R-group carboxylates of gutamate and aspartate also takes place and forms methyl esters.
Proteins can also be methylated on the thiol R-group of cysteine.

As indicated below, many proteins are modified at their C-terminus by prenylation near a
cysteine residue in the consensus CAAX. Following the prenylation reaction the protein is cleaved
at the peptide bond of the cysteine and the carboxylate residue is methylated by a prenylated
protein methyltransferase. One such protein that undergoes this type of modification is the proto-
oncogene RAS.

back to the top

Phosphorylation

Post-translational phosphorylation is one of the most common protein modifications that


occurs in animal cells. The vast majority of phosphorylations occur as a mechanism to regulate
the biological activity of a protein and as such are transient. In other words a phosphate (or more
than one in many cases) is added and later removed.

Physiologically relevant examples are the phosphorylations that occur in glycogen synthase
and glycogen phosphorylase in hepatocytes in response to glucagon release from the pancreas.
Phosphorylation of synthase inhibits its activity, whereas, the activity of phosphorylase is
increased. These two events lead to increased hepatic glucose delivery to the blood.

The enzymes that phosphorylate proteins are termed kinases and those that remove
phosphates are termed phosphatases. Protein kinases catalyze reactions of the following type:

ATP + protein <——> phosphoprotein + ADP

In animal cells serine, threonine and tyrosine are the amino acids subject to phosphorylation.
The largest group of kinases are those that phsophorylate either serines or threonines and as
such are termed serine/threonine kinases. The ratio of phosphorylation of the three different
amino acids is approximately 1000/100/1 for serine/threonine/tyrosine.

Although the level of tyrosine phosphorylation is minor, the importance of phosphorylation of


this amino acid is profound. As an example, the activity of numerous growth factor receptors is
controlled by tyrosine phosphorylation.

back to the top

Sulfation

Sulfate modification of proteins occurs at tyrosine residues such as in fibrinogen and in some
secreted proteins (eg gastrin). The universal sulfate donor is 3'-phosphoadenosyl-5'-
phosphosulphate (PAPS).

Since sulfate is added permanently it is necessary for the biological activity and not used as a
regulatory modification like that of tyrosine phosphorylation.

back to the top


Prenylation

Prenylation refers to the addition of the 15 carbon farnesyl group or the 20 carbon
geranylgeranyl group to acceptor proteins, both of which are isoprenoid compounds derived from
the cholesterol biosynthetic pathway. The isoprenoid groups are attached to cysteine residues at
the carboxy terminus of proteins in a thioether linkage (C-S-C). A common consensus sequence
at the C-terminus of prenylated proteins has been identified and is composed of CAAX, where C
is cysteine, A is any aliphatic amino acid (except alanine) and X is the C-terminal amino acid. In
order for the prenylation reaction to occur the three C-terminal amino acids (AAX) are first
removed. Following attachment of the prenyl group the carboxylate of the cysteine is methylated
in a reaction utilizing S-adenosylmethionine as the methyl donor.

In addition to numerous prenylated proteins that contain the CAAX consensus, prenylation is
known to occur on proteins of the RAB family of RAS-related G-proteins. There are at least 60
proteins in this family that are prenylated at either a CC or CXC element in their C-termini. The
RAB family of proteins are involved in signaling pathways that control intracellular membrane
trafficking.

Some of the most important proteins whose functions depend upon prenylation are those that
modulate immune responses. These include proteins involved in leukocyte motility, activation,
and proliferation and endothelial cell immune functions. It is these immune modulatory roles of
many prenylated proteins that are the basis for a portion of the anti-inflammatory actions of the
statin class of cholesterol synthesis-inhibiting drugs due to a reduction in the synthesis of
farnesylpyrophosphate and geranylpyrophosphate and thus reduced extent of inflammatory
events. Other important examples of prenylated proteins include the oncogenic GTP-binding and
hydrolyzing protein RAS and the γ-subunit of the visual protein transducin, both of which are
farnesylated. In addition, numerous GTP-binding and hydrolyzing proteins (termed G-proteins) of
signal transduction cascades have γ-subunits modified by geranylgeranylation.

Genetic code
From Wikipedia, the free encyclopedia
Jump to: navigation, search

A series of codons in part of a mRNA molecule. Each codon consists of three


nucleotides, usually representing a single amino acid.

The genetic code is the set of rules by which information encoded in genetic
material (DNA or mRNA sequences) is translated into proteins (amino acid
sequences) by living cells. The code defines a mapping between tri-nucleotide
sequences, called codons, and amino acids. With some exceptions,[1] a triplet
codon in a nucleic acid sequence specifies a single amino acid. Because the vast
majority of genes are encoded with exactly the same code (see the RNA codon
table), this particular code is often referred to as the canonical or standard
genetic code, or simply the genetic code, though in fact there are many variant
codes. For example, protein synthesis in human mitochondria relies on a genetic
code that differs from the standard genetic code.
Not all genetic information is stored using the genetic code. All organisms'
DNA contains regulatory sequences, intergenic segments, and chromosomal
structural areas that can contribute greatly to phenotype. Those elements
operate under sets of rules that are distinct from the codon-to-amino acid
paradigm underlying the genetic code.

The genetic code

After the structure of DNA was deciphered by James Watson, Thomas W.


Donnellan, Francis Crick, Maurice Wilkins and Rosalind Franklin, serious efforts
to understand the nature of the encoding of proteins began. George Gamow
postulated that a three-letter code must be employed to encode the 20 standard
amino acids used by living cells to encode proteins, because 3 is the smallest
integer n such that 4n is at least 20.[2]

The fact that codons consist of three DNA bases was first demonstrated in the
Crick, Brenner et al. experiment. The first elucidation of a codon was done by
Marshall Nirenberg and Heinrich J. Matthaei in 1961 at the National Institutes of
Health. They used a cell-free system to translate a poly-uracil RNA sequence
(i.e., UUUUU...) and discovered that the polypeptide that they had synthesized
consisted of only the amino acid phenylalanine. They thereby deduced that the
codon UUU specified the amino acid phenylalanine. This was followed by
experiments in the laboratory of Severo Ochoa demonstrating that the poly-
adenine RNA sequence (AAAAA...) coded for the polypeptide, poly-lysine. [3] and
the poly-cytosine RNA sequence (CCCCC...) coded for the polypeptide, poly-
proline.[4] Therefore the codon AAA specified the amino acid lysine, and the
codon CCC specified the amino acid proline. Using different copolymers most of
the remaining codons were then determined. Extending this work, Nirenberg and
Philip Leder revealed the triplet nature of the genetic code and allowed the
codons of the standard genetic code to be deciphered. In these experiments
various combinations of mRNA were passed through a filter which contained
ribosomes, the components of cells that translate RNA into protein. Unique
triplets promoted the binding of specific tRNAs to the ribosome. Leder and
Nirenberg were able to determine the sequences of 54 out of 64 codons in their
experiments.[5]

Subsequent work by Har Gobind Khorana identified the rest of the genetic
code. Shortly thereafter, Robert W. Holley determined the structure of transfer
RNA (tRNA), the adapter molecule that facilitates the process of translating RNA
into protein. This work was based upon earlier studies by Severo Ochoa, who
received the Nobel prize in 1959 for his work on the enzymology of RNA
synthesis.[6] In 1968, Khorana, Holley and Nirenberg received the Nobel Prize in
Physiology or Medicine for their work.[7]

Transfer of information via the genetic code

The genome of an organism is inscribed in DNA, or in the case of some


viruses, RNA. The portion of the genome that codes for a protein or an RNA is
referred to as a gene. Those genes that code for proteins are composed of tri-
nucleotide units called codons, each coding for a single amino acid. Each
nucleotide sub-unit consists of a phosphate, deoxyribose sugar and one of the 4
nitrogenous nucleobases. The purine bases adenine (A) and guanine (G) are
larger and consist of two aromatic rings. The pyrimidine bases cytosine (C) and
thymine (T) are smaller and consist of only one aromatic ring. In the double-helix
configuration, two strands of DNA are joined to each other by hydrogen bonds in
an arrangement known as base pairing. These bonds almost always form
between an adenine base on one strand and a thymine on the other strand and
between a cytosine base on one strand and a guanine base on the other. This
means that the number of A and T residues will be the same in a given double
helix, as will the number of G and C residues.[8]:102–117 In RNA, thymine (T) is
replaced by uracil (U), and the deoxyribose is substituted by ribose.[8]:127

Each protein-coding gene is transcribed into a template molecule of the


related polymer RNA, known as messenger RNA or mRNA. This, in turn, is
translated on the ribosome into an amino acid chain or polypeptide.[8]:Chp 12 The
process of translation requires transfer RNAs specific for individual amino acids
with the amino acids covalently attached to them, guanosine triphosphate as an
energy source, and a number of translation factors. tRNAs have anticodons
complementary to the codons in mRNA and can be "charged" covalently with
amino acids at their 3' terminal CCA ends. Individual tRNAs are charged with
specific amino acids by enzymes known as aminoacyl tRNA synthetases, which
have high specificity for both their cognate amino acids and tRNAs. The high
specificity of these enzymes is a major reason why the fidelity of protein
translation is maintained.[8]:464–469
There are 4³ = 64 different codon combinations possible with a triplet codon of
three nucleotides; all 64 codons are assigned for either amino acids or stop
signals during translation. If, for example, an RNA sequence, UUUAAACCC is
considered and the reading frame starts with the first U (by convention, 5' to 3'),
there are three codons, namely, UUU, AAA and CCC, each of which specifies
one amino acid. This RNA sequence will be translated into an amino acid
sequence, three amino acids long.[8]:521–539 A comparison may be made with
computer science, where the codon is similar to a word, which is the standard
"chunk" for handling data (like one amino acid of a protein), and a nucleotide is
similar to a bit, in that it is the smallest unit.

The standard genetic code is shown in the following tables. Table 1 shows
what amino acid each of the 64 codons specifies. Table 2 shows what codons
specify each of the 20 standard amino acids involved in translation. These are
called forward and reverse codon tables, respectively. For example, the codon
AAU represents the amino acid asparagine, and UGU and UGC represent
cysteine (standard three-letter designations, Asn and Cys, respectively).[8]:522

RNA codon table

polar basic acidic (stop codon)


nonpolar
2nd base
U C A G
1st (Phe/F) (Ser/S) (Tyr/Y) (Cys/C)
UUU UCU UAU UGU
base Phenylalanine Serine Tyrosine Cysteine
(Phe/F) (Ser/S) (Tyr/Y) (Cys/C)
UUC UCC UAC UGC
Phenylalanine Serine Tyrosine Cysteine
U
(Leu/L) (Ser/S)
UUA UCA UAA Ochre (Stop) UGA Opal (Stop)
Leucine Serine
(Leu/L) (Ser/S) Amber (Trp/W)
UUG UCG UAG UGG
Leucine Serine (Stop) Tryptophan
(Leu/L) (Pro/P) (His/H) (Arg/R)
CUU CCU CAU CGU
Leucine Proline Histidine Arginine
(Leu/L) (Pro/P) (His/H) (Arg/R)
CUC CCC CAC CGC
Leucine Proline Histidine Arginine
C
(Leu/L) (Pro/P) (Gln/Q) (Arg/R)
CUA CCA CAA CGA
Leucine Proline Glutamine Arginine
(Leu/L) (Pro/P) (Gln/Q) (Arg/R)
CUG CCG CAG CGG
Leucine Proline Glutamine Arginine
A (Ile/I) (Thr/T) (Asn/N) (Ser/S)
AUU ACU AAU AGU
Isoleucine Threonine Asparagine Serine
(Ile/I) (Thr/T) (Asn/N) (Ser/S)
AUC ACC AAC AGC
Isoleucine Threonine Asparagine Serine
AUA (Ile/I) ACA (Thr/T) AAA (Lys/K) AGA (Arg/R)
Isoleucine Threonine Lysine Arginine
(Met/M) (Thr/T) (Lys/K) (Arg/R)
AUG[A] ACG AAG AGG
Methionine Threonine Lysine Arginine
(Ala/A) (Asp/D) (Gly/G)
GUU (Val/V) Valine GCU GAU GGU
Alanine Aspartic acid Glycine
(Ala/A) (Asp/D) (Gly/G)
GUC (Val/V) Valine GCC GAC GGC
Alanine Aspartic acid Glycine
(Glu/E)
G (Ala/A) (Gly/G)
GUA (Val/V) Valine GCA GAA Glutamic GGA
Alanine Glycine
acid
(Glu/E)
(Ala/A) (Gly/G)
GUG (Val/V) Valine GCG GAG Glutamic GGG
Alanine Glycine
acid
A
The codon AUG both codes for methionine and serves as an initiation site: the
first AUG in an mRNA's coding region is where translation into protein begins.[9]
Inverse table
Ala/A GCU, GCC, GCA, GCG Leu/L UUA, UUG, CUU, CUC, CUA, CUG
Arg/R CGU, CGC, CGA, CGG, AGA, Lys/K AAA, AAG
AGG
Asn/N AAU, AAC Met/M AUG
Asp/D GAU, GAC Phe/F UUU, UUC
Cys/C UGU, UGC Pro/P CCU, CCC, CCA, CCG
Gln/Q CAA, CAG Ser/S UCU, UCC, UCA, UCG, AGU, AGC
Glu/E GAA, GAG Thr/T ACU, ACC, ACA, ACG
Gly/G GGU, GGC, GGA, GGG Trp/W UGG
His/H CAU, CAC Tyr/Y UAU, UAC
Ile/I AUU, AUC, AUA Val/V GUU, GUC, GUA, GUG
START AUG STOP UAA, UGA, UAG

DNA codon table

The DNA codon table is essentially identical to that for RNA, but with U
replaced by

Salient features

Sequence reading frame

A codon is defined by the initial nucleotide from which translation starts. For
example, the string GGGAAACCC, if read from the first position, contains the
codons GGG, AAA and CCC; and, if read from the second position, it contains
the codons GGA and AAC; if read starting from the third position, GAA and ACC.
Every sequence can thus be read in three reading frames, each of which will
produce a different amino acid sequence (in the given example, Gly-Lys-Pro,
Gly-Asn, or Glu-Thr, respectively). With double-stranded DNA there are six
possible reading frames, three in the forward orientation on one strand and three
reverse on the opposite strand.[10]:330 The actual frame in which a protein
sequence is translated is defined by a start codon, usually the first AUG codon in
the mRNA sequence.

Start/stop codons

Translation starts with a chain initiation codon (start codon). Unlike stop
codons, the codon alone is not sufficient to begin the process. Nearby sequences
(such as the Shine-Dalgarno sequence in E. coli) and initiation factors are also
required to start translation. The most common start codon is AUG which is read
as methionine or, in bacteria, as formylmethionine. Alternative start codons
(depending on the organism), include "GUG" or "UUG", which normally code for
valine or leucine, respectively. However, when used as a start codon, these
alternative start codons are translated as methionine or formylmethionine.[11]

The three stop codons have been given names: UAG is amber, UGA is opal
(sometimes also called umber), and UAA is ochre. "Amber" was named by
discoverers Richard Epstein and Charles Steinberg after their friend Harris
Bernstein, whose last name means "amber" in German. The other two stop
codons were named "ochre" and "opal" in order to keep the "color names" theme.
Stop codons are also called "termination" or "nonsense" codons and they signal
release of the nascent polypeptide from the ribosome due to binding of release
factors in the absence of cognate tRNAs with anticodons complementary to
these stop signals.[12]

Effect of mutations

Examples of notable mutations that can occur in humans.[13]

During the process of DNA replication, errors occasionally occur in the


polymerization of the second strand. These errors, called mutations, can have an
impact on the phenotype of an organism, especially if they occur within the
protein coding sequence of a gene. Error rates are usually very low—1 error in
every 10–100 million bases—due to the "proofreading" ability of DNA
polymerases.[14][15]

Missense mutations and nonsense mutations are examples of point mutations,


which can cause genetic diseases such as sickle-cell disease and thalassemia
respectively.[16][17][18] Clinically important missense mutations generally change the
properties of the coded amino acid residue between being basic, acidic polar or
non-polar, whereas nonsense mutations result in a stop codon.[10]:266

Mutations that disrupt the reading frame sequence by indels (insertions or


deletions) of a non-multiple of 3 nucleotide bases are known as frameshift
mutations. These mutations usually result in a completely different translation
from the original, and are also very likely to cause a stop codon to be read, which
truncates the creation of the protein.[19] These mutations may impair the function
of the resulting protein, and are thus rare in in vivo protein-coding sequences.
One reason inheritance of frameshift mutations is rare is that if the protein being
translated is essential for growth under the selective pressures the organism
faces, absence of a functional protein may cause death before the organism is
viable.[20] Frameshift mutations may result in severe genetic diseases such as
Tay-Sachs disease.[21]

Although most mutations that change protein sequences are harmful or


neutral, some mutations have a positive effect on an organism.[22] These
mutations may enable the mutant organism to withstand particular environmental
stresses better than wild-type organisms, or reproduce more quickly. In these
cases a mutation will tend to become more common in a population through
natural selection.[23] Viruses that use RNA as their genetic material have rapid
mutation rates,[24] which can be an advantage since these viruses will evolve
constantly and rapidly, and thus evade the defensive responses of e.g. the
human immune system.[25] In large populations of asexually reproducing
organisms, for example, E. coli, multiple beneficial mutations may co-occur,
causing competition among them, this phenomenon is called clonal interference.
[26]

Degeneracy of the genetic code

The genetic code has redundancy but no ambiguity (see the codon tables
above for the full correlation). For example, although codons GAA and GAG both
specify glutamic acid (redundancy), neither of them specifies any other amino
acid (no ambiguity). The codons encoding one amino acid may differ in any of
their three positions. For example the amino acid glutamic acid is specified by
GAA and GAG codons (difference in the third position), the amino acid leucine is
specified by UUA, UUG, CUU, CUC, CUA, CUG codons (difference in the first or
third position), while the amino acid serine is specified by UCA, UCG, UCC,
UCU, AGU, AGC (difference in the first, second or third position).[8]:521–522
A position of a codon is said to be a fourfold degenerate site if any nucleotide
at this position specifies the same amino acid. For example, the third position of
the glycine codons (GGA, GGG, GGC, GGU) is a fourfold degenerate site,
because all nucleotide substitutions at this site are synonymous; i.e., they do not
change the amino acid. Only the third positions of some codons may be fourfold
degenerate.[8]:521–522 A position of a codon is said to be a twofold degenerate site if
only two of four possible nucleotides at this position specify the same amino acid.
For example, the third position of the glutamic acid codons (GAA, GAG) is a
twofold degenerate site. In twofold degenerate sites, the equivalent nucleotides
are always either two purines (A/G) or two pyrimidines (C/U), so only
transversional substitutions (purine to pyrimidine or pyrimidine to purine) in
twofold degenerate sites are nonsynonymous.[8]:521–522 A position of a codon is
said to be a non-degenerate site if any mutation at this position results in amino
acid substitution. There is only one threefold degenerate site where changing to
three of the four nucleotides may have no effect on the amino acid (depending on
what it is changed to), while changing to the fourth possible nucleotide always
results in an amino acid substitution. This is the third position of an isoleucine
codon: AUU, AUC, or AUA all encode isoleucine, but AUG encodes methionine.
In computation this position is often treated as a twofold degenerate site.[8]:521–522

There are three amino acids encoded by six different codons: serine, leucine,
and arginine. Only two amino acids are specified by a single codon. One of these
is the amino-acid methionine, specified by the codon AUG, which also specifies
the start of translation; the other is tryptophan, specified by the codon UGG. The
degeneracy of the genetic code is what accounts for the existence of
synonymous mutations.[8]:Chp 15

Degeneracy results because there are more codons than encodable amino
acids. For example, if there were two bases per codon, then only 16 amino acids
could be coded for (4²=16). Because at least 21 codes are required (20 amino
acids plus stop), and the next largest number of bases is three, then 4³ gives 64
possible codons, meaning that some degeneracy must exist.[8]:521–522

These properties of the genetic code make it more fault-tolerant for point
mutations. For example, in theory, fourfold degenerate codons can tolerate any
point mutation at the third position, although codon usage bias restricts this in
practice in many organisms; twofold degenerate codons can tolerate one out of
the three possible point mutations at the third position. Since transition mutations
(purine to purine or pyrimidine to pyrimidine mutations) are more likely than
transversion (purine to pyrimidine or vice-versa) mutations, the equivalence of
purines or that of pyrimidines at twofold degenerate sites adds a further fault-
tolerance.[8]:531–532
Grouping of codons by amino acid residue molar volume and hydropathy.

A practical consequence of redundancy is that some errors in the genetic code


only cause a silent mutation or an error that would not affect the protein because
the hydrophilicity or hydrophobicity is maintained by equivalent substitution of
amino acids; for example, a codon of NUN (where N = any nucleotide) tends to
code for hydrophobic amino acids. NCN yields amino acid residues that are small
in size and moderate in hydropathy; NAN encodes average size hydrophilic
residues.[27][28] These tendencies may result from the shared ancestry of the
aminoacyl tRNA synthetases related to these codons.

Even so, single point mutations can still cause dysfunctional proteins. For
example, a mutated hemoglobin gene causes sickle-cell disease. In the mutant
hemoglobin a hydrophilic glutamate (Glu) is substituted by the hydrophobic valine
(Val), that is, GAA or GAG becomes GUA or GUG. The substitution of glutamate
by valine reduces the solubility of β-globin which causes hemoglobin to form
linear polymers linked by the hydrophobic interaction between the valine groups
causing sickle-cell deformation of erythrocytes. Sickle-cell disease is generally
not caused by a de novo mutation. Rather it is selected for in malarial regions (in
a way similar to thalassemia), as heterozygous people have some resistance to
the malarial Plasmodium parasite (heterozygote advantage).[29]

These variable codes for amino acids are allowed because of modified bases
in the first base of the anticodon of the tRNA, and the base-pair formed is called
a wobble base pair. The modified bases include inosine and the Non-Watson-
Crick U-G basepair.[30]

Variations to the standard genetic code

While slight variations on the standard code had been predicted earlier,[31]
none were discovered until 1979, when researchers studying human
mitochondrial genes discovered they used an alternative code. Many slight
variants have been discovered since,[32] including various alternative
mitochondrial codes,[33] as well as small variants such as Mycoplasma translating
the codon UGA as tryptophan and Candida species translating CUG as a serine
rather than a leucine.[34][35] In bacteria and archaea, GUG and UUG are common
start codons. However, in rare cases, certain specific proteins may use
alternative initiation (start) codons not normally used by that species.[32]

In certain proteins, non-standard amino acids are substituted for standard stop
codons, depending upon associated signal sequences in the messenger RNA:
UGA can code for selenocysteine and UAG can code for pyrrolysine as
discussed in the relevant articles. Selenocysteine is now viewed as the 21st
amino acid, and pyrrolysine is viewed as the 22nd.[32]

Notwithstanding these differences, all known codes have strong similarities to


each other, and the coding mechanism is the same for all organisms: three-base
codons, tRNA, ribosomes, reading the code in the same direction and translating
the code three letters at a time into sequences of amino acids.

Expanded genetic code

Main article: Expanded genetic code

Since 2001, 40 non-natural amino acids have been added into protein by
creating a unique codon (recoding) and a corresponding transfer-RNA:aminoacyl
– tRNA-synthetase pair to encode it with diverse physicochemical and biological
properties in order to be used as a tool to exploring protein structure and function
or to create novel or enhanced proteins.[36][37]

[edit] Theories on the origin of the genetic code

Despite the minor variations that exist, the genetic code used by all known
forms of life is nearly universal. However, there are a huge number of possible
genetic codes. If amino acids are randomly associated with triplet codons, there
will be 1.5 x 1084 possible genetic codes.[38]

Phylogenetic analysis of transfer RNA suggests that tRNA molecules evolved


before the present set of aminoacyl-tRNA synthetases.[39]

Theoretically the genetic code could be completely random (a "frozen


accident"), completely non-random (optimal) or a combination of random and
nonrandom. There are sufficient data to refute the first possibility.[40] For a start, a
quick view on the table of the genetic code already shows a clustering of amino
acid assignments. Furthermore, amino acids that share the same biosynthetic
pathway tend to have the same first base in their codons,[41] and amino acids with
similar physical properties tend to have similar codons.[42][43]

There are four themes running through the many theories that seek to explain
the evolution of the genetic code (and hence the origin of these patterns):[44]
• Chemical principles govern specific RNA interaction with amino acids. Aptamer
experiments showed that some amino acids have a selective chemical affinity for
the base triplets that code for them.[45] Recent experiments show that of the 8
amino acids tested, 6 show some RNA triplet-amino acid association.[46][47] This
has been called the stereochemical code. The stereochemical code could have
created an ancient core of assignments. The current complex translation
mechanism involving tRNA and associated enzymes may be a later development,
and that originally, protein sequences were directly templated on base sequences.
• Biosynthetic expansion. The standard modern genetic code grew from a simpler
earlier code through a process of "biosynthetic expansion". Here the idea is that
primordial life "discovered" new amino acids (e.g., as by-products of metabolism)
and later back-incorporated some of these into the machinery of genetic coding.
Although much circumstantial evidence has been found to suggest that fewer
different amino acids were used in the past than today,[48] precise and detailed
hypotheses about exactly which amino acids entered the code in exactly what
order have proved far more controversial.[49][50]
• Natural selection has led to codon assignments of the genetic code that minimize
the effects of mutations.[51] A recent hypothesis[52] suggests that the triplet code
was derived from codes that used longer than triplet codons. Longer than triplet
decoding has higher degree of codon redundancy and is more error resistant than
the triplet decoding. This feature could allow accurate decoding in the absence of
highly complex translational machinery such as the ribosome.
• Information channels: Information-theoretic approaches see the genetic code as
an error-prone information channel [53]. The inherent noise (i.e. errors) in the
channel poses the organism with a fundamental question: how to construct a
genetic code that can withstand the impact of noise [54] while accurately and
efficiently translating information? These “rate-distortion” models [55] suggest that
the genetic code originated as a result of the interplay of the three conflicting
evolutionary forces: the needs for diverse amino-acids [56], for error-tolerance [51]
and for minimal cost of resources. The code emerges at a coding transition when
the mapping of codons to amino-acids becomes nonrandom. The emergence of
the code is governed by the topology defined by the probable errors and is related
to the map coloring problem.

The Operon Model of Gene Expression

On the basis of their studies with the lac system, and results such as the
PaJaMo experiment, François Jacob (right) and Jaques Monod proposed the
Operon Model of Gene Expression in bacteria.

An operon is a cluster of genes that are transcribed as a single mRNA. Genes


in an operon code for a diffusible gene product which may be a polypeptide or an
RNA molecule.

[Lod11-5]
The following are the important features of the model:

• There are 2 classes of gene:

Structural Genes

these genes code for protein and RNA molecules that are required for normal
enzymatic or structural functions in the cell.

Regulator Genes

these genes code for protein and RNA molecules whose function is to regulate
the expression of other genes. Because these gene products act at another site,
they are trans-acting factors.

• Structural genes are organized as a unit.

• Structural genes are expressed as a single messenger RNA.

• Expression of structural genes is regulated by a Regulatory protein whose activity


depends on the presence or absence of an Effector substance.

• The regulatory protein acts by binding to a site on the DNA.

If the regulatory protein is a repressor, the site on the DNA to which it binds is
called an OPERATOR.

In the absence of a repressor, RNA polymerase can bind to the promoter and
initiate transcription of the operon:
In the presence of a repressor, RNA polymerase is unable to transcribe the
operon:

The exact details whereby repressor interferes with RNA polymerase and its
ability to transcribe need to be described on an operon-by-operon basis.

The ability of a repressor to block transcription depends, however, on the


presence or absence of an effector. In some cases (e.g. the trp operon), the
effector is required to assist the repressor to bind to DNA; in other cases (e.g. the
lac operon), the repressor will bind to DNA in the absence of the effector but not
in its presence.

The Lactose Operon

Structural Genes

The lac operon consists of three structural genes:

[MVH26-17]

• lacZ codes for β -galactosidase.

The lacZ gene is 3072 bp in length. The β -galactosidase enzyme, with a


subunit molecular weight of 125 KDal, is one of the largest polypeptides in the
E. coli cell -- 1024 amino acids. The active form of the enzyme is a tetramer.

• lacY codes for β -galactoside permease.

The lacY gene is 1251 bp in length and codes for a 30 KDal monomeric
membrane protein of 417 amino acids.

• lacA codes for thiogalactoside transacetylase.


The lacA gene is 609 bp in length and codes for a dimeric protein, with
subunit polypeptide of length 203 amino acids.

Regulator Genes

The lac operon has a regulator gene: lacI which codes for the regulatory
protein, lactose repressor. The lacI gene is 1080 bp in length; the repressor
functions as a tetramer.

Gene Organization

The three structural genes are organized as a unit -- lacZ-lacY-lacA and are
expressed as a unit from lacZ through lacA.

The lacI regulator gene is located immediately adjacent to and upstream of


the structural genes. This is not a required feature of the operon model of gene
expression. There are many operons in which the regulator gene is located far
away from the structural genes whose expression it regulates. For example, the
galR gene is located halfway round the E. coli chromosome from the galETK
operon whose expression it regulates.

Regulator genes code for diffusible molecules which can potentially act at
many other locations in the genome.

Control of Gene Expression

In the absence of an inducer, the lactose repressor binds to its operator and
blocks RNA polymerase from transcribing the structural genes of the operon:
[27-9a] [MVH26-18]

Note:

• the lacI gene is expressed from its own promoter which is a very weak promoter
so the amount of mRNA transcribed, and, hence, the amount of protein made, will
be low.

• the repressor actually functions as a tetramer

In the presence of an inducer, the repressor is converted into a form that


cannot bind to to the operator. RNA polymerase bound to the lac promoter will
be able transcribe the three genes of the operon. The lac promoter is a medium
strength promoter so the amount of lac operon mRNA made and, hence, the
amount of the protein products will be moderate.

[27-9b] [MVH26-18]

Note:
• Lactose is not per se the true effector. Allolactose is the true inducer of the
operon. Isopropyl-thio-galactoside (IPTG is an artificial inducer of the operon -
one that is commonly used in research laboratories.

[allolactose] [IPTG]

As we will see, a key aspect of the lactose repressor and its function are its
dual properties:

• It binds to DNA (viz. the operator)


• It binds to the inducer.

Protein synthesis can be divided into the same three phases as any of the
other polymerization reactions we have discussed in this course, but it also
contains an explicit fourth phase:

Initiation

where a functionally competent ribosome is assembled in the correct place on


an mRNA ready to commence protein synthesis.

Elongation

whereby the correct amino acid is brought to the ribosome, is joined to the
nascent polypeptide chain, and the entire assembly moves one position along
the mRNA.

Termination

which happens when a stop codon is reached, there is no amino acid to be


incorporated and the newly-synthesized polypeptide is released from the
ribosome.

Disassembly

whereby a special factor binds to the ribosome so that it can release the
mRNA and tRNA that is still bound to it and so that it can be recycled in another
round of protein synthesis.
There are two rules about protein synthesis to keep in mind:

• mRNA is translated 5' -> 3'

• Proteins are synthesized from the N-terminus to the C-terminus

This account describes the steps of protein synthesis in bacteria; we will


mention eukaryotic protein synthesis briefly at the end.

Initiation

This phase of protein synthesis results in the assembly of a functionally


competent ribosome in which an mRNA has been positioned correctly so that its
start codon is positioned in the P (peptidyl) site and is paired with the initiator
tRNA.

The following ingredients are needed for this phase of protein synthesis:

• Two ribosome subunits - 30S and 50S

• The mRNA

• Three Initiation Factors - IF1, IF2 (GTP) and IF3

• The initiator fMet-tRNAfMet.

[26-27] [MVH27-20]

The following steps take place:

Binding of the ribosome 30S subunit with Initiation Factors

IF3 promotes the dissociation of the ribosome into its two component subunits.
The presence of IF3 permits the assembly of the initiation complex and prevents
binding of the 50S subunit prematurely.

IF1 assists IF3 in some way, perhaps by increasing the dissociation rate of the
30S and 50S subunits of the ribosome.
Binding of the mRNA and the fMet-tRNAfMet

IF3 assists the mRNA to bind with the 30S subunit of the ribosome so that the
start codon is correctly positioned at the peptidyl site of the ribosome. The
mRNA is positioned by means of base-pairing between the 3' end of the 16S
rRNA with the Shine-Dalgarno sequence immediately upstream of the start
codon.

IF2(GTP) assists the fMet-tRNAfMet to bind to the 30S subunit in the correct site
- the P site.

It is not clear whether the mRNA or fMet-tRNAfMet binds first. It may be that
either can bind first.

At this stage of assembly, the 30S initiation complex is complete and IF3 can
dissociate.

Binding of the ribosome 50S subunit and release of Initiation Factors

Three events now happen "simultaneously". As the 50S subunit of the


ribosome associates with the 30S innitiation complex, GTP hydrolysis occurs on
IF2. This hydrolysis may be helped by the L7/L12 ribosomal proteins rather than
by IF2 itself. GTP hydrolysis probably serves as a timing mechanism to ensure
that the tRNA is correctly positioned before IF3 (and IF1) dissociates. Hydrolysis
is also required for dissociation of IF2. Once the initiation fcators have
dissociated, the initiation complex is complete and translation can proceed.

The following diagram illustrates the relative rates of these events during
initiation:

Diagram from:
Late events of translation initiation in bacteria: a
kinetic analysis
J. Tomic, L.A. Vitali, T. Daviter1, A. Savelsbergh, R.
Spurio, P. Striebeck, W. Wintermeyer, M.V. Rodnina
and C.O. Gualerzi
The EMBO Journal, Vol. 19, No. 9 pp. 2127-2136,
2000
Elongation

Three special Elongation Factors are required for this phase of protein
synthesis: EF-Tu (GTP), EF-Ts and EF-G (GTP).

The Elongation phase of protein synthesis consists of a cyclic process


whereby a new aminoacyl-tRNA is positioned in the ribosome, the amino acid is
transferred to the C-terminus of the growing polypeptide chain, and the the whole
assembly moves one position along the ribosome:

[Image]

[26-31]

A new codon is now positioned at the A site and awaits a new aminoacyl-
tRNA.

[26-28] [MVH27-22]

Binding of a new aminoacyl-tRNA at the A site

At the start of each cycle:


• the A (aminoacyl) site on the ribosome is empty
• the P (peptidyl) site contains a peptidyl-tRNA,
• and the E (exit) site contains an uncharged tRNA.

The elongation factor, EF-Tu (GTP) binds with an aminoacyl-tRNA and brings
it to the ribosome. Once the correct aminoacyl-tRNA is positioned in the
ribosome, GTP is hydrolyzed, EF-Tu (GDP) undergoes a conformational change
and then dissociates away from the ribosome.

There are two ways that EF-Tu functions to ensure that the correct aminoacyl-
tRNA is in place:

• EF-Tu prevents the aminoacyl end of the charged tRNA from entering the A site
on the ribosome. This ensures that codon-anticodon pairing is checked first before
the charged tRNA is irreversibly bound in the A site and a new, potentially
incorrect, peptide bond is made.

• GTP hydrolysis is SLOW and EF-Tu cannot dissociate from the ribosome until it
occurs. The amount of time prior to GTP hydrolysis allows the final fidelity
check to take place. Hydrolysis is associated with a conformational change in EF-
Tu

[26-29]

If the anticodon-codon interaction is incorrect, the aminoacyl-tRNA simply


dissociates and a new one is brought in. This check, however, can verify nothing
about the amino acid -- it simply verifies that the correct pairing takes place.

Experiments using GTP analogues have been used to establish these results:
o If a GTP analogue such as GTP-γ -S, which is hydrolyzed very slowly,
is used then protein synthesis slows down because of the slow rate of
hydrolysis but it also becomes more accurate because there is more time to
check that the correct aminoacyl-tRNA is in place.

o If a GTP analogue such as GMP-PCP, which contains a non-


hydrolyzable methylene bridge between the β and γ phosphates, is used
then protein synthesis stops because EF-Tu cannot dissociate from the
ribosome.

The following diagram illustrates the relative rates of events during EF-Tu
dependent tRNA binding:
Diagram from:
Late events of translation initiation in bacteria: a kinetic analysis
J. Tomic, L.A. Vitali, T. Daviter, A. Savelsbergh, R. Spurio, P. Striebeck, W.
Wintermeyer, M.V. Rodnina and C.O. Gualerzi
The EMBO Journal, Vol. 19, No. 9 pp. 2127-2136, 2000
Kanamycin Causes misreading of the code by interfering with the wobble base
pairing.
Streptomycin This antibiotic was the first aminoglycoside characterized. It inhibits
prokaryotic ribosomes in a couple of ways. It causes misreading by
interfering with the normal pairing between codon and anticodon. It can
also prevent initiation. Streptomycin resistant bacteria carry an altered
S12 subunit.

[Box26-4-2]
Tetracycline Inhibits aminoacyl-tRNA binding to the A site on the ribosome.

Kirromycin Blocks dissociation of GDP from EF-Tu after hydrolysis. This prevents
dissociation of EF-Tu from the ribosome and effectively stalls protein
synthesis.

EF-Tu is the most abundant protein in the E. coli cell. There are approximately
70-100,000 molecules/cell which is 5% of the total cell protein. There are also
approximately 70-100,000 tRNA molecules/cell. Nearly all of the aminoacyl-tRNA
in the cell is bound by EF-Tu.

EF-Tu cannot bind with tRNAfMet. This tRNA has a slight difference in its
structure compared with that of tRNAMet which means that it is not bound by EF-
Tu.

EF-Tu (GDP) is inactive and cannot bind aminoacylated tRNAs. However, EF-
Tu has a higher affinity for GDP (Ka = 10-8M) than for GTP (Ka = 10-6M).

In order to recycle EF-Tu, the elongation factor EF-Ts binds to the EF-Tu
(GDP) complex to displace the GDP. GTP then, in turn, displaces EF-Ts. Many
other G-proteins require a guanine nucleotide release protein (GNRP) to
release GDP; EF-Ts is the GNRP for EF-Tu.

[MVH27-23]
Formation of the new peptide bond (Transpeptidation)

Peptide bond formation occurs as a result of nucleophilic attack by the lone


pair of electrons on the amino nitrogen of the aminoacyl-tRNA on the carbonyl
carbon that attaches the growing polypeptide chain to a tRNA molecule in the P
site of the ribosome. As a result, the peptide chain is attached to the tRNA which
is paired with the codon in the A site. The new amino acid is, therefore, added to
the C-terminal end of the polypeptide chain.

[26-23]

Older illustrations show this reaction as a transfer of the entire polypeptide


chain from the tRNA in the P site to the tRNA in the A site. This is not an
accurate representation. It is more likely that the aminoacyl arm of the tRNA in
the A site extends to join with the polypeptide chain in the P site.

The peptidyltransferase activity of the ribosome which catalyzes this reaction


is located on the 23S rRNA though it will be assisted by some of the ribosomal
protein subunits. In other words, peptidyl transferase is a ribozyme - another
example of a catalytic RNA.
From: Cech, T.R. (2000)The Ribosome is a Ribozyme. Science 289: 878-
879.

Adenine 2451 (in the E coli 23S rRNA) is located in a microenvironment such
that the pKa is shifted by 4 units to a value of 7.6. This permits it to act as a
general acid/base for catalysis as shown above. This adenine is universally
conserved in all known 23S rRNA's.

Chloramphenicol Inhibits peptidyl transferase in prokaryotes. It binds near the L16


protein and seems to prevent the aminoacylated end of charged tRNAs
from binding correctly to the A site on the ribosome.
Puromycin Causes premature chain termination. Its structure resembles that of
the 3' end of a tyrosyl-tRNA and it participates as a substrate in a
peptidyl transferase reaction.

[Box26-4-1]

However, once it is added to the 3' end of a nascent protein, it does


not provide a suitable centre for any further nucleophilic reactions, and
protein synthesis is aborted.
Cycloheximide Inhibits peptidyl transferase in eukaryotes.
Translocation of the Ribosome

Finally, the ribosome translocates along the mRNA thereby moving the new
peptidyl-tRNA to the P site and the old (now uncharged) tRNA, which has just
lost its peptidyl chain, to the E site. This step requires the elongation factor, EF-
G(GTP). There are 20,000 molecules/cell of EF-G which is the same as the
number of ribosomes.

GTP is hydrolyzed during translocation and, once again, GTP hydrolysis is


required for dissociation of EF-G not for binding.

EF-G blocks the binding of aminoacyl tRNAs to the A site as well as blocking
the binding of Release Factors. It effectively makes sure that translocation must
take place before the cycle continues.

EF-G and the tRNA-EF-Tu complex are mutually exclusive. The structures of
these two are remarkably similar and demonstrate very nicely why these two
cannot bind to the ribosome simultaneously:

Phe-tRNA-EF-Tu EF-G

[26-30]

The following figure compares the binding of tRNA-EF-Tu and EF-G with the
ribosome. Notice the similarity in the manner in which both the structures can fit
into the anticodon binding part of the A site. Notice also that there are differences
in the manner in which EF-Tu and EF-G interact with the ribosome.
Image adapted from:

Note that as a new protein is being synthesized, it must leave the ribosome.
Structural studies show that there is an exit tunnel but that it is quite narrow and
that it is unlikely that any significant protein folding could occur within the
ribosome. The following image shows a trans-section through the ribosome that
shows the rRNA (grey), the ribosomal proteins (green), the peptidyl transferase
centre (PT), and the nascent polypeptide (white).

Image adapted from:


M.Selmer, S. Al-Karadaghi, G. Hirokawa, A. Kaji, A. Liljas (1999) Crystal
Structure of Thermotoga maritima Ribosome Recycling Factor: A tRNA
Mimic . Science 286: 2349-2352.
Erythromycin Blocks the entrance to the exit tunnel - which is 7-8 aas away from the
peptidyltransferase site.

FIGURE: shows an overview of the steps in elongation. Note that a


proofreading step is included.

Termination

The final phase of protein synthesis requires that the finished polypeptide
chain be detached from a tRNA. This can only happen in response to the signal
that a stop codon has been reached.
[26-32] [MVH27-26]

Binding of Release factors

There are no tRNAs that recognize the stop codons (except the tRNAs for
selenocysteine and pyrrolysine as well as the suppressor tRNAs). Rather stop
codons are recognized by release factor RF1 (which recognizes the UAA and
UAG stop codons) or RF2 (which recognizes the UAA and UGA stop codons).
These release factors act at the A site of the ribosome. A third release factor,
RF3 (GTP), stimulates the binding of RF1 and RF2.

Hydrolysis of the peptidyl-tRNA

Binding of the release factors alters the peptidyltransferase activity so that


water is now the nucleophilic attack agent. The result is hydrolysis of the
peptidyl-tRNA and release of the completed polypeptide chain. The uncharged
tRNA in the E site can dissociate as can the release factors. GTP is hydrolyzed.
The uncharged tRNA in the P site can NOT dissociate.

Disassembly

There is one final step in the overall cycle of protein synthesis, namely,
disassembly of the ribosome. In bacteria this requires the participation of the
ribosome recycling factor (RRF).

Following the action of the release factors, the ribosome complex contains a
70S ribosome, a bound mRNA, an empty A-site, and a deacylated tRNA in the P-
site. RRF along with EF-G(GTP) dissassembles the complex.

RRF is a small protein containing 185 amino acids. Structurally, it contains two
domains. Overall, the shape of the molecule mimics that of tRNA. Much like EF-
Tu and EF-G, this mimicry may underlie the explanation of how this protein
functions.
Image adapted from:
M.Selmer, S. Al-Karadaghi, G. Hirokawa, A. Kaji, A. Liljas (1999) Crystal
Structure of Thermotoga maritima Ribosome Recycling Factor: A tRNA
Mimic . Science 286: 2349-2352.

EF-G and EF-Tu are shown in the left two images, respectively. The third
image shows a superposition of RRF with a tRNA molecule. Note, however, that
RRF does not have any structural elements that correspond with the acceptor
arm of the tRNA.

It is thought that RRF could bind to the A-site of the ribosome. Selmer et al.
propose that EF-G then binds and that translocation may occur. This would move
the empty tRNA that is still bound in the P-site to the E-site, thereby releasing it.
The ribosome would then dissociate releasing the mRNA as well as RRF and
EF-G.

The following figure shows that molecular mimicry extends among the release
factors: RF2, eRF and RRF. Notice that RF2 has the structure that can fit into the
anticodon binding part of the A site as does eRF.

Image adapted from:


FIGURE: shows an overview of the steps in termination and disassembly.

Summary: The Steps

Initiation

Binding of the ribosome 30S subunit with Initiation Factors

Binding of the mRNA and the fMet-tRNAfMet

Binding of the ribosome 50S subunit and release of Initiation Factors

Elongation

Binding of a new aminoacyl-tRNA at the A site

Formation of the new peptide bond (Transpeptidation)

Translocation of the Ribosome

Termination

Binding of Release factors

Hydrolysis of the peptidyl-tRNA

Disassembly

Binding of ribosome release factor

Antibiotics and Protein Synthesis

Many antibiotics and toxins function by blocking certain steps during protein
synthesis. As well as their utility in treating infections, antibiotics have been
useful in dissecting many of the molecular details of the steps and reactions of
protein synthesis. The following will give you a feel for this important topic.

[MVH27-28]

Chloramphenicol Inhibits peptidyl transferase in prokaryotes. It binds near the L16


protein and seems to prevent the aminoacylated end of charged tRNAs
from binding correctly to the A site on the ribosome.
Cycloheximide Inhibits peptidyl transferase in eukaryotes.
Diphtheria Toxin Inhibits the activity of EF-G byADP-ribosylation.
Erythromycin Blocks the entrance to the exit tunnel - which is 7-8 aas away from the
peptidyltransferase site.
Fusidic Acid Blocks the dissociation of eEF-2 during protein synthesis in eukaryotes.

Kanamycin Causes misreading of the code by interfering with the wobble base
pairing.
Kirromycin Blocks dissociation of GDP from EF-Tu after hydrolysis. This prevents
dissociation of EF-Tu from the ribosome and effectively stalls protein
synthesis.
Puromycin Causes premature chain termination. Its structure resembles that of
the 3' end of a tyrosyl-tRNA and it participates as a substrate in a
peptidyl transferase reaction.

[Box26-4-1]

However, once it is added to the 3' end of a nascent protein, it does


not provide a suitable centre for any further nucleophilic reactions, and
protein synthesis is aborted.
Streptomycin This antibiotic was the first aminoglycoside characterized. It inhibits
prokaryotic ribosomes in a couple of ways. It causes misreading by
interfering with the normal pairing between codon and anticodon. It can
also prevent initiation. Streptomycin resistant bacteria carry an altered
S12 subunit.

[Box26-4-2]
Tetracycline Inhibits aminoacyl-tRNA binding to the A site on the ribosome.

Rescuing synthesis on "broken" mRNA in prokaryotes

Bacterial mRNA has a short half-life and is generally degraded quickly. As a


result, there is a high probability that the 3'-end of an mRNA will be missing. If
this happens, the consequences could be severe. If an mRNA has lost its stop
codons, there will be no signals to promote dissociation of the ribosomes. Any
ribosomes that have bound to a defective mRNA will therefore stall when they
reach the broken end unable to continue and unable to dissociate efficiently.
E. coli (and other bacteria) has a mechanism to deal with this situation.

E. coli contains a small RNA, encoded by the ssrA gene, is synthesized as a


457 nt precursor RNA that is processed by RNaseE to a mature 363 nt RNA.
This RNA is also known as tmRNA or 10Sa RNA.

tmRNA has properties of tRNA and mRNA combined in a single molecule. It


functions during protein synthesis to rescue ribosomes that have become "stuck"
while translating mRNA molecules that have lost their stop codons.

The ssrA RNA has a


number of important
properties:

• Its secondary and


tertiary structure
partially resembles
that of tRNA

• It can be charged with


alanine

• It can be used as an
mRNA which codes
for a 10 amino acid
long oligopeptide:
ANDENYALAA.
Image of Escherichia coli tmRNA from the tmRNA Database. Click
here to view a three-dimensional structure of the E. coli tmRNA.

The mechanism of action of the ssrA RNA is shown in the following figure:
Diagram from:
SsrA-mediated peptide tagging caused by rare codons and tRNA
scarcity
E.D.Roche and R.T.Sauer
The EMBO Journal, Vol. 18 (16) pp. 4579-4589, 1999

When a ribosome stalls, the ssrA RNA charged with alanine is brought to the
A-site of the ribosome by the SsrB protein. Peptidyl transferase activity transfers
the nascent polypeptide to the alanine attached to ssrA.

The mRNA template is also displaced by the ssrA RNA. Further protein
synthesis now uses ssrA as a template and ten further amino acids
(ANDENYALAA) are added to the C-terminal end of the polypeptide.

However, the final two amino acids that are added (AA) mark the new protein
for proteolysis by the two proteases ClpAP and ClpXP.

Thus any proteins that are only partially synthesized by stalled ribosomes can
be rapidly destroyed and turned over.
The Genetic Code

Soon after the structure of DNA was proposed, Francis Crick turned his
thoughts to the Genetic Code. At first he realised that any code that used only 2
bases at a time did not have enough information capacity to specify all of the
amino acids found in proteins. He also though that a code that used 3 bases at a
time had too much capacity.

In fact, the idea that there are 20 standard amino acids was not clear at that
time. The search to unravel the Genetic Code, was partly instrumental in leading
to that conclusion as well.

Crick and Sidney Brenner, along with their many colleagues, spent a lot of
time thinking about the Code and how it might be interpreted. Once it was
accepted that there was a standard repertoire of 20 amino acids, the triplet
nature of the code followed.

What did not follow was how these triplets might be arranged. For a time, they
considered an overlapping arrangement of codons (a word coined by Seymour
Benzer) but they were able to dismiss this on the basis of protein sequence
analysis.

Once they felt that the code was non-overlapping, the question became one
of knowing where each triplet began. Proof that the code was indeed a triplet as
well as the determination of the meaning of each triplet came from that old
standby: experimentation.

[26-2]

Crick started a series of elegant genetic experiments using bacteriophage


crosses which demonstrated very conclusively that the genetic code was a triplet
code. At the same time, Marshall Nirenberg and Heinrich Matthaei showed
that UUU was the codon for phenylalanine. The way appeared clear to solve the
complete code. For this work, Nirenberg shared the 1968 Nobel Prize in
Physiology or Medicine with Robert Holley (who solved the structure of yeast
alanyl-tRNA- the first determination of the complete chemical structure of a
biologically active nucleic acid) and with Har Gobind Khorana (whose methods
for synthesising synthetic nucleic acids were a pre-requisite for the final solution
of the genetic code).

Solving the Genetic Code


The first steps to solving the Genetic Code depended on
the development of a cell-free in vitro translation system by
Paul Zamecnik (right). This system which consisted of a
membrane-free cell supernatent, ATP, GTP,
radioactively labelled amino-acids and RNA, was
capable of directing the synthesis of radioactively labelled
protein.

[S5-15]

In 1961, Marshall Nirenberg and Heinrich Matthaei


were using such a system to investigate the synthesis of
viral proteins. They used the Tobacco Mosaic Virus (TMV)
RNA as their experimental template. As a control RNA
template they used the homopolymer poly(U) -- which they
synthesized from UDP using polynucleotide
phosphorylase. They did not expect that this template
would code for or direct protein synthesis.

But it did! Nirenberg and Matthaei went on to show that


the only amino acid that was incorporated into a polypeptide
when poly(U) was the RNA template was phenylalanine.
The way to crack code was open!

[Lod4-28]
above pictures from Nobel web site

Francis Crick (in What Mad Pursuit) describes how he heard about
Nirenberg's results while on a visit to the Biochemical Congress in Moscow in
1961:

"The Moscow meeting was made especially interesting because of the


results reported by Marshall Nirenberg, then almost unknown. I had
heard rumours of these experiments but no details. Matt Meselson,
whom I ran into in a corridor, alerted me to Marshall's talk in a remote
seminar room. I was so impressed that I asked Marshall to take part in a
much larger meeting, of which I was the chairman. What he had
discovered was that he could add an artificial message to a test-tube
system that synthesized proteins and get it to direct some synthesis. In
detail, he had added poly U -- the RNA message consisting almost
entirely of a sequence of uracils -- to the system and it had synthesized
phenylalanine. This suggested that UUU (assuming a triplet code) was
a codon for phenylalanine (one of the "magic twenty" amino acids), as
indeed it is. I later claimed that the audience was "startled") I think I
originally wrote "electrified") to receive this news. Seymour Benzer
countered this with a photograph showing everyone looking extremely
bored! Nevertheless it was an epoch-making discovery, after which
there was no looking back."

The use of poly(A) and poly(C) as templates similarly showed that AAA was
a codon for lysine and that CCC was a codon for proline. However, poly(G) did
not work at all in the system.

This use of homopolymers is clearly quite limited. The use of random mixed
copolymers helped to extend the utility of the system and the information
obtained from it.

Random copolymers can be synthesized from a mixture of two


ribonucleotides with polynucleotide phosphorylase. Thus if ADP and CDP are
used in a 5:1 ratio, then the frequency of each possible triplet in the synthesized
RNA will vary according to this ratio. For example, AAA triplets will be found 100
times more frequently than CCC triplets.

RELATIVE
CODON FREQUENCY
FREQUENCY
AAA 0.579 100
AAU 0.116 20
AUA 0.116 20
UAA 0.116 20
AUU 0.023 4
UAU 0.023 4
UUA 0.023 4
UUU 0.00463 1

By measuring the ratios of the different amino acids that are incorporated into
protein using random colpolymer templates, it is possible to narrow down the
range of codons that correspond to particular amino acids.
This method did not yield all of the codon assignments. That required the
chemical synthesis of short oligonucleotides with defined sequences. These were
used in two ways:

Nirenberg and Phil Leder showed that aminoacylated tRNAs could be bound
to ribosomes if the ribosomes contained trinucleotides acting as mRNA.

[Lod4-30] [S5-16]

Gobind Khorana showed that tri- and tetra-


nucleotides could be polymerized into polymers with
repeating sequences that could be used in cell-free in
vitro translation assays.

In the case of trinucleotides, three polypeptides will be


synthesized, each of which is a homopolymer of a
single amino acid.

[MVH27-2] [Lod4-29]

In the case of tetranucleotides, a single polypeptide


(usually) will be synthesized which contains a repeating
amino acid sequence.

above picture from Nobel web site

In these ways, the entire Genetic Code was determined.

The Genetic Code

U C A G
UUU Phe UCU Ser UAU Tyr UGU Cys
UUC Phe UCC Ser UAC Tyr UGC Cys
UUA Leu UCA Ser UAA Stop UGA Stop
UUG Leu UCG Ser UAG Stop UGG Trp
CUU Leu CCU Pro CAU His CGU Arg
CUC Leu CCC Pro CAC His CGC Arg
CUA Leu CCA Pro CAA Gln CGA Arg
CUG Leu CCG Pro CAG Gln CGG Arg
AUU Ile ACU The AAU Asn AGU Ser
AUC Ile ACC Thr AAC Asn AGC Ser
AUA Ile ACA Thr AAA Lys AGA Arg
AUG Met ACG Thr AAG Lys AGG Arg
GUU Val GCU Ala GAU Asp GGU Gly
GUC Val GCC Ala GAC Asp GGC Gly
GUA Val GCA Ala GAA Glu GGA Gly
GUG Val GCG Ala GAG Glu GGG Gly
For a simpler view of the this table go to http://esg-
www.mit.edu:8001/esgbio/dogma/images/code.gif.

[T26-1]

The following are features to note in the genetic code:

• The code is triplet, unpunctuated and nonoverlapping.


Three bases are required to specify each amino acid. There are no gaps between
codons. Codons do not overlap.

[MVH27-1]
• The code is degenerate. Most amino acids are specified by more than
one codon. In fact, only Met and Trp are specified by a single codon:

# of codons amino acids


1 Met, Trp
Asn, Asp, Cys, Gln,
2 Glu,
His, Lys, Phe, Tyr
3 Ile
4 Ala, Gly, Pro, Thr, Val
6 Arg, Leu, Ser

Degeneracy is found only in the third nucleotide of the codon.

• The Genetic Code is Unambiguous.

In general, no codon specifies more than one amino acid. The exceptions so far
are AUG, UGA and UAG. In the first case, AUG specifies both Methionine and
N-formyl-Methionine, which is used to initiate protein synthesis in bacteria. In
the second case, UGA specifies the twenty-first amino-acid selenocysteine as
well as being a stop codon. And, in the last case, UAG specifies the twenty
second amino acid (the most recent to be added to the list), pyrrolysine.

pyrrolysine is a amide-linked 4-substituted pyrroline-5-carboxylate lysine


derivative. It was discovered following crystallisation of the enzyme
monomethylamine methyltransferase in Methanosarcina barkeri.
Examination of te gene sequence revealed that a UAG codon was being used
to specify pyrrolysine. Further examination of the genome revealed that the
enzyme required to modify a lys-tRNA is encoded imeediately adjacent to the
gene for monomethylamine methyltransferase.

• There are 3 stop codons: UAA, UAG, and UGA.

• There is one start codon: AUG. However, note that GUG and UUG are
occasionally found as start codons.

• The Genetic Code is Universal. Although there are a number of


exceptions to this rule -- particularly in organelle systems -- the genetic code is
remarkably the same in all organisms. The most common exception is the use of
UGA as a codon for Tryptophan in mitochondria.