15: RNA
Q - Jack of all trades, master of none?
AF
DR
T
Chapter Map
RNA
Replication
AF Assists with
Information storage
Catalysis
DR
A 3’
C
C
5’ A
G C
C G
G C
G U
A U
U A
U A C U A
U GACAC G
D GA A
D CUCG CC U G U G T Ψ C
G U
GAGC G
G G A G
AG
C G
C G
A U
G C
A Ψ
C A
U Y
G AA
anticodon
15.1: The Question 301
T
15.1 The Question
AF
molecules.
For example proteins are manufac-
tured with the help of highly special-
ized molecular complexes called ribo-
somes but ribosomes themselves con-
tain proteins. How is this possible? Is
this another one of life’s chicken and
egg problems? To find out, we need
Figure 15.1: Modern cells are highly to have a closer look at how protein
complex structures. Source: NHGRI production actually works and take it
from there. It turns out that special
molecules of ribonucleic acid (RNA)
bring information about the structure of the protein from DNA to the ribosome
and that different RNA molecules bring the building blocks (amino acids) from
which the ribosome produces the protein. Initially, therefore, RNA appears in a
somewhat auxiliary role. Important but not glorious.
Some further inspection shows that RNA is quite flexible, and involved in a
rather large number of processes in the cell. Considering the hard work done by
DR
proteins and the center-stage role of DNA in storing genes, does this make RNA a Replication
jack of all trades? Perhaps an evolutionary relic soon to be replaced?
Information storage
To probe further, let us contemplate some essential aspects which one would
expect any cell, even the earliest ones to have.
If we look at early cells, we know that there must have been some kind of Catalysis
information storage to make heredity possible, some kind of (self-) replication to
allow for offspring and some kind of catalysis to speed up otherwise tardily slow
chemical reactions. Due to its complicated nature, interdependency between dif- Fig. 15.2: Key requirements for
ferent types of molecular groups is almost certainly something that has evolved life.
as a specialization to specific requirements like being the most efficient catalyst.
Consequently, we may surmise a single class of molecules that is at least adequate
in any of the three necessary functions to have originated first.
If we take a hint from modern cells we find that there are three types of poly-
mers to consider: DNA, RNA and proteins. As argued in chapter 14, only polymers
are considered to be suitable for functions like heredity in living systems. Now
clearly, one has to be careful when trying to draw conclusions about pre-modern
cells by starting from moderns cells. However, since the modern genetic appara-
tus in essence dates back some 3 billion years and since evolutionary processes do
have a clear path even when their details are unknown, it is reasonable to take the
modern cell as a starting point and work back from there.
Amongst DNA, RNA and proteins, RNA is the only type of molecule that
302 15: RNA
T
is known to be good albeit perhaps not always excellent with regards to all three
must-haves of early cells: information storage, self-replication and catalysis. DNA,
while excelling in information storage is at best marginal in catalysis (not enough
flexibility in its 3D structure) and self-replication (as a consequence of not being
able to catalyze the replication). Similarly, proteins excel in catalysis but are poor
choices for information storage (no base-pairing and hence hard to copy) and self-
replication (most 3D structures cannot be reversibly unfolded). Hence amongst
the three modern types of molecules, RNA seems to be good at many things but
AF
outstanding at nothing. Hence the chapter question Is RNA a jack of all trades
and a master of none?
In order to answer this question, we need to understand what RNA can do.
Therefore let us now look at RNA in the context of each of the three functions that
must have been available to some degree in early cells: replication, catalysis and
information storage.
T (DNA)
DR
It is therefore necessary to employ some sort of a mechanism which translates
the information about a protein that is stored in DNA as a nucleotide sequence
A- to the correct amino acid sequence found in a protein. It turns out that this is
U (RNA) achieved with the so-called genetic code discussed in section 19.10 in the context
of information processing where groups of three successive nucleotides encode for
a specific amino acid.
G-C Contrary to the huge differences between the building blocks of proteins and
DNA, DNA and RNA consist of very similar, if not identical monomers. Both DNA
and RNA are built up of nucleotides containing the bases adenine (A), guanine
Fig. 15.3: Nucleotide base-pairs (G), and cytosine (C), while DNA further uses thymine (T) and RNA uracil (U),
are A-T, G-C for DNA and A-U, the unmethylated form of thymine. Thus both DNA and RNA use four distinct
G-C for RNA.
nucleotides.
The sugar in the DNA nucleotides lack an oxygen atom in the five carbon
sugar ribose hence called deoxyribose that the RNA nucleotides do have. A special
property essential for the process of information storage and processing is that
these nucleotides can pair up in predetermined and generally fixed ways. The base
adenine (A) can pair up with the base thymine (T) or uracil (U) while the base
guanine (G) can pair up with cytosine (C). Since the pairs are formed between the
bases of these nucleotides, such pairs are generally referred to as base pairs.
The bonds between the base pairs are hydrogen bonds and as a consequence
quite weak. That means that they can fairly easily be broken if so desired.
15.3: RNA and catalysis 303
T
Let us now have a bit a closer look at how RNA assists in the process of ob-
taining a protein from a sequence of nucleotides in DNA. When a gene encoding a
protein needs to be expressed, first the relevant sequence of nucleotides is copied
onto a strand of so-called messenger RNA (mRNA) with the help of RNA poly-
merase such that the mRNA strand is exactly complimentary to the DNA being
copied. This process is called transcription. Now DNA has two complementary
strands so how how does RNA polymerase know which one to copy? RNA poly-
merase only works in one direction and thus which of the two strands is copied
AF
is predetermined. If this weren’t so, then all copyable regions would need to be
symmetric, clearly an undesirable situation.
At first, in what usually is referred
to as initiation, RNA polymerase binds 3’ 5’
to a specific region of the DNA iden- Direction of Movement
DN
tified by a certain sequence of nu-
A
cleotides (e.g. CAAT in eukaryotes). DNA
After stabilization with the help of a rewinding
small protein called factor sigma, the DNA
RNA polymerase separates the double unwinding
stranded DNA to form a bubble so that A
it can pair the first nucleotide monomer RN
Active site
with the beginning of the DNA se- 5’
RNA Polymerase
quence to be copied. It then moves
down the DNA pairing one nucleotide Figure 15.4: Schematic representation of
after another to the DNA while attach- transcription.
ing that nucleotide covalently to the
growing RNA sequence (this process is generally called elongation) as illustrated
DR
in Figure 15.4. Rather than leaving the growing RNA sequence paired with the
DNA, RNA polymerase detaches the newly formed RNA trailing a few nucleotides
behind the attachment site such that the DNA can rewind. Finally when the RNA
polymerase reaches a stop signal in the DNA, termination occurs. In eukaryotes,
the resulting RNA strand is then post-processed before being translated into a pro-
tein by a ribosome while in prokaryotes, the mRNA can be translated immediately.
It should come as little surprise that quite some energy is necessary to carry out
the process described in the previous paragraph. How is this energy supplied?
Rather than having a separate energy and nucleotide sources, RNA polymerase
uses energy-rich triphosphate versions of the nucleotides, i.e. ATP, UTP, CTP, GTP.
When these triphosphate nucleotides arrive at the RNA polymerase, the energy
stored in the phosphate-phosphate bonds is released by splitting the triphosphate
nucleotides into phospate groups and the needed RNA nucleotide monomer.
T
correct spatial arrangement so that the substrates are spatially suitably oriented for
the reaction to proceed at a fast pace. Furthermore, in many cases, an enzyme needs
to be able to (partially) change shape to e.g. first capture the substrates, then with a
slightly different conformation catalyze the reaction, and lastly change shape one
more time to release the product - quite akin to the workings of a tool like pair of
plyers.
In the same way, in order for RNA to be able to catalyze a large range of
different reactions it therefore needs to be able to assume many different shapes.
AF
Catalytic RNA molecules are often referred to as ribozymes by combination of the
words ribonucleic acid and enzyme.
Although RNA as such is single stranded, the fact that it is made up of nu-
cleotides that allow for complimentary base pairing means that an RNA sequence
can have some of its sections pair up with other of its sections thus creating intricate
three dimensional structures. Furthermore, it turns out that some short structural
elements are used quite frequently as illustrated in Figure 15.5
3’ 5’ 5’ 3’ 5’ 3’
5’ 3’
double strand
3’ 5’ 3’ 5
5’ 3’ ’ 5’ 3’
single nulceotide three nulceotide
single strand hairpin loop
buldge buldge
5’ 3’ 5’ 3’
3’ 3’ 5’
5’ 5’ 3’ base pair
DR
unpaired nucleotide
3’ 5’ 3’ 5’
three stem four stem
junction junction
In modern cells ribozymes are quite rare with the exception of the for life on
earth essential ribosome which contains relatively large ribosomal RNA molecules
generally abbreviated as rRNA.
In ribosomes, the messenger RNA is translated into proteins according to the
genetic code which matches sequences of three nucleotides to certain amino acids
by stringing these amino acids together with covalent bonds.
Although ribosomes also contain about 35% proteins besides rRNA (3 rRNA
molecules in prokaryotes and 4 rRNA molecules in eukaryotes), the catalytic ac-
tivity needed for joining amino acids into a protein is carried out by the rRNA.
In bacteria, only one type of RNA polymerase is used but in eukaryotes, there
are three types: RNA polymerase I is responsible for three of the four rRNA
molecules by transcribing a precuror rRNA which is then modified into the three
types of rRNA while RNA polymerase III directly transcribes the fourth rRNA
15.4: RNA and information storage 305
T
molecule. On the other hand, mRNA molecules are transcribed by RNA poly-
merase II which is similar to the RNA polymerase of bacteria.
Just like in the case of enzymes, ribozymes often have metal atoms to assist
them in their function. Due to the relative rarity of rybozymes versus enzymes
in modern cells, one could suspect that ribozymes are not as versatile. However,
experiments have shown that ribozymes can catalyze a great number of reactions.
The key difference with enzymes is that ribozymes appear to have in general lower
maximum reaction speeds. It is therefore quite conceivable that many reactions
AF
which are now catalyzed by enzymes once were catalyzed by ribozymes during
the earlier stages of evolution.
What we therefore see as with regards to the catalytic activity of RNA is that
while perhaps often simply adequate, in a case where it really counts, namely pro-
tein production, it is very capable.
T
up of RNA only. Indeed, viruses with both RNA and DNA are rather rare. Even
though it is arguable whether viruses are life forms or not, it is clear that informa-
tion is being transported. Therefore, at least under certain circumstances, RNA can
be a good information carrier.
In the context of viruses, it is notable that viral genomes can be single stranded
or double stranded, be they made up of DNA or RNA. Hence in viruses we find an
example of single stranded DNA.
#include<iostream>
int main()
{
int i;
int sum = 0;
for(i=1;i<=10;i++){ sum = sum + i; }
if(sum > 20){ std::cout << sum << endl; }
DR
return 0;
}
The first line indicates that a set of instructions defining input and output needs
to be used. The line int main() indicates the start of the code that needs
to be processed while the following two lines, int i; and int sum = 0,
indicate that we have an integer variable with the name i and also an in-
teger variable with the name sum that is initialized to 0. Much happens
in the next line for(i=1;i<=10;i++) { sum = sum + i; }. Firstly,
for(i=1;i<=10;i++) means that our integer variable i starts at 1, after
which the command { sum = sum + i; } is executed, and then i is increased
by 1 (this is indicated by i++). The part i<=10 in for(i=1;i<=10;i++)
means that the program should proceed to the next instruction (if(sum >
20) std::cout << sum << endl; ) when the integer i is larger than
10. The line if(sum > 20) std::cout << sum << endl; is a condi-
tional statement. The part if(sum>20) checks whether the value of the variable
sum is larger than 20. If so, the part std::cout << sum << endl; is exe-
cuted printing the value of sum on the monitor. If sum is smaller than or equal to
20, nothing happens. Finally, return 0; marks the end of the program.
In computers, a large collection of similar constructs eventually amounts to a
in a certain way unrecognizable outcomes such as a word processor.
15.5: RNA and computation 307
T
Translation
One of the key processes in a cell, the translation of the information stored in
messenger RNA into a protein, has a distinctly computational flavor to it. After
a messenger RNA strand has been transcribed, and if necessary processed, it pro-
ceeds to the ribosome where it is translated into a protein. This procedure is called
translation since specific sequences of nucleotides need to be ’translated’ into a
AF
corresponding amino acid. After all, in humans there are five times as many differ-
ent amino acids than there are nucleotides. To be more specific, three successive
nucleotides, referred to a a codon, represent a certain amino acid.
Since there are four different types
of nucleotides, there are in total 43 = attached amino acid
64 different codons. The table which (phenylalanine)
matches these codons with amino acids A 3’
as well as instructions start and stop is C
C
called the genetic code and shown in 5’ A
figure 19.12. G C
C G
The question of course is, how G C
G U
would the genetic code be processed? A U T loop
U A
One option would be for the ribosome D loop U A C U A
U GACAC G
to recognize the codons and then grab D GA A
D CUCG CC U G U G T Ψ C
a suitable amino acid. This is, how- G U
G
GAGC
ever, not quite how nature does it. In G G A G
AG
C G
nature, amino acids destined for pro- C G base-pair
A U
tein production are attached to special- G C
A Ψ
DR
ized RNA molecules called transfer C
anticodon
A
RNA (tRNA). The tRNA molecules, U Y loop
G AA
an example of which is shown in Fig-
anticodon
ure 15.7, have a complementary se-
quence of bases called an anticodon Figure 15.8: The tRNA molecule for the
on one side. This anticodon exactly amino acid phenylalanine.
matches a codon and since a specific
tRNA molecule can only have one kind
of amino acid attached to it, if the ribosome finds a tRNA whose anticodon matches
the codon of the mRNA currently being processed, then the ribosome has the cor-
rect amino acid for the protein to be manufactured.
Since there are significantly more codons than different types of amino acids,
even taking account start and stop signals, many amino acids can be represented by
more than one codon. Of course, it would be possible that nature simply doesn’t use
all the possible 64 codons but this is not the case. All codons are used. Since base-
pairing needs an exact match between codon and anticodon, some amino acids are
therefore carried by more than one type of tRNA.
A notable feature of tRNA is that after its synthesis, some bases are chemically
modified. For example the base ’D’ in the thereafter named D-loop is a modifica-
tion of uracil.
Since being read as such does not destroy an mRNA strand, it can in principle
308 15: RNA
T
be read over and over again. When that happens, enormous amplification can occur
and a single gene can yield a huge quantity of protein.
In contrast, rRNA (after suitable processing) is the final product when being
transcribed and consequently in order to manufacture the 10 million or so ribo-
somes a mammalian cell needs, multiple copies of the necessary rRNA genes exist
in the genome. For example humas have about 200 rRNA genes copies per haploid
genome!
AF
Control
In computational processes, control is essential, especially if constant adaptation
to changing environments is necessary. Due to their versatility, RNA molecules are
involved in a number of regulatory activities such as RNA processing, modification
and editing. Let us now have a look at some interesting classes of RNA molecules:
snRNA
The so-called small nuclear RNA (snRNA) is a class of RNA molecules in-
volved in RNA splicing and the regulation of transcription factors.
They are also involved in the regulation of telomeres. Telomeres are highly
repetitive sequences of DNA that can be found at the terminal ends of chromo-
somes. In general telomeres become shorter as a cell divides since the DNA poly-
merase responsible for replication cannot proceed all the way to the end of a strand.
By having telomeres, none of the essential genetic material will be missed at the
end of a chromosome when a cell divides. However, the telomeres do become
shorter and are therefore thought to play a role in ageing. A subclass of snRNA,
DR
the small nuleolar RNAs (snoRNA) plays an important role in the chemical modi-
fication of certain types of RNA molecules like rRNA and tRNA.
snRNA are generally about 150 nucleotides long and form complexes with
proteins called small nuclear ribonucleoproteins (snRNP) to fulfill their function.
eRNA
In general the uses of promoters and inhibitors leads to a kind of on-off reg-
ulation that resembles a digital process. It appears that by interfering with the
transcription apparatus, efference RNA (eRNA), allows for a somewhat analog
fine grain control.
tmRNA
Common in all bacteria but thus far not found in eukaryotes is a class of RNA
molecules that have both tRNA as well as mRNA regions. Their main purpose is
to deal with ribosomes where the production of a protein has become stuck. Unfin-
ished proteins could be entirely useless but could also be damaging to a cell as their
function is unpredictable. It is therefore important to identify stuck ribosomes and
tag the incomplete protein for destruction. This is done by the transfer-messenger
15.6: RNA genes 309
T
RNA (tmRNA).
siRNA
AF
For example if a certain type of mRNA strand has already been transcribed but
needs to be stopped from being translated into a protein by a ribosome, then the
so-called RNA induced silencing complex (RISC) can be used as follows: one of
the strands of siRNA called the guide strand is incorporated into the RISC complex
which subsequently binds to the complementary regions on an mRNA strand. The
RISC complex then destroys the mRNA thus silencing it.
miRNA
15.7 Overview
The main RNA functions as with regards to protein production are summarized in
table 15.1 Some of the other non-coding RNA not involved in protein synthesis are
listed in table 15.2
Given names such a small and micro one might be tempted to assume that that
all non-coding RNA is relatively pretty small. This is not the case however. For
example, in female mammals, one of the two X chromosomes is inactivated (so as
to have the same number of active X chromosomes as males - namely one) by an
310 15: RNA
T
Molecule Abbreviation Function
transfer RNA tRNA brings amino acid monomers to ribosome
messanger RNA mRNA brings instruction for protein to ribosome
ribosomal RNA rRNA catalyzes the joining of amino acids
Table 15.1: Key non-coding RNA types in protein manufacture
AF
Molecule Abbreviation Function
small nuclear RNA snRNA regulatory functions in eukaryotic nuclei
efference RNA eRNA gene regulation
transfer-messanger RNA tmRNA identifies faulty ribosomal activity in bacteria
small interfering RNA siRNA regulates gene expression and combats viruses
micro RNA miRNA control gene expression
Table 15.2: Key non-coding RNA types in protein manufacture
RNA gene name Xist which is 18,000 base pairs long. Nevertheless, the majority
of ncRNA is rather small.
Another issue is the number of different ncRNAs. Humans have slightly over
20,000 genes so it is interesting to consider how many RNA genes there are. Al-
though this is currently not known, some expectations go as high as 100,000.
T
can answer our chapter question somewhat tongue in cheek by RNA, master of all
trades, much better than grand-master of one!
15.9 Exercises
1. Give two reasons why RNA might have preceded DNA in the evolution of
life on earth.
AF
2. How much energy does it take to break a bond between two nucleotides in
RNA?
8. The catalytic activity of the ribosome is carried out by which type of biopoly-
mer?