Secondary article
Article Contents
. Introduction . The Measurement and Manipulation of DNA Reassociation . DNA Reassociation Kinetics: Viruses and Bacteria . The Concept of Sequence Complexity . Eukaryotic Genomes: Multiple Kinetic Classes . Stringency of Hybrid Formation: Divergence among Repeated DNAs . Survey of Genomes: Representation of Different Kinetic Classes . Applications of Cot Procedures and Analysis . Rot Curves; Analysis of RNA Populations
Introduction
Native DNA is a double helix: it consists of two antiparallel strands joined noncovalently. (This double-helical DNA is termed duplex DNA.) Under normal physiological conditions, association of the two strands is greatly favoured over dissociation; however, at high temperatures the two strands will spontaneously dissociate. If the temperature is then returned to normal, the single strands will reassociate to form new duplexes. (Reassociation describes the behaviour of the DNA population as a whole. In most cases, the new duplexes are formed by strands that were not originally paired.) Analysis of reassociation rates, and of the thermal stability of resulting duplexes, has yielded valuable information about the types of DNA sequences that are present in genomes and how they are organized. Such analysis is termed Cot analysis. Dissociation of DNA duplexes into single strands is also termed melting or denaturation of the DNA. Association of single strands to form a duplex is also termed hybridization or reannealing of the DNA.
much more rapidly than does one whose reassociation is 50% complete at Cot 5 6000. The DNA of the rst solution can be taken to have much less sequence complexity. By convention, the DNA concentration is expressed in moles of nucleotides per litre and the time is expressed in seconds.
Usually, two characteristics of the reacted DNA solution are of interest: the amount of reassociation that has occurred, and the thermal stability of the resulting duplexes. The amount of reassociation is of interest because repeated sequences reveal themselves by reassociating more rapidly than single-copy DNA. The thermal stability (stability as the temperature increases) is of interest because duplexes can form between DNA strands that are not a perfect match. However, the greater the mismatch, the lower the temperature needed to reseparate the strands. This depression of dissociation temperature, relative to the dissociation temperature of native DNA, can be used to calculate the percentage of mismatch within populations of related DNA sequences. S1 nuclease Since highly repeated sequences are often interspersed with low-copy or single-copy sequences, reassociated DNA is likely to consist of duplex regions attached to singlestranded tails. These single-stranded tails increase the
1
amount of DNA that appears to have formed duplexes. The tails can be removed with S1 nuclease of Aspergillus oryzae, which selectively degrades single-stranded DNA. If the average fragment length is known, comparison of the amount of DNA in duplex structures before and after removal of the tails gives an indication of how repeated and single-copy sequences are interspersed. Hydroxyapatite chromatography Under certain conditions, hydroxyapatite (calcium phosphate hydroxide) binds double-stranded but not singlestranded DNA. Because of this, hydroxyapatite column chromatography can be used to measure both the amount of duplex DNA that has formed in a reassociation reaction, and its thermal stability. The thermal stability of the duplexes can be measured if the column is enclosed in a temperature-controlled water jacket. As the temperature of the column increases, duplex DNA bound to the hydroxyapatite dissociates into single strands, and is then eluted. UV absorption spectroscopy DNA absorbs ultraviolet light (UV light) whose wavelength is near 260 nm. The absorption by DNA increases as the DNA dissociates. The increase in absorption (termed hyperchromic shift) of native DNA varies with its G 1 C composition, but is about 27% for the DNA of higher organisms. As single strands reassociate, their UV absorption decreases (termed hypochromic shift). Changes in UV absorption allow monitoring of DNA dissociation and reassociation in solution. This allows the thermal stability of reassociated DNA duplexes to be measured without hydroxyapatite. Other methods Single-stranded and duplex DNA can be distinguished by both electron microscopy and density-gradient ultracentrifugation (single-stranded DNA is denser). Both techniques have been used to characterize partly reassociated genomes. In addition, radioactively labelled DNA is often used in reassociation experiments. Two specic uses of labelled DNA are described below.
involve radioactively labelling molecules of one of the two populations. The rst method involves physically immobilizing DNA single strands of one of the two populations on a solid support such as nitrocellulose paper. Molecules of the other population are radioactively labelled and remain free to diuse. When the reaction is nished, the nitrocellulose paper is removed from the solution, washed and then analysed for such characteristics as the total amount of bound radioactivity and the thermal stability of the binding (which presumably involves duplex formation between the immobilized DNA on the nitrocellulose paper and radioactively labelled DNA from the solution). The second method is to incubate a small amount of labelled DNA from one population (termed probe) with a large excess of DNA (usually at least a 10 000-fold excess) from the other population (termed driver). The results of the reaction are then monitored by assaying radioactivity. Duplexes formed entirely of probe DNA are negligible in number because the initial concentration of probe is very low and (usually) because most of the probe DNA will form duplexes with driver DNA before it has a chance to form probeprobe duplexes. Duplexes formed entirely of driver DNA are very common in such a reaction, but are not radioactive and hence do not interfere with monitoring of the probedriver duplexes. Shearing of the DNA For the results of Cot reactions to be interpretable, the DNA molecules present at the start of the reaction should be sheared to lengths of 200300 base pairs. If the DNA is not sheared, it will reassociate to form networks whose composition cannot be ascertained. This network formation results from the presence of highly repeated sequences dispersed throughout the genome. Shearing is generally done with the DNA still in duplex form. The best methods for shearing DNA include agitation in a Virtis homogenizer and squirting of the DNA solution through a needle valve at high pressure. Other methods include sonication, treatment with deoxyribonuclease and acid depurination followed by alkaline hydrolysis. The effect of salt concentration The presence of salt in the reassociation solution aects the rate of reassociation. Generally, the higher the concentration of monovalent cations, the faster the reassociation. (Divalent cations such as Mg2 1 are extremely potent accelerators of reassociation, but are usually omitted from reassociation reactions.) The standard solution for measuring reassociation reactions is 0.12 mol L 2 1 sodium phosphate buer, pH 7.0. Reassociation reactions measured at other salt concentrations are expressed in equivalent Cot (Ecot) units.
Practical considerations
Control over hybridization specificity Cot hybridization experiments often involve coincubation of two distinct DNA populations. In these cases, it is usually duplex formation between DNA single strands of dierent populations that is of interest; duplex formation by DNA strands of the same population is an unwanted and confounding side eect. Hence, two methods are used to monitor only the reassociations occurring between single strands of dierent populations. Both methods
2
Suppression of DNA composition effects A major potential complication in measurements of DNA thermal stability is its dependence on base composition. Other circumstances being equal, GC-rich DNA dissociates at a higher temperature than does AT-rich DNA. The distribution of GC base pairs throughout genomes is uneven and largely unknown; hence, variations in GC content could mimic or obscure more interesting eects on thermal stability due to base mismatch. The inuence of base composition can be minimized by dissociating the DNA in the presence of 2.4 mol L 2 1 tetraethylammonium chloride (TEACl). Unfortunately, TEACl cannot be used with hydroxyapatite. Other considerations Temperature, pH and DNA fragment length all aect DNA reassociation rates. Their eects must be considered in the planning of reassociation experiments and they must be carefully controlled as the experiment is performed.
Fraction reassociated (%)
Cot (mol
Figure 1 The time course of reassociation is shown for bacterial and for calf DNA. The amount of reassociation is plotted against the Cot. The time course for bacterial (Escherichia coli) DNA, almost all of which is single-copy DNA, plots as an S-shaped curve (red). If calf DNA were also single-copy DNA, it would plot as a similar curve diplaced to the right (green). The actual curve for calf DNA (blue) has a different shape, however. The shape of the curve indicates that the reaction takes place in two different stages, one early and one late; the midpoints of the two stages (broken vertical lines) are separated by a factor of 100 000. The early stage represents the reassociation of repeated DNA and the later stage that of single-copy DNA. (From Britten and Kohne, 1970).
would suggest, has given rise to the concept of sequence complexity. The sequence complexity of a genome is the total length of the dierent sequences it contains, measured in nucleotide pairs. For species with no repetitive DNA, the sequence complexity is equal to the genome size. The measured complexity of a genome can depend on the criterion at which measurements are performed, because repeats are often imperfect.
more copies per haploid genome. On the other hand, many families of tandem repeats are much smaller, containing as few as two members. Interspersed repeats form a third type of rapidly reassociating sequence. They are usually derived from transposable elements, although some simple-sequence interspersed repeats may arise spontaneously. Interspersed repeats range in number from only a few copies per haploid genome to hundreds of thousands of copies or more. The Alu interspersed repeat of the human genome, for example, is present in roughly 500 000 copies. Interspersed repeats are interspersed with low-copy or single-copy sequences and with each other. To a much lesser extent, they are also interspersed with satellite DNA repeats. A fourth category of repeated sequences are low-copy long repeats. The genes encoding the pigments for human redgreen colour vision, for example, are present in long, tandem repeats of 39 kilobase pairs. Each human X chromosome contains between two and six copies of the repeat (Nathans et al., 1992). A fth kinetic class consists of sequences that are present only once per haploid genome. These are single-copy sequences. Figure 1 illustrates how Cot curves revealed the existence of repetitive DNA. Calf DNA reassociates much more rapidly than it would if calf DNA were entirely single copy, and the reassociation curve has a shape indicative of at least two components.
been repeated cycles of expression and sequence drift within specic families. Divergence among satellite or interspersed repeats can be measured by Cot analysis. Diverged sequence families will reassociate to form duplexes containing mismatched base pairs. The greater the average divergence within a family, the more mismatches will occur in the reassociated duplexes. This mismatch can be estimated from the reduction in dissociation temperature that the reassociated duplexes show, compared with native DNA. In a typical measurement, a radioactively labelled recombinant DNA clone containing a member of the repeat family is dissociated, mixed with a great excess of dissociated genomic DNA, and allowed to reassociate. The dissociation prole of the reassociated duplexes is then compared with the dissociation prole of the native recombinant DNA clone. Very large DNA repeat families tend to contain many diverged members, suggesting perhaps that signicant time is needed for sequence families to acquire very many copies. Nevertheless, families of the same size can have very dierent average divergences. Less-divergent families are usually thought to be newer in origin; however, it is possible instead that divergence has been suppressed by natural selection or some other inuence.
Still other species have what could be termed an intermediate period interspersion pattern. Among these are chickens and ducks.
cated. On the other hand, the chorion gene region of D. melanogaster is overreplicated at least 10-fold in late-stage egg chambers. There are many species where ribosomal genes are overreplicated in some tissue or developmental stage. In addition, human cancers often contain overreplicated regions of DNA that aect their behaviour. Measurement of uneven replication is done by DNA hybridization and Cot analysis.
0 Fraction of DNA in duplex form 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 40 45 50 55 60 65
Temperature (C)
Figure 2 Dissociation profiles reveal sequence divergence within a species. For simplicity, the species is considered to be haploid and to have only single-copy DNA sequences. In assay no. 1, DNA from several individuals is dissociated, reassociated separately to a specified Cot value, mixed and then subjected to a dissociation profile (red line). In assay no. 2, the DNA samples are first mixed, then dissociated, reassociated to the specified Cot value and subjected to a dissociation profile (blue line). In assay no. 1, all duplexes are composed of single strands from the same individual. In assay no. 2, most duplexes (the fraction depending on the number of individuals tested) are composed of single strands from different individuals. The duplexes derived from different individuals contain base mismatches absent from duplexes derived from a single individual. The decrease in thermal stability that occurs when the DNA is from different individuals gauges DNA sequence polymorphism within the species. This technique also has been used to gauge sequence divergence in diploid species containing repetitive DNA.
Often, interspecies comparisons are restricted to singlecopy and low-copy DNA. This genomic fraction contains most of the genes and its Cot behaviour is relatively easy to interpret. However, highly repetitive sequences have also been used in interspecies comparisons. Cot analysis of closely related species has also shed light on evolution itself. The sequence divergence between humans and chimpanzees, for example, is less than 2% overall and only about 0.5% in the active coding sequences of functional nuclear genes. These surprisingly low values limit explanations of what the genetic dierence between humans and chimpanzees is, and how it evolved. In recent years, Cot analysis has been used to address more subtle evolutionary questions. In one study of parasitic tapeworms in ocean sh, in which seven tapeworm species are each highly specic to a single host, it was concluded that there is much less genetic variation between the tapeworm species than between the hosts (Verneau et al., 1997). This suggests that the tapeworm species have not coexisted with their hosts for long evolutionary periods, and that there has instead been host-switching in relatively recent times. Although direct analysis of DNA sequences has largely replaced Cot analysis in evolutionary studies, in some cases the two methods complement each other. In a recent construction of a phylogenetic tree for stork species (Slikas, 1997), DNADNA hybridization measurements and comparisons of a mitochondrial gene sequence each provided information that the other method could not. DNADNA hybridization measurements resolved relationships between distantly related species, but could not resolve relationships between more closely related species. Sequence comparisons of the mitochondrial gene used (cytochrome b) resolved relationships between closely related species, but not between more distantly related species.
A fourth is to measure the concentration of specic sequences of interest, such as the amount of a tumour virus RNA present in tumours (Jaenisch et al., 1975). Cot technology has largely been replaced by DNA sequencing. Although DNA sequence information is harder to obtain, it allows weak relationships between sequences to be detected more reliably. It also allows results from dierent experiments and laboratories to be compared, and it provides more clues about DNA function. Nevertheless, as the examples above indicate, there are still important uses for Cot techniques.
References
Britten RJ and Kohne DE (1970) Repeated segments of DNA. Scientic American 222(4): 2431. Craig JM, Kraus J and Cremer T (1997) Removal of repetitive sequences from FISH probes using PCR-assisted anity chromatography. Human Genetics 100(34): 472476. Daniell E, Kohne DE and Abelson J (1975) Characterization of the inhomogeneous DNA in virions of the bacteriophage Mu by DNA reannealing kinetics. Journal of Virology 15(4): 739743. Galau GA, Klein WH, Davis MM et al. (1976) Structural gene sets active in embryos and adult tissues of the sea urchin. Cell 7(4): 487505. Green MR, Chinnadurai G, Mackey JK and Green M (1976) A unique pattern of integrated viral genes in hamster cells transformed by highly oncogenic human adenovirus 12. Cell 7(3): 419428. Jaenisch R, Fan H and Croker B (1975) Infection of preimplantation mouse embryos and of newborn mice with leukemia virus: tissue distribution of viral DNA and RNA and leukemogenesis in the adult animal. Proceedings of the National Academy of Sciences of the USA 72(10): 40084012. Landegent JE, Jansen in de Wal N, Dirks RW, Baas F and van der Ploeg M (1987) Use of whole cosmid cloned genomic sequences for chromosomal localization by non-radioactive in situ hybridization. Human Genetics 77(4): 366370. Nathans J, Merbs SL, Sung C-H, Weitz CJ and Wang Y (1992) Molecular genetics of human visual pigments. Annual Review of Genetics 26: 403424. Sibley CG, Comstock JA and Ahlquist JE (1990) DNA hybridization evidence of hominid phylogeny: a reanalysis of the data. Journal of Molecular Evolution 30(3): 202236. Slikas B (1997) Phylogeny of the avian family Ciconiidae (storks) based on cytochrome b sequences and DNADNA hybridization distances. Molecular Phylogenetics and Evolution 8(3): 275300. Verneau O, Catzeis FM and Renaud F (1997) Molecular relationships between closely-related species of Bothriocephalus (Cestoda: Platyhelminthes). Molecular Phylogenetics and Evolution 7(2): 201207.
Further Reading
Britten RJ and Davidson EH (1976) Studies on nucleic acid reassociation kinetics: empirical equations describing DNA reassociation. Proceedings of the National Academy of Sciences of the USA 73(2): 415419. Britten RJ, Graham DE and Neufeld BR (1974) Analysis of repeating DNA sequences by reassociation. Methods in Enzymology 29: 363 418.