Anda di halaman 1dari 24

Blackwell Science, LtdOxford, UKMMIMolecular Microbiology1365-2958Blackwell Publishing Ltd, 2003492277300Review ArticleProphage genomicsS.

Casjens

Molecular Microbiology (2003) 49(2), 277300

doi:10.1046/j.1365-2958.2003.03580.x

MicroReview Prophages and bacterial genomics: what have we learned so far?


Sherwood Casjens Department of Pathology, University of Utah Medical School, 30 North 1900 East, Salt Lake City, UT 841322501, USA. between related prophages that reside at different locations in a bacteriums genome. In addition, many genes in defective prophages remain functional, so a signicant portion of the temperate bacteriophage gene pool resides in prophages.

Epigraph There is something fascinating about science. One gets such wholesale returns of conjecture out of such a triing investment of fact. Mark Twain 1883 Life on the Mississippi Summary Bacterial genome nucleotide sequences are being completed at a rapid and increasing rate. Integrated virus genomes (prophages) are common in such genomes. Fifty-one of the 82 such genomes published to date carry prophages, and these contain 230 recognizable putative prophages. Prophages can constitute as much as 1020% of a bacteriums genome and are major contributors to differences between individuals within species. Many of these prophages appear to be defective and are in a state of mutational decay. Prophages, including defective ones, can contribute important biological properties to their bacterial hosts. Therefore, if we are to comprehend bacterial genomes fully, it is essential that we are able to recognize accurately and understand their prophages from nucleotide sequence analysis. Analysis of the evolution of prophages can shed light on the evolution of both bacteriophages and their hosts. Comparison of the Rac prophages in the sequenced genomes of three Escherichia coli strains and the Pnm prophages in two Neisseria meningitidis strains suggests that some prophages can lie in residence for very long times, perhaps millions of years, and that recombination events have occurred
Accepted 3 April, 2003. *For correspondence. E-mail sherwood.casjens@path.utah.edu; Tel. (+1) 801 581 5980; Fax (+1) 801 581 3607.

Prophage biology The genomes of cellular organisms are often littered with both functional and defunct viral chromosomes. For example, the human genome is about 8% retrovirus genes (Lander et al., 2001), and some bacterial genomes may be composed of as much as 20% bacteriophage genes (Casjens et al., 2000). Clearly, in order to understand these genomes completely, we must be able to recognize these viral genes and understand any effects they may have on the host cells. Bacteriophages, the viruses that infect bacteria, are extremely varied. Different types of phage virions may carry single- or double-stranded (ds)DNA or RNA, and the details of their replication cycles reect this diversity. The dsDNA phages, the subject of this review, can be grossly divided into lytic and temperate virus groups, each of which is extremely diverse. Lytic dsDNA phages infect bacterial cells and always programme the synthesis of progeny virions, which are then released from the dead, infected cell. Temperate dsDNA phages, on the other hand, although they are able to propagate lytically under some circumstances, are also able to establish a stable relationship with their host bacteria in which the phage DNA is replicated in concert with the hosts chromosome, and virus genes that are detrimental to the host are not expressed. This long-term, apparently benign, association of bacteriophages with bacterial cells was rst described in the 1920s (Gildmeister and Herzberg, 1924; Bail, 1925; Bordet, 1925), but its acceptance and an understanding of the real nature of this association took many years (Lwoff, 1953; 1966). Subsequent work has shown that, during this association, the phage DNA (now called the prophage) is usually physically integrated into one of the native replicons of the host (Campbell, 1962; Freifelder and Meselson, 1970); however, a few phages, such as P1,

2003 Blackwell Publishing Ltd

278 S. Casjens N15, LE1, f20 and fBB-1, are not integrated and exist as circular or linear plasmids (Ikeda and Tomizowa, 1968; Ravin and Shulga, 1970; Inal and Karunakaran, 1996; Eggers et al., 2000; Girons et al., 2000). Different individuals of a given integrating temperate phage always have the same unique integration site on the phage chromosome, but may or may not always integrate their DNA at precisely the same site in the bacterial chromosome. In Escherichia coli, for example, phage l DNA normally integrates at only one site, phage P2 DNA can quite readily integrate into at least 10 sites (Barreiro and HaggardLjungquist, 1992), and phage Mu DNA integrates essentially randomly into host DNA (Harshey, 1988). Bacteriophage virions can be released from cells containing an intact prophage by a process called induction, during which prophage genes required for lytic growth are turned on and progeny virions are produced and released from the cell. Cells carrying a prophage are called lysogens because of this potential to induce and lyse. Induction can happen spontaneously and randomly in a small fraction of the bacteria that harbour a given prophage, or specic environmental signals can cause simultaneous induction of a particular prophage in many cells. A number of the important model system dsDNA tailed phages were rst discovered after they were released from lysogenic bacteria in the laboratory; for example, phages l (Lederberg, 1951), P22 (Zinder and Lederberg, 1952), P1 and P2 from the same E. coli strain (Bertani, 1951), P4 (Six, 1963) and N15 (Ravin, 1968) were originally isolated in this manner. Most genes, including those required for lytic growth and virion production, are turned off in integrated prophages but, in the few studied cases, plasmid prophages typically express most of their non-lysis, non-virion assembly genes. Some of the genes that are expressed from the prophage in a lysogen are lysogenic conversion genes, which alter the properties of the host bacterium. The products of these genes can have very important effects on the host bacterium, which range from protection against further phage infection to increasing the virulence of a pathogenic host. This subject has been frequently and recently reviewed and will not be covered in depth here (see Bishai and Murphy, 1988; Cheetham and Katz, 1995; Waldor, 1998; Miao and Miller, 1999; Boyd et al., 2001; Banks et al., 2002; Boyd and Brussow, 2002; Wagner and Waldor, 2002; Casjens and Hendrix, 2003). The presence or absence of prophages can account for a large fraction of the variation among individuals within a bacterial species, and phages are likely to be important vehicles for horizontal transfer of genetic information between bacteria (Ohnishi et al., 2001; Banks et al., 2002; Casjens and Hendrix, 2003). Clearly, in order fully to understand the information in bacterial whole-genome nucleotide sequences, it is essential that we be able to recognize and understand prophages when they are present. The medical and evolutionary importance of prophages makes this all the more urgent. Types of prophages and related entities Fully functional prophages can induce a round of lytic growth to initiate; however, not all prophage-like entities in bacterial genomes encode functional bacteriophages. Four additional types of prophage-related entities have been characterized: defective and satellite prophages, bacteriocins and gene transfer agents. (i) Defective prophages (sometimes called cryptic prophages, although in theory this term could include fully functional prophages that have never been induced to lytic growth) are prophages that are in a state of mutational decay. Although they may still harbour functional genes, defective prophages are unable to programme the full phage replication cycle (reviewed by Campbell, 1994; 1996). Several defective prophages in E. coli K-12, Rac (Kaiser and Murray, 1979), e14 (Greener and Hill, 1980), DLP12 (Lindsey et al., 1989) and QIN (Espion et al., 1983) (Table1) and in Bacillus subtilis, 186 (PBSX; Krogh et al., 1996) and SKIN (Takemaru et al., 1995; Mizuno et al., 1996), were discovered before genomic sequencing became possible and have been studied in some detail. Each of these harbours some functional genes. For example, Rac encodes the RecE homologous recombination system (Kaiser and Murray, 1979), QIN harbours intact cell lysis genes (Espion et al., 1983), and PBSX encodes the synthesis of a virion-like particle (Okamato et al., 1968). (ii) Satellite phages are otherwise functional phages that do not carry their own virion structural protein genes, and have chromosomes that have been evolutionarily designed to be encapsidated by the virion proteins of other specic phages. The best understood example of such a parasitic relationship is that between satellite phage P4 and fully functional phage P2 (see also Ruzin et al., 2001). P4 carries genes that encode proteins that replicate its own DNA, which turn on the virion protein genes of the P2 prophage and modify the P2 head to be smaller and only able to accommodate the smaller P4 chromosome (Bertani and Six, 1988). (iii) Some bacteria produce bacteriocins (devices that kill other bacteria) that resemble phage tails (e.g. Gratia, 1989; Thaler et al., 1995; Zink et al., 1995; Nguyen et al., 1999; Nakayama et al., 2000). Two of these that have been characterized, the type F and R bacteriocins of Pseudomonas aeruginosa PAO1, are similar to phage l tails and phage P2 tails respectively (Nakayama et al., 2000). The gene clusters encoding them have nearly complete sets of l and P2 tail gene homologues in nearly the same order as they are found in those phages. (iv) Finally, gene transfer agents (GTAs) are encoded by some bacterial genomes (Yen et al., 1979; Starich et al., 1985; Rapp and Wall, 1987; Humphrey
2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277300

Prophage genomics 279


Table 1. Prophages in three E. coli genomes. E. coli K-12a CP4-6 DLP-12 lc e14 Rac QIN CP4-44b PR-X CPS-53d Eutc CP4-57b
b

E. coli O157 EDL933a CP-933I, CP-933H CP-933K CP-933M 933W CP-933N CP-933C CP-933X (2?) CP-933O (24) CP-933R CP-933P CP-933T CP-933U CP-933V CP-22d CP-933Y

E. coli O157 Sakaia Sp1, Sp2 Sp3 Sp4 Sp5 Sp6 Sp7 Sp8 Sp9 Sp10 Sp11, Sp12 Sp13 Sp14 SpLE2 Sp15 Sp16 Sp17 Sp18

Phage type Lambdoid, P4-like Lambdoid Lambdoid Lambdoid Lambdoid Lambdoid Unstudied type Lambdoid Lambdoid Lambdoid Lambdoid Somewhat P2-like Lambdoid Unclear P2-like, highly deleted Lambdoid P22-like, highly deleted P22-like, highly deleted Lambdoid Mu-like

a. Each row represents a different integration site. In some cases (e.g. QIN site), rearrangements have made it difcult to tell whether they have identical attachment sites. This list was compiled from the following publications and references therein: Blattner et al. (1997); Rudd (1999); Hayashi et al. (2001b); Ohnishi et al. (2001); Perna et al. (2001; 2002). Elements are listed in order clockwise around the standard E. coli map; see TableS1 in Supplementary material for a list of the genes thought to lie within each prophage. Only l and 933W have been shown to be fully functional phage genomes. Duplicate morphogenesis functions suggest that CP-933X may be evolved from two original prophages. The correspondence between Sp9 and Sp11 + Sp12 and CP-933O and CP-933P, respectively, is complex because of an inversion in the EDL933 strain lineage that involved these prophages, and other rearrangements that have occurred among the prophages (Perna et al., 2002). b. These elements are possibly phage derived, but do not carry any uniquely phage-derived genes. CP4-6, CP4-44 and CP4-57 of K-12 and SpLE2 of Sakai are probably phage derived, but convincing proof of this is lacking (see text); in Sakai, SpLE1 and SpLE4, not shown in this table, have some similarity to the CP4 elements (Blattner et al., 1997; Rudd, 1999; Hayashi et al., 2001b). The CP4 elements are not closely related to the prophages at the same location in the other strains. c. Phage l was cured from the sequenced version of E. coli K-12 (Blattner et al., 1997); Eut [also called CPZ-55 (Rudd, 1999) and CP-unnamed (Hayashi et al., 2001b)] is missing from some extant K-12 laboratory strains (Kofoid et al., 1999); Rac and e14 are also excisable (Evans et al., 1979; Brody et al., 1985). d. CP-22 is a provisional name for a region not formally identied as a prophage by Perna et al. (2001). CPS-53 (Rudd, 1999) has also been called KpLE1.

et al., 1997). GTAs are tailed phage-like particles that encapsidate random fragments of the bacterial genome. These particles cannot propagate as viruses, as the vast majority of the particles do not carry the genes that encode the GTA and, in the cases that have been studied, those that do contain a DNA fragment that is too short to include the full set of GTA genes. These virion-like particles can deliver their DNA payload into another bacterium of the same species, where the DNA can replace the resident cognate chromosomal region by homologous recombination. The best characterized GTA is encoded by a cluster of genes on the Rhodobacter capsulatus chromosome (Lang et al., 2000; Lang and Beatty, 2001). Although not all the proteins encoded by the genes in this GTA cluster have been characterized in detail, the number of genes involved make it likely that it will contain the genes for the structural components of the virion-like particles and little else. Do the tail-like bacteriocins and GTAs have a positively selected function or are they simply defective prophages that happen by chance to be able to perform these functions that serve no important purpose for the host? There are several arguments for such a
2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277300

selected function. (i) They are often universally present in species that harbour them. The Brachyspira hyodysenteriae GTA has been found in every isolate of that species that has been examined (T. Stanton and G. Thompson, personal communication), as has the R. capsulatus GTA (Wall et al., 1975), and the F and R bacteriocins were present in all of the nine P. aeruginosa strains examined (Nakayama et al., 2000). (ii) They do not appear to be in a state of evolutionary decay as pseudogenes (used here to mean any mutationally inactivated gene) have not been identied within them; and (iii) expression of their genes appears to be regulated differently from the phages to which they are related (Nakayama et al., 2000). In spite of this accumulated knowledge, it is often not possible to distinguish among functional prophages and these prophage-like entities by simply examining their nucleotide sequences. For example, a tail gene cluster in a bacterial chromosome could encode a bacteriocin or simply be what remains of a partly deleted prophage. Induced PBSX encapsulates host DNA, and its virion-like particles kill B. subtilis cells that do not carry PBSX (McDonnell et al., 1994) (it has not been demonstrated to be able to trans-

280 S. Casjens duce other bacteria with its packaged DNA, but this remains a possibility). Is PBSX a GTA, a bacteriocin or a decaying prophage? Because of such current unknowables, in this discussion I will usually not attempt to distinguish fully functional prophages from defective prophages, satellite prophages, GTAs or phage-like bacteriocins and will include them all within the term prophage. I will only consider the temperate dsDNA-tailed phages of bacteria, although temperate phages with ssDNA containing lamentous virions are known that integrate as dsDNA prophages (Waldor and Mekalanos, 1996; Chang et al., 1998; Davis et al., 1999; Lin et al., 2001; da Silva et al., 2002), and not yet well-studied lytic and temperate dsDNA tailed phages that infect Archaea are known (for example, see Pster et al., 1998; Klein et al., 2002; Tang et al., 2002). Prophage abundance Should we expect prophages to be present in bacterial genome sequences and, if so, how many? In addition to the anecdotal observation that many of the phages currently under study were isolated after their release from lysogenic bacteria, more systematic studies have indicated that prophages can be very common. Osawa et al. (2000) found that 51 different functional phages were released from 27 E. coli strains, and Schicklmaier et al. (1998) found that 83 of 107 E. coli strains released at least one functional phage type. Schmieger and coworkers (Schicklmaier et al., 1998; Schmieger and Schicklmaier, 1999) examined 173 Salmonella enterica (serovar Typhimurium) isolates and found that 136 released functional phages. Indeed, the LT2 isolate of S. enterica that is commonly used in laboratory studies carries four intact, fully functional prophages (Yamamoto, 1967; 1969; Figueroa-Bossi and Bossi, 1999; McClelland et al., 2001). Mitomycin C was found to induce the synthesis of functional phages from seven of 170 Yersinia strains (Popp et al., 2000) and phages or phage-like particles from 38 of 68 Gram-positive dairy Streptococcus strains (Huggins and Sandine, 1977). Of course, all such searches nd a minimum number of functional prophages, as they depend upon successful induction and use of permissive indicator strains. Other studies have asked about the presence of particular prophage features in multiple isolates of the same bacterial species. In the E. coli chromosome, the attachment site of the l-like (lambdoid) phage 21 is occupied by phage-like sequences in 28 of 77 strains examined (Wang et al., 1997), the lambdoid phage Atlas attachment site is occupied in 23 of 72 strains examined (Milkman and Bridges, 1990; Sandt and Hill, 2000), and four of 33 strains examined have something (probably l-like in two cases) inserted at the phage l attachment site (Kuhn and Campbell, 2001). Hybridization of DNA from various bacterial strains with authentic phage or prophage DNA probes has shown that related prophages are often present in a substantial fraction of other isolates of the same species [a few of the many such analyses are as follows: Gram-negative enterobacteria (Anilionis et al., 1980; Lindsey et al., 1989; Faubladier and Bouche, 1994; Agron et al., 2001), Wolbachia (Masui et al., 2000) and Haemophilus (Chang et al., 2000); spirochaete Borrelias (Casjens et al., 1997); Gram-positive Streptococcus (Ramirez et al., 1999; Beres et al., 2002; Smoot et al., 2002) and diphtheria-causing Corynebacterium (Pappenheimer and Murphy, 1983)]. Finally, a substantial fraction of searches for strain-specic bacterial sequences for use in the typing of related bacterial isolates have found prophage sequences [e.g. enterobacteria (Emmerth et al., 1999; McClelland et al., 2000), Campylobacter (Dep et al., 2001), Neisseria (Klee et al., 2000), and Lactobacillus (Brandt et al., 2001)]. Clearly, prophages are common in many, widely diverse bacterial species. A plethora of putative prophages in bacterial genome sequences In spite of this anecdotal evidence that prophages can be common, their abundance in bacterial genome sequences came as a bit of a surprise to many microbiologists. In the 14 published g-Proteobacteria genomes, the bacterial phyla with phages that are the best studied and in which prophages are therefore most easily recognized, the number of convincing prophages is high. Eleven of these genomes, those of S. enterica serovars Typhi and Typhimurium, two Yersinia pestis strains, Shigella exneri, two Xylella fastidiosa strains and four E. coli strains each carry between seven and 20 prophages (Blattner et al., 1997; Simpson et al., 2000; Hayashi et al., 2001a; McClelland et al., 2001; Parkhill et al., 2001a,b; Perna et al., 2001; Deng et al., 2002; Jin et al., 2002; Welch et al., 2002; Van Sluys et al., 2003), and the Shewanella oneidensis, Xanthomonas axonopodis and Xanthomonas campestris genomes contain three, two and one recognized prophages respectively (Heidelberg et al., 2002; da Silva et al., 2002). Bacteria from other phyla also often harbour multiple prophages. For example, among the Gram-positive bacteria, the sequenced genomes of B. subtilis, Clostridium acetobutylicum, Clostridium perfringens, Clostridium tetani, Lactococcus lactis, Listeria innocua, Listeria monocytogenes, Staphylococcus aureus and Streptococcus pyogenes strains all carry multiple, easily recognizable and, in many cases, largely intact prophages (Kunst et al., 1997; Bolotin et al., 2001; Ferretti et al., 2001; Glaser et al., 2001; Kuroda et al., 2001; Nolling et al., 2001; Beres et al., 2002; Shimizu et al., 2002; Smoot et al., 2002; Bruggemann et al., 2003). The phages that infect
2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277300

Prophage genomics 281 B. subtilis and L. lactis are the best studied in this rather diverse group. B. subtilis 186 contains three very convincing and largely intact prophages plus at least two smaller possible prophage remnants. Of its three unambiguous prophages, one, SPb, is a fully functional 134kbp phage genome (Lazarevic et al., 1999; it is the largest known temperate phage), whereas the other two, PBSX and SKIN, are defective (Krogh et al., 1996; Mizuno et al., 1996). At least two of the six L. lactis IL1403 prophages are fully functional (Chopin et al., 1989; 2001). Prophages can make up a signicant fraction of these genomes; E. coli O157 Sakais 18 recognized prophages make up about 12% of its chromosome (Ohnishi et al., 2001), and the six prophages in Streptococcus pyogenes M3 MGAS 315 make up about 12% of its chromosome (Beres et al., 2002). Phages of other phyla have been studied in less detail, but the spirochaete Borrelia burgdorferi B31s multiple plasmid prophages may constitute as much as 20% of its genome (Casjens et al., 2000). I emphasize that, although it is clear that many prophages are present in bacterial genomes, our current knowledge is far from complete, and some of the interpretations made here may have to be revised in the future. Although prophages are common in bacterial genomes, they have not been found in every individual or in every species. Among the 82 currently published and annotated bacterial genome sequences, 51 harbour apparent prophages and, of these, all but two have integrated prophages. At least 230 prophages are currently recognizable in these 51 genomes. These prophages are listed, along with the genes that they encompass in TableS1 in the Supplementary material. As even the most conserved phage-specic genes (below) are not always recognizable with current methods or might have been deleted, this is a minimum estimate, especially in bacterial phyla in which phages have not been studied in detail. The 31 bacterial genome sequences that contain no recognized prophages are largely clustered at the lower end of the bacterial genome size range (Fig.1). Two of the smallest genomes that have prophages, B. burgdorferi B31 and Chlamydia pneumoniae AR39, are exceptions that prove the rule, in that the prophages they harbour are plasmids (Casjens et al., 2000; Read et al., 2000). The absence of integrated prophages in small-genome bacteria could reect the evolutionary pressure to remove non-essential chromosomal DNA that led to the reduction in the size of their genomes (Lawrence et al., 2001). A few of the larger bacterial genome sequences, for example those of the high G+C Gram-positive bacteria such as Mycobacterium (4.4mbp) and Streptomyces (9.07mbp) have relatively few convincing prophages (Fleischmann et al., 2002; Bentley et al., 2002). In addition, P. aeruginosa PAO1 (6.3mbp) carries only two tail-like bacteriocins, and Sinorhizobium meliloti 1021 (6.7mbp) has no recognized prophages (Stover et al., 2000; Galibert et al., 2001). In some cases, temperate phages that infect these bacteria are known, making it less likely (but not impossible) that prophages are present in the genomes but remain unrecognized. For example, temperate phage fC31 of Streptomyces has been characterized (Smith et al., 1999), and P. aeruginosa phages are known that are similar to the well-studied E. coli phages l and P2 [e.g. phages D3 (Kropinski, 2000) and fCTX (Nakayama et al., 1999) respectively]. Perhaps some bacteria have devised mechanisms to avoid such parasites or, by chance, individuals with no integrated phage genomes were chosen for sequencing. It should also be noted that, if laboratory bacterial growth conditions cause frequent induction of a resident prophage, this will impose an articial selection for derivatives that have lost the prophage. This has apparently happened for the prophages Gifsy-1 and Gifsy-2 in some laboratory strains of S. enterica LT2 (Bunny et al., 2002).

Fig. 1. Putative prophages in sequenced bacterial genomes. The number of recognizable prophages in each of the 82 published bacterial genome sequences is indicated. Closed circles represent genomes with only integrated prophages, and open circles indicate genomes with prophage plasmids (Borrelia burgdorferi B31, 12 prophages; Chlamydia pneumoniae AR39, one prophage). These probably represent minimum prophage numbers, as some may not be currently recognizable. The individual prophages in each genome sequence are delineated in TableS1 (Supplementary material).

2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277300

282 S. Casjens The genetic structure of prophages As nearly 100 complete sequences of fully functional dsDNA tailed phage genomes have been determined, it might seem to be a trivial exercise to search for homologues of known phage genes in bacterial genome sequences and thus identify prophages; however, there are confounding factors. The most important of these factors is the extreme diversity of the dsDNA tailed phages (e.g. Casjens et al., 1992; Hendrix et al., 1999). The phages that infect the enteric bacteria E. coli and Salmonella are the most intensively studied. Yet even today, even the sequence of a new phage that is closely related to their well-characterized phages is expected to have novel genes. For example, our recently determined sequence of the genome of phage ES18, a typical lambdoid phage that infects S. enterica (serovar Typhimurium), has about 20 novel genes out of 75 total predicted genes (M. Pedulla, R. Hendrix, G. Hatfull and S. Casjens, unpublished). Prophages in less well-studied bacterial phyla can be expected to contain a majority of novel genes (e.g. 40 of 52 predicted genes in the convincing prophage RadMu in the Deinococcus radiodurans R1 genome have no known homologue; Morgan et al., 2002). The genomes of most phages that are closely related to one another can be described as having a mosaic relationship, as comparison of any two individuals shows patches of (sometimes very high) sequence similarity separated by non-homologous regions. The notion that such mosaicism has arisen by horizontal transfer of genetic material among the tailed phages has been discussed extensively (Susskind and Botstein, 1978; Botstein, 1980; Campbell and Botstein, 1983; Casjens et al., 1992; Campbell, 1994; 1996; Hendrix et al., 1999; Lucchini et al., 1999; Juhala et al., 2000; Moreira, 2000; Desiere et al., 2001; Brussow and Hendrix, 2002; Lawrence et al., 2002). Such mosaicism is strikingly demonstrated by the relationships among the well-studied phages l, P22 and N15, all of which have historically been included in the lambdoid phage group. Figure 2 shows that P22 and l have similar but mosaically related right halves (early regions) but very different left halves (late operon/ virion protein genes), whereas N15 and l have very similar left halves and little similarity in their right halves. A curious result of this is that P22 and N15 are both considered to be lambdoid phages, but they are almost completely non-homologous and only distantly related in their few homologous genes (Ravin et al., 2000). The genetic diversity of phages has only been studied among those that infect the Gram-negative g-Proteobacteria and the Gram-positive Firmicutes, and these are both far from attaining saturation. Nonetheless, comparison of phages with very similar transcriptional programmes that infect gProteobacteria, such as the lambdoid phages of E. coli, phages P22, Gifsy-1, Gifsy-2, Fels1 and ES18 of S. enterica (McClelland et al., 2000; Pedulla et al., 2003; S. Casjens, R. Hendrix and M. Pedulla, unpublished), Sf6 and SfV of S. exneri (Allison et al., 2002; S. Casjens, A. J. Clark, W. Inwood and R. Moreno, unpublished), prophages XfP1 and XfP2 of X. fastidiosa (Simpson et al., 2000), prophage lSo of Shewanella oneidensis (Heidelberg et al., 2002) and phage D3 of P. aeruginosa (Kropinski, 2000) suggest that exchanges among them have taken place such that quite similar genes can be present even in distantly related phages within this group. There have also been recent exchanges of genetic material between very different phages that infect the same host. For example, the E. coli temperate phage l and large lytic phage T4 have tail bre assembly genes that are similar in sequence and functionally interchangeable (George et al., 1983; Montag and Henning, 1987). Although genes can be exchanged among distantly related phages with the same host and among phages with different host species, two phages of the same type are more likely (but not guaranteed) to have a higher proportion of more closely related genes if they infect closely related hosts. The lessons for this discussion are that (i) horizontal exchanges are common among the dsDNA tailed phages, so it will not be surprising to nd similar mosaic relationships among prophages that are found in bacterial genome sequences; and (ii) prophages in the chromosomes of bacteria that are distantly related to the above two phyla may be very different from known phages and so be much more difcult to recognize.

Fig. 2. Temperate phage genome mosaicism three unrelated lambdoid phages. The genes on phage P22, l and N15 virion chromosomes are shown with rectangles representing genes; grey rectangles are genes that are transcribed rightward and white are transcribed leftward. The three lytic operons are indicated by arrows below each genome. The ends of each phages circularly permuted prophage is marked by a black vertical line. Sequence homology is indicated by the light grey areas between genomes.

2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277300

Prophage genomics 283 Recognizing prophages in bacterial genome nucleotide sequence Some, but not all, phage genome sequences per se have unique properties. For example, some prophages have different G+C contents, oligonucleotide frequencies or codon usage from their hosts genome, but this type of analysis has not progressed to the point that it can unequivocally identify prophage sequences (Blaisdell et al., 1996). We must therefore identify prophages in bacterial genome sequences by the similarity of their genes to known phage genes. In spite of the fact that the dsDNA tailed phage genomes encompass an enormous amount of sequence diversity, there are genes that appear to be more highly conserved than others (below). These have and will continue to serve as cornerstones for the identication of prophages in bacterial genomes (the range of diversity makes it imperative that sequence searches be done at the encoded protein level, and not at the DNA level). It would be useful if the phage gene families used to identify new prophages in DNA sequence did not have non-phage-encoded members that perform non-phage functions, so that the mere presence of such cornerstone genes can prove that a region of a bacterial genome is phage derived. Genes in prophages that do not encode virion component Should phage genes such as those involved in integration, lysis, regulation of gene expression or DNA replication be considered prophage cornerstone genes? Integrases are usually sufciently conserved to be recognizable, but plasmid prophages do not integrate, and non-phage elements such as plasmids, pathogenicity islands and integrons can carry integrase genes for their own purposes. Thus, although most temperate phages carry an integrase gene, its presence is neither necessary nor sufcient to prove the existence of a prophage. Phage lysis enzymes are often true homologues of chicken egg white lysozyme but may be of other types, such as phage l endolysin or phage amidases, or may have similarity to other polysaccharide-degrading enzymes such as chitinases (Mediavilla et al., 2000). These proteins can be quite similar, even among distantly related phages, but some bacteria encode autolysins that are homologues of phage lysis enzymes. Autolysin genes often appear not to be in a prophage context (e.g. Whatmore and Dowson, 1999; Smith et al., 2000), and such enzymes might be used in normal bacterial cell wall remodelling. It is unknown whether these are ancient prophage relics that have now become useful parts of the bacterial genomes. Every host and many temperate phages encode their own DNAbinding proteins, nucleases, helicases and/or DNA polymerases that function in DNA metabolism and regulatory
2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277300

proteins that control gene expression. The existence of non-prophage bacterial homologues to nearly all these genes shows that they also do not uniquely mark prophages (e.g. Lewis et al., 1998). No host homologue of the transcriptional antiterminators of the l gene Q family is known, so these might mark some prophages. Families of homologous phage genes involved in the above processes may or may not form discrete phylogenetic clusters that are separable from their bacterial homologues; however, a very close relationship to a bona de phage gene is likely to signify that a gene in question is part of a prophage. We will consider two examples, the phage-borne replicon-partitioning proteins and the singlestrand DNA-binding proteins (SSBs). The sopA family of plasmid-partitioning genes on the prophage plasmids of E. coli phages N15 and P1 are not particularly close relatives; the N15 SopA protein is 6075% identical to SopAs encoded by several non-prophage plasmids of enteric bacteria but is only 25% identical to its phage P1 homologue. On the other hand, the S. exneri lambdoid phage Sf6 SSB protein (S. Casjens, unpublished) is a very close relative (93% identity) of the E. coli phage 1639 SSB (GenBank accession no. AJ304858), but is only moderately closely related to SSBs of E. coli phage P1 (60%) and the non-phage SSBs of enterobacteria (5862%); it is only distantly related to SSBs of Gram-positive bacteria (2230%) and their phages (A118, 29%; FPVL, 32%). Thus, when members of the same gene family are used in both phage and non-phage contexts, the phage and bacterial genes often do not fall into well-separated lineages. On account of these issues, and variation in DNA metabolism, gene regulation and lysis mechanisms, etc. among phages, the presence of genes for these processes should be considered as supportive but not sufcient evidence for absolute proof of the existence of a prophage. Virion protein genes as prophage indicator cornerstones On the other hand, one might expect the genes that encode proteins involved in building the virion to be unique to phages, as bacterial cells are not known to make similar structures for their own purposes (again here I include GTAs and tail-like bacteriocins as prophages), and this is indeed the case; phage morphogenetic genes usually do not have homologues that are known to perform unrelated functions in other contexts. Therefore, the presence of genes that are closely related to known phage morphogenetic genes in a bacterial genome is, at our current state of knowledge, a virtually unassailable indication of a prophage. The icosahedral heads of the different tailed phages are extremely similar in physical appearance, although they do have different sizes and some are elongated. Similarly,

284 S. Casjens tails are only known in three general morphotypes short (e.g. phages P22 and T7), long, contractile (P2 and Mu) and long, non-contractile (l), although details of tail structure sometimes allow recognition of subtypes within these general tail types (for reviews of phage virion structure and assembly, see Casjens and Hendrix, 1988; Casjens, 1997). However, the proteins that build the various structurally similar virions are at rst glance startlingly diverse. For example, scaffolding protein (required catalytically for head shell assembly), proteins at the headtail junction and proteins at the tail tip/baseplate are very often not recognizably similar among different phages. Even central virion assembly players such as the coat proteins (building block of the icosahedral head shell) are often not recognizably similar. For example, the coat proteins of the very well-studied enterobacterial phages l, P2, P22, HK97, Mu and T7 are not recognizably homologous even though their heads are virtually indistinguishable in appearance in the electron microscope. It is not known whether such diversity indicates that these are all truly unrelated proteins or whether these proteins are ancient homologues that have diverged to the point of having no recognizable amino acid sequence similarity. The recent observation that HK97 and P22 coat proteins have similar folds supports the latter idea for these two coat proteins (Jian et al., 2003). Nonetheless, some phage virion assembly proteins are more highly conserved than others, and homology of these genes can often be recognized between phage types. These are as follows: (i) the larger of the two subunits of terminase, the enzyme that cleaves virion-length molecules from concatemeric replicating DNA and is probably part of the motor that drives DNA into the preformed protein capsid; (ii) portal protein, which forms the hole through which DNA is packaged into the capsid and is also part of the packaging motor; (iii) head maturation protease the assembly of some but not all phage heads is accompanied by assembly-controlled proteolytic cleavage of virion proteins; (iv) coat protein (above); (v) the proteins that build the tail shaft; (vi) tail tapemeasure protein, which determines the length of the tail shaft in the long-tailed phages; and (vii) tail bres tail tip proteins that make the initial contact between the virion and bacterial surface. Although the above proteins appear to be more highly conserved than other virion assembly proteins, in no case have all known members of one of these functional protein types been shown to form a single protein sequence family. It is possible that some or all of these may coalesce into single groups as more phage genome sequences are determined. How condent can we be that weak or tenuous matches to virion assembly genes identify a prophage? The tail bre proteins and tapemeasure proteins adopt extended, brous conformations, and they often contain imperfect amino acid sequence repeats that reect these structures. These repeats are sometimes found to match other unrelated extended proteins such as myosin, collagen, etc., as well as long coiled-coil proteins. For example, some phage tail bres contain substantial numbers of the collagen Gly-X-Y repeat (Smith et al., 1998). In addition, the sequences of coat proteins, tail shaft proteins and the head maturation proteases are somewhat more variable than the other proteins in this conserved protein list. Protease motifs can often be recognized in the latter, but such motifs are not phage specic. For all three of these protein types, similarity is sometimes found between distantly related phages, yet it is not uncommon to nd no substantive similarity between otherwise rather close relatives. Probably the most universally conserved and therefore best cornerstone proteins for prophage identication are the large terminase subunit and portal protein. If PSI-BLAST (Altschul et al., 1997) is used to build up related families of terminase and portal homologues from the current sequence database, a small number of currently unconnected families accumulate in both cases, and no convincing matches to these proteins are found that have a known non-phage function. Yet there are a few orphan homologues of terminase and portal genes present in bacterial genomes that have no other unequivocal phage genes nearby. For example, the Sinorhizobium meliloti 1021 genome contains an isolated, excellent homologue (gene SMc04187) of the phage P22 large terminase subunit (Galibert et al., 2001), and an orphan portal homologue (gene Spy0555) is present in the Streptococcus pyogenes M1SF370 genome (Ferretti et al., 2001). The functions of these particular genes have not been studied. Are these all that remains of once functional prophages, or might they have other, as yet unknown, non-phage-related roles in these cases? At present, we do not know the answer to this question, but current information suggests that such a lone homologue may well be a relict prophage. Subjective prophage criteria Given the immense variation among phages and our incomplete knowledge of that variation, recognition of prophages can be a rather subjective and delicate art, especially as satellite prophages and partly deleted defective prophages may contain no morphogenetic cornerstone genes. However, there are less objective criteria that can contribute substantially to our condence in prophage identication. In spite of their diversity, the temperate phages appear to have settled on a limited number of transcriptional arrangements, and they tend to have operons that are longer than the average E. coli operon, presumably to allow turn-off of the lytic genes by repression at a small number of operators. The latter can be
2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277300

Prophage genomics 285 condence building for prophage identication in bacteria such as E. coli, which have more or less randomly oriented genes, but is less useful in genomes with genes that are largely oriented in the direction of DNA replication such as Clostridium (Shimizu et al., 2002) and Thermoanaerobacter (Bao et al., 2002). More importantly, phage genomes show striking gene clustering according to general function and ordering according to detailed function within some of the clusters, and genes that encode DNA-interacting proteins usually lie near the DNA target of those proteins. For example, prophage integrase genes are essentially always adjacent to or very near the attachment (integration) site on the phage chromosome, and so they typically mark one end of integrated prophages. Of particular interest here is the observation that, within the gene cluster that encodes the virion assembly proteins, there exists a striking conservation of gene order (Casjens and Hendrix, 1988; Casjens et al., 1992; Hendrix and Duda, 1998). Recombination, replication and control functions are not found in this cluster, although a small number of non-assembly genes appear to have been relatively recently inserted into this operon in some temperate phages (Hendrix et al., 2000). In nearly every tailed phage and prophage with a gene order that is known, the order is terminase portal protease scaffold major head shell (coat) protein head/tail-joining proteins tail shaft protein tapemeasure protein tail tip/baseplate proteins tail bre (listed in the order of transcription). The large lytic phages such as those typied by T4 often have some rearrangements relative to this order, but the order is especially well conserved in the temperate phages. This is shown for the most highly conserved genes in some of the best-characterized phages in Fig.3. Fifteen to 25 proteins are typically used to build a temperate tailed phages virion, so the more highly conserved proteins are typically embedded in this order in an apparent operon of this size. The lysis genes usually lie in the same orientation, adjacent to and at either end of the virion protein cluster. This is biology, so there are of course exceptions to any rules we might attempt to derive. Some temperate phages such as P22 have short tails and so have no tapemeasure or tail shaft proteins, and the well-studied E. coli phage P2 and its close relatives have inverted terminase and portal genes relative to other phages, and their lysis genes lie between tail genes. But, overall, the above conserved morphogenetic gene order has relatively few exceptions and, when weak matches are present in this order, credence can be lent to otherwise uncertain similarities. An instructive case in point is the family of 3032kbp circular cp32 plasmids found in the spirochaete B. burgdorferi. Each of these plasmids carries a similar, very poorly expressed 22-gene-long putative operon, which at the time of sequencing contained only novel genes (Fraser et al.,
2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277300

1997; Casjens et al., 2000; Ojaimi et al., 2003). As the phage sequence database grew, a moderately weak match (protein BLAST e-value = 3 10-8) was found between the second gene from the beginning of these Borrelia operons and a Streptococcus phage fO1205 gene (Stanley et al., 1997). This fO1205 gene, which is located near the promoter-proximal end of the putative morphogenetic operon (the expected position for a terminase gene), is a moderately weak match (e = 5.5 10-7) to the well-characterized terminase of B. subtilis phage SPP1. [The transitive nature of such sequence families (A matches B, B matches C, but A does not readily match C) is often a feature of relationships between distantly related phage virion proteins, and transitive matches should be accepted in such searches (see Gerstein, 1998).] Later, when the X. fastidiosa genome was sequenced (Simpson et al., 2000), the protein encoded by the adjacent, transcriptionally downstream Borrelia cp32 gene was found to match very weakly (e = 0.13) a protein encoded at the portal position (immediately downstream of the putative large terminase gene) in X. fastidiosas convincing prophages XfP3 and XfP4. After two additional rounds of PSI-BLAST alignment, a family of proteins accumulates that includes the putative Borrelia portal proteins (now e = 3 10-77) and proteins encoded at the portal position by very unambiguous prophages in S. enterica, Haemophilus inuenzae and L. innocua, but no connection to experimentally proven portal proteins is made. In addition, a novel gene near the 3 end of this Borrelia gene cluster was found to be able functionally to replace a phage l lysis (holin) gene (Damman et al., 2000). Any of these observations alone does not constitute a very convincing argument that these Borrelia plasmids are or harbour prophages, but the fact that each of these three matches is at the expected location within a phage late operon (see Fig.3) makes the argument considerably stronger. Finally, Eggers and Samuels (1999) found that cp32 plasmid DNA is present in tailed phage-like particles released from Borrelia, considerably strengthening the argument that these plasmids are indeed prophages (even though 90% of the genes in these putative virion assembly operons have no recognized homologues, and none has been studied in more detail). Although it is impossible to quantify the increase in condence one obtains when such weak matches occur in the relative positions expected for a phage genome, anecdotal observations like this suggest that increased condence is nonetheless at least partly justied and can certainly provide impetus for further directed experimental studies. Highly deleted defective prophages The evolutionary history of strain-specic elements that have no remaining virion assembly genes can be difcult

286 S. Casjens

Fig. 3. Conserved genes and gene order in temperate phage morphogenetic operons. The most highly conserved genes in the morphogenetic (late) operons of temperate phages are shown as coloured rectangles; rectangle colours indicate similar functions as labelled. Identical colours do not necessarily indicate sequence similarity; phages are sufciently diverse that not all proteins of similar function are recognizably homologous (see text). Black circles indicate the location of packaging initiation sites where this is known. A gap between rectangles indicates that there is a gene(s) between them that is not shown in the gure. The black arrow above indicates the direction of transcription for all the genes in the gure except two phage P2 genes, which are indicated to be transcribed in the opposite direction. The functions of most of the indicated E. coli and S. enterica phage genes have been determined directly, whereas the function of most of the genes of the other phages shown in the gure have been deduced by sequence homology.

to deduce, and it may never be possible to know unambiguously if they are in fact really prophage relics. Even in the E. coli K-12 genome, there are elements with origins that remain uncertain. For example, the 22-kbp-long CP457 element is inserted into the tmRNA gene, a site at which other more obvious prophages often lie in other bacteria (Table1) (Kirby et al., 1994; Retallack et al., 1994). It contains an integrase, a functional homologue of the satellite phage P4 orf88 regulatory gene, no obviously non-phage genes and no recognizable homologues to virion protein genes. Similarly, the 34kbp CP4-6 and 13kbp CP4-44 elements in K-12 are possible prophages (Blattner et al., 1997; Rudd, 1999). CP4-6 carries an integrase gene at one end, several transposon parts, the

arginine metabolism argF gene and a glycosyl hydrolase (the last two have been argued to have arrived in E. coli by relatively recent horizontal transfer; Van Vliet et al., 1988; Garcia-Vallve et al., 1999). Genes in these three regions have a similar codon usage that is different from E. coli (Perna et al., 2002), and these elements are not present in other E. coli strains. All three elements contain genes of unknown function that are homologous to one another and are similarly arranged. These CP4s have been called prophages without qualication in the literature, but their only overt phage homologies are integrase and control genes (Blattner et al., 1997; Garcia-Vallve et al., 1999; Rudd, 1999); genuine proof of phage ancestry awaits the discovery of a true phage with a genome
2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277300

Prophage genomics 287 structure that is similar to the CP4s. In the genomes of less well-studied bacteria, it is even more difcult to recognize partly deleted or satellite prophages that contain none of the prophage cornerstone genes. Prophage evolution and genetic exchange between prophages Prophages and the bacteria they inhabit have a somewhat precarious mutual existence. From the prophage perspective, many of its genes are not in use and so are not under selection for function. Therefore, mutations, including deleterious ones, can accumulate in these genes resulting in a defective prophage. The host bacterium is under threat of death by prophage induction, and it seems that, in the long term, it would be advantageous from the bacteriums perspective if the prophages were to suffer debilitating mutations, especially if those mutations blocked the ability of the prophage to express its potentially lethal genes (Lawrence et al., 2001). It is therefore not surprising that a large fraction of the prophages that have been identied in bacterial genome sequences appear to be defective (only nine of the more than 200 prophages in TableS1 have been shown experimentally to be fully functional phages). To begin to understand the evolutionary processes that work on prophage DNA, it is instructive to examine specic cases. Two cases will be considered here the Rac prophages of E. coli (Table1) and the Pnm prophages of Neisseria meningitidis. These are both aProteobacteria, and they may not be representative of all other bacterial phyla. For example, the sequenced Grampositive Lactococcus, Lactobacillus and Streptococcus genomes contain multiple prophages, but very highly decayed prophages have not been identied there (nonetheless, possible defective prophages such as SF370.4 do exist in Streptococcus pyogenes; Canchaya et al., 2002). It is not yet clear whether this is a sampling difference or if some species might carry only relatively newly arrived prophages and/or have ways of avoiding the accumulation of defective prophages (see also above). The Rac prophages Figure4 diagrammatically compares the prophage entities that lie at the Rac attachment site in the three sequenced E. coli chromosomes. Rac was the rst defective prophage to be discovered in E. coli K-12 (Low, 1973; Kaiser and Murray, 1979). In this strain, it was shown that, although no Rac virions were ever produced upon induction, (i) parts of Rac can be picked up by the phage l chromosome through homologous recombination (Zissler et al., 1971; Kaiser and Murray, 1979); (ii) the Rac prophage can be excised upon induction (Evans et al., 1979; Brikun et al., 1994); (iii) Rac is lethal to the host if expression of its genes is induced, and this lethality results from an inhibitor of host cell division that is homologous to the

Fig. 4. Three l-like E. coli Rac prophages. Prophages Rac (E. coli strain K-12), Sp10 (strain Sakai) and CP-933R (strain EDL933) are located at identical positions in the three genomes. Genes and predicted genes are indicated by rectangles; black, genes outside the prophage; white, prophage genes that are transcribed to the left; grey, prophage genes that are transcribed to the right; cross-hatched, genes that currently have no homologues in other phages or prophages and so could in theory have been inserted since the original phage genome integrated at this site (see text). Below is a scale in kbp and arrows that indicate the major operons of the prophages as predicted by homology with other better characterized lambdoid phages. Cross-hatching between the three prophages marks regions of nucleotide sequence similarity; in some sections, the percentage identity is given. The labels for the various genes indicate known function or putative function as deduced from homology relationships. Open circles indicate apparent pseudogenes that have obviously been inactivated by mutation; closed circles indicate genes that have been shown to be functional in Rac; and closed triangles indicate deletions relative to known lambdoid infectious phages. 2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277300

288 S. Casjens l Kil protein (Feinstein and Low, 1982; Conter et al., 1996); (iv) mutations (called sbcA) in Rac can restore homologous recombination in recBrecC mutants by expressing the prophages RecE function (Fouts et al., 1983; Willis et al., 1985); and (v) these sbcA mutants also express a function, Lar, that enhances EcoKI-mediated DNA methylation (similar to l Ral function) (King and Murray, 1995). The sbcA mutations are thought to turn on the non-lethal part of the Rac prophage early left operon rather than altering RecE and Lar functions directly (Mahajan et al., 1990), indicating that the lar and recE genes are functional but unexpressed in the Rac prophage. The K-12 genome sequence conrmed that Rac is indeed a lambdoid prophage that has lost about 60% of its original DNA (Blattner et al., 1997). Its early left operon contains the recE gene at the position in which other lambdoid phages carry their genes for homologous recombination. More recently, the fully functional Salmonella phages, Gifsy-1 and Gifsy-2, have been found to carry recE homologues in similar positions in their early left operons (McClelland et al., 2001), suggesting that the recE gene is most likely an authentic part of the original Rac phage. In addition, it is likely that the Rac repressor and integrase still function, as conjugational transfer induces gene expression from the prophage and causes excision (Evans et al., 1979; Feinstein and Low, 1982). Racs right arm has not fared as well (Fig.4); deletions have removed at least (i) the region between the DNA replication and lysis genes; (ii) the head and upstream tail genes (equivalent to l genes nu1 to G-T); and (iii) the tail tip genes (equivalent to l M to J). In addition, two transposons now reside in its right arm, one of which disrupts a homologue of the l lom lysogenic conversion gene. There are four obvious pseudogenes in the right arm, the interrupted lom gene and truncated b1361, tail tapemeasure (H) and lysis (Rz) genes. Of course, it is not possible to tell whether any open reading frame that has not been studied experimentally but is approximately full length relative to other homologues is in fact functional, so this is the minimum number of defective genes. Curiously, immediately to the right of Racs Rz homologue, the trkG gene for potassium uptake (Dosch et al., 1991; Schlosser et al., 1991) lies in a region that is very variable among the lambdoid phages and is not known to carry essential genes (for the phage). Was the trkG gene part of the original prophage or was it moved into this location subsequent to the phages original integration? To date, no functional phage is known that carries a trkG homologue. The huge diversity of phages makes it difcult to even guess whether such a putative prophage gene, which has not yet been found on other phages, was or was not part of the phage that integrated to form the original prophage. The trkG gene in Rac and the argF homologue in CP4-6 (above) are such cases in point, but both are redundant to other genes with the same function in K-12 and so may be recent arrivals. Our (admittedly not exhaustive) analysis of the prophages in TableS1 suggests that there are few compelling examples of putative non-phage genes that have moved into a prophage after its integration. It seems inevitable that some nonphage genes would end up inside defective prophages during rearrangements that might accompany the decay process, and the frequency of such events could vary among hosts but, nevertheless, such events appear to be rare in prophages that have not yet decayed into unrecognizability. More recently, the genomes of two closely related O157-type E. coli strains, EDL933 and Sakai, have been sequenced (Hayashi et al., 2001a; Perna et al., 2001) that carry a prophage located precisely at the Rac attachment site (Table1); in EDL933, it was named CP-933R and, in Sakai, it was named Sp10 (Fig.4). In a fourth E. coli strain, CFT073 (Welch et al., 2002), all that remains at this attachment site is 320bp (including a C-terminal fragment of an integrase gene) that are 98.4% identical to the left end of the above three prophages. It thus appears that a related prophage once occupied the Rac attachment site in CTF073 but, as it has been nearly completely deleted, it will not be discussed further here. CP-933R and Sp10 are similar to one another, but are not identical. Both have lengths similar to known lambdoid phages (which range from about 39kbp to 62kbp). They are typically mosaic lambdoid genomes, with many homologues of known lambdoid phage genes arranged with the correct clustering, order and orientation. Neither contains any genes that are clearly related to non-phage genes, and both contain a few obvious pseudogenes. Among the essential virion assembly genes, the Sp10 putative coat protein gene contains a frameshift relative to several other prophages in these strains. CP-933R has head and tail genes that are similar to phage l and, using l gene nomenclature, its essential genes E, V, H, I and J are truncated or contain frameshifting mutations, and genes FI, FII, Z and U are missing. Thus, neither prophage is expected to be able produce viable virions upon induction, and they appear to have had different mutational histories since their arrival at this location. As in Rac, the left arm of these two prophages appears, at this level of analysis, to be largely intact. The leftmost 21kbp are >99.9% identical in CP-933R and Sp10, and their leftmost 8kbp are 99.0% identical to the K-12 Rac prophage. Are Rac, CP-933R and Sp10 the result of integration by different phages at the same bacterial attachment site, or are they descendants of the same progenitor prophage? Independently isolated phages with identical integration specicities are known so, at rst glance, the former scenario seems plausible, as the central regions of Sp10 and CP-993R are not closely related. The head genes of
2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277300

Prophage genomics 289 Sp10 are very similar to those of prophages Sp6, Sp9 and Sp12 (which are not close relatives of any experimentally studied phage). Those of CP-933R are very similar to head genes of phages l and 21 and also closely related to genes in the CP-933Od portion of the complex CP933O prophage (see TableS1). (Sp prophages are in E. coli strain Sakai and CP prophages are in strain EDL933.) This could be interpreted as evidence for independent origins for Sp10 and CP-933R; however, the lambdoid phages are so diverse that very rarely, if ever, have any two independently isolated infectious lambdoid phages been found that are so nearly identical over such an extended region as Sp10 and CP-933R are at their left and right ends. HK97, 434 and l integrate at the same site, as do Sf6 and HK620, and these phages do not have this similar prophage ends with different central regions relationship; they are typically mosaically related with nearly identical integrase genes (Juhala et al., 2000; S. Casjens, A. J. Clark, W. Inwood and R. Moreno, unpublished). Thus, independent integration at the Rac attachment site by two different progenitor phages with such similar genomes seems an unlikely event. The deletion in CP-933R that has end-points in its l E and V gene homologues (Fig.4) contributes to a stronger argument that genes have, in fact, been exchanged among prophages within these bacteria. This deletion is also present with exactly the same end-points (between genes Z2136 and Z2137) in the EDL933 prophage CP933Od. It is very unlikely that identical deletions happened independently in CP-933R and CP-933Od, so one of these head regions was apparently replaced by a copy of the other after the deletion occurred. It is also unlikely that this deletion (which removes six essential genes) would be present in an infecting phage virions DNA. We cannot be absolutely sure, but it therefore seems most reasonable to propose that CP-933R and Sp10 are in fact descendants of the same original prophage, and that either (i) in EDL933, the head genes of the original prophage at this site were replaced by a copy of the deletioncarrying head genes from CP-933Od; or (ii) in Sakai, the phage l-like head genes of the original prophage were replaced by a copy of those from Sp6, Sp9 or Sp12. Although such recombination acts could be seen as homogenizing, the recipient carries a new overall combination of alleles not present in the parent prophages. As there is a very low probability of two phage DNAs of independent origin having such extended regions of nearly identical nucleotide sequence integrating into the same chromosome, such identity, when present, could conceivably constitute tentative evidence for such duplicative exchanges. For example, the 14317bp of identity between prophages XfP3 and XfP4 in X. fastidiosa 9a5c and the over 4000bp of identity between the Gifsy-1 and Gifsy-2 prophages DNA replicationNin regions in S. enterica LT2 suggest that such exchanges may also have occurred in these cases. Even more surprising is the observation that the same type of relationship as is seen between CP-933R and Sp10 (extremely similar outside regions with very different central regions) is found to be common when other cognate prophage pairs in EDL933 and Sakai are compared. Prophage pairs Sp14/CP-933U, Sp4/CP-933M and Sp15/ CP-933V all have this type of relationship (Fig.5). For example, lambdoid prophages Sp14 and CP-933U are both integrated into the same site within the serU tRNA gene. These two prophages have about 12kbp of 99.2% identity at their tail gene ends and 16kbp of 99.9% identity at their integrase ends. Between these long-terminal similarities, they have >10kbp of sequence where little similarity can be found. This central part of Sp14 contains an 8kbp section of the head genes that is 99.8% identical to the head gene region of Sp4. If it is unlikely that two phage

Fig. 5. Central region shufing among E. coli O157 prophages. Top. Five E. coli O157 Sakai prophages are indicated by coloured rectangles. Bottom. The E. coli O157 EDL933 prophages integrated at cognate sites are similarly indicated. The host gene at the site of integration is shown between cognate prophages. All ve cognate pairs have outer regions that are extremely similar (in most cases >99% identical). The colours of the central sections of the prophages indicate their sequence relationships in the head gene regions, and the asterisk (*) indicates the presence of the deletion that ends in the coat and tail shaft protein genes (see text). Similar colours indicate nucleotide sequences that are >93% identical. The central (head) regions indicated by different colours are not close sequence relatives; the closest is about two-thirds of the Sp15 head region, which is about 75% identical to that of CP-933U, and the others are much more distantly related. Rectangle sizes are not proportional to DNA length, and the situation is actually more complex than the diagram indicates in that some of the non-head gene regions of the central non-homologous parts of cognate prophages have different relationships from the indicated head genes. Prophage CP933X contains the remaining unsequenced section of the strain EDL933 genome. 2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277300

290 S. Casjens genomes with this type of sequence relationship happened to have integrated independently at the Rac attachment site in these two strains (above), then it is all the more unlikely that these four prophage pairs would also have such a relationship. Furthermore, the identical deletion of DNA between the coat and tail shaft genes that is present in CP-933R and CP-933Od in EDL933 is present in Sp4 and Sp14 in Sakai. As none of these four is a cognate prophage (i.e. integrated at the same site in these two strains), this deletion appears to have occurred in a common ancestor of EDL933 and Sakai and then moved between prophages several times after their divergence. The relative abundance of this type of duplicative rearrangement between prophages in these two isolates suggests that interprophage homologous recombination may occur much more frequently than previously imagined, and that such events could well be an important route by which new temperate phage allele combinations are formed. Pnm2 and Pnm3 prophages Neisseria meningitidis cognate prophages Pnm2 in strain Z2491 and NeisMu1 in strain MC58 (Parkhill et al., 2000; Tettelin et al., 2000) are mosaic relatives of the Mu-like group of phages [E. coli phage Mu and three largely intact prophages, FluMu, Sp18 and Pnm1, present in H. inuenzae Rd, E. coli Sakai and N. meningitidis Z2491, respectively, have been completely sequenced (Fleischmann et al., 1995; Parkhill et al., 2000; Hayashi et al., 2001a; Morgan et al., 2002); NeisMu1 is a provisional name used here as the original annotators did not name this element]. This type of phage integrates essentially randomly by a transposition mechanism (reviewed by Harshey, 1988). Thus, as the number of potential integration targets in any genome is huge, natural prophages of this type that are found at identical positions in the genomes of two independently isolated bacteria are extremely likely to be descendants of the same past phage integration event. Pnm2 and NeisMu1 occupy precisely the same integration site within an ABC-type transporter gene (Fig.6). In both prophages, the reading frames of the two transporter gene halves seem to be essentially intact (97.4% identical in nucleotide sequence); however, the N-terminal fragment of the strain MC58 gene contains a frameshift mutation. These prophages are both certainly defective, and their deletion histories are different. For example, Pnm2 appears to have suffered an 9kbp deletion in the tail region, and NeisMu1 has a major deletion in its middle gene region and a shorter deletion of the putative coat protein gene. As in the case of Sp10 and CP-933R above, differential DNA replacements appear to have occurred after integration. An example of such a replacement is near the left end of the two prophages, where Pnm2 and NeisMu1 have unrelated genes, the best matches of which are other transcriptional repressors, at the position where other Mu-like phages encode repressors. In general, the

Fig. 6. Defective Mu-like Neisseria meningitidis prophages. Defective prophages Pnm2 and Pnm3 and NeisMu1 and NeisMu2 in N. meningitidis prophages in strains Z2491 and MC58, respectively, are shown as in Fig.4. Below each prophage, selected genes are marked by the gene number of the homologous phage Mu gene and/or a predicted function (Morgan et al., 2002). Grey arrows connect genes that have similar predicted function but not sequence similarity. Black bars marked A, B or C denote regions where more detailed comparisons were made (see text). 2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277300

Prophage genomics 291 homologous genes in Pnm2 and NeisMu1 are about as different from each other as are chromosomal backbone genes in N. meningitidis MC58 and Z2491. Sections A and B (Fig.6) of Pnm2 and NeisMu1 are 99.3% and 97.4% identical in nucleotide sequence, respectively, and the three intact chromosomal genes adjacent to the left end of NeisMu1 in MC58 are 97% identical to the same genes in strain Z2491. This is consistent with the notion that NeisMu1 and Pnm2 have been diverging for the about same length of time as the chromosomes in which they reside. Sections A and B are in the head and tail gene clusters, respectively, neither of which should be under selection for function in the prophage. N. meningitidis is a naturally competent bacterium, in which DNA uptake is mediated through a DNA uptake sequence (Goodman and Scocca, 1988). Both Pnm2 and NeisMu1 do contain this sequence greatly over-represented, so it is impossible to know whether one of the putative repressor genes entered the prophage from an infecting phage, another prophage (now gone) or from transforming phage or prophage DNA. Also present at identical locations in N. meningitidis Z2491 and MC58 is another region that is probably a more highly decayed Mu-like prophage called Pnm3 and NeisMu2 in the two strains respectively (Fig.6). These are much more highly deleted than Pnm2 and NeisMu1 (they retain only 1520% of their putative original DNA), and so are likely to have been decaying for a longer period of time yet they are 97.5% identical to each other in region C (Fig.6). This is consistent with the divergence of Z2491 and MC58 after this element started to decay. These two prophages highlight the use of gene order and clustering in recognizing highly deleted prophages. The only match to an authentic phage gene in Pnm3 and NeisMu2 is the presence of a homologue of Mu gene 16 (also called gemA). The presence of a single phage-like gene, especially a regulatory gene (Ghelardini et al., 1994) such as this one, cannot be considered unequivocal evidence of a prophage. However, there are a number of genes in Pnm3/NeisMu2 that are similar to otherwise novel open reading frames present in Pnm2/NeisMu1 (and Pnm1, another largely intact Mu-like prophage in Z2491; Klee et al., 2000). As homologues to these genes are not present outside these prophages, and as they are present in the same order in each of the putative prophages, it can be rather rmly concluded that Pnm3 and NeisMu2 are real but highly deleted prophages. The complex decay of prophages It might have been expected that derelict prophage DNAs would be in a straightforward mutational free fall in which inactivating mutations occur at random until the prophage is completely eliminated. Lysogenic conversion (or possi 2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277300

bly other) integrated prophage genes may be advantageous to the host and be kept functional by selection as the rest of the prophage decays into oblivion, and so they may eventually be appropriated as integral parts of the host chromosome. Examples of possible intermediates in this assimilation process might be some pathogenicity islands and the Shigella dysenteriae Shiga toxin that is encoded by a small prophage remnant (McDonough and Butterton, 1999). Likewise, plasmid prophages might evolve into plasmid replicons. However, the situation is certainly much more complex than this. Understanding prophage evolution and decay is signicantly complicated by possible excision and subsequent replacement by another, possibly related phage genome, as well as by homologous recombination with infecting phage genomes and other prophages in the same cell. Infecting phages can clearly acquire genetic information from prophages in cells they infect (e.g. Kaiser, 1980; Espion et al., 1983; Bouchard and Moineau, 2000). However, as prophages express immunity and superinfection exclusion systems that can allow cell survival after a superinfecting phage has injected its DNA (Susskind et al., 1974; Susskind and Botstein, 1980), transfer of information from infecting phage DNA to related prophages might occur as well. Studies of the DNA sequences present in different E. coli isolates at the phage 21, l and Atlas attachment sites have found evidence for different entities being present at each site in some different strains (Milkman and Bridges, 1990; Wang et al., 1997; Kuhn and Campbell, 2001), so complete excision and replacement is certainly plausible. Although such ndings could be interpreted to support the idea that, in a given bacterial lineage, prophages come and go (perhaps frequently?; Campbell, 1996), the arguments presented above suggest that complete replacement may be less common than other types of genetic interactions. Some prophages may in fact spend rather long times in residence in bacterial chromosomes before being completely removed. At least large parts of Proteobacterial prophages such as Rac and Pnm2, which are still quite far from complete assimilation, appear to have been in residence at least long enough for bacterial genes to diverge about 1.5 and 3% respectively. (The genes of E. coli K-12, for example, are on average 98.5% and 98.3% identical to those of EDL933 and Sakai respectively; Perna et al., 2002.) If estimates of divergence rates in bacteria are correct (Ochman and Wilson, 1987; Reid et al., 2000), this suggests that these prophages may have been in place for as long as several million years. This does not mean that all prophages have such long residence times, and the suggestion of such antiquity for some is rather speculative. Comparison of additional genome sequences will help to decide upon its accuracy.

292 S. Casjens Analysis of such decaying prophages clearly shows that point mutations, transposon insertions and deletions all occur. Interestingly, it appears that, as prophages decay, prophage-debilitating deletions can accumulate more rapidly than gene-inactivating point mutations, as numerous genes, even in moderately highly deleted prophages such as Rac, remain functionally intact. The functionality of many normally unexpressed genes in defective prophages has been demonstrated in the laboratory through mutations that turn on their expression (Willis et al., 1985; Blasband et al., 1986; Bejar et al., 1988; Mahajan et al., 1990), recombination onto a related phage that depends upon that function (Kaiser, 1980; Espion et al., 1983; Bouchard and Moineau, 2000) or expression of functional proteins from a cloning vector (Morimyo et al., 1992; King and Murray, 1995; Jin et al., 1996; Mahdi et al., 1996). This lack of debilitating point mutations could be the result of random failure to be inactivated by mutation or be due to selection for function if the genes are in fact weakly expressed and have a function in the lysogen. The latter seems unlikely, except for lysogenic conversion genes, given current knowledge about gene expression from prophages. On the other hand, inactivated genes could be repaired to full functionality by recombination with other prophages or with infecting phages. It has long been known that homologous recombination between lambdoid prophages is possible in the laboratory (Meselson, 1967; Redeld and Campbell, 1987), but simple, single break-and-join recombination events between non-tandem prophages integrated in the same chromosome would result in inversion or deletion of the intervening DNA. Such events could be detrimental to the host; however, one such inversion event does appear to have occurred that involved prophages CP-933O and CP-933P in E. coli strain EDL933 (Perna et al., 2002). Non-reciprocal double break-and-join or long gene conversion events could replace parts of one prophage with sequences from another prophage. Either mechanism could create relationships such as those observed between the prophages in Fig.5 where multiple prophages within a bacterium contain sections of nearly identical sequence. Such duplicative replacement events among prophages should not distinguish between functional and non-functional genes, and so would be just as likely to replace a functional gene with a non-functional one as vice versa. On the other hand, replacement of part of a prophage by part of an infecting phage genome would be more likely to repair damaged prophage genes to functionality, as genes on an infectious phage genome have presumably been under recent selection for functionality. Nonetheless, at present, we cannot know whether (for example) the apparently functional left early operon of Rac was left intact by chance, was somehow selected to remain functional or is currently functional because of recent repair from another prophage (since lost) or an infecting phage. We can, however, conclude that, even if a prophage is defective, it is not necessary that all its genes are doomed to be lost forever. As many of their genes retain functionality and remain accessible to the phage population, and as phage virions may only be in 10-fold excess over bacterial cells in the environment (Bergh et al., 1989), prophage genes constitute a signicant portion of the phage gene pool in the earths biosphere.

Comments on the identication and annotation of prophages In order to understand fully the true nature of bacterial genomes, we must be able to recognize prophages in nucleotide sequence; however, the extreme variability of phage nucleotide sequences makes it quite possible that unrecognized prophages still lurk in bacterial genome sequences. The gold standard of prophage recognition is and should remain high similarity of sequence and gene organization to authentic temperate phages that infect the same bacterial species. In addition, (i) recognition of the conserved nature of some dsDNA tailed phage morphogenetic proteins such as portal and terminase; and (ii) the observation that these proteins do not have homologues with known non-phage functions has made the recognition of many prophages, even in distantly related bacterial genome sequences, quite unambiguous. Can our ability to recognize prophages and annotate their sequences be improved? Yes. Most importantly, the study of additional infectious tailed phages, especially those that infect the less well-studied phylogenetic branches of bacteria, will help to ll in the current gaps in sequence space and so make prophages more easily recognizable in those phyla. Hopefully, this will eventually lead to a situation in which at least the most highly conserved phage proteins will form one or a few (transitive) sets of related sequences that will include, for example, all the known terminases or portal proteins and will contain recognizable homologues of all subsequently sequenced members of those families. But such cornerstone genes may not be present in authentic but defective prophages or satellite prophages; how can we recognize these with higher accuracy and condence? Several simple things can be done now. (i) As relatively few non-phage genes appear to have moved into the known prophages after integration, it seems justied at this point for bacterial genome annotators to indicate that hypothetical (novel) or conserved hypothetical (have a homologue of unknown function in the database) genes within apparent prophages are putative prophage genes. To date, some bacterial genomes have been anno 2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277300

Prophage genomics 293 tated in this manner, whereas many have not. If this were universally done, it would be much easier to determine whether conserved hypothetical genes in new prophages are present elsewhere in other prophages. If they are, and especially if they are present in the same order, they probably represent the remains of another prophage. When this logic is applied to the X. fastidiosa 9a5c genome, for example, at least ve more highly deleted putative prophage remnants are found in addition to the four prophages identied in the original genome report (Simpson et al., 2000) (TableS1). (ii) Many phage genes have specic, very well-understood functions, so possible prophage genes with homology to these should be annotated with their specic presumed function, not just phage-related protein as is often done currently. (iii) Prophages very often repair the bacterial gene into which they integrate by carrying a similar replacement part on the phage genome that gets fused to the target gene upon integration (Campbell et al., 1992; Campbell, 1994). Even when this does not occur, the identity between the phage and bacterial attachment sites is usually 10bp long. Thus, there are typically exact direct repeats tens of basepairs long at the prophage boundaries [e.g. 10148bp in the various E. coli Sakai prophages (Hayashi et al., 2001b); such repeats can be even longer and need not be perfect throughout their length (Campbell et al., 1992)]. Genome annotators and analysers should attempt to locate and report such repeats, as nding these features identies the outside boundaries of the prophage with precision. (iv) A strong argument for integrated phage DNA (or any mobile DNA element) is its absence in some other strains. This may be subject to exception if there has been a recent population bottleneck in a species or if phages are so abundant that ancestors of every extant bacterium acquired a prophage at a given attachment site. As many genome sequencing operations have an interest in using their sequence information to examine genomic variation within species, this author recommends that, whenever possible, if a prophage is tentatively identied in a new bacterial genome sequence, the sequencers check for its absence in other strains. This can be done by DNA array analysis, but this approach has the disadvantage that it can be fooled by the not unlikely occurrence of similar prophages at different locations in other strains. A more informative approach is polymerase chain reaction amplication across the putative attachment site in other strains and sequencing the amplied product, if it is made, that is expected when no prophage is present. This would both help
2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277300

to conrm a sequence region as a prophage and precisely locate the prophage attachment site and prophage ends with condence, which in turn would make the assignment of prophage genes much more robust. (v) Finally, annotators should give names to putative prophage elements in bacterial genome sequences. This may seem a trivial point, but it has not been done in many of the published genome sequences, and the lack of names makes it difcult for others (who are reluctant to name them themselves) to deal with them in print. Prophages at cognate sites in different strains of the same species should not be given the same name, as they are probably not identical. If these were universally implemented, it would make global analysis of prophage sequences much easier, which in turn would make annotation much more accurate and our understanding much more sophisticated. Prophage sequences and bacteriophage diversity For those who are interested in understanding the range of diversity of phages on earth, the sequenced prophages represent a wealth of information that cannot be ignored, as more prophage sequences have been determined than have sequences of bona de infectious phages. For example, it is currently possible to use the prophage sequences to learn about the different types of non-homologous (convergent) gene modules that are used for a particular function by a group of temperate phages. A few such cases are as follows. (i) The E. coli RecE-type homologous recombination function was rst found in the defective Rac prophage and only subsequently found in other infectious lambdoid phages (above). This recombination system uses the recE and recT genes of prophage Rac but, on other phages, we can recognize only a recT homologue [e.g. B. subtilis phage SPP1 (Alonso et al., 1997) and L. monocytogenes phage A118 (Loessner et al., 2000)] or only a recE homologue (e.g. S. enterica phages Gifsy-1 and Gifsy-2; McClelland et al., 2001). This suggests that these phages may have another non-homologous protein that replaces the missing partner. (ii) The lambdoid phages P22 and l have convergent replication genes the l gene P protein recruits the host DnaB helicase to the replication initiation complex, whereas the cognate, non-homologous P22 gene 12 protein is a homologue of the host DnaB protein that does the helicase job itself (Wickner, 1984a,b). An E. coli dnaC homologue was rst seen in the Rac prophage at this location and, recently, the lambdoid phages Gifsy-1 and Gifsy-2 have been found to carry a clear dnaC homologue in their DNA replication region. E. coli DnaC protein is a helicase loader, so perhaps these DnaC homologues act in the same way as phage l P protein? In addition, the lambdoid S. enterica

294 S. Casjens LT2 prophage Fels-1 encodes a novel protein that contains a primase motif in its replication gene position; does Fels-1 use a new, unstudied type of lambdoid phage replication initiation? (iii) Most lambdoid phages carry a homologue of the l Rz lysis gene adjacent and downstream of their endolysin gene. The K-12 prophage QIN has a different, novel gene in this position, and phage N15 was subsequently found to have a homologue of this QIN protein in the same location (Ravin et al., 2000). Does this gene represent a functional alternative to Rz function? (iv) Several prophages in the E. coli genome sequences that are lambdoid in other respects have head and/or tail genes (as deduced from their position within the prophage) that are unrelated in sequence to any previously studied virion assembly genes, and lambdoid prophages found in the genomes of Wolbachia (Masui et al., 2000; 2001) and X. fastidiosa (Simpson et al., 2000) have tail genes that are homologous to genes that encode contractile tails in other phages (all previously characterized lambdoid phages had non-contractile or short tails). More recently, E. coli and S. exneri lambdoid phages fP27 and SfV were found to have contractile tail genes (Allison et al., 2002; Recktenwald and Schmidt, 2002). Clearly, the sequenced prophages are an excellent place to nd variations on temperate phage lifestyle themes. Finally, we can learn about the overall variety of types of temperate phages from the examination of prophage sequences. A dramatic example of this may be indicated by genes homologous to the RNA polymerase gene of virulent E. coli phage T7 in the X. axonopodis 903 genome (da Silva et al., 2002) and in the Pseudomonas putida KT2440 (Nelson et al., 2002). In both cases, homologues of phage head and tail genes lie nearby, supporting the notion that these putative RNA polymerase genes are parts of prophages PP03 in P. putida and XacP2 in X. axonopodis (TableS1, Supplementary material). If true, this would be a completely new type of temperate phage, as no temperate phage is currently known to encode its own RNA polymerase. Many such discoveries no doubt await the careful analysis of the numerous prophages present in bacterial genome sequences. Acknowledgements
The authors research is supported by NSF grant MCB990526 and NIH grant AI49003. I thank Roger Hendrix and Jeff Lawrence for reading this manuscript and for many productive discussions of phage biology and evolution, and Thad Stanton, Kenn Rudd, Guy Plunkett and Nicole Perna for access to unpublished information. Table S1. Prophages and phage-like objects in 82 published bacterial complete genomes.

References
Agron, P.G., Walker, R.L., Kinde, H., Sawyer, S.J., Hayes, D.C., Wollard, J., et al. (2001) Identication by subtractive hybridization of sequences specic for Salmonella enterica serovar Enteritidis. Appl Environ Microbiol 67: 4984 4991. Allison, G.E., Angeles, D., Tran-Dinh, N., and Verma, N.K. (2002) Complete genomic sequence of SfV, a serotypeconverting temperate bacteriophage of Shigella exneri. J Bacteriol 184: 19741987. Alonso, J.C., Luder, G., Stiege, A.C., Chai, S., Weise, F., and Trautner, T.A. (1997) The complete nucleotide sequence and functional organization of Bacillus subtilis bacteriophage SPP1. Gene 204: 201212. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 33893402. Anilionis, A., Ostapchuk, P., and Riley, M. (1980) Identication of a second cryptic lambdoid prophage locus in the E. coli K12 chromosome. Mol Gen Genet 180: 479481. Bail, O. (1925) Der kolistamm 88 von Gildmeister und Herzberg. Med Klin (Munich) 21: 12711273. Banks, D.J., Beres, S.B., and Musser, J.M. (2002) The fundamental contribution of phages to GAS evolution, genome diversication and strain emergence. Trends Microbiol 10: 515521. Bao, Q., Tian, Y., Li, W., Xu, Z., Xuan, Z., Hu, S., et al. (2002) A complete sequence of the T. tengcongensis genome. Genome Res 12: 689700. Barreiro, V., and Haggard-Ljungquist, E. (1992) Attachment sites for bacteriophage P2 on the Escherichia coli chromosome: DNA sequences, localization on the physical map, and detection of a P2-like remnant in E. coli K-12 derivatives. J Bacteriol 174: 40864093. Bejar, S., Bouche, F., and Bouche, J.P. (1988) Cell division inhibition gene dicB is regulated by a locus similar to lambdoid bacteriophage immunity loci. Mol Gen Genet 212: 11 19. Bentley, S., Chater, K., Cerdeno-Tarrage, A., Challis, G., Thomson, R., James, K., et al. (2002) Complete genome sequence of the model actinomycete Streptococcus coelicolor A3(2). Nature 417: 141147. Beres, S.B., Sylva, G.L., Barbian, K.D., Lei, B., Hoff, J.S., Mammarella, N.D., et al. (2002) Genome sequence of a serotype M3 strain of group A Streptococcus: phageencoded toxins, the high-virulence phenotype, and clone emergence. Proc Natl Acad Sci USA 99: 1007810083. Bergh, O., Borsheim, K., Bratbak, G., and Heldal, M. (1989) High abundance of viruses found in aquatic environments. Nature 340: 467468. Bertani, G. (1951) Studies on lysogenesis. I. The mode of phage liberation by lysogenic Escherichia coli. J Bacteriol 62: 293299. Bertani, E., and Six, E. (1988) The P2-like phages and their parasite P4. In The Bacteriophages, Vol. 2. Calendar, R. (ed.). New York: Plenum Press, pp. 73143.
2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277300

Supplementary material
The following material is available from http://www. blackwellpublishing.com/products/journals/suppmat/mmi/ mmi3580/mmi3580sm.htm.

Prophage genomics 295


Bishai, W., and Murphy, J. (1988) Bacteriophage gene products that cause human disease. In The Bacteriophages, Vol. 2. Calendar, R. (ed.). New York: Plenum Press, pp. 683724. Blaisdell, B.E., Campbell, A.M., and Karlin, S. (1996) Similarities and dissimilarities of phage genomes. Proc Natl Acad Sci USA 93: 58545859. Blasband, A.J., Marcotte, W.R., Jr, and Schnaitman, C.A. (1986) Structure of the lc and nmpC outer membrane porin protein genes of lambdoid bacteriophage. J Biol Chem 261: 1272312732. Blattner, F.R., Plunkett, G., III, Bloch, C.A., Perna, N.T., Burland, V., Riley, M., et al. (1997) The complete genome sequence of Escherichia coli K-12. Science 277: 1453 1474. Bolotin, A., Wincker, P., Mauger, S., Jaillon, O., Malarme, K., Weissenbach, J., et al. (2001) The complete genome sequence of the lactic acid bacterium Lactococcus lactis ssp. lactis IL1403. Genome Res 11: 731753. Bordet, J. (1925) Le problme de lautolyse microbienne transmissible ou du bactriophage. Ann Inst Pasteur 39: 711763. Botstein, D. (1980) A theory of modular evolution in bacteriophages. Ann NY Acad Sci 354: 484491. Bouchard, J.D., and Moineau, S. (2000) Homologous recombination between a lactococcal bacteriophage and the chromosome of its host strain. Virology 270: 6575. Boyd, E.F., and Brussow, H. (2002) Common themes among bacteriophage-encoded virulence factors and diversity among the bacteriophages involved. Trends Microbiol 10: 521529. Boyd, E.F., Davis, B.M., and Hochhut, B. (2001) Bacteriophagebacteriophage interactions in the evolution of pathogenic bacteria. Trends Microbiol 9: 137144. Brandt, K., Tilsala-Timisjarvi, A., and Alatossava, T. (2001) Phage-related DNA polymorphism in dairy and probiotic Lactobacillus. Micron 32: 5965. Brikun, I., Suziedelis, K., and Berg, D.E. (1994) DNA sequence divergence among derivatives of Escherichia coli K-12 detected by arbitrary primer PCR (random amplied polymorphic DNA) ngerprinting. J Bacteriol 176: 16731682. Brody, H., Greener, A., and Hill, C.W. (1985) Excision and reintegration of the Escherichia coli K-12 chromosomal element e14. J Bacteriol 161: 11121117. Bruggemann, H., Baumer, S., Fricke, W.F., Wiezer, A., Liesegang, H., Decker, I., et al. (2003) The genome sequence of Clostridium tetani, the causative agent of tetanus disease. Proc Natl Acad Sci USA 100: 13161321. Brussow, H., and Hendrix, R.W. (2002) Phage genomics: small is beautiful. Cell 108: 1316. Bunny, K., Liu, J., and Roth, J. (2002) Phenotypes of lexA mutations in Salmonella enterica: evidence for a lethal lexA null phenotype due to the Fels-2 prophage. J Bacteriol 184: 62356249. Campbell, A. (1962) The episomes. Adv Genet 11: 101 118. Campbell, A. (1994) Comparative molecular biology of lambdoid phages. Annu Rev Microbiol 48: 193222. Campbell, A. (1996) Cryptic prophages. In Escherichia coli and Salmonella: Cellular and Molecular Biology.
2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277300

Neidhardt, F. (ed.). Washington, DC: American Society for Microbiology Press, pp. 20412046. Campbell, A., and Botstein, D. (1983) Evolution of the lambdoid phages. In Lambda II. Hendrix, R., Roberts, J.W., Stahl, F.W., and Weisberg, R. (eds). Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press, pp. 365380. Campbell, A., Schneider, S.J., and Song, B. (1992) Lambdoid phages as elements of bacterial genomes (integrase/ phage 21/Escherichia coli K-12/icd gene). Genetica 86: 259267. Canchaya, C., Desiere, F., McShan, W., Ferretti, J., Parkhill, J., and Brussow, H. (2002) Genome analysis of an inducible prophage and prophage remnants integrated into Streptococcus pyogenes strain SF370. Virology 302: 245 258. Casjens, S. (1997) Principles of virion structure, function and assembly. In Structural Biology of Viruses. Chiu, W., Burnett, R., and Garcea, R. (eds). Oxford: Oxford University Press, pp. 337. Casjens, S., and Hendrix, R. (1988) Control mechanisms in dsDNA bacteriophage assembly. In The Bacteriophages, Vol. 1. Calendar, R. (ed.). New York: Plenum Press, pp. 1591. Casjens, S., and Hendrix, R. (2003) Bacteriophage roles in bacterial chromosome evolution. In The Bacterial Chromosome. Higgins, P. (ed.). Washington, DC: American Society for Microbiology Press, (in press). Casjens, S., Hatfull, G., and Hendrix, R. (1992) Evolution of dsDNA tailed-bacteriophage genomes. Semin Virol 3: 383397. Casjens, S., van Vugt, R., Tilly, K., Rosa, P.A., and Stevenson, B. (1997) Homology throughout the multiple 32-kilobase circular plasmids present in Lyme disease spirochetes. J Bacteriol 179: 217227. Casjens, S., Palmer, N., Van Vugt, R., Mun Huang, W., Stevenson, B., Rosa, P., et al. (2000) A bacterial genome in ux: the twelve linear and nine circular extrachromosomal DNAs in an infectious isolate of the Lyme disease spirochaete Borrelia burgdorferi. Mol Microbiol 35: 490 516. Chang, C.C., Gilsdorf, J.R., DiRita, V.J., and Marrs, C.F. (2000) Identication and genetic characterization of Haemophilus inuenzae genetic island 1. Infect Immun 68: 26302637. Chang, K.H., Wen, F.S., Tseng, T.T., Lin, N.T., Yang, M.T., and Tseng, Y.H. (1998) Sequence analysis and expression of the lamentous phage fLf gene I encoding a 48-kDa protein associated with host cell membrane. Biochem Biophys Res Commun 245: 313318. Cheetham, B.F., and Katz, M.E. (1995) A role for bacteriophages in the evolution and transfer of bacterial virulence determinants. Mol Microbiol 18: 201208. Chopin, M.C., Chopin, A., Rouault, A., and Galleron, N. (1989) Insertion and amplication of foreign genes in the Lactococcus lactis subsp. lactis chromosome. Appl Environ Microbiol 55: 17691774. Chopin, A., Bolotin, A., Sorokin, A., Ehrlich, S.D., and Chopin, M. (2001) Analysis of six prophages in Lactococcus lactis IL1403: different genetic structure of temperate and virulent phage populations. Nucleic Acids Res 29: 644651.

296 S. Casjens
Conter, A., Bouche, J.P., and Dassain, M. (1996) Identication of a new inhibitor of essential division gene ftsZ as the kil gene of defective prophage Rac. J Bacteriol 178: 5100 5104. Damman, C.J., Eggers, C.H., Samuels, D.S., and Oliver, D.B. (2000) Characterization of Borrelia burgdorferi BlyA and BlyB proteins: a prophage-encoded holin-like system. J Bacteriol 182: 67916797. Davis, B.M., Kimsey, H.H., Chang, W., and Waldor, M.K. (1999) The Vibrio cholerae O139 Calcutta bacteriophage CTXf is infectious and encodes a novel repressor. J Bacteriol 181: 67796787. Deng, W., Burland, V., Plunkett, G., III, Boutin, A., Mayhew, G.F., Liss, P., et al. (2002) Genome sequence of Yersinia pestis KIM. J Bacteriol 184: 46014611. Dep, M.S., Mendz, G.L., Trend, M.A., Coloe, P.J., Fry, B.N., and Korolik, V. (2001) Differentiation between Campylobacter hyoilei and Campylobacter coli using genotypic and phenotypic analyses. Int J Syst Evol Microbiol 51: 819826. Desiere, F., Mahanivong, C., Hillier, A.J., Chandry, P.S., Davidson, B.E., and Brussow, H. (2001) Comparative genomics of lactococcal phages: insight from the complete genome sequence of Lactococcus lactis phage BK5-T. Virology 283: 240252. Dosch, D.C., Helmer, G.L., Sutton, S.H., Salvacion, F.F., and Epstein, W. (1991) Genetic analysis of potassium transport loci in Escherichia coli: evidence for three constitutive systems mediating uptake potassium. J Bacteriol 173: 687 696. Eggers, C.H., and Samuels, D.S. (1999) Molecular evidence for a new bacteriophage of Borrelia burgdorferi. J Bacteriol 181: 73087313. Eggers, C.H., Casjens, S., Hayes, S.F., Garon, C.F., Damman, C.J., Oliver, D.B., et al. (2000) Bacteriophages of spirochetes. J Mol Microbiol Biotechnol 2: 365373. Emmerth, M., Goebel, W., Miller, S.I., and Hueck, C.J. (1999) Genomic subtraction identies Salmonella typhimurium prophages, F-related plasmid sequences, and a novel mbrial operon, stf, which are absent in Salmonella typhi. J Bacteriol 181: 56525661. Espion, D., Kaiser, K., and Dambly-Chaudiere, C. (1983) A third defective lambdoid prophage of Escherichia coli K12 dened by the lambda derivative, lambdaqin111. J Mol Biol 170: 611633. Evans, R., Seeley, N.R., and Kuempel, P.L. (1979) Loss of rac locus DNA in merozygotes of Escherichia coli K12. Mol Gen Genet 175: 245250. Faubladier, M., and Bouche, J.P. (1994) Division inhibition gene dicF of Escherichia coli reveals a widespread group of prophage sequences in bacterial genomes. J Bacteriol 176: 11501156. Feinstein, S.I., and Low, K.B. (1982) Zygotic induction of the rac locus can cause cell death in E. coli. Mol Gen Genet 187: 231235. Ferretti, J.J., McShan, W.M., Ajdic, D., Savic, D.J., Savic, G., Lyon, K., et al. (2001) Complete genome sequence of an M1 strain of Streptococcus pyogenes. Proc Natl Acad Sci USA 98: 46584663. Figueroa-Bossi, N., and Bossi, L. (1999) Inducible prophages contribute to Salmonella virulence in mice. Mol Microbiol 33: 167176. Fleischmann, R.D., Adams, M.D., White, O., Clayton, R.A., Kirkness, E.F., Kerlavage, A.R., et al. (1995) Wholegenome random sequencing and assembly of Haemophilus inuenzae Rd. Science 269: 496512. Fleischmann, R., Alland, D., Eisen, J., Carpenter, L., White, O., Peterson, J., et al. (2002) Whole-genome comparison of Mycobacterium tuberculosis clinical and laboratory strains. J Bacteriol 184: 54795490. Fouts, K.E., Wasie-Gilbert, T., Willis, D.K., Clark, A.J., and Barbour, S.D. (1983) Genetic analysis of transposoninduced mutations of the Rac prophage in Escherichia coli K-12 which affect expression and function of recE. J Bacteriol 156: 718726. Fraser, C.M., Casjens, S., Huang, W.M., Sutton, G.G., Clayton, R., Lathigra, R., et al. (1997) Genomic sequence of a Lyme disease spirochaete, Borrelia burgdorferi. Nature 390: 580586. Freifelder, D., and Meselson, M. (1970) Topological relationship of prophage lambda to the bacterial chromosome in lysogenic cells. Proc Natl Acad Sci USA 65: 200205. Galibert, F., Finan, T.M., Long, S.R., Puhler, A., Abola, P., Ampe, F., et al. (2001) The composite genome of the legume symbiont Sinorhizobium meliloti. Science 293: 668672. Garcia-Vallve, S., Palau, J., and Romeu, A. (1999) Horizontal gene transfer in glycosyl hydrolases inferred from codon usage in Escherichia coli and Bacillus subtilis. Mol Biol Evol 16: 11251134. George, D.G., Yeh, L.S., and Barker, W.C. (1983) Unexpected relationships between bacteriophage lambda hypothetical proteins and bacteriophage T4 tail-ber proteins. Biochem Biophys Res Commun 115: 10611068. Gerstein, M. (1998) Measurement of the effectiveness of transitive sequence comparison, through a third intermediate sequence. Bioinformatics 14: 707714. Ghelardini, P., La Valle, R., and Paolozzi, L. (1994) The Mu gem operon: its role in gene expression, recombination and cell cycle. Genetica 94: 151156. Gildmeister, E., and Herzberg, K. (1924) Zur theorie der bakteriophagen (dHerelle Lysine). 6. Mitteilung ber das dHerellesche phanomen. Zentr Bakteriol Parasitenk I Abt Orig 93: 402420. Girons, I.S., Bourhy, P., Ottone, C., Picardeau, M., Yelton, D., Hendrix, R.W., et al. (2000) The LE1 bacteriophage replicates as a plasmid within Leptospira biexa: construction of an L. biexaEscherichia coli shuttle vector. J Bacteriol 182: 57005705. Glaser, P., Frangeul, L., Buchrieser, C., Rusniok, C., Amend, A., Baquero, F., et al. (2001) Comparative genomics of Listeria species. Science 294: 849852. Goodman, S.D., and Scocca, J.J. (1988) Identication and arrangement of the DNA sequence recognized in specic transformation of Neisseria gonorrhoeae. Proc Natl Acad Sci USA 85: 69826986. Gratia, J.P. (1989) Products of defective lysogeny in Serratia marcescens SMG 38 and their activity against Escherichia coli and other Enterobacteria. J Gen Microbiol 135: 2535. Greener, A., and Hill, C.W. (1980) Identication of a novel genetic element in Escherichia coli K-12. J Bacteriol 144: 312321.
2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277300

Prophage genomics 297


Harshey, R. (1988) Phage Mu. In The Bacteriophages, Vol. 1. Calendar, R. (ed.). New York: Plenum Press, pp. 193 234. Hayashi, T., Makino, K., Ohnishi, M., Kurokawa, K., Ishii, K., Yokoyama, K., et al. (2001a) Complete genome sequence of enterohemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain K-12. DNA Res 8: 1122. Hayashi, T., Makino, K., Ohnishi, M., Kurokawa, K., Ishii, K., Yokoyama, K., et al. (2001b) Complete genome sequence of enterohemorrhagic Escherichia coli O157: H7 and genomic comparison with a laboratory strain K-12. DNA Res 8 (Suppl.): 4752. Heidelberg, J.F., Paulsen, I.T., Nelson, K.E., Gaidos, E.J., Nelson, W.C., Read, T.D., et al. (2002) Genome sequence of the dissimilatory metal ion-reducing bacterium Shewanella oneidensis. Nature Biotechnol 20: 1118 1123. Hendrix, R.W., and Duda, R.L. (1998) Bacteriophage HK97 head assembly: a protein ballet. Adv Virus Res 50: 235 288. Hendrix, R.W., Smith, M.C., Burns, R.N., Ford, M.E., and Hatfull, G.F. (1999) Evolutionary relationships among diverse bacteriophages and prophages: all the worlds a phage. Proc Natl Acad Sci USA 96: 21922197. Hendrix, R.W., Lawrence, J.G., Hatfull, G.F., and Casjens, S. (2000) The origins and ongoing evolution of viruses. Trends Microbiol 8: 504508. Huggins, A.R., and Sandine, W.E. (1977) Incidence and properties of temperate bacteriophages induced from lactic streptococci. Appl Environ Microbiol 33: 184191. Humphrey, S.B., Stanton, T.B., Jensen, N.S., and Zuerner, R.L. (1997) Purication and characterization of VSH-1, a generalized transducing bacteriophage of Serpulina hyodysenteriae. J Bacteriol 179: 323329. Ikeda, H., and Tomizowa, J. (1968) Prophage P1, an extrachromosoal replication unit. Cold Spring Harb Symp Quant Biol 33: 791798. Inal, J.M., and Karunakaran, K.V. (1996) f20, a temperate bacteriophage isolated from Bacillus anthracis exists as a plasmidial prophage. Curr Microbiol 32: 171175. Jian, W., Li, Z., Zhang, Z., Baker, M., Prevelige, P., and Chiu, W. (2003) Coat protein fold and maturation transition of bacteriophage P22 seen at sub-nanometer resolution. Nature Struct Biol 10: 131135. Jin, Q., Yuan, Z., Xu, J., Wang, Y., Shen, Y., Lu, W., et al. (2002) Genome sequence of Shigella exneri 2a: insights into pathogenicity through comparison with genomes of Escherichia coli K12 and O157. Nucleic Acids Res 30: 44324441. Jin, S., Chen, Y., Christie, G.E., and Benedik, M.J. (1996) Regulation of the Serratia marcescens extracellular nuclease: positive control by a homolog of P2 Ogr encoded by a cryptic prophage. J Mol Biol 256: 264278. Juhala, R.J., Ford, M.E., Duda, R.L., Youlton, A., Hatfull, G.F., and Hendrix, R.W. (2000) Genomic sequences of bacteriophages HK97 and HK022: pervasive genetic mosaicism in the lambdoid bacteriophages. J Mol Biol 299: 2751. Kaiser, K. (1980) The origin of Q-independent derivatives of phage lambda. Mol Gen Genet 179: 547554.
2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277300

Kaiser, K., and Murray, N.E. (1979) Physical characterisation of the Rac prophage in E. coli K12. Mol Gen Genet 175: 159174. King, G., and Murray, N.E. (1995) Restriction alleviation and modication enhancement by the Rac prophage of Escherichia coli K-12. Mol Microbiol 16: 769777. Kirby, J., Trempy, J., and Gottesman, S. (1994) Excision of a P4-like cryptic prophage leads to Alp protease expression in Escherichia coli. J Bacteriol 176: 2068 2081. Klee, S.R., Nassif, X., Kusecek, B., Merker, P., Beretti, J.L., Achtman, M., et al. (2000) Molecular and biological analysis of eight genetic islands that distinguish Neisseria meningitidis from the closely related pathogen Neisseria gonorrhoeae. Infect Immun 68: 20822095. Klein, R., Baranyl, U., Rossler, N., Greineder, B., Scholz, H., and Witte, A. (2002) Natrialba magadii virus fCh1: rst complete nucleotide sequence and functional organization of a virus infecting a haloalkaliphilic archaeon. Mol Microbiol 45: 851863. Kofoid, E., Rappleye, C., Stojiljkovic, I., and Roth, J. (1999) The 17-gene ethanolamine (eut) operon of Salmonella typhimurium encodes ve homologues of carboxysome shell proteins. J Bacteriol 181: 53175329. Krogh, S., OReilly, M., Nolan, N., and Devine, K.M. (1996) The phage-like element PBSX and part of the skin element, which are resident at different locations on the Bacillus subtilis chromosome, are highly homologous. Microbiology 142: 20312040. Kropinski, A.M. (2000) Sequence of the genome of the temperate, serotype-converting, Pseudomonas aeruginosa bacteriophage D3. J Bacteriol 182: 60666074. Kuhn, J., and Campbell, A. (2001) The bacteriophage lambda attachment site in wild strains of Escherichia coli. J Mol Evol 53: 607614. Kunst, F., Ogasawara, N., Moszer, I., Albertini, A.M., Alloni, G., Azevedo, V., et al. (1997) The complete genome sequence of the gram-positive bacterium Bacillus subtilis. Nature 390: 249256. Kuroda, M., Ohta, T., Uchiyama, I., Baba, T., Yuzawa, H., Kobayashi, I., et al. (2001) Whole genome sequencing of meticillin-resistant Staphylococcus aureus. Lancet 357: 12251240. Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., et al. (2001) Initial sequencing and analysis of the human genome. Nature 409: 860921. Lang, A.S., and Beatty, J.T. (2001) The gene transfer agent of Rhodobacter capsulatus and constitutive transduction in prokaryotes. Arch Microbiol 175: 241249. Lang, A.S., Beatty, J.T., LeBlanc, H., Towers, G., Harris, J., Lang, G., et al. (2000) Genetic analysis of a bacterial genetic exchange element: the gene transfer agent of Rhodobacter capsulatus. Proc Natl Acad Sci USA 97: 859864. Lawrence, J.G., Hendrix, R.W., and Casjens, S. (2001) Where are the bacterial pseudogenes? Trends Microbiol 9: 535540. Lawrence, J.G., Hatfull, G., and Hendrix, R. (2002) The imbroglios of viral taxonomy: genetic exchange and the failings of phenetic approaches. J Bacteriol 184: 4891 4905.

298 S. Casjens
Lazarevic, V., Dusterhoft, A., Soldo, B., Hilbert, H., Mauel, C., and Karamata, D. (1999) Nucleotide sequence of the Bacillus subtilis temperate bacteriophage SPbc2. Microbiology 145: 10551067. Lederberg, E. (1951) Lysogenicity in E. coli K-12. Genetics 36: 560. Lewis, R.J., Brannigan, J.A., Offen, W.A., Smith, I., and Wilkinson, A.J. (1998) An evolutionary link between sporulation and prophage induction in the structure of a repressor: anti-repressor complex. J Mol Biol 283: 907912. Lin, N.T., Chang, R.Y., Lee, S.J., and Tseng, Y.H. (2001) Plasmids carrying cloned fragments of RF DNA from the lamentous phage fLf can be integrated into the host chromosome via site-specic integration and homologous recombination. Mol Gen Genet 266: 425435. Lindsey, D.F., Mullin, D.A., and Walker, J.R. (1989) Characterization of the cryptic lambdoid prophage DLP12 of Escherichia coli and overlap of the DLP12 integrase gene with the tRNA gene argU. J Bacteriol 171: 61976205. Loessner, M.J., Inman, R.B., Lauer, P., and Calendar, R. (2000) Complete nucleotide sequence, molecular analysis and genome structure of bacteriophage A118 of Listeria monocytogenes: implications for phage evolution. Mol Microbiol 35: 324340. Low, K.B. (1973) Restoration of the rac locus of recombinant forming ability in recB and recC merozygotes of Escherichia coli K12. Mol Gen Genet 122: 119130. Lucchini, S., Desiere, F., and Brussow, H. (1999) Comparative genomics of Streptococcus thermophilus phage species supports a modular evolution theory. J Virol 73: 8647 8656. Lwoff, A. (1953) Lysogeny. Bacteriol Rev 17: 269337. Lwoff, A. (1966) The prophage and I. In Phage and the Origins of Molecular Biology. Cairns, J., Stent, G., and Watson, J. (eds). Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press, pp. 8899. McClelland, M., Florea, L., Sanderson, K., Clifton, S.W., Parkhill, J., Churcher, C., et al. (2000) Comparison of the Escherichia coli K-12 genome with sampled genomes of a Klebsiella pneumoniae and three Salmonella enterica serovars, Typhimurium, Typhi and Paratyphi. Nucleic Acids Res 28: 49744986. McClelland, M., Sanderson, K.E., Spieth, J., Clifton, S.W., Latreille, P., Courtney, L., et al. (2001) Complete genome sequence of Salmonella enterica serovar Typhimurium LT2. Nature 413: 852856. McDonnell, G.E., Wood, H., Devine, K.M., and McConnell, D.J. (1994) Genetic control of bacterial suicide: regulation of the induction of PBSX in Bacillus subtilis. J Bacteriol 176: 58205830. McDonough, M.A., and Butterton, J.R. (1999) Spontaneous tandem amplication and deletion of the shiga toxin operon in Shigella dysenteriae 1. Mol Microbiol 34: 1058 1069. Mahajan, S.K., Chu, C.C., Willis, D.K., Templin, A., and Clark, A.J. (1990) Physical analysis of spontaneous and mutagen-induced mutants of Escherichia coli K-12 expressing DNA exonuclease VIII activity. Genetics 125: 261273. Mahdi, A.A., Sharples, G.J., Mandal, T.N., and Lloyd, R.G. (1996) Holliday junction resolvases encoded by homologous rusA genes in Escherichia coli K-12 and phage 82. J Mol Biol 257: 561573. Masui, S., Kamoda, S., Sasaki, T., and Ishikawa, H. (2000) Distribution and evolution of bacteriophage WO in Wolbachia, the endosymbiont causing sexual alterations in arthropods. J Mol Evol 51: 491497. Masui, S., Kuroiwa, H., Sasaki, T., Inui, M., Kuroiwa, T., and Ishikawa, H. (2001) Bacteriophage WO and virus-like particles in Wolbachia, an endosymbiont of arthropods. Biochem Biophys Res Commun 283: 10991104. Mediavilla, J., Jain, S., Kriakov, J., Ford, M.E., Duda, R.L., Jacobs, W.R., Jr, et al. (2000) Genome organization and characterization of mycobacteriophage Bxb1. Mol Microbiol 38: 955970. Meselson, M. (1967) Reciprocal recombination in prophage lambda. J Cell Physiol 70 (Suppl. 1): 113118. Miao, E.A., and Miller, S.I. (1999) Bacteriophages in the evolution of pathogenhost interactions. Proc Natl Acad Sci USA 96: 94529454. Milkman, R., and Bridges, M.M. (1990) Molecular evolution of the Escherichia coli chromosome. III. Clonal frames. Genetics 126: 505517. Mizuno, M., Masuda, S., Takemaru, K., Hosono, S., Sato, T., Takeuchi, M., et al. (1996) Systematic sequencing of the 283 kb 210 degrees-232 degrees region of the Bacillus subtilis genome containing the skin element and many sporulation genes. Microbiology 142: 31033111. Montag, D., and Henning, U. (1987) An open reading frame in the Escherichia coli bacteriophage lambda genome encodes a protein that functions in assembly of the long tail bers of bacteriophage T4. J Bacteriol 169: 58845886. Moreira, D. (2000) Multiple independent horizontal transfers of informational genes from bacteria to plasmids and phages: implications for the origin of bacterial replication machinery. Mol Microbiol 35: 15. Morgan, G., Hatfull, G., Casjens, S., and Hendrix, R. (2002) Bacteriophage Mu genome sequence: analysis and comparison with Mu-like prophages in Haemophilus, Neisseria and Deinococcus. J Mol Biol 317: 337359. Morimyo, M., Hongo, E., Hama-Inaba, H., and Machida, I. (1992) Cloning and characterization of the mvrC gene of Escherichia coli K-12 which confers resistance against methyl viologen toxicity. Nucleic Acids Res 20: 31593165. Nakayama, K., Kanaya, S., Ohnishi, M., Terawaki, Y., and Hayashi, T. (1999) The complete nucleotide sequence of fCTX, a cytotoxin-converting phage of Pseudomonas aeruginosa: implications for phage evolution and horizontal gene transfer via bacteriophages. Mol Microbiol 31: 399 419. Nakayama, K., Takashima, K., Ishihara, H., Shinomiya, T., Kageyama, M., Kanaya, S., et al. (2000) The R-type pyocin of Pseudomonas aeruginosa is related to P2 phage, and the F-type is related to lambda phage. Mol Microbiol 38: 213231. Nelson, K.E., Weinel, C., Paulsen, I.T., Dodson, R.J., Hilbert, H., Martins dos Santos, V.A., et al. (2002) Complete genome sequence and comparative analysis of the metabolically versatile Pseudomonas putida KT2440. Environ Microbiol 4: 799808. Nguyen, A.H., Tomita, T., Hirota, M., Sato, T., and Kamio, Y. (1999) A simple purication method and morphology and
2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277300

Prophage genomics 299


component analyses for carotovoricin Er, a phage-tail-like bacteriocin from the plant pathogen Erwinia carotovora Er. Biosci Biotechnol Biochem 63: 13601369. Nolling, J., Breton, G., Omelchenko, M.V., Makarova, K.S., Zeng, Q., Gibson, R., et al. (2001) Genome sequence and comparative analysis of the solvent-producing bacterium Clostridium acetobutylicum. J Bacteriol 183: 48234838. Ochman, H., and Wilson, A.C. (1987) Evolution in bacteria: evidence for a universal substitution rate in cellular genomes. J Mol Evol 26: 7486. Ohnishi, M., Kurokawa, K., and Hayashi, T. (2001) Diversication of Escherichia coli genomes: are bacteriophages the major contributors? Trends Microbiol 9: 481485. Ojaimi, C., Brooks, C., Casjens, S., Rosa, P., Elias, A., Barbour, A., et al. (2003) Proling temperature-induced changes in Borrelia burgdorferi gene expression using whole genome arrays. Infect Immun 71: 16891705. Okamato, K., Mudd, J., Mangon, J., Huang, W.M., and Marmur, J. (1968) Properties of the defective phage of Bacillus subtilis. J Mol Biol 34: 413428. Osawa, R., Iyoda, S., Nakayama, S.I., Wada, A., Yamai, S., and Watanabe, H. (2000) Genotypic variations of Shiga toxin-converting phages from enterohaemorrhagic Escherichia coli O157:H7 isolates. J Med Microbiol 49: 565574. Pappenheimer, A.M., Jr, and Murphy, J.R. (1983) Studies on the molecular epidemiology of diphtheria. Lancet 2: 923 926. Parkhill, J., Achtman, M., James, K.D., Bentley, S.D., Churcher, C., Klee, S.R., et al. (2000) Complete DNA sequence of a serogroup A strain of Neisseria meningitidis Z2491. Nature 404: 502506. Parkhill, J., Dougan, G., James, K.D., Thomson, N.R., Pickard, D., Wain, J., et al. (2001a) Complete genome sequence of a multiple drug resistant Salmonella enterica serovar Typhi CT18. Nature 413: 848852. Parkhill, J., Wren, B.W., Thomson, N.R., Titball, R.W., Holden, M.T., Prentice, M.B., et al. (2001b) Genome sequence of Yersinia pestis, the causative agent of plague. Nature 413: 523527. Pedulla, M.L., Ford, M.E., Karthikeyan, T., Houtz, J.M., Hendrix, R.W., Hatfull, G.F., et al. (2003) Corrected sequence of the bacteriophage P22 genome. J Bacteriol 185: 1475 1477. Perna, N.T., Plunkett, G., III, Burland, V., Mau, B., Glasner, J.D., Rose, D.J., et al. (2001) Genome sequence of enterohaemorrhagic Escherichia coli O157:H7. Nature 409: 529 533. Perna, N., Glasner, J., Burland, V., and Plunkett, G. III (2002) The Genomes of Escherichia coli K-12 and Pathogenic E. coli. In Escherichia coli: Virulence Mechanisms of a Versatile Pathogen. Donnenberg, M. (ed.). San Diego, CA: Academic Press, pp. 353. Pster, P., Wasserfallen, A., Stettler, R., and Leisinger, T. (1998) Molecular genomics of methanobacterium phage YM2. Mol Microbiol 30: 233244. Popp, A., Hertwig, S., Lurz, R., and Appel, B. (2000) Comparative study of temperate bacteriophages isolated from Yersinia. Syst Appl Microbiol 23: 469478. Ramirez, M., Severina, E., and Tomasz, A. (1999) A high incidence of prophage carriage among natural isolates of Streptococcus pneumoniae. J Bacteriol 181: 36183625.
2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277300

Rapp, B., and Wall, J. (1987) Genetic transfer in Desulfovibrio desulfuricans. Proc Natl Acad Sci USA 84: 91289130. Ravin, V.K. (1968) The functioning of the genes of temperate bacteriophage in lysogenic cells. Genetika 4: 119124 (in Russian). Ravin, V., and Shulga, M.G. (1970) The evidence of extrachromosomal location phage prophage N15. Virology 40: 800805. Ravin, V., Ravin, N., Casjens, S., Ford, M.E., Hatfull, G.F., and Hendrix, R.W. (2000) Genomic sequence and analysis of the atypical temperate bacteriophage N15. J Mol Biol 299: 5373. Read, T.D., Brunham, R.C., Shen, C., Gill, S.R., Heidelberg, J.F., White, O., et al. (2000) Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39. Nucleic Acids Res 28: 13971406. Recktenwald, J., and Schmidt, H. (2002) The nucleotide sequence of Shiga toxin (Stx) 2e-encoding phage fP27 is not related to other Stx phage genomes, but the modular genetic structure is conserved. Infect Immun 70: 1896 1908. Redeld, R.J., and Campbell, A. (1987) Structure of cryptic l prophages. J Mol Biol 198: 393404. Reid, S.D., Herbelin, C.J., Bumbaugh, A.C., Selander, R.K., and Whittam, T.S. (2000) Parallel evolution of virulence in pathogenic Escherichia coli. Nature 406: 6467. Retallack, D.M., Johnson, L.L., and Friedman, D.I. (1994) Role for 10Sa RNA in the growth of lambda-P22 hybrid phage. J Bacteriol 176: 20822089. Rudd, K.E. (1999) Novel intergenic repeats of Escherichia coli K-12. Res Microbiol 150: 653664. Ruzin, A., Lindsay, J., and Novick, R.P. (2001) Molecular genetics of SaPI1 a mobile pathogenicity island in Staphylococcus aureus. Mol Microbiol 41: 365377. Sandt, C.H., and Hill, C.W. (2000) Four different genes responsible for nonimmune immunoglobulin-binding activities within a single strain of Escherichia coli. Infect Immun 68: 22052214. Schicklmaier, P., Moser, E., Wieland, T., Rabsch, W., and Schmieger, H. (1998) A comparative study on the frequency of prophages among natural isolates of Salmonella and Escherichia coli with emphasis on generalized transducers. Antonie Van Leeuwenhoek 73: 4954. Schlosser, A., Kluttig, S., Hamann, A., and Bakker, E.P. (1991) Subcloning, nucleotide sequence, and expression of trkG, a gene that encodes an integral membrane protein involved in potassium uptake via the Trk system of Escherichia coli. J Bacteriol 173: 31703176. Schmieger, H., and Schicklmaier, P. (1999) Transduction of multiple drug resistance of Salmonella enterica serovar typhimurium DT104. FEMS Microbiol Lett 170: 251256. Shimizu, T., Ohtani, K., Hirakawa, H., Ohshima, K., Yamashita, A., Shiba, T., et al. (2002) Complete genome sequence of Clostridium perfringens, an anaerobic esh-eater. Proc Natl Acad Sci USA 99: 9961001. da Silva, A.C., Ferro, J.A., Reinach, F.C., Farah, C.S., Furlan, L.R., Quaggio, R.B., et al. (2002) Comparison of the genomes of two Xanthomonas pathogens with differing host specicities. Nature 417: 459463. Simpson, A.J., Reinach, F.C., Arruda, P., Abreu, F.A., Acencio, M., Alvarenga, R., et al. (2000) The genome sequence

300 S. Casjens
of the plant pathogen Xylella fastidiosa. Nature 406: 151 157. Six, E. (1963) A defective phage depending on phage P2. Bacteriol Proc 80: 138. Smith, M.C., Burns, N., Sayers, J.R., Sorrell, J.A., Casjens, S.R., and Hendrix, R.W. (1998) Bacteriophage collagen. Science 279: 1834. Smith, M.C., Burns, R.N., Wilson, S.E., and Gregory, M.A. (1999) The complete sequence of the Streptomyces temperate phage fC31: evolutionary relationships to other viruses. Nucleic Acids Res 27: 21452155. Smith, T.J., Blackman, S.A., and Foster, S.J. (2000) Autolysins of Bacillus subtilis: multiple enzymes with multiple functions. Microbiology 146: 249262. Smoot, J.C., Barbian, K.D., Van Gompel, J.J., Smoot, L.M., Chaussee, M.S., Sylva, G.L., et al. (2002) Genome sequence and comparative microarray analysis of serotype M18 group A Streptococcus strains associated with acute rheumatic fever outbreaks. Proc Natl Acad Sci USA 99: 46684673. Stanley, E., Fitzgerald, G.F., Le Marrec, C., Fayard, B., and van Sinderen, D. (1997) Sequence analysis and characterization of fO1205, a temperate bacteriophage infecting Streptococcus thermophilus CNRZ1205. Microbiology 143: 34173429. Starich, T., Cordes, P., and Zissler, J. (1985) Transposon tagging to detect a latent virus in Myxococcus xanthus. Science 230: 541543. Stover, C.K., Pham, X.Q., Erwin, A.L., Mizoguchi, S.D., Warrener, P., Hickey, M.J., et al. (2000) Complete genome sequence of Pseudomonas aeruginosa PA01, an opportunistic pathogen. Nature 406: 959964. Susskind, M.M., and Botstein, D. (1978) Molecular genetics of bacteriophage P22. Microbiol Rev 42: 385413. Susskind, M.M., and Botstein, D. (1980) Superinfection exclusion by lambda prophage in lysogens of Salmonella typhimurium. Virology 100: 212216. Susskind, M.M., Botstein, D., and Wright, A. (1974) Superinfection exclusion by P22 prophage in lysogens of Salmonella typhimurium. III. Failure of superinfecting phage DNA to enter sieA+ lysogens. Virology 62: 350366. Takemaru, K., Mizuno, M., Sato, T., Takeuchi, M., and Kobayashi, Y. (1995) Complete nucleotide sequence of a skin element excised by DNA rearrangement during sporulation in Bacillus subtilis. Microbiology 141: 323327. Tang, S., Nutthall, S., Ngui, K., Fisher, C., Lopez, P., and Dyall-Smith, M. (2002) HF2: a double-stranded DNA tailed haloarcheal virus with a mosaic genome. Mol Microbiol 44: 283296. Tettelin, H., Saunders, N.J., Heidelberg, J., Jeffries, A.C., Nelson, K.E., Eisen, J.A., et al. (2000) Complete genome sequence of Neisseria meningitidis serogroup B strain MC58. Science 287: 18091815. Thaler, J.O., Baghdiguian, S., and Boemare, N. (1995) Purication and characterization of xenorhabdicin, a phage taillike bacteriocin, from the lysogenic strain F1 of Xenorhabdus nematophilus. Appl Environ Microbiol 61: 20492052. Van Sluys, M.A., de Oliveira, M.C., Monteiro-Vitorello, C.B., Miyaki, C.Y., Furlan, L.R., Camargo, L.E., et al. (2003) Comparative analyses of the complete genome sequences of Pierces disease and citrus variegated chlorosis strains of Xylella fastidiosa. J Bacteriol 185: 10181026. Van Vliet, F., Boyen, A., and Glansdorff, N. (1988) On interspecies gene transfer: the case of the argF gene of Escherichia coli. Ann Inst Pasteur Microbiol 139: 493496. Wagner, P.L., and Waldor, M.K. (2002) Bacteriophage control of bacterial virulence. Infect Immun 70: 39853993. Waldor, M.K. (1998) Bacteriophage biology and bacterial virulence. Trends Microbiol 6: 295297. Waldor, M.K., and Mekalanos, J.J. (1996) Lysogenic conversion by a lamentous phage encoding cholera toxin. Science 272: 19101914. Wall, J.D., Weaver, P.F., and Gest, H. (1975) Gene transfer agents, bacteriophages, and bacteriocins of Rhodopseudomonas capsulata. Arch Microbiol 105: 217 224. Wang, F.S., Whittam, T.S., and Selander, R.K. (1997) Evolutionary genetics of the isocitrate dehydrogenase gene (icd) in Escherichia coli and Salmonella enterica. J Bacteriol 179: 65516559. Welch, R.A., Burland, V., Plunkett, G., III, Redford, P., Roesch, P., Rasko, D., et al. (2002) Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proc Natl Acad Sci USA 99: 17020 17024. Whatmore, A.M., and Dowson, C.G. (1999) The autolysinencoding gene (lytA) of Streptococcus pneumoniae displays restricted allelic variation despite localized recombination events with genes of pneumococcal bacteriophage encoding cell wall lytic enzymes. Infect Immun 67: 4551 4556. Wickner, S. (1984a) Oligonucleotide synthesis by Escherichia coli dnaG primase in conjunction with phage P22 gene 12 protein. J Biol Chem 259: 1404414047. Wickner, S. (1984b) DNA-dependent ATPase activity associated with phage P22 gene 12 protein. J Biol Chem 259: 1403814043. Willis, D.K., Satin, L.H., and Clark, A.J. (1985) Mutationdependent suppression of recB21 recC22 by a region cloned from the Rac prophage of Escherichia coli K-12. J Bacteriol 162: 11661172. Yamamoto, K. (1967) The origin of bacteriophage P221. Virology 33: 545547. Yamamoto, N. (1969) Genetic evolution of bacteriophage. I. Hybrids between unrelated bacteriophages P22 and Fels2. Proc Natl Acad Sci USA 62: 6369. Yen, H.C., Hu, N.T., and Marrs, B.L. (1979) Characterization of the gene transfer agent made by an overproducer mutant of Rhodopseudomonas capsulata. J Mol Biol 131: 157168. Zinder, N., and Lederberg, J. (1952) Genetic exchange in Salmonella. J Bacteriol 64: 679699. Zink, R., Loessner, M.J., and Scherer, S. (1995) Characterization of cryptic prophages (monocins) in Listeria and sequence analysis of a holin/endolysin gene. Microbiology 141: 25772584. Zissler, J., Signer, E., and Sachaefer, F. (1971) The role of recombination in growth of bacteriophage lambda. In Bacteriophage Lambda. Hershey, A.D. (ed.). Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press, pp. 455475.

2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277300

Anda mungkin juga menyukai