Anda di halaman 1dari 8

Available online at www.sciencedirect.

com

ScienceDirect

Taking the pseudo out of pseudogenes


Ian Goodhead and Alistair C Darby

Pseudogenes are defined as fragments of once-functional the increased amount of non-coding, intergenic
genes that have been silenced by one or more nonsense, sequences in intracellular bacteria are the remnants of
frameshift or missense mutations. Despite continuing genes that have been systematically scrambled by
increases in the speed of sequencing and annotating bacterial mutation [3].
genomes, the identification and categorisation of pseudogenes
remains problematic. Even when identified, pseudogenes are In bacteria, pseudogenes, defined here as ‘genes that have
considered to be rare and tend to be ignored. On the contrary, been silenced by one or more deleterious mutations,’ are
pseudogenes are surprisingly prevalent and can persist for long commonly detected as fragments of previously described
evolutionary time periods, representing a record of once- homologs, and therefore can become difficult to annotate
functional genetic characteristics. Most importantly, once significant degradation to the original sequence has
pseudogenes provide an insight into prokaryotic evolutionary occurred. A comparison between the genomes of any two
history as a record of phenotypic traits that have been lost. species can identify polymorphisms between related
Focusing on the intracellular and symbiotic bacteria in which genes that may have altered or ablated their function.
pseudogenes predominate, this review discusses the Pseudogenes, therefore, can function as a record of the
importance of identifying pseudogenes to fully understand the proteins, enzymes or pathways that are no longer necess-
abilities of bacteria, and to understand prokaryotes within their ary as the bacterium has adapted to a novel environment.
evolutionary context.
Addresses High throughput DNA sequencing is now an indispen-
Functional and Comparative Genomics, School of Biological Sciences, sible tool in life sciences. The technology allows the rapid
University of Liverpool, Crown St., Liverpool L69 7ZB, UK description of genomes, transcriptomes and epigenetic
features in ways that were not possible a decade ago. In
the area of bacterial genomics and informatics we are
feeling very pleased with ourselves. We can now rapidly
Current Opinion in Microbiology 2015, 23:102–109 sequence large panels of bacteria to high standards of
This review comes from a themed issue on Genomics finished qualities, and describe important differences
Edited by Neil Hall and Jay Hinton
associated with phenotypes [4,5]. Nevertheless, we are
struggling to define or categorise pseudogenes, and to
explain why they persist (i.e. have not been entirely
removed from the genome). We have not examined
http://dx.doi.org/10.1016/j.mib.2014.11.012 whether pseudogenes have any residual function, and
1369-5274/# 2014 Published by Elsevier Ltd. it is not clear why we should invest resources to do so.

The dynamics of gene evolution


In order to emphasise the importance of pseudogenes, it
is important to understand them in the context of the
evolution of a microbial genome. Microbial pathogens
Introduction and symbionts exist at different stages along a pathway of
The paradigm for bacterial genomes is that they are small pseudogenisation, which ultimately ends with gene
and efficient. Coding sequences are short — approxi- deletion and a greatly reduced genome size, depending
mately 1 kb in length — lack introns and are arranged on the length of time that the cell has co-evolved along-
in such a manner that little coding capacity (i.e. the side its host (Figure 1). From an evolutionary standpoint,
amount of ‘active’ coding sequence as a percentage of bacterial genomes both expand and contract, depending
the total genome size) is wasted. Operons are organised upon the rate of acquisition of new genes (and pheno-
‘end-to-end’ with few gaps in between. Generally, pro- types) and the deletion of redundant sequences. Such
karyotic genome size correlates well with gene number, dynamics are observable within genera and species (Box
unlike in some eukaryotes, where expansions in genome 1). The ‘accessory’ genome contains genes that encode
size tend to show a reduced overall percentage of the adaptive traits and can expand through horizontal gene
genome coding for genes [1]. Generally, intergenic and transfer [6], to increase host-range or virulence [7], or
non-functional sequences are scarce in bacteria, probably introduce tolerance to antimicrobial compounds [8],
due to a propensity for redundant sequences to be environmental toxins [9], or genes associated with the
deleted, although decreased coding capacities can be establishment or maintenance of symbiosis [10]. Corre-
observed in intracellular bacteria [2]. It is assumed that spondingly, the ‘core’ genome, which encodes essential

Current Opinion in Microbiology 2015, 23:102–109 www.sciencedirect.com


The open problem of pseudogenes Goodhead and Darby 103

Figure 1

Deletion

Transposable elements
Recombination
Pseudogenisation
# Pseudogenes

Host cell

Host cell

Bacterial cell

Free-living bacterium Recent symbiont Obligate symbiont


Host-restriction Genome reduction
Large genome size < Genome size << Genome size
High % coding capacity < Coding capacity Host-specific
Few transposable elements <Transposable elements > % coding capacity
Free-living / multiple hoste <Selection AT-bias

Current Opinion in Microbiology

Genome reduction with respect to endosymbionts. Schematic of genome reduction as bacteria move from a free-living environment to a more
specific (intracellular) niche. Initially having large genomes, some reduction in both size and coding capacity is seen as the organism adapts to an
intracellular environment as the selective constraints on some genes are relaxed due to protection or nutrients offered by the host cell. Such
genes are prone to ‘pseudogenisation’ by mutation. Further specialisation results in deletion of redundant sequences and a much-reduced
genome size. The long-term maintenance of reduced coding capacities in recent symbionts may be as a result of some residual function.

genes associated with functions such as metabolism, can to correct such non-synonymous substitutions, often due
remain stable unless large environmental changes occur to the lack of proofreading during DNA replication. For
— such as adapting to a new environment (e.g. adaptation example, the loss of DnaQ-mediated proofreading activi-
to an intracellular lifestyle within a host). Horizontal gene ties of DNA polymerase III in M. leprae, the etiological
transfers are common in bacteria, and transfer large agent of leprosy [12,13], or recA/F in Buchnera, the primary
amounts of content, reaching rates exceeding 16 kbp symbiont in pea aphids [14], results in increased rates of
per million years in Escherichia coli, for instance [11]. mutation. A combination of a lack of error-correction and
the frequent bottlenecks that are associated with an
Once removed from evolutionary selective constraints,
genes can accumulate deleterious single nucleotide poly-
morphisms, frameshifts or truncations relative to an Figure 2
ancestral orthologue (Figure 2). A frequent feature of
intracellular prokaryotes is the inability of the organism polymorphism
premature stop
loss / change of functional domain
Box 1 Intracellular bacteria genome evolution loss of start codon

Compared to the free living life style the intracellular environment can promoter / RBS
loss of promoter / RBS
be viewed as a protected, homeostatic, nutrient-rich compartment.
In such an environment a bacterium will have less need for some the
pathways whose function may now be provided by the host, for premature stop
instance those involved with metabolism or defence — genes that
insertion
would be under strong positive selection to be retained — in a free-
1
living bacterium. Because of this, many obligate intracellular bacteria 2 frameshift
3
have much smaller genomes than their free-living counterparts, as
such pathways tend to have been deleted. There are a number of Current Opinion in Microbiology
good examples where bacterial genomes have been streamlined.
Wigglesworthia glossinidia, the mutualist symbiont of tsetse flies,
Methods by which pseudogenes can be formed. A schematic showing
has a genome size of just 700 kb [26]; Mycoplasma genitalium,
actions of polymorphisms on gene sequences leading to ‘pseudogene’
the obligate parasite of primate genital and respiratory tracts, 580 kb
formation. The top four panels showing polymorphism within coding
[66] and some Buchnera species, mutualist symbionts in aphids,
sequence or promoter regions. The fifth and sixth panels show
have genome sizes of between 450 kb and 650 kb [27,67].
insertions within coding sequences leading to a premature stop or a
frameshift, respectively.

www.sciencedirect.com Current Opinion in Microbiology 2015, 23:102–109


104 Genomics

intracellular lifestyle results in frequent fixation of non- frameshift mutations [33]. The genomes of both M. leprae
sense mutations within the population: Non-synon- [13] and S. glossinidius [24,34] have coding capacities of
ymous/synonymous (dN/dS) substitutions in only 50%, suggesting there has not been sufficient
intracellular bacteria regularly reach rates of 1, outstrip- evolutionary time or selective pressure for non-coding
ping that of free-living bacteria [15]. Non-sense regions to be removed.
mutations and frameshifts frequently create pseudogenes
by introducing premature stop-codons, or by destroying Pseudogene accumulation is a hallmark of recent adap-
promoters [16,17], start-codons [18], or functional tation to a new host, wherein genes are being selectively
domains [16,19]. Horizontal gene transfers are probably silenced as the organism adapts to its new environmental
the major source of pseudogenes in prokaryotes, as the niche (Figure 2). Salmonella genomes serve as a useful
majority of the genes transferred likely provide few example in this instance: Salmonella is a broad host-range
benefits and are rapidly silenced due to a lack of positive Gram-negative facultative anaerobe that exists in a num-
selective pressure. Some genes may even be toxic increas- ber of different hosts, and for which six subspecies and
ing negative selective pressure [20]. Deletion, therefore over 2500 serovars have been identified and several
represents a mechanism to defend against deleterious genomes have been sequenced (Reviewed [35]). Further-
products of gene acquisition and maintains small gen- more, the genomes are clonal, around 4.5 Mb in size and
omes sizes [21]. Prokaryotic pseudogenisation differs reflect evolutionary changes, such as genome reduction
from the mechanism in eukaryotes, in which retrotran- and horizontal gene acquisition, to adapt to different
sposition (‘processed’), and gene duplication and sub- niches. Generally, Salmonella enterica serovars with a
sequent degradation of additional gene copies (‘non- broad host-range, such as S. Enteritidis PT4 (113 pseu-
processed’), predominate [22,23]. There are also dogenes) or S. Typhimurium SL1344 (63) have fewer
examples in intracellular bacteria where transposons or pseudogenes than host-specific serovars like S. Galli-
mobile elements disrupt gene function [24,25]. narum (309), which causes Fowl Typhoid, or the
human-restricted S. Typhi CT18 (204) [35]. Expanding
In obligate mutualist symbionts, genome reduction may this to more bacterial genera, the same pattern of
be coupled with dependence upon the symbiont for the increased pseudogenisation in narrow host range species
production of certain metabolites that are not provided by can be seen, for example in Yersinia [36], Escherichia and
standard nutrition, such as Wigglesworthia providing essen- Shigella [37] and Rickettsia [38].
tial B-vitamins that are not present in a tsetse fly’s regular
blood meal [26], or Buchnera providing aphids with essen- The mystery of persistence
tial amino acids that are absent from phloem sap [27]. Some bacteria, particularly younger symbionts such as
Arguably, the best example of genome reduction is the Sodalis glossinidius or Baumannia cicadellinicola, maintain
mitochondrion, which is an ancient symbiosis that has high numbers of pseudogenes, and reduced coding
become a eukaryotic organelle in its own right [28]; capacities. Comparisons have shown that pseudogenes
phylogenetic analysis of cytochrome B and cytochrome are generally species-specific, and that they are rarely
C oxidase I genes suggest mitochondria may have shared between closely related species [39–44]. A study in
diverged from a free-living Rickettsiaceae ancestor Cicadas reported that two recently diverged bacterial
approximately 1.5–2.0 billion years ago [29,30]. The most symbiont species within the same host have non-over-
closely related organism to the eukaryotic mitochondrion lapping pseudogenes, and complementary gene contents
is Rickettsia prowazekii, a gram negative Alpha-proteobac- that, together, ameliorate individually silenced pathways
terium that causes epidemic typhus. R. Prowazekii has a [45]. A lack of shared pseudogenes between species also
1.1 Mb genome that is undergoing substantial reduction suggests that redundant genes are either eliminated
and has a coding capacity of only 76%. Reduced coding rapidly from genomes, or that they are subject to a high
capacities are common in many of the Rickettsiae, which degree of degradation that they are no longer recognised
represent a taxonomically diverse group of obligate intra- by standard annotation pipelines [44].
cellular bacteria existing in both symbiotic and patho-
genic relationships with their hosts, and therefore Due to the efficiency of genome architecture in bacteria,
represent a useful model for understanding pseudogene it is unclear why prokaryotic genomes would maintain
content across environments [3,31]. The 49 Rickettsia reduced coding-capacities. It is tempting to hypothesise
genomes available in Ensembl Bacteria (release 22) that persistent pseudogenes maintain some function, and
[32] have small chromosome sizes (range 0.86–1.82 Mb; therefore there is some positive selective pressure to
mean 1.23Mb), low %GC contents (mean 31.7%) and maintain their presence. If there is a selective advantage
reduced coding capacities (mean 76%). Low %GC con- to the removal of extraneous DNA, however, due to, for
tents are important in this context, as alternative-frame instance, reducing unnecessary energy expenditure on
premature stop-codons are more prevalent in AT-rich DNA replication, transcriptional or translational mechan-
genomes, which are hypothesised to serve as error-cor- isms, or even to remove toxic genes [35], why should it
recting sequences to minimise the effect of deleterious take so long for such regions to be deleted? Pseudogenes

Current Opinion in Microbiology 2015, 23:102–109 www.sciencedirect.com


The open problem of pseudogenes Goodhead and Darby 105

can be surprisingly persistent: Throughout its evolution, sequence relative to its M. tuberculosis orthologue, and an
the mean half-life of pseudogenes in Buchnera is approxi- otherwise intact translational architecture. In a study of
mately 24 million years, [46]. There are two hypotheses to three Yersiniae, between one and forty annotated ‘pseu-
explain such a high level of persistence: Either there is dogenes’ showed evidence of translation [54]. The
some residual function associated with pseudogenes concept of pseudogenes being silent in a translational
resulting in a small selective pressure to maintain the sense seems to be incorrect, although further evidence for
pseudogenised sequence [47], or perhaps there is a bias active protein products being produced from degraded
towards silencing regulatory sequences thereby removing sequences will help elucidate the extent to which this is
the need to delete the pseudogene itself [46]. the case (Figure 3).

It is important to understand that ‘pseudogenes’ are, by The size of the problem


definition, a result of bioinformatic analyses that follow a Automatic annotation pipelines for prokaryotic genomes
set of rules for what a gene should look like. Therefore, if a have vastly improved alongside this speed increase:
hypothetical ‘pseudogene’ is processed by cellular Prokka, for instance, can annotate the E. coli K12 genome
machinery and produces RNA, or a protein (albeit in only six minutes using a desktop computer [55],
possibly altered relative to an ancestral orthologue), is however, the ability to predict and annotate pseudogenes
it actually a ‘pseudo’ gene? Unless the regulatory machin- has lagged behind. Prokka does provide a list of suspi-
ery for a given gene has been altered, it is likely that cious genes that are potentially pseudogenes, but these
pseudogene-derived RNA is transcribed. There is need to be checked carefully.
increasing evidence for pseudogene transcription in
eukaryotes: The ENCODE project predicting that 1/ In contrast, manual re-annotation projects have shown
5th of pseudogenes are being actively transcribed [48], some success in identifying originally unidentified pseu-
and that some pseudogenes have an active role in regu- dogenes: For instance, Belda and colleagues used
lation through siRNA pathways [49], although evidence BLASTX [56] searches of intergenic sequences >50 bp
remains elusive in prokaryotes. Microarray-based expres- against the E. coli K12 proteome, and found 529 additional
sion analyses suggested that a large proportion of the M. pseudogenes to the original 927 identified in Sodalis
leprae pseudogenes are transcribed [50,51], although it is glossinidius [34]. Similarly, the original E. coli K12 genome
unlikely that these transcripts are translated into func- contained only a single pseudogene [57], a subsequent
tional protein. No pseudogene-derived products were study found 207 [58]. Table 1 lists some basic statistics
observed from over 300 proteins identified in extracts for annotations held for a few bacterial genera taken from
from M. leprae isolated from Armadillos [52] and few two databases: the Prokaryotic Genome Analysis Tool
peptides were detected matching the nine expressed (PGAT, [59]) and the Integrated Microbial Genomes
pseudogenes identified in S. Typhi Ty2 [53]. Pseudo- (IMG) database [60]. The level of pseudogene annotation
gene-derived protein products are viable, if perhaps rare: can be seen to vary depending upon the source of the
Cloning the M. leprae pyrR pseudogene into E. coli yielded data. Downstream analyses, such as gene expression,
an appropriately sized protein product [51]. The pyrR comparative genomics or metabolomics or proteomic
pseudogene contains a premature stop codon, which studies, rely heavily upon comprehensive and accurate
corresponds to approximately 75% of its original coding genome annotations. In this context, it is important to

Figure 3

Homology

Proteome Pseudogenes

Expressed
Transcriptome
Protein

Genome

CDS unannotated CDS predicted


ORF pseudogene
Current Opinion in Microbiology

Building evidence for gene models and putatively functional pseudogenes. Coding sequences (CDS), can be predicted on the genome by one or
more of: (1) Homology to closely related genomes. (2) Overlapping proteomic data. (3) Overlapping transcriptome data. (4) Genomic data, that is
an open reading frame with a valid start codon, stop codon, ribosomal binding site and functional domain. Some functionality may exist for
pseudogenised open reading frames (Venn diagram, right).

www.sciencedirect.com Current Opinion in Microbiology 2015, 23:102–109


106 Genomics

Table 1

A list of seven genera with the number of genomes analysed, mean genome size, mean numbers of annotated coding sequences and
pseudogenes, and the maximum number of annotated pseudogenes

Source Genome size # coding Pseudogenes Coding capacity


(mean bp) sequences (mean) (mean %)
Mean Max
Buchnera IMG 620 134 539 2 18 84.7
Burkholderia PGAT 6 741 756 5831 295 493 68.1
IMG 6 788 019 6355 48 4011 84.9
Escherichia/Shigella PGAT 4 979 798 4573 229 858 82.2
IMG 5 167 476 5083 5 675 88.2
Francisella PGAT 1 914 213 1529 264 496 32.3
IMG 1 857 009 1902 25 317 88.0
Pseudomonas PGAT 6 467 936 5717 40 62 86.9
IMG 6 142 996 5814 16 1523 88.4
Rickettsia IMG 1 377 304 1373 10 416 78.4
Salmonella PGAT 4 790 558 4500 153 336 83.4
IMG 4 755 870 4677 13 660 87.2
NB 4 849 106 4460 162 311 n/a

Listed is the mean coding capacity (%) for all of the examined genomes within each genus. Data was taken from the PGAT and IMG databases, or
from the Nuccio and Bäumler (2014) reannotation of fifteen Salmonella genomes (‘NB’) [59,60,65]. For the latter, overall coding capacity could not
be calculated.

know whether or not gene fragments have been anno- accurate and thorough annotations. Disappointingly, there
tated in the genome as discrepancies between annota- is little support for submitting updated annotations to
tions may have biological relevance changing gene public databases, although ‘wiki’ based solutions have
function or active metabolic pathways. been suggested as a potential avenue to address this
problem [61]. Producing and curating comprehensive,
An excellent study of fifteen S. enterica strains found that open-source databases of complete bacterial genomes,
comparative genomic analysis was hindered by the differ- their annotations — including pseudogenes — and
ent methods of annotation used on each genome, particu- associated ‘omics data would serve as an unprecedented
larly with reference to predicted pseudogenes [65]. The resource for evolutionary biology. Without the ability to
authors therefore performed a substantial re-annotation constantly curate ‘re-annotations’ or provide ‘notes’ for
and identified 1004 novel ‘pseudogenes’. The 15 genomes these genomes, we risk such information being lost or
contained 471 intact CDS that had been misidentified as sub-standard genomic information being used.
pseudogenes. Analysis of genome degradation in the
corrected genome annotation revealed microbial path- Systematic analyses combining genomics, transcriptomics
ways essential for Salmonella gastrointestinal success, and proteomics can produce comprehensive annotations
which were inactivated by pseudogenes in extra-intesti- and making available such datasets for wider comparison is
nal pathovars. Interestingly, the authors propose the term of fundamental importance of addressing various prokar-
‘hypothetically disrupted coding DNA sequences’ to yotic genomic and evolutionary questions. Initially, high-
replace the ambiguous ‘pseudogene’ term. quality, broad and deep bacterial genome sequencing can
be rapidly achieved by the latest DNA sequencing tech-
The issue of properly annotating pseudogene-derived nologies (e.g. PacBio [4,5,62]). The presence of active
sequences has been underestimated: an extensive survey transcription and translation in annotated pseudogenes
of 64 diverse prokaryotic genomes in 2004 estimated a could be determined by RNAseq [63], including transcrip-
background level of between 1% and 5% of pseudogene- tion start sites, operon structure, promoters, and antisense
derived sequences, and identified 7000 shared pseudo- regulation (e.g. [64]). Combining computational predic-
genes between genomes [20]. Release 22 of the Ensembl tions of gene content with deep proteomic profile using
Bacteria database now contains over 10 000 bacterial mass spectrometry, for instance, has been shown to
genomes [32], and with the increase in speed of whole improve the accuracy of annotations [54]. Further studies
genome sequencing, the number of available bacterial could reveal the presence of any functional activity that
genomes for comparison is rapidly expanding. remains selectively important, implicating reasons for
pseudogene persistence, or conversely, any post-transcrip-
Clearly, high-throughput genomic data analyses are reliant tional regulation mechanisms that remove any residual
upon high quality, finished genomes, with correspondingly pseudogene products that may be damaging.

Current Opinion in Microbiology 2015, 23:102–109 www.sciencedirect.com


The open problem of pseudogenes Goodhead and Darby 107

16. Feavers IM, Maiden MC: A gonococcal porA pseudogene:


Acknowledgements implications for understanding the evolution and
The authors are grateful to the BBSRC for funding to ACD (Grant pathogenicity of Neisseria gonorrhoeae. Mol Microbiol 1998,
BBJ017698/1) and to the reviewers and editors for useful comments for 30:647-656.
improving this manuscript.
17. Dunbar HE, Wilson ACC, Ferguson NR, Moran NA: Aphid thermal
tolerance is governed by a point mutation in bacterial
symbionts. PLoS Biol 2007, 5:e96 http://dx.doi.org/10.1371/
References and recommended reading journal.pbio.0050096.
Papers of particular interest, published within the period of review,
have been highlighted as: 18. Halling SM, Peterson-Burch BD, Bricker BJ, Zuerner RL, Qing Z,
Li L-L et al.: Completion of the genome sequence of Brucella
 of special interest abortus and comparison to the highly similar genomes of
 of outstanding interest Brucella melitensis and Brucella suis. J Bacteriol 2005,
187:2715-2726 http://dx.doi.org/10.1128/JB.187.8.2715-
2726.2005.
1. Hou Y, Lin S: Distinct gene number-genome size relationships
for eukaryotes and non-eukaryotes: gene content estimation 19. Jacq C, Miller JR, Brownlee GG: A pseudogene structure in 5S
for dinoflagellate genomes. PLoS ONE 2009, 4:e6978 http:// DNA of Xenopus laevis. Cell 1977, 12:109-120 http://dx.doi.org/
dx.doi.org/10.1371/journal.pone.0006978. 10.1016/0092-8674(77)90189-1.
2. Mira A, Ochman H, Moran NA: Deletional bias and the evolution 20. Liu Y, Harrison PM, Kunin V, Gerstein M: Comprehensive
of bacterial genomes. Trends Genet 2001, 17:589-596 http://  analysis of pseudogenes in prokaryotes: widespread gene
dx.doi.org/10.1016/S0168-9525(01)02447-7. decay and failure of putative horizontally transferred genes.
3. Andersson JO, Andersson SG: Insights into the evolutionary Genome Biol 2004, 5:R64 http://dx.doi.org/10.1186/gb-2004-5-9-r64.
process of genome degradation. Curr Opin Genet Dev 1999, Pseudogenes were once thought of as being rare in bacteria, as their
9:664-671 http://dx.doi.org/10.1016/S0959-437X(99)00024-6. genomes are generally efficient in their coding capacity. This study com-
prehensively analysed 64 annotated prokaryotic genomes, and suggest
4. Koren S, Phillippy AM: One chromosome, one contig: complete that, actually, pseudogenes may occur at a background level of 1–5% of all
microbial genomes from long-read sequencing and assembly. genes, the majority arising from failed horizontal gene transfer events.
Curr Opin Microbiol 2015, 23 in this issue.
21. Lawrence JG, Hendrix RW, Casjens S: Where are the
5. Loman NJ, Constantinidou C, Chan JZM, Halachev M, pseudogenes in bacterial genomes? Trends Microbiol 2001,
 Sergeant M, Penn CW et al.: High-throughput bacterial genome 9:535-540 http://dx.doi.org/10.1016/S0966-842X(01)02198-9.
sequencing: an embarrassment of choice, a world of
opportunity. Nat Rev Microbiol 2012, 10:599-606 http:// 22. Mighell AJ, Smith NR, Robinson PA, Markham AF: Vertebrate
dx.doi.org/10.1038/nrmicro2850. pseudogenes. FEBS Lett 2000, 468:109-114 http://dx.doi.org/
Second-generation and third-generation sequencing technologies are 10.1016/S0014-5793(00)01199-6.
revolutionising microbial genomics. We are now in the position where 23. Khachane AN, Harrison PM: Strong association between
novel bacterial strains can be fully sequenced in hours or days. This pseudogenization mechanisms and gene sequence length.
review excellently summarises the current state-of-play and the future Biol Direct 2009, 4:38 http://dx.doi.org/10.1186/1745-6150-4-38.
direction of NGS in relation to microbiology.
24. Toh H, Weiss BL, Perkin SAH, Yamashita A, Oshima K, Hattori M
6. Ochman H, Lawrence JG, Groisman EA: Lateral gene transfer et al.: Massive genome erosion and functional adaptations
and the nature of bacterial innovation. Nature 2000, 405:299- provide insights into the symbiotic lifestyle of Sodalis
304 http://dx.doi.org/10.1038/35012500. glossinidius in the tsetse host. Genome Res 2006 http://
7. Baker S, Hardy J, Sanderson KE, Quail M, Goodhead I, dx.doi.org/10.1101/gr.4106106.
Kingsley RA et al.: A novel linear plasmid mediates flagellar
25. Belda E, Silva FJ, Peretó J, Moya A: Metabolic networks of
variation in Salmonella Typhi. PLoS Pathog 2007, 3:e59.
Sodalis glossinidius: a systems biology approach to reductive
8. Stokes HW, Hall RM: A novel family of potentially mobile DNA evolution. PLoS ONE 2012, 7:e30652 http://dx.doi.org/10.1371/
elements encoding site-specific gene-integration functions: journal.pone.0030652.
integrons. Mol Microbiol 1989, 3:1669-1683 http://dx.doi.org/
26. Akman L, Yamashita A, Watanabe H, Oshima K, Shiba T, Hattori M
10.1111/j.1365-2958.1989.tb00153.x.
et al.: Genome sequence of the endocellular obligate symbiont
9. Ramage HR, Connolly LE, Cox JS: Comprehensive functional of tsetse flies, Wigglesworthia glossinidia. Nat Genet 2002,
analysis of Mycobacterium tuberculosis toxin–antitoxin 32:402-407 http://dx.doi.org/10.1038/ng986.
systems: implications for pathogenesis, stress responses,
and evolution. PLoS Genet 2009, 5:e1000767 http://dx.doi.org/ 27. Shigenobu S, Watanabe H, Hattori M, Sakaki Y, Ishikawa H:
10.1371/journal.pgen.1000767. Genome sequence of the endocellular bacterial symbiont of
aphids Buchnera sp. APS. Nature 2000, 407:81-86 http://
10. Piel J, Hofer I, Hui D: Evidence for a symbiosis island involved in dx.doi.org/10.1038/35024074.
horizontal acquisition of pederin biosynthetic capabilities by
the bacterial symbiont of Paederus fuscipes Beetles. J 28. Gray MW, Burger G, Lang BF: The origin and early evolution of
Bacteriol 2004, 186:1280-1286 http://dx.doi.org/10.1128/ mitochondria. Genome Biol 2001, 2:1-5 http://dx.doi.org/
JB.186.5.1280-1286.2004. 10.1186/gb-2001-2-6-reviews1018.

11. Darmon E, Leach DRF: Bacterial genome instability. Microbiol 29. Andersson SG, Zomorodipour A, Andersson JO, Sicheritz-
Mol Biol Rev 2014, 78:1-39 http://dx.doi.org/10.1128/ Pontén T, Alsmark UC, Podowski RM et al.: The genome
MMBR00035-13. sequence of Rickettsia prowazekii and the origin of
mitochondria. Nature 1998, 396:133-140 http://dx.doi.org/
12. Mizrahi V, Dawes SS, Rubin H: DNA replication. Mol Genet 10.1038/24094.
Mycobact 2000:159-172.
30. Emelyanov VV: Evolutionary relationship of Rickettsiae and
13. Cole ST, Eiglmeier K, Parkhill J, James KD, Thomson NR, mitochondria. FEBS Lett 2001, 501:11-18 http://dx.doi.org/
Wheeler PR et al.: Massive gene decay in the leprosy bacillus. 10.1016/S0014-5793(01)02618-7.
Nature 2001, 409:1007-1011 http://dx.doi.org/10.1038/35059006.
31. Fuxelius H-H, Darby AC, Cho N-H, Andersson SGE: Visualization
14. Moran NA, Mira A: The process of genome shrinkage in the of pseudogenes in intracellular bacteria reveals the different
obligate symbiont Buchnera aphidicola. Genome Biol 2001, tracks to gene destruction. Genome Biol 2008, 9:R42 http://
2:1-12 http://dx.doi.org/10.1186/gb-2001-2-12-research0054. dx.doi.org/10.1186/gb-2008-9-2-r42.
15. Moran NA: Accelerated evolution and Muller’s rachet in 32. Flicek P, Amode MR, Barrell D, Beal K, Billis K, Brent S et al.:
endosymbiotic bacteria. Proc Natl Acad Sci U S A 1996, Ensembl 2014. Nucleic Acids Res 2014, 42:D749-D755 http://
93:2873-2878 http://dx.doi.org/10.1073/pnas.93.7.2873. dx.doi.org/10.1093/nar/gkt1196.

www.sciencedirect.com Current Opinion in Microbiology 2015, 23:102–109


108 Genomics

33. Wong T-Y, Fernandes S, Sankhon N, Leong PP, Kuo J, Liu J-K: 49. Sasidharan R, Gerstein M: Genomics: protein fossils live on as
Role of premature stop codons in bacterial evolution. J RNA. Nature 2008, 453:729-731 http://dx.doi.org/10.1038/
Bacteriol 2008, 190:6718-6725 http://dx.doi.org/10.1128/ 453729a.
JB00682-08.
50. Suzuki K, Nakata N, Bang PD, Ishii N, Makino M: High-level
34. Belda E, Moya A, Bentley S, Silva FJ: Mobile genetic element expression of pseudogenes in Mycobacterium leprae. FEMS
proliferation and gene inactivation impact over the genome Microbiol Lett 2006, 259:208-214 http://dx.doi.org/10.1111/
structure and metabolic capabilities of Sodalis glossinidius, j.1574-6968.2006.00276.x.
the secondary endosymbiont of tsetse flies. BMC Genomics
2010:1-17. 51. Williams DL, Slayden RA, Amin A, Martinez AN, Pittman TL, Mira A
et al.: Implications of high level pseudogene transcription in
35. Kuo C-H, Ochman H: The extinction dynamics of bacterial Mycobacterium leprae. BMC Genomics 2009, 10:397 http://
pseudogenes. PLoS Genet 2010, 6:e1001050 http://dx.doi.org/ dx.doi.org/10.1186/1471-2164-10-397.
10.1371/journal.pgen.1001050.
52. Marques MAM, Neves-Ferreira AGC, da Silveira EKX, Valente RH,
36. Chain PSG, Carniel E, Larimer FW, Lamerdin J, Stoutland PO, Chapeaurouge A, Perales J et al.: Deciphering the proteomic
Regala WM et al.: Insights into the evolution of Yersinia pestis profile of Mycobacterium leprae cell envelope. Proteomics
through whole-genome comparison with Yersinia 2008, 8:2477-2491 http://dx.doi.org/10.1002/pmic.200700971.
pseudotuberculosis. Proc Natl Acad Sci U S A 2004, 101:13826-
13831 http://dx.doi.org/10.1073/pnas.0404012101. 53. Perkins TT, a Kingsley R, Fookes MC, Gardner PP, James KD, Yu L
et al.: A strand-specific RNA-Seq analysis of the transcriptome
37. Feng Y, Chen Z, Liu S-L: Gene decay in Shigella as an incipient of the typhoid bacillus Salmonella typhi. PLoS Genet 2009,
stage of host-adaptation. PLoS ONE 2011, 6:e27754 http:// 5:e1000569 http://dx.doi.org/10.1371/journal.pgen.1000569.
dx.doi.org/10.1371/journal.pone.0027754.
54. Schrimpe-Rutledge AC, Jones MB, Chauhan S, Purvine SO,
38. Blanc G, Ogata H, Robert C, Audic S, Suhre K, Vestris G et al.:  Sanford JA, Monroe ME et al.: Comparative omics-driven
Reductive genome evolution from the mother of Rickettsia. genome annotation refinement: application across yersiniae.
PLoS Genet 2007, 3:e14 http://dx.doi.org/10.1371/ PLoS ONE 2012, 7 http://dx.doi.org/10.1371/
journal.pgen.0030014. journal.pone.0033903.
A study that uses a thorough proteomic profile to assess the full comple-
39. Baumler AJ, Tsolis RM, Ficht TA, Adams LG: Evolution of host ment of coding genes within three Yersinia genomes. Not only do they find
adaptation in Salmonella enterica. Infect Immun 1998, 66: a number of novel genes, but also proteins being translated from pre-
4579-4587. viously uncharacterised pseudogenes.
40. Parkhill J, Dougan G, James KD, Thomson NR, Pickard D, Wain J 55. Seemann T: Prokka: rapid prokaryotic genome annotation.
et al.: Complete genome sequence of a multiple drug resistant Bioinformatics 2014:btu153 http://dx.doi.org/10.1093/
Salmonella enterica serovar Typhi CT18. Nature 2001, 413:848- bioinformatics/btu153.
852 http://dx.doi.org/10.1038/35101607.
56. Altschul S: Gapped BLAST and PSI-BLAST: a new generation of
41. Thomson NR, Clayton DJ, Windhorst D, Vernikos G, Davidson S, protein database search programs. Nucleic Acids Res 1997,
Churcher C et al.: Comparative genome analysis of Salmonella 25:3389-3402 http://dx.doi.org/10.1093/nar/25.17.3389.
Enteritidis PT4 and Salmonella Gallinarum 287/91 provides
insights into evolutionary and host adaptation pathways. 57. Blattner FR: The complete genome sequence of Escherichia
Genome Res 2008, 18:1624-1637 http://dx.doi.org/10.1101/ coli K-12. Science 1997, 80-.(277):1453-1462 http://dx.doi.org/
gr.077404.108. 10.1126/science.277.5331.1453.
42. Holt KE, Thomson NR, Wain J, Langridge GC, Hasan R, Bhutta ZA 58. Ochman H, Davalos LM: The nature and dynamics of bacterial
et al.: Pseudogene accumulation in the evolutionary histories  genomes. Science 2006, 311:1730-1733 http://dx.doi.org/
of Salmonella enterica serovars Paratyphi A and Typhi. BMC 10.1126/science.1119966.
Genomics 2009, 10:36 http://dx.doi.org/10.1186/1471- The original E. coli genome was annotated as having a single pseudo-
2164-10-36. gene. This study shows that actually there were many more degraded
sequences than originally predicted. Understanding the provenance of
43. Bronowski C, Fookes MC, Gilderthorp R, Ashelford KE, Harris SR, such degraded sequences is essential to elucidating biological and
Phiri A et al.: Genomic characterisation of invasive non- evolutionary processes.
typhoidal Salmonella enterica subspecies enterica Serovar
Bovismorbificans isolates from Malawi. PLoS Negl Trop Dis 59. Brittnacher MJ, Fong C, Hayden HS, Jacobs MA, Radey M,
2013, 7:e2557 http://dx.doi.org/10.1371/journal.pntd.0002557. Rohmer L: PGAT: a multistrain analysis resource for microbial
genomes. Bioinformatics 2011, 27:2429-2430 http://dx.doi.org/
44. Lerat E, Ochman H: Recognizing the pseudogenes in bacterial 10.1093/bioinformatics/btr418.
genomes. Nucleic Acids Res 2005, 33:3125-3132.
60. Markowitz VM, Chen IMA, Palaniappan K, Chu K, Szeto E, Pillay M
45. Van Leuven JT, Meister RC, Simon C, McCutcheon JP: Sympatric et al.: IMG 4 version of the integrated microbial genomes
speciation in a bacterial endosymbiont results in two genomes comparative analysis system. Nucleic Acids Res 2014, 42 http://
with the functionality of one. Cell 2014, 158:1270-1280 http:// dx.doi.org/10.1093/nar/gkt963.
dx.doi.org/10.1016/j.cell.2014.07.047.
61. Salzberg SL: Genome re-annotation: a wiki solution? Genome
46. Mira A, Pushker R: The silencing of pseudogenes. Mol Biol Evol  Biol 2007, 8:102 http://dx.doi.org/10.1186/gb-2007-8-1-102.
2005, 22:2135-2138 http://dx.doi.org/10.1093/molbev/msi209. Annotations are rarely updated, and the support for making such
resources available is lacking. This opinion article is an excellent summary
47. Tamas I, Wernegreen JJ, Nystedt B, Kauppinen SN, Darby AC, of the pitfalls on relying on GenBank (or similar) databases and empha-
 Gomez-Valero L et al.: Endosymbiont gene functions impaired sises the utility of updating, making available, and actively curating our
and rescued by polymerase infidelity at poly(A) tracts. Proc current knowledge of bacterial genomes.
Natl Acad Sci U S A 2008, 105:14934-14939 http://dx.doi.org/
10.1073/pnas.0806554105. 62. Koren S, Harhay GP, Smith TPL, Bono JL, Harhay DM, Mcvey SD
Long-term obligate symbionts, with degraded %AT-rich genomes have et al.: Reducing assembly complexity of microbial genomes
an over-representation of poly-A tracts within pseudogenes, silencing with single-molecule sequencing. Genome Biol 2013, 14:R101
their function. Despite this, this study shows that such function can be http://dx.doi.org/10.1186/gb-2013-14-9-r101.
rescued through polymerase infidelity. Such symbionts also express a
chaperonin at very high-levels which may counteract problems with 63. Wilhelm BT, Marguerat S, Goodhead I, Bähler J: Defining
incorrect protein folding that results from polymerase slippage within transcribed regions using RNA-seq. Nat Protoc 2010, 5:255-266
AT-rich genes. http://dx.doi.org/10.1038/nprot.2009.229.
48. The ENCODE Project Consortium: The ENCODE (ENCyclopedia 64. Kröger C, Dillon SC, Cameron ADS, Papenfort K, Sivasankaran SK,
of DNA Elements) project. Science 2004, 306:636-640 http://  Hokamp K et al.: The transcriptional landscape and small RNAs
dx.doi.org/10.1126/science.1105136. of Salmonella enterica serovar Typhimurium. Proc Natl Acad

Current Opinion in Microbiology 2015, 23:102–109 www.sciencedirect.com


The open problem of pseudogenes Goodhead and Darby 109

Sci U S A 2012, 109:E1277-E1286 http://dx.doi.org/10.1073/ ‘pseudogene’ annotations. It includes a thorough reannotation of fifteen
pnas.1201061109. Salmonella genomes in the Supplementary Information.
This study utilises RNAseq to fully characterise the intricate transcrip-
tional landscape of an important intracellular bacterium, including operon 66. Fraser CM, Gocayne JD, White O, Adams MD, Clayton RA,
structure, multiple transcriptional start sites, regulators and small RNAs. Fleischmann RD et al.: The minimal gene complement of
Mycoplasma genitalium. Science (80–) 1995, 270:397-404 http://
65. Nuccio SP, Bäumler AJ: Comparative analysis of Salmonella dx.doi.org/10.1126/science.270.5235.397.
 genomes identifies a metabolic network for escalating growth
in the inflamed gut. MBio 2014, 5 http://dx.doi.org/10.1128/ 67. Gil R, Sabater-Muñoz B, Latorre A, Silva FJ, Moya A: Extreme
mBio.00929-14. genome reduction in Buchnera spp.: toward the minimal
This study found a novel signature of gastrointestinal pathogenesis in genome needed for symbiotic life. Proc Natl Acad Sci U S A
Salmonella, that had been missed due to inconsistencies in the available 2002, 99:4454-4458 http://dx.doi.org/10.1073/pnas.062067299.

www.sciencedirect.com Current Opinion in Microbiology 2015, 23:102–109

Anda mungkin juga menyukai