com
ScienceDirect
Pseudogenes are defined as fragments of once-functional the increased amount of non-coding, intergenic
genes that have been silenced by one or more nonsense, sequences in intracellular bacteria are the remnants of
frameshift or missense mutations. Despite continuing genes that have been systematically scrambled by
increases in the speed of sequencing and annotating bacterial mutation [3].
genomes, the identification and categorisation of pseudogenes
remains problematic. Even when identified, pseudogenes are In bacteria, pseudogenes, defined here as ‘genes that have
considered to be rare and tend to be ignored. On the contrary, been silenced by one or more deleterious mutations,’ are
pseudogenes are surprisingly prevalent and can persist for long commonly detected as fragments of previously described
evolutionary time periods, representing a record of once- homologs, and therefore can become difficult to annotate
functional genetic characteristics. Most importantly, once significant degradation to the original sequence has
pseudogenes provide an insight into prokaryotic evolutionary occurred. A comparison between the genomes of any two
history as a record of phenotypic traits that have been lost. species can identify polymorphisms between related
Focusing on the intracellular and symbiotic bacteria in which genes that may have altered or ablated their function.
pseudogenes predominate, this review discusses the Pseudogenes, therefore, can function as a record of the
importance of identifying pseudogenes to fully understand the proteins, enzymes or pathways that are no longer necess-
abilities of bacteria, and to understand prokaryotes within their ary as the bacterium has adapted to a novel environment.
evolutionary context.
Addresses High throughput DNA sequencing is now an indispen-
Functional and Comparative Genomics, School of Biological Sciences, sible tool in life sciences. The technology allows the rapid
University of Liverpool, Crown St., Liverpool L69 7ZB, UK description of genomes, transcriptomes and epigenetic
features in ways that were not possible a decade ago. In
the area of bacterial genomics and informatics we are
feeling very pleased with ourselves. We can now rapidly
Current Opinion in Microbiology 2015, 23:102–109 sequence large panels of bacteria to high standards of
This review comes from a themed issue on Genomics finished qualities, and describe important differences
Edited by Neil Hall and Jay Hinton
associated with phenotypes [4,5]. Nevertheless, we are
struggling to define or categorise pseudogenes, and to
explain why they persist (i.e. have not been entirely
removed from the genome). We have not examined
http://dx.doi.org/10.1016/j.mib.2014.11.012 whether pseudogenes have any residual function, and
1369-5274/# 2014 Published by Elsevier Ltd. it is not clear why we should invest resources to do so.
Figure 1
Deletion
Transposable elements
Recombination
Pseudogenisation
# Pseudogenes
Host cell
Host cell
Bacterial cell
Genome reduction with respect to endosymbionts. Schematic of genome reduction as bacteria move from a free-living environment to a more
specific (intracellular) niche. Initially having large genomes, some reduction in both size and coding capacity is seen as the organism adapts to an
intracellular environment as the selective constraints on some genes are relaxed due to protection or nutrients offered by the host cell. Such
genes are prone to ‘pseudogenisation’ by mutation. Further specialisation results in deletion of redundant sequences and a much-reduced
genome size. The long-term maintenance of reduced coding capacities in recent symbionts may be as a result of some residual function.
genes associated with functions such as metabolism, can to correct such non-synonymous substitutions, often due
remain stable unless large environmental changes occur to the lack of proofreading during DNA replication. For
— such as adapting to a new environment (e.g. adaptation example, the loss of DnaQ-mediated proofreading activi-
to an intracellular lifestyle within a host). Horizontal gene ties of DNA polymerase III in M. leprae, the etiological
transfers are common in bacteria, and transfer large agent of leprosy [12,13], or recA/F in Buchnera, the primary
amounts of content, reaching rates exceeding 16 kbp symbiont in pea aphids [14], results in increased rates of
per million years in Escherichia coli, for instance [11]. mutation. A combination of a lack of error-correction and
the frequent bottlenecks that are associated with an
Once removed from evolutionary selective constraints,
genes can accumulate deleterious single nucleotide poly-
morphisms, frameshifts or truncations relative to an Figure 2
ancestral orthologue (Figure 2). A frequent feature of
intracellular prokaryotes is the inability of the organism polymorphism
premature stop
loss / change of functional domain
Box 1 Intracellular bacteria genome evolution loss of start codon
Compared to the free living life style the intracellular environment can promoter / RBS
loss of promoter / RBS
be viewed as a protected, homeostatic, nutrient-rich compartment.
In such an environment a bacterium will have less need for some the
pathways whose function may now be provided by the host, for premature stop
instance those involved with metabolism or defence — genes that
insertion
would be under strong positive selection to be retained — in a free-
1
living bacterium. Because of this, many obligate intracellular bacteria 2 frameshift
3
have much smaller genomes than their free-living counterparts, as
such pathways tend to have been deleted. There are a number of Current Opinion in Microbiology
good examples where bacterial genomes have been streamlined.
Wigglesworthia glossinidia, the mutualist symbiont of tsetse flies,
Methods by which pseudogenes can be formed. A schematic showing
has a genome size of just 700 kb [26]; Mycoplasma genitalium,
actions of polymorphisms on gene sequences leading to ‘pseudogene’
the obligate parasite of primate genital and respiratory tracts, 580 kb
formation. The top four panels showing polymorphism within coding
[66] and some Buchnera species, mutualist symbionts in aphids,
sequence or promoter regions. The fifth and sixth panels show
have genome sizes of between 450 kb and 650 kb [27,67].
insertions within coding sequences leading to a premature stop or a
frameshift, respectively.
intracellular lifestyle results in frequent fixation of non- frameshift mutations [33]. The genomes of both M. leprae
sense mutations within the population: Non-synon- [13] and S. glossinidius [24,34] have coding capacities of
ymous/synonymous (dN/dS) substitutions in only 50%, suggesting there has not been sufficient
intracellular bacteria regularly reach rates of 1, outstrip- evolutionary time or selective pressure for non-coding
ping that of free-living bacteria [15]. Non-sense regions to be removed.
mutations and frameshifts frequently create pseudogenes
by introducing premature stop-codons, or by destroying Pseudogene accumulation is a hallmark of recent adap-
promoters [16,17], start-codons [18], or functional tation to a new host, wherein genes are being selectively
domains [16,19]. Horizontal gene transfers are probably silenced as the organism adapts to its new environmental
the major source of pseudogenes in prokaryotes, as the niche (Figure 2). Salmonella genomes serve as a useful
majority of the genes transferred likely provide few example in this instance: Salmonella is a broad host-range
benefits and are rapidly silenced due to a lack of positive Gram-negative facultative anaerobe that exists in a num-
selective pressure. Some genes may even be toxic increas- ber of different hosts, and for which six subspecies and
ing negative selective pressure [20]. Deletion, therefore over 2500 serovars have been identified and several
represents a mechanism to defend against deleterious genomes have been sequenced (Reviewed [35]). Further-
products of gene acquisition and maintains small gen- more, the genomes are clonal, around 4.5 Mb in size and
omes sizes [21]. Prokaryotic pseudogenisation differs reflect evolutionary changes, such as genome reduction
from the mechanism in eukaryotes, in which retrotran- and horizontal gene acquisition, to adapt to different
sposition (‘processed’), and gene duplication and sub- niches. Generally, Salmonella enterica serovars with a
sequent degradation of additional gene copies (‘non- broad host-range, such as S. Enteritidis PT4 (113 pseu-
processed’), predominate [22,23]. There are also dogenes) or S. Typhimurium SL1344 (63) have fewer
examples in intracellular bacteria where transposons or pseudogenes than host-specific serovars like S. Galli-
mobile elements disrupt gene function [24,25]. narum (309), which causes Fowl Typhoid, or the
human-restricted S. Typhi CT18 (204) [35]. Expanding
In obligate mutualist symbionts, genome reduction may this to more bacterial genera, the same pattern of
be coupled with dependence upon the symbiont for the increased pseudogenisation in narrow host range species
production of certain metabolites that are not provided by can be seen, for example in Yersinia [36], Escherichia and
standard nutrition, such as Wigglesworthia providing essen- Shigella [37] and Rickettsia [38].
tial B-vitamins that are not present in a tsetse fly’s regular
blood meal [26], or Buchnera providing aphids with essen- The mystery of persistence
tial amino acids that are absent from phloem sap [27]. Some bacteria, particularly younger symbionts such as
Arguably, the best example of genome reduction is the Sodalis glossinidius or Baumannia cicadellinicola, maintain
mitochondrion, which is an ancient symbiosis that has high numbers of pseudogenes, and reduced coding
become a eukaryotic organelle in its own right [28]; capacities. Comparisons have shown that pseudogenes
phylogenetic analysis of cytochrome B and cytochrome are generally species-specific, and that they are rarely
C oxidase I genes suggest mitochondria may have shared between closely related species [39–44]. A study in
diverged from a free-living Rickettsiaceae ancestor Cicadas reported that two recently diverged bacterial
approximately 1.5–2.0 billion years ago [29,30]. The most symbiont species within the same host have non-over-
closely related organism to the eukaryotic mitochondrion lapping pseudogenes, and complementary gene contents
is Rickettsia prowazekii, a gram negative Alpha-proteobac- that, together, ameliorate individually silenced pathways
terium that causes epidemic typhus. R. Prowazekii has a [45]. A lack of shared pseudogenes between species also
1.1 Mb genome that is undergoing substantial reduction suggests that redundant genes are either eliminated
and has a coding capacity of only 76%. Reduced coding rapidly from genomes, or that they are subject to a high
capacities are common in many of the Rickettsiae, which degree of degradation that they are no longer recognised
represent a taxonomically diverse group of obligate intra- by standard annotation pipelines [44].
cellular bacteria existing in both symbiotic and patho-
genic relationships with their hosts, and therefore Due to the efficiency of genome architecture in bacteria,
represent a useful model for understanding pseudogene it is unclear why prokaryotic genomes would maintain
content across environments [3,31]. The 49 Rickettsia reduced coding-capacities. It is tempting to hypothesise
genomes available in Ensembl Bacteria (release 22) that persistent pseudogenes maintain some function, and
[32] have small chromosome sizes (range 0.86–1.82 Mb; therefore there is some positive selective pressure to
mean 1.23Mb), low %GC contents (mean 31.7%) and maintain their presence. If there is a selective advantage
reduced coding capacities (mean 76%). Low %GC con- to the removal of extraneous DNA, however, due to, for
tents are important in this context, as alternative-frame instance, reducing unnecessary energy expenditure on
premature stop-codons are more prevalent in AT-rich DNA replication, transcriptional or translational mechan-
genomes, which are hypothesised to serve as error-cor- isms, or even to remove toxic genes [35], why should it
recting sequences to minimise the effect of deleterious take so long for such regions to be deleted? Pseudogenes
can be surprisingly persistent: Throughout its evolution, sequence relative to its M. tuberculosis orthologue, and an
the mean half-life of pseudogenes in Buchnera is approxi- otherwise intact translational architecture. In a study of
mately 24 million years, [46]. There are two hypotheses to three Yersiniae, between one and forty annotated ‘pseu-
explain such a high level of persistence: Either there is dogenes’ showed evidence of translation [54]. The
some residual function associated with pseudogenes concept of pseudogenes being silent in a translational
resulting in a small selective pressure to maintain the sense seems to be incorrect, although further evidence for
pseudogenised sequence [47], or perhaps there is a bias active protein products being produced from degraded
towards silencing regulatory sequences thereby removing sequences will help elucidate the extent to which this is
the need to delete the pseudogene itself [46]. the case (Figure 3).
Figure 3
Homology
Proteome Pseudogenes
Expressed
Transcriptome
Protein
Genome
Building evidence for gene models and putatively functional pseudogenes. Coding sequences (CDS), can be predicted on the genome by one or
more of: (1) Homology to closely related genomes. (2) Overlapping proteomic data. (3) Overlapping transcriptome data. (4) Genomic data, that is
an open reading frame with a valid start codon, stop codon, ribosomal binding site and functional domain. Some functionality may exist for
pseudogenised open reading frames (Venn diagram, right).
Table 1
A list of seven genera with the number of genomes analysed, mean genome size, mean numbers of annotated coding sequences and
pseudogenes, and the maximum number of annotated pseudogenes
Listed is the mean coding capacity (%) for all of the examined genomes within each genus. Data was taken from the PGAT and IMG databases, or
from the Nuccio and Bäumler (2014) reannotation of fifteen Salmonella genomes (‘NB’) [59,60,65]. For the latter, overall coding capacity could not
be calculated.
know whether or not gene fragments have been anno- accurate and thorough annotations. Disappointingly, there
tated in the genome as discrepancies between annota- is little support for submitting updated annotations to
tions may have biological relevance changing gene public databases, although ‘wiki’ based solutions have
function or active metabolic pathways. been suggested as a potential avenue to address this
problem [61]. Producing and curating comprehensive,
An excellent study of fifteen S. enterica strains found that open-source databases of complete bacterial genomes,
comparative genomic analysis was hindered by the differ- their annotations — including pseudogenes — and
ent methods of annotation used on each genome, particu- associated ‘omics data would serve as an unprecedented
larly with reference to predicted pseudogenes [65]. The resource for evolutionary biology. Without the ability to
authors therefore performed a substantial re-annotation constantly curate ‘re-annotations’ or provide ‘notes’ for
and identified 1004 novel ‘pseudogenes’. The 15 genomes these genomes, we risk such information being lost or
contained 471 intact CDS that had been misidentified as sub-standard genomic information being used.
pseudogenes. Analysis of genome degradation in the
corrected genome annotation revealed microbial path- Systematic analyses combining genomics, transcriptomics
ways essential for Salmonella gastrointestinal success, and proteomics can produce comprehensive annotations
which were inactivated by pseudogenes in extra-intesti- and making available such datasets for wider comparison is
nal pathovars. Interestingly, the authors propose the term of fundamental importance of addressing various prokar-
‘hypothetically disrupted coding DNA sequences’ to yotic genomic and evolutionary questions. Initially, high-
replace the ambiguous ‘pseudogene’ term. quality, broad and deep bacterial genome sequencing can
be rapidly achieved by the latest DNA sequencing tech-
The issue of properly annotating pseudogene-derived nologies (e.g. PacBio [4,5,62]). The presence of active
sequences has been underestimated: an extensive survey transcription and translation in annotated pseudogenes
of 64 diverse prokaryotic genomes in 2004 estimated a could be determined by RNAseq [63], including transcrip-
background level of between 1% and 5% of pseudogene- tion start sites, operon structure, promoters, and antisense
derived sequences, and identified 7000 shared pseudo- regulation (e.g. [64]). Combining computational predic-
genes between genomes [20]. Release 22 of the Ensembl tions of gene content with deep proteomic profile using
Bacteria database now contains over 10 000 bacterial mass spectrometry, for instance, has been shown to
genomes [32], and with the increase in speed of whole improve the accuracy of annotations [54]. Further studies
genome sequencing, the number of available bacterial could reveal the presence of any functional activity that
genomes for comparison is rapidly expanding. remains selectively important, implicating reasons for
pseudogene persistence, or conversely, any post-transcrip-
Clearly, high-throughput genomic data analyses are reliant tional regulation mechanisms that remove any residual
upon high quality, finished genomes, with correspondingly pseudogene products that may be damaging.
11. Darmon E, Leach DRF: Bacterial genome instability. Microbiol 29. Andersson SG, Zomorodipour A, Andersson JO, Sicheritz-
Mol Biol Rev 2014, 78:1-39 http://dx.doi.org/10.1128/ Pontén T, Alsmark UC, Podowski RM et al.: The genome
MMBR00035-13. sequence of Rickettsia prowazekii and the origin of
mitochondria. Nature 1998, 396:133-140 http://dx.doi.org/
12. Mizrahi V, Dawes SS, Rubin H: DNA replication. Mol Genet 10.1038/24094.
Mycobact 2000:159-172.
30. Emelyanov VV: Evolutionary relationship of Rickettsiae and
13. Cole ST, Eiglmeier K, Parkhill J, James KD, Thomson NR, mitochondria. FEBS Lett 2001, 501:11-18 http://dx.doi.org/
Wheeler PR et al.: Massive gene decay in the leprosy bacillus. 10.1016/S0014-5793(01)02618-7.
Nature 2001, 409:1007-1011 http://dx.doi.org/10.1038/35059006.
31. Fuxelius H-H, Darby AC, Cho N-H, Andersson SGE: Visualization
14. Moran NA, Mira A: The process of genome shrinkage in the of pseudogenes in intracellular bacteria reveals the different
obligate symbiont Buchnera aphidicola. Genome Biol 2001, tracks to gene destruction. Genome Biol 2008, 9:R42 http://
2:1-12 http://dx.doi.org/10.1186/gb-2001-2-12-research0054. dx.doi.org/10.1186/gb-2008-9-2-r42.
15. Moran NA: Accelerated evolution and Muller’s rachet in 32. Flicek P, Amode MR, Barrell D, Beal K, Billis K, Brent S et al.:
endosymbiotic bacteria. Proc Natl Acad Sci U S A 1996, Ensembl 2014. Nucleic Acids Res 2014, 42:D749-D755 http://
93:2873-2878 http://dx.doi.org/10.1073/pnas.93.7.2873. dx.doi.org/10.1093/nar/gkt1196.
33. Wong T-Y, Fernandes S, Sankhon N, Leong PP, Kuo J, Liu J-K: 49. Sasidharan R, Gerstein M: Genomics: protein fossils live on as
Role of premature stop codons in bacterial evolution. J RNA. Nature 2008, 453:729-731 http://dx.doi.org/10.1038/
Bacteriol 2008, 190:6718-6725 http://dx.doi.org/10.1128/ 453729a.
JB00682-08.
50. Suzuki K, Nakata N, Bang PD, Ishii N, Makino M: High-level
34. Belda E, Moya A, Bentley S, Silva FJ: Mobile genetic element expression of pseudogenes in Mycobacterium leprae. FEMS
proliferation and gene inactivation impact over the genome Microbiol Lett 2006, 259:208-214 http://dx.doi.org/10.1111/
structure and metabolic capabilities of Sodalis glossinidius, j.1574-6968.2006.00276.x.
the secondary endosymbiont of tsetse flies. BMC Genomics
2010:1-17. 51. Williams DL, Slayden RA, Amin A, Martinez AN, Pittman TL, Mira A
et al.: Implications of high level pseudogene transcription in
35. Kuo C-H, Ochman H: The extinction dynamics of bacterial Mycobacterium leprae. BMC Genomics 2009, 10:397 http://
pseudogenes. PLoS Genet 2010, 6:e1001050 http://dx.doi.org/ dx.doi.org/10.1186/1471-2164-10-397.
10.1371/journal.pgen.1001050.
52. Marques MAM, Neves-Ferreira AGC, da Silveira EKX, Valente RH,
36. Chain PSG, Carniel E, Larimer FW, Lamerdin J, Stoutland PO, Chapeaurouge A, Perales J et al.: Deciphering the proteomic
Regala WM et al.: Insights into the evolution of Yersinia pestis profile of Mycobacterium leprae cell envelope. Proteomics
through whole-genome comparison with Yersinia 2008, 8:2477-2491 http://dx.doi.org/10.1002/pmic.200700971.
pseudotuberculosis. Proc Natl Acad Sci U S A 2004, 101:13826-
13831 http://dx.doi.org/10.1073/pnas.0404012101. 53. Perkins TT, a Kingsley R, Fookes MC, Gardner PP, James KD, Yu L
et al.: A strand-specific RNA-Seq analysis of the transcriptome
37. Feng Y, Chen Z, Liu S-L: Gene decay in Shigella as an incipient of the typhoid bacillus Salmonella typhi. PLoS Genet 2009,
stage of host-adaptation. PLoS ONE 2011, 6:e27754 http:// 5:e1000569 http://dx.doi.org/10.1371/journal.pgen.1000569.
dx.doi.org/10.1371/journal.pone.0027754.
54. Schrimpe-Rutledge AC, Jones MB, Chauhan S, Purvine SO,
38. Blanc G, Ogata H, Robert C, Audic S, Suhre K, Vestris G et al.: Sanford JA, Monroe ME et al.: Comparative omics-driven
Reductive genome evolution from the mother of Rickettsia. genome annotation refinement: application across yersiniae.
PLoS Genet 2007, 3:e14 http://dx.doi.org/10.1371/ PLoS ONE 2012, 7 http://dx.doi.org/10.1371/
journal.pgen.0030014. journal.pone.0033903.
A study that uses a thorough proteomic profile to assess the full comple-
39. Baumler AJ, Tsolis RM, Ficht TA, Adams LG: Evolution of host ment of coding genes within three Yersinia genomes. Not only do they find
adaptation in Salmonella enterica. Infect Immun 1998, 66: a number of novel genes, but also proteins being translated from pre-
4579-4587. viously uncharacterised pseudogenes.
40. Parkhill J, Dougan G, James KD, Thomson NR, Pickard D, Wain J 55. Seemann T: Prokka: rapid prokaryotic genome annotation.
et al.: Complete genome sequence of a multiple drug resistant Bioinformatics 2014:btu153 http://dx.doi.org/10.1093/
Salmonella enterica serovar Typhi CT18. Nature 2001, 413:848- bioinformatics/btu153.
852 http://dx.doi.org/10.1038/35101607.
56. Altschul S: Gapped BLAST and PSI-BLAST: a new generation of
41. Thomson NR, Clayton DJ, Windhorst D, Vernikos G, Davidson S, protein database search programs. Nucleic Acids Res 1997,
Churcher C et al.: Comparative genome analysis of Salmonella 25:3389-3402 http://dx.doi.org/10.1093/nar/25.17.3389.
Enteritidis PT4 and Salmonella Gallinarum 287/91 provides
insights into evolutionary and host adaptation pathways. 57. Blattner FR: The complete genome sequence of Escherichia
Genome Res 2008, 18:1624-1637 http://dx.doi.org/10.1101/ coli K-12. Science 1997, 80-.(277):1453-1462 http://dx.doi.org/
gr.077404.108. 10.1126/science.277.5331.1453.
42. Holt KE, Thomson NR, Wain J, Langridge GC, Hasan R, Bhutta ZA 58. Ochman H, Davalos LM: The nature and dynamics of bacterial
et al.: Pseudogene accumulation in the evolutionary histories genomes. Science 2006, 311:1730-1733 http://dx.doi.org/
of Salmonella enterica serovars Paratyphi A and Typhi. BMC 10.1126/science.1119966.
Genomics 2009, 10:36 http://dx.doi.org/10.1186/1471- The original E. coli genome was annotated as having a single pseudo-
2164-10-36. gene. This study shows that actually there were many more degraded
sequences than originally predicted. Understanding the provenance of
43. Bronowski C, Fookes MC, Gilderthorp R, Ashelford KE, Harris SR, such degraded sequences is essential to elucidating biological and
Phiri A et al.: Genomic characterisation of invasive non- evolutionary processes.
typhoidal Salmonella enterica subspecies enterica Serovar
Bovismorbificans isolates from Malawi. PLoS Negl Trop Dis 59. Brittnacher MJ, Fong C, Hayden HS, Jacobs MA, Radey M,
2013, 7:e2557 http://dx.doi.org/10.1371/journal.pntd.0002557. Rohmer L: PGAT: a multistrain analysis resource for microbial
genomes. Bioinformatics 2011, 27:2429-2430 http://dx.doi.org/
44. Lerat E, Ochman H: Recognizing the pseudogenes in bacterial 10.1093/bioinformatics/btr418.
genomes. Nucleic Acids Res 2005, 33:3125-3132.
60. Markowitz VM, Chen IMA, Palaniappan K, Chu K, Szeto E, Pillay M
45. Van Leuven JT, Meister RC, Simon C, McCutcheon JP: Sympatric et al.: IMG 4 version of the integrated microbial genomes
speciation in a bacterial endosymbiont results in two genomes comparative analysis system. Nucleic Acids Res 2014, 42 http://
with the functionality of one. Cell 2014, 158:1270-1280 http:// dx.doi.org/10.1093/nar/gkt963.
dx.doi.org/10.1016/j.cell.2014.07.047.
61. Salzberg SL: Genome re-annotation: a wiki solution? Genome
46. Mira A, Pushker R: The silencing of pseudogenes. Mol Biol Evol Biol 2007, 8:102 http://dx.doi.org/10.1186/gb-2007-8-1-102.
2005, 22:2135-2138 http://dx.doi.org/10.1093/molbev/msi209. Annotations are rarely updated, and the support for making such
resources available is lacking. This opinion article is an excellent summary
47. Tamas I, Wernegreen JJ, Nystedt B, Kauppinen SN, Darby AC, of the pitfalls on relying on GenBank (or similar) databases and empha-
Gomez-Valero L et al.: Endosymbiont gene functions impaired sises the utility of updating, making available, and actively curating our
and rescued by polymerase infidelity at poly(A) tracts. Proc current knowledge of bacterial genomes.
Natl Acad Sci U S A 2008, 105:14934-14939 http://dx.doi.org/
10.1073/pnas.0806554105. 62. Koren S, Harhay GP, Smith TPL, Bono JL, Harhay DM, Mcvey SD
Long-term obligate symbionts, with degraded %AT-rich genomes have et al.: Reducing assembly complexity of microbial genomes
an over-representation of poly-A tracts within pseudogenes, silencing with single-molecule sequencing. Genome Biol 2013, 14:R101
their function. Despite this, this study shows that such function can be http://dx.doi.org/10.1186/gb-2013-14-9-r101.
rescued through polymerase infidelity. Such symbionts also express a
chaperonin at very high-levels which may counteract problems with 63. Wilhelm BT, Marguerat S, Goodhead I, Bähler J: Defining
incorrect protein folding that results from polymerase slippage within transcribed regions using RNA-seq. Nat Protoc 2010, 5:255-266
AT-rich genes. http://dx.doi.org/10.1038/nprot.2009.229.
48. The ENCODE Project Consortium: The ENCODE (ENCyclopedia 64. Kröger C, Dillon SC, Cameron ADS, Papenfort K, Sivasankaran SK,
of DNA Elements) project. Science 2004, 306:636-640 http:// Hokamp K et al.: The transcriptional landscape and small RNAs
dx.doi.org/10.1126/science.1105136. of Salmonella enterica serovar Typhimurium. Proc Natl Acad
Sci U S A 2012, 109:E1277-E1286 http://dx.doi.org/10.1073/ ‘pseudogene’ annotations. It includes a thorough reannotation of fifteen
pnas.1201061109. Salmonella genomes in the Supplementary Information.
This study utilises RNAseq to fully characterise the intricate transcrip-
tional landscape of an important intracellular bacterium, including operon 66. Fraser CM, Gocayne JD, White O, Adams MD, Clayton RA,
structure, multiple transcriptional start sites, regulators and small RNAs. Fleischmann RD et al.: The minimal gene complement of
Mycoplasma genitalium. Science (80–) 1995, 270:397-404 http://
65. Nuccio SP, Bäumler AJ: Comparative analysis of Salmonella dx.doi.org/10.1126/science.270.5235.397.
genomes identifies a metabolic network for escalating growth
in the inflamed gut. MBio 2014, 5 http://dx.doi.org/10.1128/ 67. Gil R, Sabater-Muñoz B, Latorre A, Silva FJ, Moya A: Extreme
mBio.00929-14. genome reduction in Buchnera spp.: toward the minimal
This study found a novel signature of gastrointestinal pathogenesis in genome needed for symbiotic life. Proc Natl Acad Sci U S A
Salmonella, that had been missed due to inconsistencies in the available 2002, 99:4454-4458 http://dx.doi.org/10.1073/pnas.062067299.