Athanasios Zervas1, Yonghui Zeng1,2, Anne Mette Madsen3 and Lars H. Hansen1,*
1
Section of Environmental Microbiology and Biotechnology, Department of Environmental Science, Aarhus
Corresponding author:
lhha@envs.au.dk
Data deposition: All sequences will be available in GenBank upon acceptance of the final
manuscript.
bioRxiv preprint first posted online Dec. 6, 2018; doi: http://dx.doi.org/10.1101/488148. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
Abstract
The phyllosphere is a habitat to a variety of viruses, bacteria, fungi and other microorganisms,
which play a fundamental role in maintaining the health of plants and mediating the interaction
between plants and ambient environments. A recent addition to this catalogue of microbial diversity
was the aerobic anoxygenic phototrophs (AAPs), a group of widespread bacteria that absorb light
molecular oxygen. However, culture representatives of AAPs from phyllosphere and their genome
information are lacking, limiting our capability to assess their potential ecological roles in this unique
niche. In this study, we investigated the presence of AAPs in the phyllosphere of a winter wheat
(Triticum aestivum L.) in Denmark by employing bacterial colony based infrared imaging and
MALDI-TOF mass spectrometry (MS) techniques. A total of ~4480 colonies were screened for the
presence of cellular BChl a, resulting in 129 AAP isolates that were further clustered into 25 groups
based on MALDI-TOF MS profiling, representatives of which were sequenced using the Illumina
NextSeq and Oxford Nanopore MinION platforms. Twenty draft and five complete genomes of AAPs
Nanopore, photosystems.
2
bioRxiv preprint first posted online Dec. 6, 2018; doi: http://dx.doi.org/10.1101/488148. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
Introduction
Plant-microbe interactions both above (phyllosphere) and below (rhizosphere) ground are
common in nature. Traditionally, these relationships are investigated in the rhizosphere, where
conditions are relatively stable and nutrient availability is rather high (Vorholt 2012; Carvalho &
Castillo 2018). Despite recent work (reviewed in e.g. Lindow & Brandl 2003; Vorholt 2012), the
microbe–phyllosphere (which is comprised by the aerial parts – and especially the leaves – of plants)
interactions remain relatively understudied compared to rhizosphere. Models suggest that the total
leaf surface is greater than 1 billion km2 (Woodward & Lomas 2004), potentially colonized by up to
thrive in, it may be “extreme” for a plethora of reasons. First, different plants exhibit different growth
patterns and climate adaptations. On one hand, annual plants complete their life cycle in just one
growth season, whereas perennial plants grow and shed leaves every year. This creates a
discontinuous, ever-changing habitat (Vorholt 2012). At the same time, environmental changes also
affect the phyllosphere and its inhabitants. Winds, rainfall, frost, and drought all play important roles
in shaping the conditions encountered by microorganisms of the phyllosphere. However, the most
significant environmental driver is sunlight. Temperature differences in the upper leaves, especially
those that form the canopy of various habitats may range up to 50 oC between day and night. Sunlight,
damaging to the DNA of both prokaryotes and eukaryotes (Remus-Emsermann et al. 2014). In
addition to the abiotic factors that make the phyllosphere a much more hostile environment compared
to the rhizosphere, its inhabitants also face – directly or indirectly – the influence of their plant host.
Leaves are surrounded by a thin, laminar, waxy layer, the cuticle, which renders the leaf surface
3
bioRxiv preprint first posted online Dec. 6, 2018; doi: http://dx.doi.org/10.1101/488148. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
hydrophobic, thus removing the excess of water that is collected due to rainfall, dew, or respiration
from the stomata. This leads to the retention of little water most usually in veins and other cavities of
the cuticle. These micro-formations also protect the colonizers from the surrounding environment
(Beatie 2002). Apart from niche competition and scarce water availability, phyllosphere colonizers
also need to compete for nutrients, as the cuticle makes the surface virtually impermeable to nutrients
deriving from diffusion from the cells of the plant host (Wilson & Lindow 1994a, 1994b). At the
same time, they also need to protect themselves from potential invaders and a plethora of
Albeit the harsh conditions they encounter, phyllosphere microorganism exhibit remarkable
biodiversity, which, in certain cases, can be comparable to that of the human gut microbiome (Xu &
Gordon 2003; Knief et al. 2012). Through recent genomics and metagenomics studies, several
bacterial genera, such as Methylobacterium, have been shown to be ubiquitous in the phyllosphere
and their role in nutrient recycling, plant growth promotion and protection has been evidenced
previously (Beattie & Lindow 1999; Gourion et al. 2006; Atamna-Ismaeel, Omri Finkel, et al. 2012;
Kwak et al. 2014). One of the most interesting, recent findings in phyllosphere microbiota was the
variety of land plants (Atamna-Ismaeel, Omri M. Finkel, et al. 2012; Atamna-Ismaeel, Omri Finkel,
et al. 2012). Employing 454 metagenome sequencing and also targeted PCR approaches for the
identification of the chlorophyllide reductase subunit Y gene (bchY) and reaction center subunit M
gene (pufM) it was possible to identify more than 150 bacterial rhodopsins belonging in 5 Phyla and
a rich AAP community for the first time in non-aquatic habitats. AAP bacteria are commonly found
in aquatic environment ranging from the arctic to the tropics and from high salinity lakes to pristine
high altitude lakes (Yurkov & Hughes 2017). They rely on organic carbon compounds to cover their
nutritional requirements and can also utilize light to produce energy in the form of ATP, without
4
bioRxiv preprint first posted online Dec. 6, 2018; doi: http://dx.doi.org/10.1101/488148. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
fixing carbon and without producing oxygen. Having been found in a variety of extreme
environments and having shown their potential to outgrow non-AAP bacteria under nutrient limiting
conditions, their presence in the phyllosphere may not be surprising. However, their relative
abundance, at least in the rice phyllosphere (Oryza sativa), was up to 3 times higher compared to
what is commonly found in marine ecosystems (Atamna-Ismaeel, Omri Finkel, et al. 2012).
the presence of AAP bacteria in the phyllosphere. However, no pure cultures of AAPs have been
described so far from phyllosphere and thus detailed genome information is lacking, which prevents
an in-depth understanding of the ecological roles and evolutionary trajectory of AAPs in this unique
niche. To expand our genomics views of this ecologically important group of bacteria in phyllosphere,
we designed this study with the following aims: i) to create the first collection of AAP isolates from
the phyllosphere, ii) to characterize their photosynthesis gene clusters (PGCs) and compare their gene
content and molecular evolution, and iii) to expand the database of complete genomes of AAP
bacteria from non-aquatic environments. Employing culturomics (Lagier et al. 2018) which combines
colony infrared (IR) imaging systems, MALDI-TOF mass spectrometry, Illumina and Oxford
Nanopore sequencing technologies, we were able to provide 20 draft genomes that contain complete
photosynthetic gene clusters and 5 complete genomes that contain plasmids, including a novel strain
in the Methylocystaceae family. The further comparison of the evolutionary trajectories of their PGCs
revealed a diverging pattern among the highly homogenous Methylobacterium strains, highlighting
5
bioRxiv preprint first posted online Dec. 6, 2018; doi: http://dx.doi.org/10.1101/488148. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
A piece of intact and alive winter wheat (Triticum aestivum L.) with a height of ca. 60 cm was
collected from a field in Roskilde, Denmark, on 6 June, 2018 and transported to the lab for bacterial
isolation on the same day. Wheat leaves were cut off and cleaned with running sterile water to remove
dust and other temporary deposits from the ambient environment. Microbiota were collected from 8
pieces of leaves by rinsing the leaves in 80 mL PBS solution (pH 7.4) in a Ø 150-mm petri dish and
scraping the leave surface repeatedly with sterile swabs until the PBS solution slightly turned greenish
due to the fall-off of leave cells. From the supernatant, 14 mL were plated on 1/5 strength R2A plates
(resulted in a total of 137 agar plates) and incubated at room temperature with normal indoor light
conditions for two weeks. AAP bacteria were identified using a custom screening chamber equipped
with an infrared imaging system, described in detail previously (Zeng et al. 2014). In short, plates
were placed in a chamber sealed from light and exposed to a source of green lights. AAP bacteria
containing bacteriochlorophyll absorb the green light and emit radiation in the near IR spectrum,
which was captured by a NIR sensitive CMOS camera. The image was processed and the glowing
colonies indicated the presence of BChl a. The colonies that give positive signals were marked,
restreaked onto 1/5 strength R2A plates and incubated at room temperature for 72h. Verification of
the light absorbing capabilities of the isolated strains was performed using the same setup. Strains
that passed the verification step were further restreaked onto 1/5 strength R2A plates to ensure pure
6
bioRxiv preprint first posted online Dec. 6, 2018; doi: http://dx.doi.org/10.1101/488148. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
The isolated AAP strains were grouped and de-replicated using a MALDI-TOF mass
spectrometer (Microflex LT, Bruker Daltonics, Bremen, Germany) before performing whole genome
sequencing to reduce the cost while maintain the biodiversity. Briefly, a toothpick was used to transfer
a small amount of an AAP bacterial colony onto the target plate (MSP 96 polished steel, Bruker) that
was evenly spread out and formed a thin layer of biomass on the steel plate. The sample was then
overlaid with 70% formic acid and allowed for air dry before addition of 1 uL MALDI-MS matrix
characterized peaks (Bruker Daltonics) was used to calibrate the instrument. The standard method
“MBT_AutoX” was applied to obtain proteome profiles within the mass range of 2-20 k Daltons
using the flexControl software (Bruker). The flexAnalysis software (Bruker) was used to smooth the
data, subtract baseline and generate main spectra (MSP), followed by a hierarchical clustering
analysis with the MALDI Biotyper Compass Explorer software, which produced a dendrogram as the
output for visual inspection of similarities between samples. An empirical distance value of 50 was
used as the cutoff for defining different groups on the dendrogram, corresponding to different
strains/species.
From each group on the dendrogram, one isolate was chosen and restreaked on ½ R2A plates
and incubated at room temperature for one week. Prior to DNA isolation, the plates were again tested
for the presence of light absorbing pigments as previously described. Bacterial colonies were scraped
from the plates using a loop and immersed in 400µL PBS solution, vortexed until the cells had
separated, spun down in a table-top centrifuge at 10.000g at 4oC, the supernatant was removed and
the pellets were re-dissolved in milliQ H2O. High molecular weight DNA was extracted from each
strain using a modified Masterpure DNA purification kit (Epicentre) by replacing the elution buffer
with a 10mM Tris-HCl - 50mM NaCl (pH 7.5 - 8.0) solution at the final step. DNA concentration
7
bioRxiv preprint first posted online Dec. 6, 2018; doi: http://dx.doi.org/10.1101/488148. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
was quantified on a Qubit 2.0 (Life Technologies) using the Broad Range DNA Assay kit and DNA
quality was measured on a Nanodrop 2000C (Thermo Scientific). Whole genome shotgun sequencing
was performed on all AAP strains on the Illumina Nextseq platform in house using the 2x150bp
chemistry. Selected strains were also sequenced on the MinION platform (Oxford Nanopore
Technologies, UK) in house using the Rapid Barcoding Kit (SBQ-RBK004) and the FLO-MIN106
The Illumina pair end reads were trimmed for quality and ambiguities using Cutadapt (Martin
2011) and assembled using SPAdes (Bankevich et al. 2012). The resulting assemblies were analyzed
using dRep (Olm et al. 2017) to compare their average nucleotide identities in order to identify
clusters of closely related taxa. The assemblies were imported in Geneious R.11.2.5 (Biomatters Ltd.)
and photosynthetic gene clusters were annotated using the Roseobacter litoralis strain Och149
plasmid pRL0149 (CP002624) and the Tardiphaga sp. strain G40 proteorhodopsin gene (AU
collection - data not published) as references under relaxed similarity criteria (40%). The suggested
genes of probable photosynthetic gene clusters were analyzed using BLAST (megablast, blastn) to
verify their annotations. For full genome annotations, the assemblies were uploaded on RAST (Aziz
et al. 2008; Overbeek et al. 2014; Brettin et al. 2015). The predicted, translated protein coding genes
were imported in CMG-Biotools (Vesth et al. 2013) for pairwise proteome comparisons using the
Hybrid assemblies of pair-end Illumina reads and long Oxford Nanopore Technologies reads
were performed using the Unicycler assembler (Wick et al. 2017) utilizing the bold assembly mode.
The resulting complete genomes were imported in Geneious, where their circularity was confirmed
by mapping-to-reference runs using the short and long reads from the respective hybrid assemblies
8
bioRxiv preprint first posted online Dec. 6, 2018; doi: http://dx.doi.org/10.1101/488148. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
and the native Geneious mapper under default settings in the Medium-Low sensitivity mode. Whole
genome alignments were performed using Mauve (Darling et al. 2004) and its progressiveMauve
Phylogenetic analyses
CMG-Biotools. Datasets containing the different genes from the identified photosynthetic gene
clusters were created in Geneious. The resulting sequences were aligned using MAFFT v.7.388
(Katoh et al. 2002) using the FFT-NS-I x1000 algorithm with default settings as implemented in
Geneious. Phylogenetic trees were created using RAxML (Stamatakis 2006) employing the GTR-
GAMMA nucleotide substitution model and the rapid bootstrapping and search for best scoring ML-
tree with 100 bootstraps replicates and starting from a random tree. Analysis of the photosynthetic
gene clusters was performed on 5 selected genes that were present in all isolates, namely acsF, bchL,
bchY, crtB, pufL and pufM. Individual gene alignments and phylogenetic trees were performed as
described above. A super-matrix (8.240 characters) comprised of the total 7 gene alignments was
created using the built-in “Concatenate alignments” function in Geneious, and a phylogenetic tree
was created as above. All trees were visually inspected for disagreements. Finally, a phylogenomic
tree was constructed in RAxML employing the GAMMA BLOSUM62 substitution model and the
rapid bootstrapping and search for best scoring ML-tree with 100 bootstraps replicates and starting
from a random tree, using as input the core protein alignment consisting of 6.988 characters created
with CheckM (Parks et al. 2015) and its lineage_wf algorithm with default settings. Substitution rates
were calculated as distances from the root, using the legend provided by the phylogenetic trees.
9
bioRxiv preprint first posted online Dec. 6, 2018; doi: http://dx.doi.org/10.1101/488148. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
Results
phyllosphere
The 129 phyllosphere isolates that were tested positive for the presence of
bacteriochlorophyll-α were placed in 25 groups according to their protein mass profiles based on
MALDI-TOF MS. A representative from each group were sequenced (Illumina) and their genomes
assembled (SPAdes). Information about their phylogeny, total genome size, and genome assembly
Table 1: Genome statistics and phylogenetic information of the 25 isolates that were Illumina
Analysis of the 16S region evidenced that isolate WL44 was in fact a mixture of 2 species
(Roseomonas and an Actinobacterium), while isolates WL46 and WL96 actually were fungi
(separated in 4 distinct groups), and the last 4 belonged in Rhizobium, Roseomonas, Sphingomonas
Fig. 1: Phylogenetic tree based on the core proteome (CheckM) of the 21 pure bacterial isolates of
the analysis. Rhizobium sp. WL3 was chosen as the outgroup. The legend shows substitution rates
from the root of the tree and the node labels indicate bootstrap support values (%).
10
bioRxiv preprint first posted online Dec. 6, 2018; doi: http://dx.doi.org/10.1101/488148. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
For WL4, analysis of the 16S region gave positive results on the family level, with most hits
belonging to uncultured, environmental strains. The only characterized hits belonged to Methylosinus
trichosporium (Gorlach et al. 1994) and Alsobacter metallidurans, a recently characterized novel
genus, novel species (Bao et al. 2006). However, in both instances, the similarity was rather low
(~98.5%). We included strains of both species in the core proteome analysis (CheckM – figure 2) and
observed that WL4 is highly supported as sister to Alsobacter sp. SH9 (BS = 100%), both of which
were distinct from both Methylosinus trichosporium OB3b and the remaining isolates, with very high
bootstraps support values. The placement of WL4 was consistent, on this rather small dataset,
Fig. 2: Phylogenomic tree based on the 16S region of the 21 pure bacterial isolates of the analysis.
Rhizobium sp. WL3 was chosen as the outgroup. The legend shows substitution rates from the root
of the tree and the node labels indicate bootstrap support values (%).
The 17 Methylobacterium isolates were further divided into 4 groups based on Average
Nucleotide Identities (ANI) (figure 3). Group 1 is comprised of isolates WL9, WL19, and WL69,
which probably comprise a distinct Methylobacterium species (ANI similarity < 85%). Group 2 is
comprised of isolates WL1, WL2, WL64, WL7, and WL18, which belong to either the same species
but different strains, or 2 different species (Species 2.1: WL1, WL2, WL64, Species 2.2: WL7,
WL18). Group 3 is comprised of isolates WL6, WL30, WL93, WL116, WL119, and WL30, in 3
strains of the same species. Lastly, Group 4 is comprised of isolates WL8, WL12, WL103, WL122,
and WL122, that are most likely 2 strains of the same species.
11
bioRxiv preprint first posted online Dec. 6, 2018; doi: http://dx.doi.org/10.1101/488148. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
Fig. 3: Whole genome, Average Nucleotide Identity (ANI) cladogram showing percentages of k-mer
This grouping is further supported by pairwise whole proteome comparisons (figure 4). For
Groups 2 and 3, the average core protein content is ~ 4.500 proteins, while for Group 4 it is closer to
4.000. This may signify the presence of either more or larger plasmids that characterize the strains of
this group. For Group 1, this number falls even more as in this case there clearly are 3 distinct species
Fig. 4: Pairwise proteome comparisons (BLAST-matrix) of the 21 bacterial isolates of the analysis.
The grey-to-green color gradient indicates low to high percentages of identity. The bottom boxes
show the presence of homologs within the same proteome (grey-to-red color gradient shows low to
The photosynthetic gene clusters identified in the sequenced isolates all belong to RC-II type,
meaning they contain bacteriochlorophyll-α and possess pheophytin – quinone type reaction centers
(Xiong & Bauer 2002). All strains possess a full bacteriochlorophyll synthase gene set, except for
Table 2: Gene content of the photosynthetic gene clusters identified in the 21 bacterial isolates. Grey
boxes indicate presence of genes and white boxes indicate absence of genes.
12
bioRxiv preprint first posted online Dec. 6, 2018; doi: http://dx.doi.org/10.1101/488148. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
For the carotenoid biosynthesis genes, all strains possess crtB, while crtA is only present in
strain WL4 and Rhizobium sp. strain WL3. For the crtC and crtK genes an interesting pattern is
observed: Methylobacterium group 1 and group 2 strains contain crtC (with 3 exceptions, discussed
below), while group 3 and 4 strains completely lack the gene. On the other hand, crtK is only present
in Methylobacterium group 3 strains and missing from groups 1, 2, and 4. Similar patterns are
evidenced in pufB which is absent from Methylobacterium groups 3 and 4, Rhizobium sp. WL3 and
Roseomonas sp. WL45, and pufC missing from Methylobacterium group 4 as well as the novel strain
WL4. The remaining genes are found ubiquitously in all strains of the analysis, with only a few
exceptions (table 2). Proteorhodopsin genes were not detected in any of the strains analyzed, except
for sample WL44, which is comprised of a Roseomonas and an Actinobacterial strain (table 1).
BLAST searches showed that the identified Proteorhodopsin gene belongs to the Actinobacterial part
The photosynthetic gene clusters were consistently found on the longer contigs of the initial
assemblies of Illumina data produced by SPAdes (for the strains with fewer than 500 contigs) (table
1). The different architectures of the isolated photosynthetic gene clusters are shown in figure 6. In
Rhizobium sp. WL3 and and Roseomonas sp. WL45 the photosynthetic genes form one continuous
cluster, with different, however, architecture. In Methylocystaceae WL4 the genes are organized in 2
clusters, similar to all Methylobacterium spp. isolates. Unique to the architecture present in WL4 is
the presence of bchI before the crtB, crtI genes in the first cluster. For Methylobacterium spp., groups
2 and 3 show similar gene synteny – though there are a few differences in gene content (table 2) –
while group 1 and group 4 appear to also share the overall architecture of the photosynthetic gene
clusters. For isolates WL9 and WL19 (both in group 1) the exact location of the crtB and crtI genes
13
bioRxiv preprint first posted online Dec. 6, 2018; doi: http://dx.doi.org/10.1101/488148. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
Fig. 6: Architecture of the different photosynthetic gene clusters. Arrows indicate the orientation of
the genes. Sizes are not representatives of gene length. Genes that may be missing from some strains
Annotating the assemblies on RAST revealed several housekeeping genes present in the vicinity of
these clusters, which further supported the notion of their chromosomal placement. The completed
genomes resulting from the hybrid assemblies (strain WL1, WL2, WL3, WL4, WL45) provided the
final evidence of the exact placement of the photosynthetic gene clusters on the chromosomes of all
our isolates. General descriptive statistics of the assembled complete genomes are given in table 3.
Table 3: General descriptive statistics of the 5 assembled, complete AAP bacterial genomes.
The hybrid assemblies resulted in both complete genomes and plasmids. Thus, Rhizobium sp.
WL3 contains 2 plasmids, Roseomonas sp. WL45 contains 2 chromosomes and 7 plasmids; the novel
Mehtylocystaceae strain WL4 with 2 plasmids; Methylobacterium sp. WL1 with 1 plasmid; while
none in Methylobacterium sp. WL2. Interestingly, isolates WL1 and WL2 exhibit a peculiar feature;
the two isolates appear identical in the 16S phylogenetic analysis (figure 1), ANI comparison (figure
3), as well as in phylogenetic analysis of the super-matrix comprised of the 6 genes of the
photosynthetic cluster (figure 5). Interestingly, the assembled genomes of WL1 and WL2 are not
identical. Though close in size, WL1’s genome is slightly bigger, and it also contains 1 plasmid.
Synteny between the two genomes is rather high, except for a 45kb rearrangement that is present in
WL2 (Mauve alignment – data not shown). Illumina contigs show that the rearrangement in WL2 is
true, based on mapping-to-reference runs. This needs to be further investigated in order to verify
14
bioRxiv preprint first posted online Dec. 6, 2018; doi: http://dx.doi.org/10.1101/488148. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
whether this is an artifact of the assembly process or that we truly managed to capture the beginning
Fig 5. Phylogenetic tree based on the super-matrix consisting of the 16S region and the acsF, bchL,
bchY, crtB, pufL and pufM genes that are present in the photosynthetic gene clusters of all 21 bacterial
isolates. The legend shows substitution rates from the root of the tree and the node labels indicate
Using Rhizobium sp. WL3 as outgroup and calculating substitution rates from the root of the
different trees, it is observed that Roseomonas sp. WL45 has the most divergent genes out of the 21
isolates (table 4). This is consistent for the 16S region, the 6 genes of the photosynthetic gene clusters
that were used, the super-matrix of all 7 genes, as well as the phylogenomics tree from CheckM. In
Methylobacterium spp., which make up the majority of the isolates, it is evidenced that groups 3 and
4 have comparable rates. Methylobacterium spp. group 2 appear to have more divergent genes
compared to the other 3 groups, with only exceptions being the genes bchL and crtB. Thus, evolution
rates in the super-matrix are closer than for the individual genes. For Methylobacterium sp. WL103,
crtB seems to be as divergent as that of Roseomonas sp. WL4, and much more different compared to
the other Methylobacterium strains. However, the observed differences in the sequences in the
selected genes of the 18 Methylobacterium strains disappear in the core proteome comparison
(CheckM), where the margin ranges between 0.37 and 0.40 substitutions per site for all 4 groups,
Table 4: Substitutions per site in the different phylogenetic analyses. All values correspond to
distances from the root, where Rhizobium sp. WL3 was chosen as the outgroup for all the alignments.
15
bioRxiv preprint first posted online Dec. 6, 2018; doi: http://dx.doi.org/10.1101/488148. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
Discussion
environmental isolates.
bacteria in the phyllosphere, we designed this study using wheat as the plant of choice, and employed
a high-throughput, multi-disciplinary approach to identify, isolate and analyze AAP strains. The
possible using colony infrared imaging techniques (Zeng et al. 2014). This method requires the
presence of bacterial colonies growing on agar plates. It is expected that bacteria that inhabit the
phyllosphere will be able to grow in such conditions in the laboratory. After identifying the “glowing”
From the initial 137 plates, we ended up with ~4480 bacterial colonies, more than 500 of
which were probable AAP strains. Given the large number of isolates that we collected, we decided
to employ a 2-step approach to reduce the total number to a more manageable figure. Thus, after
macroscopic investigation of size, shape, color, and texture of the colonies, we chose 129 unique
strains, for which we ran proteome profiling using MALDI-TOF MS. This approach has been shown
to be able reproducibly to distinguish different taxa, and in cases where reference MALDI-TOF
spectra are available in the database, be able to assign Genus and/or Species names (Madsen et al.
2015). The currently available MALDI-TOF spectra libraries lack spectra from environmental type
strains (Madsen et al. 2015). Thus, we were only able to categorize the 129 isolates into 25 distinct
groups. We then proceeded with whole genome sequencing on the Illumina Nextseq platform for 1
representative from each of these groups. Analyzing the draft assemblies and the 16S sequences we
were able to further reduce the number of unique genomes down to 11, for which we performed
16
bioRxiv preprint first posted online Dec. 6, 2018; doi: http://dx.doi.org/10.1101/488148. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
Oxford Nanopore Technologies sequencing, aiming to generate complete genomes and plasmids.
(plating, morphology etc.), biochemical analyses (MALDI-TOF MS), and two different types of high-
throughput whole genome sequencing we were able to quickly, efficiently, and effectively generate
draft and complete genomes of novel AAP bacteria from the wheat phyllosphere in a very short
timeframe (weeks).
The prevalence of AAP bacteria in the phyllosphere is not known. Studies conducted thus far
relied on either metagenomes or amplicons (e.g. (Atamna-Ismaeel, Omri M. Finkel, et al. 2012;
Atamna-Ismaeel, Omri Finkel, et al. 2012). Looking at bacterial rhodopsins, we identified only one
in sample WL44 (~0.1% of total bacteria). The gene was shown (blast) to belong to an
Actinobacterium, which however was mixed with a Roseomonas strain (16S analysis), which did not
allow for closing its genome using ONT sequencing. Actinobacterial rhodopsins have been found in
a variety of plant leave surfaces, including Glycine max (L.) Merr., Oryza sativa L., Tamarix nilotica
(Ehrenb.) Bunge, and others (Atamna-Ismaeel, Omri M. Finkel, et al. 2012). Compared to the total
number of microbial rhodopsins that were identified (156) they only constitute a minor fraction (15
sequences, ~10%). Actinobacteria, in general, grow slowly compared to other bacteria and mostly
remain dormant under nutrient-limiting conditions (Barka et al. 2016), so it is possible that more
rhodopsin-bearing strains are present in the wheat phyllosphere, that we were not able to detect using
In our investigation we sequenced two strains that were later revealed to belong to the
anamorphic yeast Sporobolomyces roseus (isolates WL46 and WL96). Apart from the obvious
explanation that a contamination had occurred, it is possible that these strains actually contain fungal
17
bioRxiv preprint first posted online Dec. 6, 2018; doi: http://dx.doi.org/10.1101/488148. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
rhodopsins, as was found earlier (Atamna-Ismaeel, Omri M. Finkel, et al. 2012). Searching for such
genes under relaxed similarity criteria yielded negative results. Their draft assemblies consist of 1840
and 2078 contigs respectively, thus making them inadmissible to RAST for annotating. Moreover, a
GenBank search (last accessed on 30/11/2018) using the terms “fungal rhodopsin”, “yeast rhodopsin”
and “Sporobolomyces rhodopsin”, yielded either no results, or draft genome assemblies of yeasts with
many predicted or uncharacterized opsin-type genes. We decided not to proceed with these sequences
to identify 21 strains harboring them, the majority of which belonged to the genus Methylobacterium.
Members of this genus are ubiquitous in nature and have been detected in both soil and plant surfaces
(Green 2006). Targeting the bchY (chlorophyllide reductase subunit Y) and pufM (reaction centre
subunit M) genes in phyllosphere samples of different plants, it was shown that the
Methylobacteriaceae (alpha-Proteobacteria) comprised 28% of the total pufM sequences from the
amplicon sequencing analysis (Atamna-Ismaeel, Omri Finkel, et al. 2012). Our results are, thus, in
phyllosphere and have consistently been shown to harbor photosynthetic gene clusters, it may be that
these photosystems are native parts of the Methylobacteria that inhabit this niche. Interestingly, in
their study, Atamna-Ismaeel et al. had negative results from their bchY amplicon sequencing
approach, suggesting the absence of bacteria that contain RC1 (bacteriochlorophyll-α containing
reaction centers) type photosystems. Our results show that all isolates contain the complete cassette
of chlorophyllide synthase subunit encoding genes (bchB, bchC, bchF, bchG, bchI, bchL, bchM,
bchN, bchP, bchX, bchY, bchZ), as well as both the pufL and pufM genes (table 2). Testing the same
18
bioRxiv preprint first posted online Dec. 6, 2018; doi: http://dx.doi.org/10.1101/488148. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
primers (Yutin et al. 2009) in silico against our bchY sequences, we had positive results for all of
them (data not shown). Using a whole genome sequencing approach on the positive isolates we were
able to retrieve all genes contained in these photosystems and we discovered an interesting pattern:
the 18 Methylobacterium isolates can be categorized in 4 distinct groups not only based on average
nucleotide identity and whole proteome pairwise comparisons (figure 3, 4), but also from the missing
genes from their photosynthetic clusters (table 2). Thus, Methylobacterium strains of group 1 are only
missing the crtA (carotenoid synthase subunit A) gene, strains of group 2 are missing crtA and crtK,
strains of group 3 are also missing crtC and pufB, and last strains of group 4 are additionally missing
pufC. For the remaining 3 isolates, no patterns can be detected as we only have 1 representative from
each taxon. However, all 21 isolates contain an intact chlorophyllide synthase, while different
carotenoid synthase subunit genes may be randomly missing. In their Illumina assemblies, strains
(Methylobacterium group 4) appear to be missing more genes of the photosynthetic gene cluster
compared to their closely related strains in their respective groups. This is most likely a result of poor
draft genome assemblies for the 3 strains as they assembled in 1376, 1179, and 1332 contigs,
respectively. Mapping their quality-controlled reads to contigs of strains of their respective groups,
the missing genes had equal coverage as the rest (data not shown). Thus, we decided to report them
in table 2 as present. As far as group 1 is concerned, the 3 Methylobacterium strains that comprise it
are quite different and most likely belong to 3 distinct species, so the remaining, randomly missing
Patterns of loss of the carotenoid genes have been observed in other alpha and gamma
proteobacteria, like in Rhodobacterales and Sphingomonadales (Zheng et al. 2011). The absence of
crtA results in the inability to perform the final step in the spheroidene pathway where
19
bioRxiv preprint first posted online Dec. 6, 2018; doi: http://dx.doi.org/10.1101/488148. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
Methylobacterium spp. of groups 3 and 4 completely affects both the spheroidene and the
spirilloxanthin pathways as its product is essential for the first step of both metabolic pathways
starting from neurosporene and lycopene, respectively. This means that strains of these 2 groups rely
on the zeaxanthin pathway to produce carotenoid pigments, and most likely nostoxanthin and
erythroxanthin, which do not, however participate in the light-harvesting process, but rather have a
In our study we sequenced and analyzed 21 aerobic anoxygenic photosynthetic bacteria, most
of which belonged to the genus Methylobacterium, the presence of which in the environment has
already been discussed previously. Methylobacterium spp. are facultative methylotrophs and can
utilize a variety of C1, C2, C3, and C4 compounds. Such compounds are often found in the
phyllosphere due to plant metabolism (Vorholt 2012; Remus-Emsermann et al. 2014). Apart from
this genus, we also identified 1 Rhizobium sp., 1 Roseomonas sp., and a strain (WL4) that belongs in
the Methylocystaceae, a family of methanotroph bacteria. This is the first time that AAP bacteria
from this Family have been isolated and their complete genomes assembled. Its closely related
Alsobacter spp. have been shown to exhibit high Thallium tolerance (Bao et al. 2006), and it will be
samples. We propose that strain WL4 belongs in the Methylocystaceae and is in fact a novel species,
and possibly a novel genus. The hybrid assembly using Illumina and Nanopore reads led to a complete
genome with 2 plasmids, with the photosynthetic clusters being located on the genome. Further
analysis of WL4 to verify these claims and suggest a species and/or genus name will be the focus of
another investigation. However, it is worth mentioning that if WL4 contains heavy-metal resistance
genes, on top of the photosynthetic gene cluster, this will have been shown for the first time and its
20
bioRxiv preprint first posted online Dec. 6, 2018; doi: http://dx.doi.org/10.1101/488148. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
potential to bioremediate heavy-metal contaminated areas if applied as Plant Growth Promotion and
Phylogenetics analyses on the Methylobacterium groups shows that the isolates belonging in
group 3 and 4 comprise a single species and possibly single strain each (figures 1, 2, 5). Group 2
Methylobacterium spp. possibly constitute 1 species with 2 strains (WL7 and WL18, WL1, WL2 and
WL64), while group 1 isolates correspond to 3 distinct Methylobacterium species. This brings the
total of unique isolated AAP taxa to 9, of which 5 complete genomes were assembled (WL1, WL2,
WL3, WL4, and WL45). The photosynthetic gene cluster in Rhizobium sp. WL3 and Roseomonas sp.
WL45 is shown as a single stretch of ~50.000 bp containing all genes. On the other hand, the novel
Methylocystace strain WL4, and Methylobacterium sp. Strains WL1, and WL2 exhibit the bipartite
architecture that has been shown elsewhere (Zheng et al. 2011). For the remaining Methylobacterium
isolates, for which draft assemblies produced long enough contigs that would be able to encompass
the photosynthetic gene cluster in one part (like WL3 and WL4), it appears that they follow a similar
bipartite architecture, though it is not possible to conclude on possible differences due to the presence
All phylogenetic trees were constructed using Rhizboium sp. WL3 as the outgroup.
Calculating substitution rates from the root on each of the constructed trees we observed that
Roseomonas sp. WL45 is more divergent compared to the other isolates in all analyses, including the
16S alignment as well. For the 18 Methylobacterium isolates, group 2 strains (WL1, WL2, WL7,
WL18, WL64) evolve consistently at different rates compared to the other 3 groups. Surprisingly the
trend followed by most genes in the analysis does not apply in the case of bchL and crtB. This is very
interesting as all the 6 genes analyzed all belong to the same photosynthetic gene cluster. It appears
that there is relatively strong selection pressure on bchL pufL and pufM, which in all cases are more
similar to each other compared to ascF, bchY and crtB. This result partially comes to contrast with
21
bioRxiv preprint first posted online Dec. 6, 2018; doi: http://dx.doi.org/10.1101/488148. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
previously adopted methods of using bchY as a marker gene for RC1 clusters (e.g. Atamna-Ismaeel,
Omri Finkel, et al. 2012). On the other hand, the use of pufM appears more sensible. It would,
however, be more suitable to investigate the possibility of using other genes for metagenome
amplicon sequencing approaches, such as pufL or bchL, which show a smaller degree of divergence
in different genera, thus universal primers might be more successful in detecting a wide variety of
AAP bacteria in environmental samples. These differences diminish when analyzing the core
proteome (CheckM) of the isolates, where all Methylobacterium isolates fall within the range between
0.38 and 0.49 substitutions per site (table 4). Overall, it seems that the photosynthetic gene clusters
were incorporated in ancerstral strains that later diverged, to give rise to the different AAP taxa. This
is further corroborated by the fact that almost all phylogenetic trees (e.g. figure 5) are consistent and
in agreement with both the 16S phylogenetic tree (figure 1) as well as the phylogenomic tree (figure
2). If the genes of the photosynthetic gene clusters had been transferred to these AAP several times,
there would have been significant differences both in the sequence level and the phylogenetic analysis
of the different genes, which would be expected to result in different topologies and diverse distances
from the root, even for closely – based on 16S analysis – related strains.
In this study we employed a culturomics approach to isolate for the first time a variety of AAP
bacteria that are present in the wheat phyllosphere, adding to the metagenomic knowledge provided
earlier from 5 other plants (Atamna-Ismaeel, Omri M. Finkel, et al. 2012; Atamna-Ismaeel, Omri
Finkel, et al. 2012). As stated earlier, the phyllosphere is a rather harsh environment, with little water
availability, scarce resources, very few suitable locations, extreme temperature difference between
day and night, and high dosages of UV radiation (Vorholt 2012). AAP bacteria have the ability to
utilize light in photophosphorylation processes, through which they produce ATP and at the same
22
bioRxiv preprint first posted online Dec. 6, 2018; doi: http://dx.doi.org/10.1101/488148. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
time funnel the few available nutrients in other metabolic pathways (Yurkov & Hughes 2017). This
has been shown to give them an edge in vitro after exposure to light, compared to non-AAP bacteria
(Ferrera et al. 2011). Considering that phyllosphere bacteria protect the plant from pathogenic
microbes by secreting metabolites and by occupying the available niches, being able to persevere and
outgrow other bacteria gives a significant edge to AAP taxa and probably points towards the
suggestion of positive selection of such bacteria in their phyllosphere by the plants themselves, as
they appear to be absent from the soil (Atamna-Ismaeel, Omri M. Finkel, et al. 2012). At this point,
it is not clear exactly how AAP bacteria benefit the plant, especially since the phyllosphere is a
complex habitat with an ever-changing, harsh environment and it required greater attention. In the
past years, great interest has been shown on how to improve crop production by studying the
rhizosphere. It has also been suggested that the phyllosphere communities play a significant role in
plant health and growth, which ultimate leads to increased yield (Lindow & Brandl 2003). Given the
current projections of world population growth and the supply of food (United Nations 2017), it
becomes evident that more, in-depth studies investigating the phyllosphere of plants, and especially
crops, and identifying the underlying drivers of the bacterial communities and their interactions with
their plant hosts is of outmost importance. Thus, by isolating, maintaining and analyzing the genomes
of phyllosphere isolates that may prove useful as plant growth promoting agents in possible field
trials investigating the effect of phyllosphere inoculation to plant growth, health, and yield.
23
bioRxiv preprint first posted online Dec. 6, 2018; doi: http://dx.doi.org/10.1101/488148. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
Acknowledgements
The authors would like to thank Tue Sparholt Jorgensen for collecting the wheat sample. The authors
also want to thank Tina Thane (AU), Tanja Begovic (AU) and Margit Frederiksen (NRCWE) for
capable laboratory assistance. We thank Michal Koblížek for providing the initial model of the colony
Author Contributions
AZ, YZ, LHH designed the study. YZ isolated the strains, performed colony infrared imaging,
extracted DNA for Illumina sequencing. YZ and AMM performed the MALDI-TOF MS analysis.
AZ extracted DNA and performed ONT Sequencing. AZ carried out the bioinformatics and
phylogenetics analyses. AZ and YZ drafted the manuscript. AZ, YZ, AMM and LHH revised the
manuscript for intellectual content. All authors read and approved the manuscript.
Conflicts of interests
Funding
This work was funded by a Villum Experiment grant (no. 17601) and a Marie Skłodowska-Curie
AIAS-COFUND fellowship (EU-FP7 program, Grant Agreement No. 609033) awarded to YZ.
Data availability
Complete and draft genome assemblies analyzed in this investigation are available in GenBank under
24
bioRxiv preprint first posted online Dec. 6, 2018; doi: http://dx.doi.org/10.1101/488148. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
References
Atamna-Ismaeel N, Finkel OM, et al. 2012. Microbial rhodopsins on leaf surfaces of terrestrial
Aziz RK et al. 2008. The RAST Server: rapid annotations using subsystems technology. BMC
Bankevich A et al. 2012. SPAdes: A New Genome Assembly Algorithm and Its Applications to
Bacteria from Heavy Metal-polluted River Sediment and Non-polluted Soils. Microbes
Barka EA et al. 2016. Taxonomy, Physiology, and Natural Products of Actinobacteria. Microbiol.
Beatie GA. 2002. Leaf surface waxes and the process of leaf colonization by microorganisms.
Beattie GA, Lindow SE. 1999. Bacterial Colonization of Leaves: A Spectrum of Strategies.
Brettin T et al. 2015. RASTtk: A modular and extensible implementation of the RAST algorithm
for building custom annotation pipelines and annotating batches of genomes. Sci. Rep. 5:8365.
doi: 10.1038/srep08365.
Carvalho SD, Castillo JA. 2018. Influence of Light on Plant–Phyllosphere Interaction. Front. Plant
25
bioRxiv preprint first posted online Dec. 6, 2018; doi: http://dx.doi.org/10.1101/488148. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
Darling ACE, Mau B, Blattner FR, Perna NT. 2004. Mauve: multiple alignment of conserved
Ferrera I, Gasol JM, Sebastián M, Hojerová E, Koblízek M. 2011. Comparison of growth rates
soil bacteria for population analysis. J. Gen. Appl. Microbiol. 40:509–517. doi:
10.2323/jgam.40.509.
Gourion B, Rossignol M, Vorholt JA, Lindow SE. 2006. A proteomic study of Methylobacterium
Green PN. 2006. Methylobacterium. In: The Prokaryotes - A handbook on the biology of bacteria.
Katoh K, Misawa K, Kuma K, Miyata T. 2002. MAFFT: a novel method for rapid multiple
sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30:3059–3066. doi:
10.1093/nar/gkf436.
Knief C et al. 2012. Metaproteogenomic analysis of microbial communities in the phyllosphere and
methylotroph in the phyllosphere Yun, S-H, editor. PLoS One. 9:e106704. doi:
10.1371/journal.pone.0106704.
Lagier J-C et al. 2018. Culturing the human microbiota and culturomics. Nat. Rev. Microbiol. doi:
10.1038/s41579-018-0041-0.
Lindow SE, Brandl MT. 2003. Microbiology of the phyllosphere. Appl. Environ. Microbiol.
26
bioRxiv preprint first posted online Dec. 6, 2018; doi: http://dx.doi.org/10.1101/488148. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
Madsen AM, Zervas A, Tendal K, Nielsen JL. 2015. Microbial diversity in bioaerosol samples
Martin M. 2011. Cutadapt removes adapter sequences from high-throughput sequencing reads.
Noguchi T, Hayashi H, Shimada K, Takaichi S, Tasumi M. 1992. In vivo states and functions of
Olm MR, Brown CT, Brooks B, Banfield JF. 2017. dRep: a tool for fast and accurate genomic
comparisons that enables improved genome recovery from metagenomes through de-
Overbeek R et al. 2014. The SEED and the Rapid Annotation of microbial genomes using
10.1093/nar/gkt1226.
Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. 2015. CheckM: assessing the
quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome
10.1093/bioinformatics/btl446.
27
bioRxiv preprint first posted online Dec. 6, 2018; doi: http://dx.doi.org/10.1101/488148. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
UN. 2017. World Population Prospects: The 2017 Revision, Data Booklet.
Vesth T, Lagesen K, Acar Ö, Ussery D. 2013. CMG-biotools, a free workbench for basic
Vorholt JA. 2012. Microbial life in the phyllosphere. Nat. Rev. Microbiol. 10:828–840. doi:
10.1038/nrmicro2910.
Wick RR, Judd LM, Gorrie CL, Holt KE. 2017. Unicycler: Resolving bacterial genome assemblies
from short and long sequencing reads Phillippy, AM, editor. PLoS Comput. Biol. 13:e1005595.
doi: 10.1371/journal.pcbi.1005595.
Wilson M, Lindow SE. 1994a. Coexistence among Epiphytic Bacterial Populations Mediated
Wilson M, Lindow SE. 1994b. Ecological Similarity and Coexistence of Epiphytic Ice-Nucleating
(Ice) Pseudomonas syringae Strains and a Non-Ice-Nucleating (Ice) Biological Control Agent.
Woodward FI, Lomas MR. 2004. Vegetation dynamics – simulating responses to climatic change.
Xiong J, Bauer CE. 2002. Complex Evolution of Photosynthesis. Annu. Rev. Plant Biol. 53:503–
Xu J, Gordon JI. 2003. Honor thy symbionts. Proc. Natl. Acad. Sci. U. S. A. 100:10452–9. doi:
10.1073/pnas.1734063100.
Yurkov V, Csotonyi JT. 2009. New Light on Aerobic Anoxygenic Phototrophs. In: The Purple
Photosynthetic Bacteria. Hunter, NC, Daldal, F, Thurnauer, MC, & Beatty, JT, editors. Springer
Yurkov V, Hughes E. 2017. Aerobic Anoxygenic Phototrophs: Four decades of mystery. In: Modern
Topics in the Phototrophic Prokaryotes: Environmental and Applied Aspects. Hallenbeck, PC,
28
bioRxiv preprint first posted online Dec. 6, 2018; doi: http://dx.doi.org/10.1101/488148. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
Yutin N et al. 2009. BchY-based degenerate primers target all types of anoxygenic photosynthetic
09.
Zeng Y, Feng F, Medova H, Dean J, Koblížek M. 2014. Functional type 2 photosynthetic reaction
centers found in the rare bacterial phylum Gemmatimonadetes. Proc. Natl. Acad. Sci.
Zheng Q et al. 2011. Diverse arrangement of photosynthetic gene clusters in aerobic anoxygenic
29
bioRxiv preprint first posted online Dec. 6, 2018; doi: http://dx.doi.org/10.1101/488148. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
Table 1
30
bioRxiv preprint first posted online Dec. 6, 2018; doi: http://dx.doi.org/10.1101/488148. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
Table 2
bchM
bchC
bchN
pucC
pufM
bchG
bchB
bchF
bchP
bchX
puhA
pufC
bchL
bchY
bchZ
acsF
pufA
pufB
pufL
crtD
bchI
crtC
crtK
crtA
crtB
crtE
crtF
crtI
Strain
Table 3
Taxon Genome size (bp) GC% Number of genes Plasmids Plasmids size (bp) GC%
Rhizobium sp. WL3 4.568.855 61,60 4.450 1 700.631 58,6
2 85.350 57,5
Roseomonas sp. WL45 4.533.887 70,8 5.042 1 262.815 65,5
1.011.854 70,7 1.138 2 240.556 66,2
3 152.922 65,8
4 85.279 64,6
5 84.774 66,1
6 83.273 65,6
7 65.860 64,8
Methylocycstaceae WL4 5.405.668 67,9 5.013 1 27.178 62,0
2 9.451 60,6
Methylobacterium WL1 6.215.463 69,1 6.431 1 35.731 63,8
Methylobacterium WL2 6.168.811 69,1 6.323 - - -
31
bioRxiv preprint first posted online Dec. 6, 2018; doi: http://dx.doi.org/10.1101/488148. The copyright holder for this preprint
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
All rights reserved. No reuse allowed without permission.
Table 4
Susbstitutions per site
Strain 16S acsF bchL bchY crtB pufL pufM super-matrix CheckM
Rhizobium sp. WL3 0,06 0,18 0,24 0,21 0,45 0,12 0,18 0,19 0,11
Roseomonas sp. WL45 0,29 0,57 0,71 0,78 1,08 0,36 0,53 0,56 0,48
Methylocystaceae WL4 0,12 0,37 0,61 0,38 0,84 0,26 0,32 0,35 0,34
Methylobacterium sp. WL9 0,21 0,52 0,30 0,46 0,86 0,35 0,40 0,44 0,40
Group1
Methylobacterium sp. WL19 0,21 0,52 0,43 0,52 0,80 0,31 0,41 0,45 0,39
Methylobacterium sp. WL69 0,22 0,50 0,46 0,53 0,81 0,34 0,43 0,45 0,39
Methylobacterium sp. WL1 0,22 0,55 0,36 0,54 0,81 0,37 0,46 0,47 0,38
Methylobacterium sp. WL2 0,22 0,55 0,36 0,54 0,81 0,37 0,46 0,47 0,38
Methylobacterium sp. WL7
Group 2
32
Figure 1
Methylocystaceae WL4
Methylocystaceae WL4
Methylocystaceae WL4
ei 3
66
as hy er
ns
se L3 4, ,4 4, s
6.4 %
ct
15
om et
s, lie W
Rh L8 sp 1 , 8 n i
ba 22
pr 61 i
te am m
iz o . o f
M o f
W W lo iu L1
am pr
M na te
91
W Homology between proteomes
Sp o bi m L1 7,
i n s, i l ie 10
8
hy
, 9
er
in um iu 12 et
h
ss
24
23
4
3,
99
s 5,
et ns
,4
ct ili
es
er L y p ei
ba 20
pr 12.0 %
go 5,
es lo .W 2
M ot fa
m
m
ct
ot fa
lo iu
08
m 5 sp ili W M cy ei m 1,078 / 8,964 pr 84
L1
ba 03 hy er
p ns ili 35 ,6
on r .W fa
m
m et L4 4 , e ,5 es
ct
,
Ro ot
lo iu st 95 6, s 5,
5
et W
L1 ns ili
3
ei
66 hy a 5 5 41
a hy er ei
ba
ns pr 8.9 % 11.5 %
s 6.4
lo % 95.4 %
eo 4,
ss ,4 L3 4, es M l c 5
M ot fa
m
m 3
ct e ot fa
iu
15
m 1 ,8
et
in
s, ili W ob ae ei m 750 / 8,407 1,080 / 9,388 pr 05
L9
p ba 22 et hy er
pr 61 ns ili 99 ,8
o .W te fa
m
m ac W
4, ,4 es ,5 es
M ct
,0
o fa ro
lo hy 5
et W
riu L1
4 ns i
M na t ei m p 91 te 6 , 6 6 i l
L4 ei
ba
99
hy
n
et 7,
s s ,3 L1 il i e 10
8
4 ,9
te s
M lo ri p r ot fa
9.1 % 8.6 % 10.6 %
M ot fa
m
m 0
lo iu
23
hy 4 sp , 9
24
s 5,
et n s,
c W i lie b a u m
e i m 1,001 / 11,037 762 / 8,811 1,060 / 9,974 pr 6 7
L3
ei
ba 20 et hy er
92 ns ili 2 ,0
pr
.W
12.0 %
ot m
fa
m ct 5, ,4 es 14 ,5 es
lo M hy ct
ot fa 09 6,
lo iu er W et W
pr
L1 ns ili
ei m 1,078 / 8,964 4 9 ,2
c 68
er
99 ei
ba 19
hy
ns
4, ys ,6 L ili
es 53
5
5, es M lo iu L9 pr 19.0 % 9.0 % 8.0 % 9.6 %
M ot fa
m
m
ct
ot fa
lo iu
95
5 ta ,4 45 5,
et in
s, W ili b ac m ei m 1,564 / 8,248 1,020 / 11,395 756 / 9,397 968 / 10,125 pr
45
9
L1
ba et hy er
p 1 8.9 % 11.5 % e n i l 1 ,
r ce 5
ot m fa
m
3 te W 4 s ,4 i e 44 ,5 es
ct
,4
lo ot fa
M lo hy s 5,
et W
riu L9
ei a m pr 05
53 ,7 ns ili
ba eW
750 / 8,407 1,080 / 9,388
riu L1 67 ei
ba 16
hy
ns
ct ,4
ili
es 09
9
te ,5
,8
es lo pr
ot fa
44.9 % 18.3 % 8.9 % 7.5 % 11.6 %
M ot fa
m
m
e
,6 6,
et c ns W ili M b a m 9 ei m 2,842 / 6,333 1,583 / 8,634 1,061 / 11,927 713 / 9,496 1,080 / 9,310 pr lo 10
9
iu L1
L4 ba ei
hy er
99 ns ili 72
riu 9.1 % 8.6 % 10.6 %
ot m fa
m
0 et ct W 6, , e 5, s
M ct
o f 7 ,0
lo iu eir 4 s
et
, e
W
L3
a pr
tei m 1,001 / 11,037 762 / 8,811 1,060 / 9,974 67 hy 8 5 ,2 6 s ili
m er L6 1 in
ba
hy
n i 2 0 p 43.3 % 42.9 % 16.2 % 8.6 % 8.5 % 11.0 % e
s ,4 l i es ,1
4
t ,5
,
es l u r ot
3
fa
M ro
t fa
m
m
, 29 W 6
et c ns W ili M o ba m 9 e in m 2,915 / 6,728 2,856 / 6,659 1,509 / 9,316 1,040 / 12,061 748 / 8,757 1,089 / 9,924 p lo 29
riu L6
ba ei
19 hy
19.0 % 9.0 % 8.0 % 9.6 % ili 1 ,1
um 9
L9 ot m fa
m
et W 6, s, es 67 ,5 e es
M ct ct
fa 04 5,
lo iu
6,
et W
pr
L159 hy ns ili
m 1,564 / 8,248 1,020 / 11,395 756 / 9,397 968 / 10,125 7 23
er er L1 ei
ba
hy
ili 1 ,4 pr 47.6 % 42.7 % 36.4 % 14.4 % 8.8 % 8.1 % 11.4 %
W es 44 ,5 es lo iu
6
M ot fa
m
m 4
ct
ot fa
L et 5, ns W M ili
b m 8 ei m 2,914 / 6,126 3,002 / 7,025 2,716 / 7,456 1,374 / 9,512 1,005 / 11,392 760 / 9,362 1,083 / 9,527 pr lo 15 iu L6
ba ei
16 e hy er
44.9 % 18.3 % 8.9 % 7.5 % 11.6 % n i 3 6
19 ot m fa
m
ac 6, s, l ie 73 ,5
,
es
M ct
58 5,
lo
s
et W
riu L1 thy
5,
am
2,842 / 6,333 1,583 / 8,634 1,061 / 11,927 713 / 9,496 1,080 / 9,310 pr 09 te W 9 50 ns ili
hy
ei
ba
ili 2 ,1 pr 39.9 % 45.8 % 36.2 % 30.6 % 18.2 % 8.9 % 8.4 % 11.3 %
es 07
te M ,5 l ri es L7 ot
5
fa
M ot fa
m
m
et 6,
c W ns o b u
ili
m
ei m 3,181 / 7,975 2,961 / 6,470 2,836 / 7,829 2,409 / 7,883 1,571 / 8,612 1,054 / 11,895 754 / 8,965 1,082 / 9,589 pr lo 44
5
riu L2
ba m et
ei
hy
43.3 % 42.9 % 16.2 % 8.6 % 8.5 % 11.0 % ns ili 85
69 ot a fa
m 6 , e 5, e s
ct
,
2,915 / 6,728 2,856 / 6,659 1,509 / 9,316 1,040 / 12,061 748 / 8,757 1,089 / 9,924 M lo iu hyl L6 cter
pr 29 W 27
1
5,
93
s 6,
1
et ns
,
ili
e
W
hy er ei
ba
1 ,1 pr 47.7 % 39.1 % 38.4 % 30.9 % 42.2 % 17.1 % 8.8 % 8.3 % 10.7 %
67 o ,5 i L1 es
9
M ot fa
m
m
ct
ot fa
et 5,
M b W ns u ili ei m 3,363 / 7,053 3,238 / 8,278 2,793 / 7,279 2,549 / 8,253 2,836 / 6,727 1,571 / 9,200 1,020 / 11,551 753 / 9,030 1,073 / 10,047 pr lo 7
riu L1
m act
02
ba e ei
4m hy
47.6 % 42.7 % 36.4 % 14.4 % 8.8 % 8.1 % 11.4 % ns ili 9
6,
te
m 6 97
ot fa ,92 ,5 es s
M o t u epr 6 W 5,
et
s,
c ie W
M hyl yl teri
5 , l
2,914 / 6,126 3,002 / 7,025 2,716 / 7,456 1,374 / 9,512 1,005 / 11,392 760 / 9,362 1,083 / 9,527 h r 61
9 6 n i
iu L ei
ba
3 p 22 44.0 % 46.5 % 38.5 % 31.8 % 41.6 % 38.0 % 18.1 % 8.8 % 7.9 % 10.8 %
73 L2 5, rot ies fa
M ot fa
m
m
ett oc 5, mW s, el m 3,322 / 7,552 3,417 / 7,348 3,374 / 8,769 2,456 / 7,729 2,955 / 7,101 2,778 / 7,314 1,586 / 8,785 1,023 / 11,599 748 / 9,493 1,070 / 9,863 pr lo 22
iru L7
ba bac
n i
hy
ei 5 m ns ili 9 ,6
M hy
% 45.8 % 36.2 % 30.6 % 18.2 % 8.9 % 8.4 % 11.3 %
m W ot ,97 fa , 6 es
6,
92
et
,5 te es
W
ol b ylo et r eriu pr 2 ns c ili
975 2,961 / 6,470 2,836 / 7,829 2,409 / 7,883 1,571 / 8,612 1,054 / 11,895 754 / 8,965 1,082 / 9,589 9 45 ,0
M L
18 6 L
5 pr ,4 27 46.8 % 43.0 % 38.3 % 34.1 % 44.6 % 38.3 % 40.4 % 17.9 % 8.9 % 8.1 % 10.3 %
ot
ei
ba fa
m
m 8
a h i cmt
ot , 5 fa ies
M lo
4 riu
6, W
et ect u pr
L1
eins mil 3,362 / 7,189 3,377 / 7,862 3,171 / 8,273 3,115 / 9,147 2,911 / 6,533 2,923 / 7,623 2,794 / 6,914 1,582 / 8,850 1,065 / 11,991 754 / 9,282 1,100 / 10,725 39
a ei ns
hy
M ter
m il 71 ,9
39.1 % 38.4 % 30.9 % 42.2 % 17.1 % 8.8 % 8.3 %
h 10.7 %
b 6
m ot , 5 fa ies 5 e s
ct
, 2
y o W 18
iu 6,
et
s, lie W
l L1 pr ,4
3,238 / 8,278 2,793 / 7,279 2,549 / 8,253 2,836 / 6,727 1,571 / 9,200 1,020 / 11,551 753 / 9,030 lob i
1,073 / 10,047
u 5 2 7 n i
L6 ter ba
4 i
m thy 79
pr ,0 43.8 % 45.4 % 37.8 % 32.2 % 39.4 % 39.8 % 40.7 % 40.2 % 16.3 % 8.8 % 7.6 % 10.7 % te
M ac o te 5,9
5
f am s, 6 es M o f am m 9
eW c i W n ili / 7,737
3,388 3,411 / 7,508 3,262 / 8,638 2,822 / 8,752 3,262 / 8,285 2,836 / 7,126 2,941 / 7,234 2,798 / 6,954 1,530 / 9,367 1,045 / 11,828 778 / 10,174 1,084 / 10,108 pr lo 05
riu L6
et ba ei il
hy
n
% 38.5 % 31.8 % 41.6 % 38.0 % 18.1 % 8.8 % te
7.9 %
M L
5, 10.8 %
h 73 m s,
ot ies fa
m
6,
58
9
,5
, 5
te es
lo riu et W
iru
5,
pr
L7 ns
ac ili
2
348 3,374 / 8,769 2,456 / 7,729 2,955 / 7,101 2,778 / 7,314 1,586 / 8,785 1,023 / 11,599 748 / 9,493 yl3 / 9,863
1,070 pr
61
5 9 11 y 56.5
5,
62 % 43.8 % 37.5 % 32.2 % 45.1 % 39.6 % 42.9 % 40.5 % 35.9 % 17.2 % 9.3 % 8.1 % 10.4 % ei m
m
M ob ot m fa ,92
6eth te es M ot
ob fa 9
W riu
s,
ac ili / 7,933 pr l L1
ei m6 3,706 6
a in / 6,561 3,476 3,162 / 8,421 2,928 / 9,087 3,360 / 7,457 3,399 / 8,593 2,877 / 6,701 2,944 / 7,272 2,705 / 7,528 1,564 / 9,109 1,162 / 12,513 773 / 9,538 1,084 / 10,423 23
38.3 % 34.1 % 44.6 % 38.3 % 40.4 % 17.9 % 8.9 % et 8.1 % ns 10.3 %
ct Wili te 47
hy 6,
m te
5 m
hy, 67 , e s M o b f a 8 6,
0 , s
W
L1 et
e
lo riu
5,
3,171 / 8,273 3,115 / 9,147 2,911 / 6,533 2,923 / 7,623 2,794 / 6,914 1,582 / 8,850 1,065 / 11,991 1 / 9,282
754 er
1,10012 / 10,725 pr 9
93 % L1 ei
ns
ac ili
M lo iu hy
pr 52.2 1 % 55.7 35.9 % 31.7 % 42.4 % 41.4 % 39.7 % 42.5 % 36.2 % 38.5 % 16.3 % 9.3 % 7.9 % 11.1 %
ot
9
fa 19 27 5,
te es M ot
ob fa
m
m
ba riu
6, / 6,866 s, W
et m et ac ili / 8,921 pr l L9
ei m 3,583 3,786 3
in / 6,796 3,207 2,814 / 8,885 3,357 / 7,919 3,299 / 7,978 3,305 / 8,329 2,874 / 6,755 2,838 / 7,850 2,783 / 7,221 1,616 / 9,941 1,120 / 12,002 774 / 9,852 1,093 / 9,884 21
hy
ns ili 10.7 % 5
% 32.2 % 39.4 % 39.8 % 40.7 % 40.2 % 16.3 %
h
6, 8.8 %
07
yl c , 5 7.6 %
te W es
M ro
te
b fa
m
m 9 6,
78
s,
4,
te s
W
et
ie
638 2,822 / 8,752 3,262 / 8,285 2,836 / 7,126 2,941 / 7,234 2,798 / 6,954 1,530 / 9,367 1,045 2
ob p / 11,828
,
778
riu
109 / 10,174 1,084
L3 / 10,108 p lo 05 iu L6 n
ac i l
M hy er ei
60.9 % 51.5 9 % 51.0 ,5 % 30.1 % 44.2 % 40.9 % 44.2 % 39.8 % 37.4 % 38.6 % 34.4 % 17.1 % 9.3 % 8.1 % 10.3 %
r 58 5 s
M ot
ob fa
m
m
ct
ot fa
ac 0 riu L4
6, / 7,112 s, / 7,439 W
et e i m m 3,775 / 6,197 3,660
et 3,797
n 2,837
i lie / 9,433 3,357 / 7,603 3,393 / 8,299 3,346 / 7,578 3,321 / 8,351 2,748 / 7,351 2,921 / 7,573 2,786 / 8,094 1,601 / 9,343 1,138 / 12,302 760 / 9,349 1,094 / 10,610 pr l 6 7
ei
ba hy
ns
32.2 % 45.1 % 39.6 % 42.9 % 40.5 % 35.9 %
hy
5, 17.2 % te , 5 9.3 %
ili
W es 8.1 % 10.4 %
M ot fa
m
m 9 45
3
,4
,7
te es
eW
44 4,
r lo iu et
pr
L1 ns
ac ili
,4 / 12,513
2,928 / 9,087 3,360 / 7,457 3,399 / 8,593 2,877 / 6,701 2,944 / 7,272 2,705 / 7,528 1,564lo
1 / 9,109 iu
1,162 773 / 9,538
L9 1,084 / 10,423 36
er ei
59
M hy
pr 61.4 % 60.1 % 51.7 7 % 42.4 ,2 % 41.7 % 42.1 % 42.4 % 44.2 % 38.2 % 40.4 % 34.8 % 36.7 % 16.5 % 9.3 % 7.7 % 12.2 %
ea
m
b m 04 ,6 es M ot
ob fa
ct
ot fa
et eia ct m 33,781 / 6,157 3,863 / 6,428 3,9026, / 7,548
et ns / 8,029
3,401 ili / 8,127
3,387 W 3,363 / 7,991 3,373 / 7,948 3,357 / 7,596 3,359 / 8,803 2,841 / 7,032 2,924 / 8,393 2,762 / 7,519 1,594 / 9,670 1,098 / 11,860 775 / 10,071 1,026 / 8,404 pr l 99
ac
% 42.4 % 41.4 % 39.7 % 42.5 % 36.2 %
h
6 38.5 % n 16.3 % W
il 9.3 % 7.9 % 11.1 % ei
ba
9
hy
,2
m st
s m 9
, 14
yl er
, i e s M ro
t fa 5,
0
s,
4 s
L1 lo iu et lie
5,
iu L9 yc
3
885 3,357 / 7,919 3,299 / 7,978 3,305 / 8,329 2,874 / 6,755 2,838 / 7,850 2
2,783pr / 7,221
ob 0
1,616 67 / 9,941 1,120 / 12,002 774 / 9,852 1,093 / 9,884 p
21 %
er
i n i
5
85 %
hy
56.0 % 60.5 % 49.0 % 45.3 56.8 40.1 % 44.9 % 42.5 % 38.8 % 38.7 % 36.7 % 36.7 % 35.3 % 17.3 % 9.3 % 9.0 % 12.0 %
L4
4, te am
M m 20 s
M oc
ct
o f 7 o f
te ac am
3,695 / 6,602 3,868 / 6,391 3,609 / 7,369 3,618 6, / 7,989
et 3,846
,
ns / 6,773 3,400
e
ili / 8,481 W 3,405 / 7,587 3,390 / 7,975 3,198 / 8,249 3,342 / 8,633 2,875 / 7,830 2,881 / 7,851 2,765 / 7,837 1,583 / 9,153 1,166 / 12,487 702 / 7,824 1,078 / 8,965 pr l 99
24
i
44.2 % 40.9 % 44.2 % 39.8 % 37.4 % et
6, 38.6 % n
te
s, 34.4 % W
ilie 17.1 % 9.3 % 8.1 % 10.3 %
ot
ei
ba fa
m
m 54
6
hy ,4
, 6
W es
09
hy M as L1
r s 4,
L1 lo iu L4 et
5,
3,357 / 7,603 3,393 / 8,299 3,346 / 7,578 3,321 / 8,351 2,748 / 7,351 9 / 7,573
2,921 2,786iu
80 / 8,094 1,601 / 9,343 1,138 / 12,302 760 / 9,349 1,094 / 10,610 pr 7 ns ili
76 %
lo hy er ei
pr 70.6 % 55.3 % 48.9 % 40.5 % 52.2 3 % 69.4 42.7 % 44.9 % 39.0 % 41.5 % 52.9 % 39.3 % 35.1 % 37.9 % 15.6 % 8.6 % 8.3 % 11.5 %
on
5 4, m
m 22 45 es M ot fa
ct W
ot fa
M eib ac m 4,075 / 5,770 3,782 / 6,837 3,595 / 7,346 3,213 / 7,932 3,709 4, / 7,103
et 4,473s,
in / 6,448 ili / 8,085
3,454 W 3,417 / 7,614 3,323 / 8,517 3,298 / 7,943 4,324 / 8,176 2,852 / 7,259 2,873 / 8,187 2,779 / 7,323 1,550 / 9,936 911 / 10,595 705 / 8,469 985 / 8,552 pr 5
% 42.1 % 42.4 % 44.2 % 38.2 % 5 40.4 %
, et
ns 34.8 %
,
te
W
ili 36.7 %
e 16.5 % 9.3 % 7.7 % 12.2 %
M
te
ba a m
ae 9 55
om 6,
41
as s
ce se
o
on
53 s f 4, ,
127 3,363 / 7,991 3,373 / 7,948 3,357 / 7,596 3,359 / 8,803 2,841h
5 / 7,032
5,
2,924
riu
6 / 8,393 L
2,762 / 7,519
10 1,594 / 9,670 1,098 / 11,860 775 / 10,071 1,026 / 8,404 pr lo 9
29 %
ns ili
e
ta Ro
8 ei
yl hy
p 4 49.5 % 70.7 % 50.4 % 40.5 % 58.5 % 68.4 9 % 68.4 42.7 % 39.3 % 41.0 % 62.0 % 55.6 % 37.1 % 37.1 % 33.0 % 19.2 % 8.6 % 9.0 % 16.6 % m
ro 09 4, s ot m fa
cs
f
3 5, / 6,603
go L3
ob a s,
et
t e pr
ei m
W 3,489 / 7,049 4,197 / 5,937 3,792 / 7,524 3,203 / 7,906 3,819 / 6,525 4,514 4,296
in / 6,278 ili / 8,101
3,462 3,247 / 8,270 3,381 / 8,255 4,514 / 7,279 4,282 / 7,706 2,828 / 7,613 2,869 / 7,725 2,686 / 8,143 1,488 / 7,765 961 / 11,138 711 / 7,912 1,401 / 8,446 2
40.1 % 44.9 % 42.5 % 38.8 % 5, 38.7 % n
, 4 36.7 %
ac i l es 36.7 % 35.3 % 17.3 % 9.3 % 9.0 % 12.0 %
cy 5 4
3,
99
L4
e m
in
s i t 23
3,400 / 8,481 3,405 / 7,587 3,390 / 7,975 3,198 / 8,249
10
8
3,342 ,9 te L1 M p ro
lo 99
fa 7,
h s, W ili
es
p / 8,633 2,87591 / 7,830 2,881 / 7,851 2,765 / 7,837 1,583 / 9,153 1,166 / 12,487 702 / 7,824 1,078 / 8,965 n
rot riu
fa 2 58.3 % 47.2 % 60.4 % 42.4 % 58.7 % 88.7 % 64.7
54
6 %
hy
68.44 ,6 % W 38.1 %
s 24 42.1 % 67.1 % 64.9 % 58.1 % 39.9 % 32.5 % 46.5 % 17.7 % 8.3 % 8.4 % 6.9 %
Sp ot
ei
iu
m fa
m
% 42.7 % 44.9 % 39.0 % 41.5 %
e ins 52.9 %
m
m ili 39.3 % 3,848 / 6,601
35.1 %
3,501 / 7,420
37.9 %
4,060 / 6,719
15.6 %
3,411 / 8,049
8.6 %
3,816 / 6,496
8.3 %
4,972 / 5,607
11.5 %
4,2274, / 6,537
et 4,307
i n s, / 6,299
as 3,320
i lie / 8,713
L1 3,346 / 7,957 4,852 / 7,228 4,453 / 6,857 4,521 / 7,777 2,842 / 7,115 2,772 / 8,532 2,740 / 5,893 1,485 / 8,384 883 / 10,671 697 / 8,344 784 / 11,356 1
pr
ob ,8
61
on
, e te am 15 ,4
448 3,454 / 8,085 3,417 / 7,614 3,323 / 8,517 3,298 / 7,943
4 ,6 / 8,176
4,324
s
W
2,852 / 7,259 2,873 / 8,187 2,779 / 7,323 1,550 / 9,936 911 / 10,595 705 / 8,469 985 / 8,552 M pr
o
15
f
W 4,
iz ns
om as Rh
ei
6 3
L8 73.0 % 59.3 % 57.2 % 48.0 % 55.4 % 87.4 % 95.4 % 65.055 % 65.0 , 4 % 40.2 % 71.1 % 91.5 % 61.3 % 63.6 % 34.2 % 50.4 % 41.9 % 18.0 % 8.7 % 6.4 % 7.6 %
ot
fa ,9 ,6 es
m 4,126 / 5,655 3,998 / 6,741 4,206 / 7,347 3,551 / 7,402 3,804 / 6,867 4,919 / 5,630 4,996 / 5,236 4,253 4 / 6,544 se ns / 6,725
4,370
on ili / 8,431
3,393 4,892 / 6,881 5,514 / 6,028 4,417 / 7,207 4,600 / 7,237 2,731 / 7,975 3,047 / 6,046 2,736 / 6,524 1,429 / 7,928 956 / 10,997 683 / 10,632 646 / 8,537 pr
Ro
68.4 % 42.7 % 39.3 % 41.0 % 62.0 % il 55.6 % 37.1 % 37.1 % 33.0 % 19.2 % 8.6 % 9.0 % 16.6 % ei m 5
i ot 08
om
e fa
L3
s
pr 5,
4,296 / 6,278 3,462 / 8,101 3,247 / 8,270 3,381 / 8,255 4,514 / 7,279 4,282 / 7,706 2,828 / 7,613 2,869 / 7,725 2,686 / 8,143 1,488 / 7,765 961 / 11,138 711 / 7,912 1,401 / 8,446 92
ng
6.5 % 7.1 % 4.9 % 4.1 % 5.3 % 6.4 % 6.5 % 6.7 % 6.24 % 5.9,9 % 8.0 % 6.9 % 6.5 % 6.1 % 5.4 % 4.1 % 4.9 % 4.0 % 2.9 % 7.1 % 3.3 % 2.8 %
23 ,3 es
303 / 4,663 355 / 4,991 280 / 5,684 237 / 5,805 268 / 5,067 347 / 5,459 333 / 5,109 346 / 5,129 348 7, / 5,615
hi ns / 5,445
322
ei
W ili / 6,027
480 387 / 5,622 389 / 5,939 337 / 5,505 337 / 6,236 174 / 4,213 233 / 4,767 170 / 4,299 135 / 4,699 457 / 6,415 132 / 3,992 138 / 4,861
Sp m
% 68.4 % 38.1 % 42.1 % 67.1 % 64.9 % 58.1 % 39.9 % 32.5 % 46.5 % 17.7 % 8.3 % 8.4 % 6.9 % m
ot fa
537 4,307 / 6,299 3,320 / 8,713 3,346 / 7,957 4,852 / 7,228 4,453 / 6,857 4,521 / 7,777 2,842 / 7,115 2,772 / 8,532 2,740 / 5,893 1,485 / 8,384 883 / 10,671 697 / 8,344 784 / 11,356 pr iu 61
ob
1 ,8
15 ,4
4,
iz ns
Rh
65.0 % 65.0 % 40.2 % 71.1 % 91.5 % 61.3 % 63.6 % 34.2 % 50.4 % 41.9 % 18.0 % 8.7 % 6.4 % 7.6 % ei
ot
4,253 / 6,544 4,370 / 6,725 3,393 / 8,431 4,892 / 6,881 5,514 / 6,028 4,417 / 7,207 4,600 / 7,237 2,731 / 7,975 3,047 / 6,046 2,736 / 6,524 1,429 / 7,928 956 / 10,997 683 / 10,632 646 / 8,537 pr
5
08
5,
6.2 % 5.9 % 8.0 % 6.9 % 6.5 % 6.1 % 5.4 % 4.1 % 4.9 % 4.0 % 2.9 % 7.1 % 3.3 % 2.8 %
29 348 / 5,615 322 / 5,445 480 / 6,027 387 / 5,622 389 / 5,939 337 / 5,505 337 / 6,236 174 / 4,213 233 / 4,767 170 / 4,299 135 / 4,699 457 / 6,415 132 / 3,992 138 / 4,861
2.8 %
Homology between proteomes Homology within proteomes
6.4 % 95.4 % 2.8 % 8.0 %
Figure 5
Methylocystaceae WL4