50025017
0021-9193/10/$12.00 doi:10.1128/JB.00542-10
Copyright 2010, American Society for Microbiology. All Rights Reserved.
from within the oral cavity. Studies have shown that different
oral structures and tissues are colonized by distinct microbial
communities (2, 39). Approximately 280 bacterial species from
the oral cavity have been isolated in culture and formally
named. It has been estimated that less than half of the bacterial
species present in the oral cavity can be cultivated using anaerobic microbiological methods and that there are likely 500
to 700 common oral species (47). Cultivation-independent molecular methods, primarily using 16S rRNA gene-based cloning
studies, have validated these estimates by identifying approximately 600 species or phylotypes (47; http://www.homd.org).
The oral cavity is a major gateway to the human body. Food
enters the mouth and is chewed and mixed with saliva on its
way to the stomach and intestinal tract. Air passes through the
nose and mouth on the way to the trachea and lungs. Microorganisms colonizing one area of the oral cavity have a significant probability of spreading on contiguous epithelial surfaces
to neighboring sites. Microorganisms from the oral cavity have
been shown to cause a number of oral infectious diseases,
including caries (tooth decay), periodontitis (gum disease),
endodontic (root canal) infections, alveolar osteitis (dry
socket), and tonsillitis. Evidence is accumulating which links
oral bacteria to a number of systemic diseases (58), including
cardiovascular disease (6, 32), stroke (31), preterm birth (46),
diabetes (25), and pneumonia (4).
For most of the history of infectious diseases, medical practitioners focused on individual organisms in pure culture
through the perspective of Kochs postulates. With the realization that essentially all surfaces of humans, animals, plants, and
inanimate objects, which have air or water interfaces, are cov-
The human oral cavity contains a number of different habitats, including the teeth, gingival sulcus, tongue,
cheeks, hard and soft palates, and tonsils, which are colonized by bacteria. The oral microbiome is comprised
of over 600 prevalent taxa at the species level, with distinct subsets predominating at different habitats. The
oral microbiome has been extensively characterized by cultivation and culture-independent molecular methods
such as 16S rRNA cloning. Unfortunately, the vast majority of unnamed oral taxa are referenced by clone
numbers or 16S rRNA GenBank accession numbers, often without taxonomic anchors. The first aim of this
research was to collect 16S rRNA gene sequences into a curated phylogeny-based database, the Human Oral
Microbiome Database (HOMD), and make it web accessible (www.homd.org). The HOMD includes 619 taxa
in 13 phyla, as follows: Actinobacteria, Bacteroidetes, Chlamydiae, Chloroflexi, Euryarchaeota, Firmicutes, Fusobacteria, Proteobacteria, Spirochaetes, SR1, Synergistetes, Tenericutes, and TM7. The second aim was to analyze
36,043 16S rRNA gene clones isolated from studies of the oral microbiota to determine the relative abundance
of taxa and identify novel candidate taxa. The analysis identified 1,179 taxa, of which 24% were named, 8% were
cultivated but unnamed, and 68% were uncultivated phylotypes. Upon validation, 434 novel, nonsingleton taxa
will be added to the HOMD. The number of taxa needed to account for 90%, 95%, or 99% of the clones examined
is 259, 413, and 875, respectively. The HOMD is the first curated description of a human-associated microbiome and provides tools for use in understanding the role of the microbiome in health and disease.
was added and incubated at 55C for 2 h. Proteinase K was inactivated by being
heated at 95C for 5 min. A total of 1 l of this preparation was used for PCR.
16S rRNA gene clone library-based studies. Table S1 in the supplemental
material describes the study source for all clones. Specific details on patient
populations, sampling protocols, and sequencing methods used in published
studies are given in the references listed in this table. In brief, 16S rRNA gene
clone libraries were created from and analyzed in unpublished studies and the
following published studies: treponemes in a subject with severe destructive
periodontitis (10); treponemes from several subjects with periodontitis and acute
necrotizing ulcerative gingivitis (ANUG) (18); subgingival plaque from healthy
subjects and subjects with periodontitis, HIV periodontitis, and acute necrotizing
ulcerative gingivitis (ANUG) (47); dental plaque from children with caries (7);
endodontic lesions (45); subjects with advanced noma lesions (49); subjects with
necrotizing ulcerative periodontitis in HIV-positive subjects (1, 50); dorsum
tongue microbiota in subjects with halitosis (36); dental caries in adults (44);
normal biota of healthy subjects at subgingival, supragingival, dorsal tongue,
ventral tongue, hard palate, vestibule, and tonsil sites (2); periodontitis in adults
(15); aggressive periodontitis (22); caries-active and caries-free twins (14); root
caries in elderly subjects (52); ventilator-associated pneumonia (5).
DNA purification from clinical samples. Dental plaque from teeth or subgingival periodontal pockets was collected using sterile Gracey curettes. Plaque
from the curette was transferred into 100 l of TE buffer (50 mM Tris-HCl, pH
7.6; 1 mM EDTA). Bacteria on soft tissues were sampled using nylon swabs. The
material from the swab was dispersed into 150 l of TE buffer. DNA extraction
was performed using the UltraClean microbial DNA isolation kit (Mo Bio
Laboratories, Carlsbad, CA) by following the manufacturers instructions for the
isolation of genomic DNA from Gram-positive bacteria.
16S rRNA gene amplification. Purified DNA samples were generally amplified
with universal primers F24/Y36 to construct broad-coverage libraries. The sequences of primers are given in Table S2 in the supplemental material. Additional libraries seeking expanded coverage of Bacteroidetes/TM7/SR1 groups or
Spirochaetes/Synergistetes groups were amplified with F24/F01 or F24/M98 selective primers, respectively. PCR was performed in thin-walled tubes using a
PerkinElmer 9700 Thermo Cycler. The reaction mixture (50 l, final volume)
contained 1 l of the purified DNA template, 20 pmol of each primer, 40 nmol
of deoxynucleoside triphosphates (dNTPs), 2.5 unit of Platinum Taq polymerase
(Invitrogen, Carlsbad, CA), and 5 l 10 PCR buffer (200 mM Tris-HCl, pH 8.4;
500 mM KCl). A hot-start protocol was used in which samples were preheated at
94C for 4 min, followed by amplification using the following conditions: denaturation at 94C for 45 s, annealing at 60C for 45 s, and elongation at 72C for
2 min, with an additional 1 s for each cycle. Thirty cycles were performed,
followed by a final elongation step at 72C for 15 min. Amplicon size and amount
were examined by electrophoresis in a 1% agarose gel stained with SYBR Safe
DNA gel stain (Invitrogen, Carlsbad, CA) and visualized under UV light. After
verification that a strong amplicon of the correct size was produced, a preparative
gel was run, and the full-length amplicon band was cut out and DNA purified
using a Qiagen gel extraction kit (Qiagen, Valencia, CA).
Cloning procedures. Size-purified 16S rRNA gene amplicons were cloned
using a TOPO TA cloning kit (Invitrogen, Carlsbad, CA) by following the
manufacturers instructions. Transformation was performed using competent
Escherichia coli TOP10 cells provided by the manufacturer. Transformed cells
were plated onto Luria-Bertani agar plates supplemented with kanamycin (50
g/ml) and incubated overnight at 37C.
Library screening. Approximately 90 colonies were picked for each library and
were placed into tubes containing 40 l of 10 mM Tris-HCl, pH 8.0. A total of
1 l of the cell suspension was used directly as the template for PCR with
Invitrogen vector M13 (21) forward and M13 reverse primers. Electrophoresis
on a 1% agarose gel was used to verify the correct amplicon size. PCR product
for preliminary sequencing with primer Y31 (positions 519 to 533, reverse) was
treated with exonuclease and shrimp alkaline phosphatase to remove primers
and dNTPs. Five microliters of PCR product was combined with 0.4 l exonuclease I (10 U/l; USB Corporation, Cleveland, OH) and 0.4 l shrimp alkaline
phosphatase (1 U/l; USB Corporation, Cleveland, OH). The reaction mixture
was incubated at 37C for 15 min and then deactivated at 85C for 15 min. The
PCR products from clones chosen for full sequencing with eight additional
primers were further concentrated and purified using QIAquick PCR purification kits (Qiagen, Valencia, CA).
16S rRNA gene sequencing. Purified DNA was sequenced using an ABI Prism
cycle sequencing kit (BigDye Terminator cycle sequencing kit) on an ABI 3100
genetic analyzer (Applied Biosystems, Foster City, CA). The sequencing primers
(see Table S2 in the supplemental material) were used in quarter-dye reactions
by following the manufacturers instructions.
5003
5004
DEWHIRST ET AL.
ing clone sequence was compared by BLASTN to the reference sequence(s) and,
if matched, added to that taxon folder. If the BLASTN match failed, the clone
sequence was used to establish a new taxon folder and added to the reference
sequence list. The scripts for these analyses can be obtained from T. Chen.
Extended human oral taxon numbers (A01 to H70) were assigned for each novel
cluster/folder.
Nucleotide sequence accession numbers. The 16S rRNA gene sequences for
the 34,753 clones analyzed are available for download at the Human Oral
Microbiome Database website (http://www.homd.org) and from GenBank under
accession numbers GU397556 to GU432434. GenBank accession numbers for
each taxon in the seven phylogenetic tree figures are included with each taxon
label. Additional full-length 16S rRNA gene sequences deposited for this work
include GenBank accession numbers FJ577249 to FJ577261, FJ717335,
FJ717336, FJ7173350, GQ131410 to GQ131418, and GU470887 to GU470911.
16S rRNA data analysis. Sequence information determined using primer Y31
(positions 519 to 533, reverse) allows preliminary identification of clones. Clones
or strains whose sequences appeared novel (differing by more than 7 bases from
previously identified oral reference sequences in the first 500 bp) were fully
sequenced on both strands (approximately 1,540 bases) using 6 to 8 additional
sequencing primers (see Table S2 in the supplemental material). Sequences were
assembled from the ABI electropherogram files using Sequencher (Gene Codes
Corporation, Ann Arbor, MI).
Aligned 16S rRNA database. All full-length human oral 16S rRNA gene
sequences which we believed represented novel taxa and those of named human
oral species available in GenBank were entered into a new Aligned Reference
Sequence Database. More than 100 nonoral sequences were also entered to link
oral phylogenetic clusters to named taxa. The basic program set for data entry,
editing and sequence alignment, secondary structure comparison, similarity matrix generation, and phylogenetic tree construction was written by F. E. Dewhirst
in Microsoft QuickBasic and has been previously described (48) (the program is
available from F. E. Dewhirst). Trees for this work were made by exporting
aligned sequences from our database into MEGA version 4 (60). Similarity
matrices were corrected for multiple base changes at single positions by the
method of Jukes and Cantor (33). Similarity matrices were constructed from
the aligned sequences by using only those sequence positions for which 95%
of the strains had data. Phylogenetic trees were constructed using the neighbor-joining method of Saitou and Nei (56). Bootstrapping was performed
using 1,000 resamplings.
Creation of the human oral microbiome 16S rRNA gene reference set. The
sequences for named species, isolates, and clones were obtained primarily by
sequencing efforts in our laboratories or from GenBank. The list of named oral
organisms was compiled from the literature and relied heavily on literature
reports from investigators at the Forsyth Institute (20, 21, 59, 61, 62) and from
Lillian Holdeman Moore and W. E. C. Moore (41, 42, 43), formerly at the
Anaerobe Laboratory at the Virginia Polytechnic Institute. To the initial list of
oral microorganisms, we added exogenous pathogens, such as Corynebacterium
diphtheriae, Bordetella pertussis, Treponema pallidum, Neisseria gonorrhoeae, and
several other species which are causative agents of oral lesions and diseases. For
the 16S rRNA gene sequences of strains or clones that did not match the named
species, we created novel 16S rRNA gene-based phylotypes. We define a phylotype as a cluster of full-length 16S rRNA gene sequences that have greater than
98.5% similarity to one another (23 base mismatches per 1,540 bases) and have
less than 98.5% similarity to neighboring taxa (species or phylotypes). Each
species and phylotype was assigned a human oral taxon (HOT) number, starting
at 001. Prior to assigning the HOT numbers, all provisional sequences were
compared, and those with sequences having greater than 98.5% similarity were
merged into single taxa, except for validly named species, which retained individual HOT numbers regardless of rRNA gene sequence similarity. Sequences
were checked for the possibility of being chimeric using multiple methods.
Neighbor-joining trees were generated using the first 600 bases and compared
with trees using the last 900 bases. Taxa that changed position in the two trees
were further examined with the Chimera Check program at the Ribosomal
Database Project (11) and with Mallard (3). The first and last 100 bases of all
clone sequences described below were analyzed by BLASTN analysis against the
HOMD Reference Set. The distance between end matches was captured from a
distance matrix file for all full-length HOMD references sequences. All sequences with ends being more than 10% different were rejected as chimeric. The
script for this program is available from us. The HOMD reference sequences and
clone sequences were rescreened using Chimera Slayer (courtesy of Brian Haas,
the Broad Institute [http://sourceforge.net]).
Analysis of 16S rRNA gene sequences obtained from clone studies. Clone
sequences were subject to BLASTN analysis against the HOMD Reference
Sequence Set (version 10). Because the first 500 bases of the 16S rRNA molecule
generally contain almost half the variability of the full sequence, a match cutoff
of 98% similarity with 95% coverage was used as the identification criteria.
Those sequences that did not match a human oral taxon sequence were subject
to BLASTN analysis against all sequences at the Ribosomal Database Project
(RDP; release 10, update 3) (11) and Greengenes (16). Because many clone
queries can match the same RDP or Greengenes subject match, a set of unique
reference matches was generated. Because this set could contain multiple entries
representing a single phylotype, the external database match sequences were
clustered as described below. Unique human oral taxon numbers were assigned
to each unique phylotype. The clones that did not meet the match criteria to the
HOMD, RDP, or Greengenes were clustered into novel taxa defined by the 98%
identity with 95% coverage criteria. Clustering was performed by first sorting the
clones by length. The first clone was considered a novel taxon and placed in a first
taxon folder, and its sequence was declared a reference sequence. Each succeed-
J. BACTERIOL.
5005
Named speciesb
Unnamed
cultivated taxac
Unnamed
uncultivated taxad
Bacteria
Firmicutes
Bacteroidetes
Proteobacteria
Actinobacteria
Spirochaetes
Fusobacteria
TM7
Synergistetes
Chlamydiae
Chloroflexi
SR1
227 (36.7)
107 (17.3)
106 (17.1)
72 (11.6)
49 (7.9)
32 (5.2)
12 (1.9)
10 (1.6)
1 (0.2)
1 (0.2)
1 (0.2)
120 (52.9)
39 (36.4)
70 (66.0)
37 (51.4)
11 (22.4)
12 (37.5)
0 (0.0)
2 (20.0)
1 (100.0)
0 (0.0)
0 (0.0)
45 (19.8)
27 (25.2)
9 (8.5)
25 (34.7)
3 (6.1)
4 (12.5)
0 (0.0)
0 (0.0)
0 (0.0)
0 (0.0)
0 (0.0)
62 (27.3)
41 (38.3)
27 (25.5)
10 (13.9)
35 (71.4)
16 (50.0)
12 (100.0)
8 (80.0)
0 (0.0)
1 (100.0)
1 (100.0)
Archaea
Euryarchaeota
1 (0.2)
0 (0.0)
0 (0.0)
113 (18.3)
213 (34.4)
Total
619 (100)
1 (100.0)
293 (47.3)
Taxa refer to named species and to phylotypes with or without cultivable members. Phylotypes are defined as clusters whose members have 98.5% full 16S rRNA
sequence similarity. The data in this table are based on those in HOMD version 10.
b
Named species are those with validly published names.
c
Unnamed cultivable taxa are phylotypes that have at least one extant isolate.
d
Uncultivated taxa are phylotypes known only from clone sequences.
Taxaa
5006
DEWHIRST ET AL.
J. BACTERIOL.
5007
34,879 that were identified as this taxon, and a symbol indicating that the taxon is a named species (F), an unnamed cultivated taxa (f), or an
uncultivated phylotype (). The tree was constructed with MEGA 4.3 using the Jukes and Cantor correction neighbor-joining distance matrix.
Comparisons with missing data were eliminated pairwise. The numbers to the left of the branches indicate the percentage of times the clade was
recovered out of 1,000 bootstrap resamplings. Only bootstrap percentages greater than 50 are shown. Roman numerals in square brackets following
a genus name indicate Collins Clostridia cluster numbers (17). Major clades are marked as follows: encircled 1, Bacilli; encircled 2, Firmicutes;
encircled 3, Erysipelotrichia; and encircled 4, Tenericutes (previously know as the class Mollicutes within Firmicutes).
5008
DEWHIRST ET AL.
J. BACTERIOL.
previously misclassified as belonging to either the Deferribacteres or Firmicutes (28). Oral strains and clones fall into two
main groups, with one which is readily cultivable and includes
the named species Jonquetella anthropi (34) and Pyramidobacter piscolens (19). Most oral sequences fall into the
second group (oral taxon 363 through 359), which until recently had no cultivable representatives. However, Vartoukian
et al. (67) have successfully cultured a member of this group by
5009
FIG. 3. Neighbor-joining tree for human oral taxa in the Veillonellaceae (previously Acidaminococcaceae) family of the class Clostridia of the
phylum Firmicutes. Labeling and methods used are as described in Fig. 1.
5013
FIG. 7. Neighbor-joining tree for human oral taxa in the phyla Spirochaetes, Chlamydiae, Chloroflexi, Synergistetes, TM7, and SR1. Labeling and
methods used are as described in Fig. 1. The phyla are labeled as follows: encircled 1, Spirochaetes; encircled 2, Chlamydiae; encircled 3,
Chloroflexi; encircled 4, Synergistetes; encircled 5, TM7; and encircled 6, SR1.
5014
DEWHIRST ET AL.
J. BACTERIOL.
TABLE 2. Phylogenetic distribution of 34,753 oral clones
% of taxa that represent:
Phylum
Total
Named
species
Unnamed
isolates
Uncultivated
phylotypes
494 (41.9)
237 (20.1)
153 (13.0)
133 (11.3)
73 (6.2)
42 (3.6)
18 (1.5)
15 (1.2)
4 (0.3)
3 (0.2)
1 (0.1)
1 (0.1)
1 (0.1)
1 (0.1)
4 (0.3)
20.6
33.3
19.0
35.3
9.6
23.8
11.1
0.0
75.0
0.0
0.0
0.0
0.0
0.0
100.0
8.7
3.0
16.3
14.3
2.7
7.1
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
70.6
63.7
64.7
50.4
87.7
69.0
88.9
100.0
25.0
100.0
100.0
100.0
100.0
100.0
0.0
23,354 (67.212)
3,909 (11.250)
2,571 (7.403)
2,641 (7.572)
577 (1.671)
1,356 (3.893)
209 (0.605)
85 (0.244)
10 (0.028)
13 (0.037)
7 (0.020)
1 (0.003)
1 (0.003)
1 (0.003)
19 (0.054)
78.7
72.7
58.6
55.7
33.8
76.6
12.4
0.0
90.0
0.0
0.0
0.0
0.0
0.0
100.0
7.3
3.8
24.1
20.3
4.3
4.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
14.1
23.6
17.3
24.0
61.9
19.4
87.6
100.0
10.0
100.0
100.0
100.0
100.0
100.0
0.00
1,179 (100)
24.0
8.4
67.7
34,753 (100.000)
73.3
8.8
17.8
Named
species
Unnamed
isolates
Uncultivated
Phylotypes
FIG. 8. Rank abundance graph for 34,753 16S rRNA clones obtained from oral samples in 1,179 taxa. Clones were placed in taxa on
the basis of 98% BLASTN identities. The first ranked taxon was
Veillonella parvula, with 2,304 clones. Ranks 769 to 1,179 were singletons.
Firmicutes
Proteobacteria
Bacteroidetes
Actinobacteria
Spirochaetes
Fusobacteria
Synergistetes
TM7
Tenericutes
Deinococcus
SR1
Chloroflexi
Acidobacteria
Cyanobacteria
Plant chloroplast
No. of taxa
(%)
5015
5016
DEWHIRST ET AL.
J. BACTERIOL.
52.
53.
54.
55.
56.
57.
58.
60.
61.
62.
63.
64.
65.
66.
67.
68.
69.
5017
59.
Zakhari, J. Read, B. Watson, and M. Guyer. 2009. The NIH Human Microbiome Project. Genome Res. 19:23172323.
Preza, D., I. Olsen, J. A. Aas, T. Willumsen, B. Grinde, and B. J. Paster.
2008. Bacterial profiles of root caries in elderly patients. J. Clin. Microbiol.
46:20152021.
Rheims, H., F. A. Rainey, and E. Stackebrandt. 1996. A molecular approach
to search for diversity among bacteria in the environment. J. Ind. Microbiol.
Biotechnol. 17:159169.
Rosebury, T. 1962. Microorganisms indigenous to man. McGraw-Hill Book
Company, Inc., New York, NY.
Ruoff, K. L. 1991. Nutritionally variant streptococci. Clin. Microbiol. Rev.
4:184190.
Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method
for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406425.
Senol, E. 2004. Stenotrophomonas maltophilia: the significance and role as a
nosocomial pathogen. J. Hosp. Infect. 57:17.
Seymour, G. J., P. J. Ford, M. P. Cullinan, S. Leishman, and K. Yamazaki.
2007. Relationship between periodontal infections and systemic disease.
Clin. Microbiol. Infect. 13(Suppl. 4):310.
Socransky, S. S., and A. D. Haffajee. 1994. Evidence of bacterial etiology: a
historical perspective. Periodontol. 2000 5:725.
Tamura, K., J. Dudley, M. Nei, and S. Kumar. 2007. MEGA4: Molecular
Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol. Biol.
Evol. 24:15961599.
Tanner, A., M. F. Maiden, P. J. Macuch, L. L. Murray, and R. L. Kent, Jr.
1998. Microbiota of health, gingivitis, and initial periodontitis. J. Clin. Periodontol. 25:8598.
Tanner, A. C., C. Haffer, G. T. Bratthall, R. A. Visconti, and S. S. Socransky.