Anda di halaman 1dari 14

Resource

Highly Parallel Genome-wide Expression Profiling of


Individual Cells Using Nanoliter Droplets
Graphical Abstract

Authors
Evan Z. Macosko, Anindita Basu, ...,
Aviv Regev, Steven A. McCarroll

Correspondence
emacosko@genetics.med.harvard.edu
(E.Z.M.),
mccarroll@genetics.med.harvard.edu
(S.A.M.)

In Brief
Capturing single cells along with sets of
uniquely barcoded primer beads together
in tiny droplets enables large-scale,
highly parallel single-cell transcriptomics.
Applying this analysis to cells in mouse
retinal tissue revealed transcriptionally
distinct cell populations along with
molecular markers of each type.

Highlights
d

Drop-seq enables highly parallel analysis of individual cells


by RNA-seq

Drop-seq encapsulates cells in nanoliter droplets together


with DNA-barcoded beads

Systematic evaluation of Drop-seq library quality using


species mixing experiments

Drop-seq analysis of 44,808 cells identifies 39 cell


populations in the retina

Macosko et al., 2015, Cell 161, 12021214


May 21, 2015 2015 Elsevier Inc.
http://dx.doi.org/10.1016/j.cell.2015.05.002

Accession Numbers
GSE63473

Resource
Highly Parallel Genome-wide Expression Profiling
of Individual Cells Using Nanoliter Droplets
Evan Z. Macosko,1,2,3,* Anindita Basu,4,5 Rahul Satija,4,6,7 James Nemesh,1,2,3 Karthik Shekhar,4 Melissa Goldman,1,2
Itay Tirosh,4 Allison R. Bialas,8 Nolan Kamitaki,1,2,3 Emily M. Martersteck,9 John J. Trombetta,4 David A. Weitz,5,10
Joshua R. Sanes,9 Alex K. Shalek,4,11,12 Aviv Regev,4,13,14 and Steven A. McCarroll1,2,3,*
1Department

of Genetics, Harvard Medical School, Boston, MA 02115, USA


Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
3Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
4Klarman Cell Observatory, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
5School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138, USA
6New York Genome Center, New York, NY 10013, USA
7Department of Biology, New York University, New York, NY 10003, USA
8The Program in Cellular and Molecular Medicine, Childrens Hospital Boston, Boston, MA 02115, USA
9Department of Molecular and Cellular Biology and Center for Brain Science, Harvard University, Cambridge, MA 02138, USA
10Department of Physics, Harvard University, Cambridge, MA 02138, USA
11Ragon Institute of MGH, MIT, and Harvard, Cambridge, MA 02139, USA
12Institute for Medical Engineering and Science and Department of Chemistry, MIT, Cambridge, MA 02139, USA
13Department of Biology, MIT, Cambridge, MA 02139, USA
14Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
*Correspondence: emacosko@genetics.med.harvard.edu (E.Z.M.), mccarroll@genetics.med.harvard.edu (S.A.M.)
http://dx.doi.org/10.1016/j.cell.2015.05.002
2Stanley

SUMMARY

Cells, the basic units of biological structure and


function, vary broadly in type and state. Singlecell genomics can characterize cell identity and
function, but limitations of ease and scale have prevented its broad application. Here we describe
Drop-seq, a strategy for quickly profiling thousands
of individual cells by separating them into nanoliter-sized aqueous droplets, associating a different
barcode with each cells RNAs, and sequencing
them all together. Drop-seq analyzes mRNA transcripts from thousands of individual cells simultaneously while remembering transcripts cell of
origin. We analyzed transcriptomes from 44,808
mouse retinal cells and identified 39 transcriptionally distinct cell populations, creating a molecular atlas of gene expression for known retinal
cell classes and novel candidate cell subtypes.
Drop-seq will accelerate biological discovery by
enabling routine transcriptional profiling at singlecell resolution.
INTRODUCTION
Individual cells are the building blocks of tissues, organs, and organisms. Each tissue contains cells of many types, and cells of
each type can switch among biological states. In most biological
systems, our knowledge of cellular diversity is incomplete; for
example, the cell-type complexity of the brain is unknown and
widely debated (Luo et al., 2008; Petilla Interneuron Nomenclature Group, et al., 2008). To understand how complex tissues
1202 Cell 161, 12021214, May 21, 2015 2015 Elsevier Inc.

work, it will be important to learn the functional capacities and responses of each cell type.
A major determinant of each cells function is its transcriptional
program. Recent advances now enable mRNA-seq analysis of
individual cells (Tang et al., 2009). However, methods of preparing cells for profiling have been applicable in practice to just hundreds (Hashimshony et al., 2012; Picelli et al., 2013) or (with automation) a few thousand cells (Jaitin et al., 2014), typically after
first separating the cells by flow sorting (Shalek et al., 2013) or
microfluidics (Shalek et al., 2014) and then amplifying each cells
transcriptome separately. Fast, scalable approaches are needed
to characterize complex tissues with many cell types and states,
under diverse conditions and perturbations.
Here, we describe Drop-seq, a method to analyze mRNA
expression in thousands of individual cells by encapsulating
cells in tiny droplets for parallel analysis. Dropletsnanoliterscale aqueous compartments formed by precisely combining
aqueous and oil flows in a microfluidic device (Thorsen et al.,
2001; Umbanhowar et al., 2000)have been used as tiny reaction chambers for PCR (Hindson et al., 2011; Vogelstein and
Kinzler, 1999) and reverse transcription (Beer et al., 2008). We
sought here to use droplets to compartmentalize cells into nanoliter-sized reaction chambers for analysis of all of their RNAs. A
basic challenge of using droplets for transcriptomics is to retain
a molecular memory of the identity of the cell from which each
mRNA transcript was isolated. To accomplish this, we developed
a molecular barcoding strategy to remember the cell-of-origin of
each mRNA. We critically evaluated Drop-seq, then used it to
profile cell states along the cell cycle. We then applied it to a complex neural tissue, mouse retina, and from 44,808 cell profiles
identified 39 distinct populations, each corresponding to one or
a group of closely related cell types. Our results demonstrate
how large-scale single-cell analysis can help deepen our understanding of the biology of complex tissues and cell populations.

Complex tissue

Barcoded primer bead

Cell isolation

Cell suspension

STAMPs

Library

Use Drop-Seq to analyze the


RNA of each individual cell

Suspend in droplets with


beads (microparticles)

Single-cell transcriptomes
attached to microparticles

RNA-seq library with 10,000


single-cell transcriptomes

Synthesis
Round 1

UMI

Synthesis
Round 2

A
G
C
T

TTT(T27)
PCR
Cell
handle barcode

Synthesis of cell barcode (12 bases)

Synthesis
Round 12

A
G
C
T
4

Synthesis of UMI (8 bases)

CT
AG

8 rounds
of synthesis

Millions of the same cell


barcode per bead
16

16,777,216

48 different molecular
barcodes (UMIs) per bead

Number of unique barcodes in pool

Figure 1. Molecular Barcoding of Cellular Transcriptomes in Droplets


(A) Drop-Seq barcoding schematic. A complex tissue is dissociated into individual cells, which are then encapsulated in droplets together with microparticles
(gray circles) that deliver barcoded primers. Each cell is lysed within a droplet; its mRNAs bind to the primers on its companion microparticle. The mRNAs are
reverse-transcribed into cDNAs, generating a set of beads called single-cell transcriptomes attached to microparticles (STAMPs). The barcoded STAMPs can
then be amplified in pools for high-throughput mRNA-seq to analyze any desired number of individual cells.
(B) Sequence of primers on the microparticle. The primers on all beads contain a common sequence (PCR handle) to enable PCR amplification after STAMP
formation. Each microparticle contains more than 108 individual primers that share the same cell barcode (C) but have different unique molecular identifiers
(UMIs), enabling mRNA transcripts to be digitally counted (D). A 30-bp oligo dT sequence is present at the end of all primer sequences for capture of mRNAs.
(C) Split-and-pool synthesis of the cell barcode. To generate the cell barcode, the pool of microparticles is repeatedly split into four equally sized oligonucleotide
synthesis reactions, to which one of the four DNA bases is added, and then pooled together after each cycle, in a total of 12 split-pool cycles. The barcode
synthesized on any individual bead reflects that beads unique path through the series of synthesis reactions. The result is a pool of microparticles, each
possessing one of 412 (16,777,216) possible sequences on its entire complement of primers (see also Figure S1).
(D) Synthesis of a unique molecular identifier (UMI). Following the completion of the split-and-pool synthesis cycles, all microparticles are together subjected to
eight rounds of degenerate synthesis with all four DNA bases available during each cycle, such that each individual primer receives one of 48 (65,536) possible
sequences (UMIs).

RESULTS
Drop-seq consists of the following steps (Figure 1A): (1) prepare
a single-cell suspension from a tissue; (2) co-encapsulate each
cell with a distinctly barcoded microparticle (bead) in a nanoliter-scale droplet; (3) lyse cells after they have been isolated in
droplets; (4) capture a cells mRNAs on its companion microparticle, forming STAMPs (single-cell transcriptomes attached to
microparticles); (5) reverse-transcribe, amplify, and sequence
thousands of STAMPs in one reaction; and (6) use the STAMP
barcodes to infer each transcripts cell of origin.
A Split-Pool Synthesis Approach to Generate Large
Numbers of Distinctly Barcoded Beads
To deliver large numbers of distinctly barcoded primer molecules into individual droplets, we use microparticles (beads).
We synthesized oligonucleotide primers directly on beads
(from 50 to 30 , yielding free 30 ends available for enzymatic priming). Each oligonucleotide is composed of four parts (Figure 1B):
(1) a constant sequence (identical on all primers and beads) for
use as a priming site for downstream PCR and sequencing; (2)
a cell barcode (identical across all the primers on the surface
of any one bead, but different from the cell barcodes on other
beads); (3) a Unique Molecular Identifier (UMI) (different on
each primer, to identify PCR duplicates) (Kivioja et al., 2012);
and (4) an oligo-dT sequence for capturing polyadenylated
mRNAs and priming reverse transcription.

To efficiently generate massive numbers of beads, each with a


distinct barcode, we developed a split-and-pool DNA synthesis strategy (Figure 1C). A pool of millions of microparticles is
divided into four equally sized groups; a different DNA base
(A, G, C, or T) is then added to each. All microparticles are
then re-pooled, mixed, and re-split at random into another four
groups, and then a different DNA base (A, G, C, or T) is added
to each of the four new groups. After 12 cycles of split-andpool DNA synthesis, the primers on any given microparticle
possess the same one of 412 = 16,777,216 possible 12-bp barcodes, but different microparticles have different sequences
(Figure 1C). The entire microparticle pool then undergoes eight
rounds of degenerate oligonucleotide synthesis to generate
the UMI on each oligo (Figure 1D); finally, an oligo-dT sequence
(T30) is synthesized on the 30 end of all oligos on all beads.
To confirm that we could distinguish RNAs based on attached
barcodes, we reverse-transcribed a pool of synthetic RNAs onto
11 microparticles and sequenced the resulting cDNAs (Figure S1A and Supplemental Experimental Procedures); 11 microparticle barcodes each constituted 3.5%14% of the resulting
sequencing reads, whereas the next-most-abundant 12-mer
constituted only 0.06% (Figure S1A). These results suggested
that the microparticle-of-origin for most cDNAs can be recognized by sequencing. We also found that each bead contained
more than 108 barcoded primer sites and that the sequence
complexity of the barcodes approached theoretical limits (Figures S1B and S1C, Supplemental Experimental Procedures).
Cell 161, 12021214, May 21, 2015 2015 Elsevier Inc. 1203

Microfluidics Device for Co-encapsulating Cells with


Beads
We designed a microfluidic co-flow device (Utada et al., 2007)
to co-encapsulate cells with barcoded microparticles (Figures
2A and S2 and Data S1). This device quickly co-flows two
aqueous solutions across an oil channel to form more than
100,000 nanoliter-sized droplets per minute. One flow contains
the barcoded microparticles suspended in a lysis buffer; the
other flow contains a cell suspension (Figure 2A, left, and 2B).
The number of droplets created greatly exceeds the number of
beads or cells injected, so that a droplet will generally contain
zero or one cells, and zero or one beads. Millions of nanolitersized droplets are generated per hour, of which thousands
contain both a bead and a cell (Movie S1). STAMPs are produced
in the subset of droplets that contain both a bead and a cell.
Sequencing and Analysis of Many STAMPs in a Single
Reaction
To efficiently process thousands of STAMPs at once, we break
droplets, collect the mRNA-bound microparticles, and reversetranscribe the mRNAs (from the microparticle-attached primers)
together in one reaction, forming covalent, stable STAMPs (Figure 2A, step 7, and Experimental Procedures). A scientist can
then select any desired number of STAMPs for the preparation
of 30 -end digital expression libraries (Figure 2C, Experimental
Procedures). We sequence the resulting molecules from each
end (Figure 2C) using high-capacity parallel sequencing. We
digitally count the number of mRNA transcripts of each gene ascertained in each cell, using the UMIs to avoid double-counting
sequence reads that arose from the same mRNA transcript. We
thereby create a matrix of digital gene-expression measurements (one measurement per gene per cell) for further analysis
(Figure 2D, Experimental Procedures).
The Single-Cell Accuracy and Sensitivity of Drop-Seq
Libraries
To measure the accuracy with which Drop-seq remembers the
cell-of-origin of each mRNA, we analyzed mixtures of cultured
human (HEK) and mouse (3T3) cells, scoring the numbers of human and mouse transcripts that associated with each cell barcode (Figures 3A, 3B, and S3A). We found that the individual
STAMPs created by Drop-seq were highly organism-specific
(Figures 3A and 3B), indicating high single-cell integrity of the libraries. At saturating levels of sequence coverage, we detected
an average of 44,295 mRNA transcripts from 6,722 genes in HEK
cells and 26,044 transcripts from 5,663 genes in 3T3 cells (Figures 3C and 3D).
To understand how Drop-seq libraries compare to other
single-cell methods, we used three quality metrics: (1) the frequency of cell-cell doublets; (2) single-cell purity; and (3) transcript capture rates.
Cell Doublets
One potential mode of failure in any single-cell method involves
cells that stick together or happen to otherwise be co-isolated for
library preparation. In Drop-seq, across four conditions spanning
12.5 cells/ml to 100 cells/ml, the fraction of species-mixed
STAMPs correlated with cell concentration (Figures 3A, 3B,
and S3B; Experimental Procedures), with cell doublet estimates
1204 Cell 161, 12021214, May 21, 2015 2015 Elsevier Inc.

ranging from 0.36% to 11.3% for the various cell concentrations


tested (under the assumption that human-mouse doublets account for half of all doublets). This reflects the greater chance
at higher cell concentrations that a droplet could encapsulate
multiple cells. By comparison, previous studies that used
FACS (Jaitin et al., 2014) or a commercial microfluidics platform
(Shalek et al., 2014) to isolate single cells reported doublet rates
of 2.3% and 11% respectively, based upon examining microscopy images of captured cells. In analyzing the above mousehuman cell suspension mixture in a commercial microfluidics
system (Fluidigm C1), we found that 30% of the resulting libraries
in that experiment were species-mixed (Figure S3C); about onethird of these doublets were visible in the microscopy images.
Single-Cell Impurity
Species-mixing experiments enabled us to measure single-cell
purity across thousands of libraries prepared at different cell
concentrations. We found that purity was strongly related to
cell concentration, ranging from 98.8% at 12.5 cells / ml to
90.4% at 100 cells / ml (Figure S3B). The largest source of single-cell impurity appeared to be ambient RNA that is present in
the cell suspension (a first step of almost all single-cell methods)
and presumably results from cells that are damaged during preparation (Figure S3D). We measured a mean single-cell purity
of 95.8% for the same cell mixtures in the Fluidigm C1 system
(Figure S3C), similar to Drop-seq at 50 cells /ml.
Conversion Efficiency
The use of synthetic RNA spike-in controls at known concentrations, together with UMIs to avoid double-counting, allows
estimation of capture rates for digital single-cell expression technologies (Brennecke et al., 2013; Islam et al., 2014). We identified
evidence that PCR and sequencing errors inflate the numbers of
apparently unique UMIs (Table S1 and Supplemental Experimental Procedures), so we developed a more conservative estimation method than has been used in earlier studies (Islam et al.,
2014); in our approach, we collapse similar UMI sequences into a
single count. Using this approach we calculated a capture rate of
12.8% for Drop-seq (Figure 3G). We corroborated this estimate
by making independent digital expression measurements (on
bulk RNA from 50,000 HEK cells) on ten genes using droplet digital PCR (ddPCR) (Hindson et al., 2011), calculating an average
conversion efficiency of 10.7% (Figures S4A, S4B, and S4C).
To further evaluate how the digital transcriptomes ascertained
by Drop-seq related to the underlying mRNA content of cells,
we compared Drop-seq log-expression measurements to those
made by a commonly used in-solution amplification process,
finding strong correlation (r = 0.94, Figure 3E), though Dropseq ascertained GC-rich transcripts at a lower rate (Figure S4D).
We also compared Drop-seq single-cell log-expression measurements with measurements from bulk mRNA-seq, observing
a correlation of r = 0.90 (Figures 3F, S4E, and S4F).
Cell States: Drop-Seq Analysis of the Cell Cycle
To evaluate the visibility of cell states in Drop-seq, we first examined cell-to-cell variation among the 589 HEK and 412 3T3
STAMPs shown in Figure 3B. Both cultures consisted of asynchronously dividing cells; principal components analysis (PCA)
of the single-cell expression profiles showed the top principal
components to be dominated by genes with roles in protein

Figure 2. Extraction and Processing of Single-Cell Transcriptomes by Drop-Seq


(A) Schematic of single-cell mRNA-seq library preparation with Drop-seq. A custom-designed microfluidic device joins two aqueous flows before their
compartmentalization into discrete droplets. One flow contains cells, and the other flow contains barcoded primer beads suspended in a lysis buffer. Immediately
following droplet formation, the cell is lysed and releases its mRNAs, which then hybridize to the primers on the microparticle surface. The droplets are broken by
adding a reagent to destabilize the oil-water interface (Experimental Procedures), and the microparticles collected and washed. The mRNAs are then reversetranscribed in bulk, forming STAMPs, and template switching is used to introduce a PCR handle downstream of the synthesized cDNA (Zhu et al., 2001).
(B) Microfluidic device used in Drop-seq. Beads (brown in image), suspended in a lysis agent, enter the device from the central channel; cells enter from the top
and bottom. Laminar flow prevents mixing of the two aqueous inputs prior to droplet formation (see also Movie S1). Schematics of the device design and how it is
operated can be found in Figure S2.
(C) Molecular elements of a Drop-seq sequencing library. The first read yields the cell barcode and UMI. The second, paired read interrogates sequence from the
cDNA (50 bp is typically sequenced); this sequence is then aligned to the genome to determine a transcripts gene of origin.
(D) In silico reconstruction of thousands of single-cell transcriptomes. Millions of paired-end reads are generated from a Drop-seq library on a high-throughput
sequencer. The reads are first aligned to a reference genome to identify the gene-of-origin of the cDNA. Next, reads are organized by their cell barcodes,
and individual UMIs are counted for each gene in each cell (Supplemental Experimental Procedures). The result, shown at far right, is a digital expression matrix
in which each column corresponds to a cell, each row corresponds to a gene, and each entry is the integer number of transcripts detected from that gene, in
that cell.

Cell 161, 12021214, May 21, 2015 2015 Elsevier Inc. 1205

Figure 3. Critical Evaluation of Drop-Seq Using Species-Mixing Experiments


(A and B) Drop-seq analysis of mixutres of mouse and human cells. Mixtures of human (HEK) and mouse (3T3) cells were analyzed by Drop-seq at the concentrations shown. The scatter plot shows the number of human and mouse transcripts associating to each STAMP. Blue dots indicate STAMPs that were
designated from these data as human-specifiic (average of 99% human transcripts); red dots indicate STAMPs that were mouse-specific (average 99%). At the
lower cell concentration, one STAMP barcode (of 570) associated with a mixture of human and mouse transcripts (A, purple). At the higher cell concentration,
about 1.9% of STAMP barcodes associated with mouse-human mixtures (B). Data for other cell concentrations and a different single-cell analysis platform are in
Figures S3B and S3C.
(C and D) Sensitivity analysis of Drop-seq at high read-depth. Violin plots show the distribution of the number of transcripts (C, scored by UMIs) and genes (D)
detected per cell for 54 HEK (human) STAMPs (blue) and 28 3T3 (mouse) STAMPs (green) that were sequenced to a mean read depth of 737,240 high-quality
aligned reads per cell.
(E and F) Correlation between gene expression measurements in Drop-seq and non-single-cell RNA-seq methods. Comparison of Drop-seq gene expression
measurements (averaged across 550 STAMPs) to measurements from bulk RNA analyzed by: (E) an in-solution template switch amplification (TSA) procedure
similar to Smart-seq2 (Picelli et al., 2013) (Supplemental Experimental Procedures); and (F) Illumina TruSeq mRNA-seq. All comparisons involve RNA derived from
the same cell culture flask (3T3 cells). All expression counts were converted to average transcripts per million (ATPM) and plotted as log (1+ATPM).
(G) Quantitation of Drop-seq capture efficiency by ERCC spike-ins. Drop-seq was performed with ERCC control synthetic RNA at an estimated concentration of
100,000 ERCC RNA molecules per droplet. 84 beads were sequenced at a mean depth of 2.4 million reads, aligned to the ERCC reference sequences, and UMIs
counted for each ERCC species, after applying a stringent down-correction for potential sequencing errors (Table S1 and Supplemental Experimental Procedures). For each ERCC RNA species above an average concentration of one molecule per droplet, the predicted number of molecules per droplet was plotted
in log space (x-axis), versus the actual number of molecules detected per droplet by Drop-seq, also in log space (y-axis). Error bars indicate SD. The intercept of a
regression line, constrained to have a slope of 1 and fitted to the seven highest points, was used to estimate a conversion factor (0.128). A second estimation,
using the average number of detected transcripts divided by the number of ERCC molecules used (100,000), yielded a conversion factor of 0.125.

synthesis, growth, DNA replication, and other aspects of the cell


cycle. We inferred the cell-cycle phase of each of the 1,001 cells
by scoring for gene sets (signatures) reflecting five phases of the
cell cycle previously characterized in chemically synchronized
cells (G1/S, S, G2/M, M, and M/G1) (Figure 4A, Table S2) (Whit1206 Cell 161, 12021214, May 21, 2015 2015 Elsevier Inc.

field et al., 2002). We identified 544 human and 668 mouse genes
with expression patterns that varied along the cell cycle (at a
false discovery rate of 5%; Experimental Procedures) (Figure 4B),
including 200 orthologous gene pairs (p < 10!65 by hypergeometric test). Of these orthologous gene pairs, most (82.5%)

A
G1/S

G1/S

G2/M

G2/M

M/G1

M/G1

1
2
3

2
1
0
1
2

Avg. normalized
expression

Gene cluster

Phase-specific
score

3
4

5
5
6

7
8

8
50

150

250

350

450

550

50

Individual human cells (HEK)

C
Classic cell cycle genes
CCNB1
CCNB2
MCM2
MCM3
MCM4
MCM5

MCM6
MCM7
MCM10
AURKA
AURKB

100

150

200

250

300

Individual mouse cells (3T3)


Novel, conserved
cell cycle genes

ATF4
ARHGAP11A
ARPC2
CDCA4
E2F7
HISTH1E
MCMBP
NCAPG
NXT1

OTUB1
PARPBP
RPL26
SNHG3
SRP9
TCF19
WDHD1
ZFHX4

have been previously annotated as related to the cell cycle in at


least one species; among the other 17.5%, we found some that
would be expected to show cell-cycle variation (e.g., E2F7 and
PARPBP) and many that to our knowledge were not previously
connected to the cell cycle (Figure 4C and Table S2). Singlecell analysis at this scale enabled characterization of cell-cycle
gene expression without chemical synchronization and at high
temporal resolution.
Cell Types: Drop-Seq Analysis of the Retina
We selected the retina as the first tissue to study with Drop-seq
because decades of work has generated molecular information
about many retinal cell types (Masland, 2012; Sanes and Zipursky, 2010), allowing us to relate our RNA-seq data to prior classification. The retina contains five neuronal classesretinal ganglion, bipolar, horizontal, photoreceptor, and amacrineeach
defined by morphological, physiological, and molecular criteria
(Figure 5A). Most of the classes are divisible into discrete
typesa total currently estimated at about 100but well under
half of these types possess known, distinguishing molecular
markers.
We sequenced 49,300 STAMPs prepared from the retinas of
14-day-old mice (STAMPs were collected in seven batches
over 4 days). We performed principal components analysis on
the 13,155 largest libraries (Figure S5, Table S3), then reduced
the 32 statistically significant PCs (Experimental Procedures)
to two dimensions using t-Distributed Stochastic Neighbor

350

400

Figure 4. Cell-Cycle Analysis of HEK and


3T3 Cells Analyzed by Drop-Seq
(A) Cell-cycle state of 589 HEK cells (left) and 412
3T3 cells (right) measured by Drop-seq. Cells were
assessed for their progression through the cell
cycle by comparison of each cells global pattern
of gene expression with gene sets known to
be enriched in one of five phases of the cycle
(horizontal rows). A phase-specific score was
calculated for each cell across these five phases
(Supplemental Experimental Procedures), and the
cells ordered by their phase scores.
(B) Discovery of cell-cycle regulated genes. Heat
map showing the average normalized expression
of 544 human and 668 mouse genes found to be
regulated by the cell cycle. Maximal and minimal
expression was calculated for each gene across a
sliding window of the ordered cells, and compared
with shuffled cells to obtain a false discovery
rate (FDR) (Experimental Procedures). The plotted
genes (FDR threshold of 5%) were then clustered
by k-means analysis to identify sets of genes with
similar expression patterns. Cluster boundaries
are represented by dashed gray lines.
(C) Representative cell-cycle regulated genes
discovered by Drop-seq. Selected genes that
were found to be cell-cycle regulated in both the
HEK and 3T3 cell sets. Left: genes that are
well-known to be cell-cycle regulated. Right: some
genes identified in this analysis that were not
previously known to be associated with the cell
cycle (Experimental Procedures). A complete
list of cell-cycle regulated genes can be found in
Table S2.

Embedding (tSNE) (Amir et al., 2013; van der Maaten and Hinton,
2008). We projected the remaining 36,145 cells in the data into
the tSNE analysis. We then combined a density clustering
approach with post hoc differential expression analysis to divide
44,808 cells among 39 transcriptionally distinct clusters (Supplemental Experimental Procedures) ranging from 50 to 29,400
cells in size (Figures 5B and 5C). Finally, we organized the 39
cell populations into larger categories (classes) by building a
dendrogram of similarity relationships among the 39 cell populations (Figure 5D, left).
The cell populations inferred from this analysis were readily
matched to the known retinal cell types, including all five
neuronal cell classes, based on the specific expression of known
markers for these cell types (Figure 5D, right, and Figure S6A).
Additional clusters corresponded to astrocytes (associated
with retinal ganglion cell axons exiting the retina), resident microglia, endothelial cells (from intra-retinal vasculature), pericytes,
and fibroblasts (Figure 5D). The relative abundances of the
major cell classes in our data agreed with earlier estimates
from microscopy (Jeon et al., 1998) (Table 1).
Replication and Cumulative Power of Drop-Seq Data
Replication across experimental sessions enables the construction of cumulatively powerful datasetsbut only if data are replicable and comparable. The retinal STAMPs were generated on 4
different days (weeks apart), utilizing different litters and multiple
runs in several sessions, for a total of seven replicates. One of the
Cell 161, 12021214, May 21, 2015 2015 Elsevier Inc. 1207

Figure 5. Ab Initio Reconstruction of Retinal Cell Types from 44,808 Single-Cell Transcription Profiles Prepared by Drop-Seq
(A) Schematic representation of major cell classes in the retina. Photoreceptors (rods or cones) detect light and pass information to bipolar cells, which in turn
contact retinal ganglion cells that extend axons into other CNS tissues. Amacrine, bipolar and horizontal cells are retinal interneurons; Muller glia act as support
cells for surrounding neurons.
(B) Clustering of 44,808 Drop-seq single-cell expression profiles into 39 retinal cell populations. The plot shows a two-dimensional representation (tSNE) of global
gene expression relationships among 44,808 cells; clusters are colored by cell class, according to Figure 5A.
(C) Differentially expressed genes across 39 retinal cell populations. In this heat map, rows correspond to individual genes found to be selectively upregulated in
individual clusters (p < 0.01, Bonferroni corrected); columns are individual cells, ordered by cluster (139). Clusters with > 1,000 cells were downsampled to 1,000
cells to prevent them from dominating the plot.
(D) Gene expression similarity relationships among 39 inferred cell populations. Average expression across all detected genes was calculated for each of 39 cell
clusters, and the relative (Euclidean) distances between gene-expression patterns for the 39 clusters are represented by a dendrogram. The branches of the
dendrogram were annotated by examining the differential expression of known markers for retina cell classes and types. Twelve examples are shown at right,
using violin plots to represent the distribution of expression within the clusters. Violin plots for additional genes are in Figure S6A.

(legend continued on next page)

1208 Cell 161, 12021214, May 21, 2015 2015 Elsevier Inc.

Table 1. Ascertainment of Cell Types and Frequencies in the


Mouse Retina by Drop-Seq

Cell Class

Percentage of Retina
(Jeon et al., 1998) (%)

Percentage of
Cell Population
in Drop-Seq (%)

Rod photoreceptors

79.9

65.6

Cone photoreceptors

2.1

4.2

Muller glia

2.8

3.6

Retinal ganglion cells

0.5

1.0

Horizontal cells

0.5

0.6

Amacrine cells

7.0

9.9

Bipolar cells

7.3

14.0

Microglia

Retinal endothelial cells

Astrocytes

0.2
0.6
0.1

The sizes of the 39 annotated cell clusters produced from Drop-seq were
used to estimate their fractions of the total cell population. These data
were compared with those obtained by microscopy techniques (Jeon
et al., 1998).

runs was performed at a particularly low cell concentration (15


cells/ml) and thus high purity, to evaluate whether results were artifacts of cell-cell doublets or single-cell impurity. We found that
all 39 clusters contained cells from every experiment. One cluster (arrow in Figure 5E; star in Figure S6B), which drew disproportionately from two replicates, expressed markers of fibroblasts, a
non-retinal cell type that is present in tissue surrounding the
retina, and hence likely represents imprecise dissection.
We examined how the classification of cells (based on their
patterns of gene expression) evolved as a function of the
numbers of cells in analysis. We used 500, 2,000, or 9,731 cells
from our dataset, and asked how (for example) cells identified as
amacrines in the full dataset clustered in analyses of smaller
numbers of cells (Figure 5F). As the number of cells in the data
increased, distinctions between related clusters become clearer,
stronger, and finer in resolution, with the result that a greater
number of rare amacrine cell sub-populations (each representing 0.1%0.9% of the cells in the experiment) could ultimately
be distinguished from one another (Figure 5F).
Profiles of Amacrine Cell Types
To characterize distinctions among closely related cell populations, we focused on the 21 clusters of amacrines. Amacrines
are the most morphologically diverse neuronal class (Masland,
2012), but the majority of types lack defining molecular markers.
Most amacrine cells are inhibitory, utilizing either GABA or
glycine as a neurotransmitter. Excitatory amacrine cells that
release glutamate have also been identified (Haverkamp and

Wassle, 2004). Another amacrine cell population expresses no


GABAergic, glycinergic or glutamatergic markers; its neurotransmitter is unidentified (nGnG amacrines) (Kay et al., 2011).
We first identified markers that were most universally expressed by amacrines relative to other cell classes (Figure 6A).
We then assessed the expression of known glycinergic and
GABAergic markers; their mutually exclusive expression is a
fundamental distinction among amacrines. Of the 21 amacrine
clusters, 12 were identifiable as GABAergic (Gad1 and/or
Gad2-positive) and 5 others were glycinergic (glycine transporter
Slc6a9-positive) (Figure 6B). An additional cell population was
identified as excitatory by its expression of a glutamate transporter, Slc17a8 (Figure 6B). The remaining three clusters (clusters
4, 20, and 21) had low levels of GABAergic, glycinergic, and glutamatergic markers; these likely include nGnG amacrines.
Among the glycinergic and GABAergic clusters, we found
many amacrine types with known markers. The most divergent
glycinergic cluster appeared to correspond to the A-II amacrine
neurons (Figure 6B, cluster 16), as this was the only cluster to
strongly express the Gjd2 gene encoding the gap junction protein connexin 36 (Feigenspan et al., 2001). Ebf3, a transcription
factor found in SEG glycinergic as well as nGnG amacrines,
was specific to clusters 17 and 20. Starburst amacrine neurons
(SACs), the only retinal cells that use acetylcholine as a co-transmitter, were identifiable as cluster 3 by their expression of the
cholinergic marker Chat (Figure 6B). Unlike other GABAergic
cells, SACs expressed Gad1 but not Gad2, as previously
observed in rabbit (Famiglietti and Sundquist, 2010).
We then identified selectively expressed markers for each of the
21 amacrine cell populations (Figure 6C and Table S4). We validated two of the markers immunohistochemically. First, we costained retinal sections with antibodies to the transcription factor
MAF, the top marker of cluster 7, plus antibodies to either GAD1 or
SLC6A9, markers of GABAergic and glycinergic transmission,
respectively. As predicted by the Drop-seq analysis, MAF was
found in a small subset of amacrine cells that were GABAergic
and not glycinergic (Figure 6D). Cluster 7 had numerous genes
that were enriched relative to its nearest neighbor, cluster 6 (Figure 6E, 16 genes > 2.8-fold enrichment, p < 10!9), including
Crybb3, which belongs to the crystallin family of proteins that
are known to be directly upregulated by Maf (Yang and Cvekl,
2005), and another, the protease Mmp9, which accepts crystallins
as substrates (Descamps et al., 2005). Second, we stained sections with antibodies to PPP1R17 (Figure 6F), a nominated marker
of cluster 20. Cluster 20 shows weak, infrequent glycine transporter expression and is one of only two clusters (with cluster
21) that express Neurod6, a marker of nGnG neurons (Kay et al.,
2011). We used a transgenic strain (MitoP) that has been shown
to express CFP specifically in nGnG amacrines (Kay et al.,
2011). PPP1R17 stained 85% of all CFP-positive amacrines in

(E) Representation of experimental replicates in each cell population. tSNE plot from Figure 2B, with each cell now colored by experimental replicate (for visual
clarity, the central rod cluster was downsampled to 10,000 cells). Each of the seven replicates contributes to all 39 cell populations. Cluster 36 (arrow), in which
these replicates are unevenly represented, expressed markers of fibroblasts, which are not native to the retina and are presumably a dissection artifact (see also
Figure S6B).
(F) Trajectory of amacrine clustering as a function of number of cells analyzed. Three different downsampled datasets were generated: (1) 500, (2) 2,000, or (3)
9,731 cells (Supplemental Experimental Procedures). Cells identified as amacrines (clusters 323) in the full analysis are here colored by their cluster identities in
that analysis. Analyses of smaller numbers of cells incompletely distinguished these subpopulations from one another.

Cell 161, 12021214, May 21, 2015 2015 Elsevier Inc. 1209

Figure 6. Finer-Scale Expression Distinctions among Amacrine Cells, Cones, and Retinal Ganglion Cells
(A) Pan-amacrine markers. The expression levels of the six genes identified (Nrxn2, Atp1b1, Pax6, Slc32a1, Slc6a1, Elavl3) are represented as dot plots across all
39 clusters; larger dots indicate broader expression within the cluster; deeper red denotes a higher expression level.

(legend continued on next page)

1210 Cell 161, 12021214, May 21, 2015 2015 Elsevier Inc.

the MitoP line, validating this as a marker of nGnG cells (Figure 6F).
PPP1R17 was one of several markers that distinguished Cluster
20 from its closest neighbor, Cluster 21 (Figure 6G; 12 genes >
2.8-fold enrichment, p < 10!9). The differences between Clusters
20 and 21 suggest a hitherto unsuspected level of heterogeneity
among nGnG amacrines.
Supervised Analysis Reveals Additional Diversity
Our unsupervised analysis grouped cells into 39 transcriptionally distinct populations, but morphological and functional
criteria suggest that there are "100 retinal cell types. We asked
whether supervised analysis could reveal multiple types within
individual clusters. For example, retinal ganglion cells (RGCs),
which consist of about 30 types (Sanes and Masland, 2015),
formed a single cluster in our analysis, perhaps because it is
a rare cell population (1%, Table 1). Five RGC types, called
intrinsically photosensitive RGCs (ipRGCs), express Opn4, the
gene encoding the photopigment melanopsin. Opn4+ RGCs
(26/432) expressed nine genes at levels 2-fold higher than
Opn4- RGCs (p < 109, Figure 6H), including Tbr2/Eomes,
known to be a selective marker for this population (Sweeney
et al., 2014). This result reveals additional heterogeneity that
may also emerge ab initio as analyses expand to include
more cells.
DISCUSSION
Ascertaining transcriptional variation across individual cells is a
valuable way of learning about complex tissues and functional
responses, but single-cell analysis has been limited by the time
and cost of preparing libraries from many individual cells. A scientist employing Drop-seq can prepare 10,000 single-cell libraries for sequencing in 12 hr, for about 6.5 cents per cell (Table
S5), representing a >100-fold improvement in both time and cost
relative to existing methods. A Drop-seq setup can be constructed quickly and inexpensively in a standard biology lab using readily available equipment (Figure S2B and Supplemental
Experimental Procedures). We hope that ease, speed, and low
cost facilitate exuberant experimentation, careful replication,
and many cycles of experiments, analyses, ideas, and more
experiments.

In validating Drop-seq, we developed stringent species-mixing experiments to measure single-cell purity and cell doublet
rates in our libraries. In another article in this issue, Klein et al.
(Klein et al., 2015) describe a droplet-based approach to single-cell RNA-seq and also use species-mixing experiments to
evaluate it. Our results indicate that all methods of isolating
single cells from a cell suspension, including Drop-seq, fluorescence activated cell sorting (FACS) and microfluidics, are vulnerable to impurities, and highlight the value of performing species
mixing experiments to assess single-cell approaches. In our
retina analysis, even relatively impure libraries generated in
ultra-high-throughput modes (100 cells per ml, allowing the
processing of 10,000 cells per hour at "10% doublet and impurity rates) appeared to yield a robust and biologically validated
cell classification, but other tissues or applications may require
using Drop-seq in purer modes.
Unsupervised computational analysis of Drop-seq data
identified 39 transcriptionally distinct retinal cell populations,
many representing specific subtypes of the major retinal cell
classes (Figures 5 and 6). It is a particular strength of the
retina that establishing correspondence between cluster and
type was in many cases straightforward; an important direction will be to identify cell types and states in other parts of
the brainas well as in other tissuesabout which less is
currently known.
We see many applications of Drop-seq, beyond the identification of cell types and cell states. Genome-scale genetic studies
are identifying many genes whose variation contributes to disease
risk, but biology has lacked similarly high-throughput ways of
connecting these genes to specific cell populations and unique
functional responses. Drop-seq could be used to provide initial insights into how these genes function in the diverse cell types
composing each tissue. In addition, coupling Drop-seq to perturbationssuch as small molecules, mutations, pathogens, or other
stimulicould generate an information-rich, multi-dimensional
readout of the influence of perturbations on many kinds of cells.
The functional implications of a genes expression are a product not just of that genes intrinsic properties, but also of the
entire cell-level context in which the gene is expressed. We
hope Drop-seq enables the abundant and routine discovery of
such relationships in many areas of biology.

(B) Identification of known amacrine types among clusters. The 21 amacrine clusters consisted of 12 GABAergic, five glycinergic, one glutamatergic, and three
non-GABAergic non-glycinergic clusters. Starburst amacrines were identified in cluster 3 by their expression of Chat; excitatory amacrines by expression of
Slc17a8; A-II amacrines by their expression of Gjd2; and SEG amacrine neurons by their expression of Ebf3.
(C) Nomination of novel candidate markers of amacrine subpopulations. Each cluster was screened for genes differentially expressed in that cluster relative to all
other amacrine clusters (p < 0.01, Bonferroni corrected) (McDavid et al., 2013), and filtered for those with highest relative enrichment. Expression of a single
candidate marker for each cluster is shown across all amacrines.
(D) Validation of MAF as a marker for a GABAergic amacrine population. Staining of a fixed adult retina from wild-type mice for MAF (i, ii, v, and green staining in iv
and vii), GAD1 (iii and iv, red staining), and SLC6A9 (vi and vii, red staining), demonstrating co-localization of MAF with GAD1, but not SLC6A9.
(E) Differential expression of cluster 7 (Maf+) with nearest neighboring amacrine cluster (#6). Average gene expression was compared between cells in clusters 6
and 7; 16 genes (red dots) were identified with >2.8-fold enrichment in cluster 7 (p < 10!9).
(F) Validation of PPP1R17 as a marker for an amacrine subpopulation. Staining of a fixed adult retina from Mito-P mice, which express CFP in both nGnG
amacrines and type 1 bipolars (Kay et al., 2011). Overlapping labeling by PPP1R17 antibody (green) and Mito-P CFP (red) supports Drop-seq identification of
Ppp1r17 expression in the nGnG amacrine neurons. 85% of CFP+ cells were PPP1R17+ and 50% of the PPP1R17+ cells were CFP!, suggesting a second
amacrine type expressing this marker. Blue staining is for VSX2, a marker of bipolar neurons.
(G) Differential expression of cluster 20 (Ppp1r17+) with nearest neighboring amacrine cluster (#21). Average gene expression was compared between cells in
clusters 20 and 21; 12 genes (red dots) were identified with >2.8-fold enrichment in cluster 20 (p < 10!9).
(H) Differential expression of melanopsin-positive and negative RGCs. Average expression was compared between Opn4-positive and -negative RGCs in cluster
2. Seven genes were identified as enriched in Opn4-positive cells (red dots, > 2-fold, p < 10!9).

Cell 161, 12021214, May 21, 2015 2015 Elsevier Inc. 1211

EXPERIMENTAL PROCEDURES
Device Design and Fabrication
Microfluidic devices were designed using AutoCAD software (Autodesk), and
the components tested using COMSOL Multiphysics (COMSOL). Full details
are described in Supplemental Experimental Procedures.
Barcoded Microparticle Synthesis
Bead functionalization and reverse-direction phosphoramidite synthesis were
performed by Chemgenes Corp (Wilmington, MA). Split-and-pool cycles
were accomplished by removing the dry resin from each column, hand mixing,
and weighing out four equal portions before returning the resin for an additional
cycle of synthesis. Full details are described in Supplemental Experimental
Procedures.
Drop-Seq Procedure
Monodisperse droplets "1 nl in size were generated using the microfluidic device described in Supplemental Experimental Procedures, in which barcoded
microparticles, suspended in lysis buffer, were flowed at a rate equal to that of
a single-cell suspension, so that resulting droplets were composed of an equal
amount of each component. As soon as droplet generation was complete,
droplets were broken with perfluorooctanol in 30 ml of 63 SSC. The addition
of a large aqueous volume to the droplets reduces hybridization events after
droplet breakage, because DNA base pairing follows second-order kinetics
(Britten and Kohne, 1968; Wetmur and Davidson, 1968). The beads were
then washed and resuspended in a reverse transcriptase mix, followed by a
treatment with exonuclease I to remove unextended primers. The beads
were then washed, counted, aliquoted into PCR tubes, and PCR amplified.
The PCR reactions were purified and pooled, and the amplified cDNA quantified on a BioAnalyzer High Sensitivity Chip (Agilent). The cDNA was fragmented and amplified for sequencing with the Nextera XT DNA sample prep
kit (Illumina) using custom primers that enabled the specific amplification of
only the 30 ends (Table S6). The libraries were purified, quantified, and then
sequenced on the Illumina NextSeq 500. All details regarding reaction
conditions, primers used, and sequencing specifications can be found in the
Supplemental Experimental Procedures.
Cell-Cycle Analysis of HEK and 3T3 Cells
Gene sets reflecting five phases of the HeLa cell cycle (G1/S, S, G2/M, M and
M/G1) were taken from Whitfield et al. (Whitfield et al., 2002) with some modification (Supplemental Experimental Procedures and Table S2). A phase-specific score was generated for each cell, across all five phases, using averaged
normalized expression levels (log2(TPM+1)) of the genes in each set. Cells
were then ordered along the cell cycle by comparing the patterns of these
five phase scores per cell. To identify cell-cycle-regulated genes, we used a
sliding window approach, and identified windows of maximal and minimal
average expression, both for ordered cells, and for shuffled cells, to evaluate
the false-discovery rate. Full details may be found in Supplemental Experimental Procedures.
Principal Components and Clustering Analysis of Retina Data
The clustering algorithm for the retinal cell data was implemented and performed using Seurat, a recently developed R package for single-cell analysis
(Satija et al., 2015). PCA was first performed on a 13,155-cell training set
of the 49,300-cell dataset, using single-cell libraries in which transcripts from
>900 genes were detected. We found this approach was more effective in
discovering structures corresponding to rare cell types than performing PCA
on the full dataset, which was dominated by numerous, tiny rod photoreceptors (Supplemental Experimental Procedures). Thirty-two statistically significant PCs were identified using a permutation test and independently
confirmed using a modified resampling procedure (Chung and Storey, 2015).
We projected individual cells within the training set based on their PC scores
onto a single two-dimensional map using t-Distributed Stochastic Neighbor
Embedding (t-SNE) (van der Maaten and Hinton, 2008). The remaining
36,145 single-cell libraries (<900 genes detected) were next projected on
this t-SNE map, based on their representation within the PC-subspace of
the training set (Berman et al., 2014; Shekhar et al., 2014). This approach mit-

1212 Cell 161, 12021214, May 21, 2015 2015 Elsevier Inc.

igates the impact of noisy variation in the lower complexity libraries due to
gene dropouts. It was also reliable in the sense that when we withheld from
the t-SNE all cells from a given cluster and then tried to project them, these
withheld cells were not spuriously assigned to another cluster by the projection
(Table S7). Point clouds on the t-SNE map represent candidate cell types; density clustering (Ester et al., 1996) identified these regions. Differential expression testing (McDavid et al., 2013) was then used to confirm that clusters
were distinct from each other. Hierarchical clustering based on Euclidean distance and complete linkage was used to build a tree relating the clusters. We
noted expression of several rod-specific genes, such as Rho and Nrl, in every
cell cluster, an observation that has been made in another retinal cell gene
expression study (Siegert et al., 2012) and likely arises from solubilization
of these high-abundance transcripts during cell suspension preparation.
Additional information regarding retinal cell data analysis can be found in the
Supplemental Experimental Procedures.

ACCESSION NUMBERS
The accession number for the raw and analyzed data reported in this paper is
GEO: GSE63473.

SUPPLEMENTAL INFORMATION
Supplemental Information includes Supplemental Experimental Procedures,
six figures, seven tables, one movie, and one data file and can be found with
this article online at http://dx.doi.org/10.1016/j.cell.2015.05.002.
AUTHOR CONTRIBUTIONS
E.Z.M. developed the barcoding and molecular biology analysis, advised by
S.A.M. A.B. designed and fabricated the microfluidic devices, advised by
D.A.W. and A.R. E.Z.M. and M.G. developed Drop-seq experimental protocols
and performed the Drop-seq experiments in S.A.M.s lab. J.N. developed the
methods and software for obtaining digital gene expression measurements for
each cell, advised by E.Z.M. and S.A.M. J.N., E.Z.M. and S.A.M. performed the
analyses of species-mixing experiments. I.T. performed the cell-cycle analysis. A.R.B. prepared the retinal cell suspensions. R.S., K.S., and A.R. developed and performed the retinal cell type clustering analyses with contribution
from N.K. E.Z.M., R.S., K.S., and J.R.S. interpreted the retina expression data.
E.M.M. and J.R.S. performed the immunohistochemistry experiments. J.J.T.
and A.K.S. performed the Fluidigm C1 experiments. E.Z.M., S.A.M., A.R.,
A.B., and A.K.S. conceived the study and key ways that Drop-seq works
together as an integrated system. E.Z.M. and S.A.M. wrote the manuscript
with contributions from all authors.
ACKNOWLEDGMENTS
This work was supported by the Stanley Center for Psychiatric Research (to
S.M.), the MGH Psychiatry Residency Research Program and Stanley-MGH
Fellowship in Psychiatric Neuroscience (to E.Z.M.), a Stewart Trust Fellows
Award (to S.M.), a grant from the Simons Foundation to the Simons Center
for the Social Brain at MIT (to A.R., S.M., and D.W.), an NHGRI CEGS P50
HG006193 (to A.R.), the Klarman Cell Observatory (to A.R. and A.B.), NIMH
grant U01MH105960 (to S.M., A.R. and J.R.S.), NIMH grant R25MH094612
(to E.M.), NIH F32 HD075541 (to R.S.). AR is an investigator of the Howard
Hughes Medical Institute. Microfluidic device fabrication was performed at
the Harvard Center for Nanoscale Systems (CNS), a member of the National
Nanotechnology Infrastructure Network (National Science Foundation award
no. ECS-0335765), with support from the National Science Foundation
(DMR-1310266) and the Harvard Materials Research Science and Engineering
Center (DMR-1420570). We thank Christina Usher and Leslie Gaffney for contributions to the manuscript figures and Chris Patil for helpful comments on the
manuscript. We thank Connie Cepko for helpful conversations about the retina
data, Beth Stevens for advice on retinal dissociations, and Assaf Rotem and
Huidan Zhang for advice on microfluidics design and fabrication. A.R. is a

member of the Scientific Advisory Board for Thermo Fisher Scientific and
Syros Pharmaceuticals and a consultant for Driver Genomics.

Kay, J.N., Voinescu, P.E., Chu, M.W., and Sanes, J.R. (2011). Neurod6 expression defines new retinal amacrine cell subtypes and regulates their fate. Nat.
Neurosci. 14, 965972.

Received: November 9, 2014


Revised: March 4, 2015
Accepted: April 30, 2015
Published: May 21, 2015

Kivioja, T., Vaharautio, A., Karlsson, K., Bonke, M., Enge, M., Linnarsson, S.,
and Taipale, J. (2012). Counting absolute numbers of molecules using unique
molecular identifiers. Nat. Methods 9, 7274.

REFERENCES
Amir, A.D., Davis, K.L., Tadmor, M.D., Simonds, E.F., Levine, J.H., Bendall,
S.C., Shenfeld, D.K., Krishnaswamy, S., Nolan, G.P., and Peer, D. (2013).
viSNE enables visualization of high dimensional single-cell data and reveals
phenotypic heterogeneity of leukemia. Nat. Biotechnol. 31, 545552.
Beer, N.R., Wheeler, E.K., Lee-Houghton, L., Watkins, N., Nasarabadi, S.,
Hebert, N., Leung, P., Arnold, D.W., Bailey, C.G., and Colston, B.W. (2008).
On-chip single-copy real-time reverse-transcription PCR in isolated picoliter
droplets. Anal. Chem. 80, 18541858.
Berman, G.J., Choi, D.M., Bialek, W., and Shaevitz, J.W. (2014). Mapping the
stereotyped behaviour of freely moving fruit flies. J. R. Soc. Interface 11,
20140672.
Brennecke, P., Anders, S., Kim, J.K., Ko1odziejczyk, A.A., Zhang, X., Proserpio, V., Baying, B., Benes, V., Teichmann, S.A., Marioni, J.C., and Heisler,
M.G. (2013). Accounting for technical noise in single-cell RNA-seq experiments. Nat. Methods 10, 10931095.
Britten, R.J., and Kohne, D.E. (1968). Repeated sequences in DNA. Hundreds
of thousands of copies of DNA sequences have been incorporated into the
genomes of higher organisms. Science 161, 529540.

Klein, A.M., Mazutis, L., Akartuna, I., Tallapragada, N., Veres, A., Li, V., Peshkin, L., Weitz, D.A., and Kirschner, M.W. (2015). Droplet barcoding for single
cell transcriptomics and its application to embryonic stem cells. Cell 161,
this issue, 11871201.
Luo, L., Callaway, E.M., and Svoboda, K. (2008). Genetic dissection of neural
circuits. Neuron 57, 634660.
Masland, R.H. (2012). The neuronal organization of the retina. Neuron 76,
266280.
McDavid, A., Finak, G., Chattopadyay, P.K., Dominguez, M., Lamoreaux, L.,
Ma, S.S., Roederer, M., and Gottardo, R. (2013). Data exploration, quality control and testing in single-cell qPCR-based gene expression experiments. Bioinformatics 29, 461467.
Petilla Interneuron Nomenclature Group, Ascoli, G.A., Alonso-Nanclares, L.,
Anderson, S.A., Barrionuevo, G., Benavides-Piccione, R., Burkhalter, A., Buzsaki, G., Cauli, B., Defelipe, J., Fairen, A., et al. (2008). Petilla terminology:
nomenclature of features of GABAergic interneurons of the cerebral cortex.
Nat. Rev. Neurosci. 9, 557568.
Picelli, S., Bjorklund, A.K., Faridani, O.R., Sagasser, S., Winberg, G., and
Sandberg, R. (2013). Smart-seq2 for sensitive full-length transcriptome
profiling in single cells. Nat. Methods 10, 10961098.

Chung, N.C., and Storey, J.D. (2015). Statistical Significance of Variables


Driving Systematic Variation in High-Dimensional Data. Bioinformatics 31,
545554.

Sanes, J.R., and Masland, R.H. (2015). The Types of Retinal Ganglion Cells:
Current Status and Implications for Neuronal Classification. Annu. Rev. Neurosci. Published online April 9, 2015.

Descamps, F.J., Martens, E., Proost, P., Starckx, S., Van den Steen, P.E., Van
Damme, J., and Opdenakker, G. (2005). Gelatinase B/matrix metalloproteinase-9 provokes cataract by cleaving lens betaB1 crystallin. FASEB J. 19,
2935.

Sanes, J.R., and Zipursky, S.L. (2010). Design principles of insect and vertebrate visual systems. Neuron 66, 1536.

Ester, M., Kriegel, H.P., Sander, J., and Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise (Menlo
Park, Calif: AAAI Press).
Famiglietti, E.V., and Sundquist, S.J. (2010). Development of excitatory and
inhibitory neurotransmitters in transitory cholinergic neurons, starburst amacrine cells, and GABAergic amacrine cells of rabbit retina, with implications
for previsual and visual development of retinal ganglion cells. Vis. Neurosci.
27, 1942.
Feigenspan, A., Teubner, B., Willecke, K., and Weiler, R. (2001). Expression
of neuronal connexin36 in AII amacrine cells of the mammalian retina.
J. Neurosci. 21, 230239.
Hashimshony, T., Wagner, F., Sher, N., and Yanai, I. (2012). CEL-Seq: singlecell RNA-Seq by multiplexed linear amplification. Cell Rep. 2, 666673.
Haverkamp, S., and Wassle, H. (2004). Characterization of an amacrine cell
type of the mammalian retina immunoreactive for vesicular glutamate transporter 3. J. Comp. Neurol. 468, 251263.
Hindson, B.J., Ness, K.D., Masquelier, D.A., Belgrader, P., Heredia, N.J.,
Makarewicz, A.J., Bright, I.J., Lucero, M.Y., Hiddessen, A.L., Legler, T.C.,
et al. (2011). High-throughput droplet digital PCR system for absolute quantitation of DNA copy number. Anal. Chem. 83, 86048610.

Satija, R., Farrell, J.A., Gennert, D., Schier, A.F., and Regev, A. (2015). Spatial
reconstruction of single-cell gene expression data. Nat. Biotechnol. Published
online 13 April, 2015. http://dx.doi.org/10.1038/nbt.3192.
Shalek, A.K., Satija, R., Adiconis, X., Gertner, R.S., Gaublomme, J.T.,
Raychowdhury, R., Schwartz, S., Yosef, N., Malboeuf, C., Lu, D., et al.
(2013). Single-cell transcriptomics reveals bimodality in expression and
splicing in immune cells. Nature 498, 236240.
Shalek, A.K., Satija, R., Shuga, J., Trombetta, J.J., Gennert, D., Lu, D., Chen,
P., Gertner, R.S., Gaublomme, J.T., Yosef, N., et al. (2014). Single-cell
RNA-seq reveals dynamic paracrine control of cellular variation. Nature 510,
363369.
Shekhar, K., Brodin, P., Davis, M.M., and Chakraborty, A.K. (2014). Automatic
Classification of Cellular Expression by Nonlinear Stochastic Embedding
(ACCENSE). Proc. Natl. Acad. Sci. USA 111, 202207.
Siegert, S., Cabuy, E., Scherf, B.G., Kohler, H., Panda, S., Le, Y.Z., Fehling,
H.J., Gaidatzis, D., Stadler, M.B., and Roska, B. (2012). Transcriptional code
and disease map for adult retinal cell types. Nat. Neurosci. 15, 487495,
S1S2.
Sweeney, N.T., Tierney, H., and Feldheim, D.A. (2014). Tbr2 is required to
generate a neural circuit mediating the pupillary light reflex. J. Neurosci. 34,
54475453.

Islam, S., Zeisel, A., Joost, S., La Manno, G., Zajac, P., Kasper, M., Lonnerberg, P., and Linnarsson, S. (2014). Quantitative single-cell RNA-seq with
unique molecular identifiers. Nat. Methods 11, 163166.

Tang, F., Barbacioru, C., Wang, Y., Nordman, E., Lee, C., Xu, N., Wang, X.,
Bodeau, J., Tuch, B.B., Siddiqui, A., et al. (2009). mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods 6, 377382.

Jaitin, D.A., Kenigsberg, E., Keren-Shaul, H., Elefant, N., Paul, F., Zaretsky, I.,
Mildner, A., Cohen, N., Jung, S., Tanay, A., and Amit, I. (2014). Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell
types. Science 343, 776779.

Thorsen, T., Roberts, R.W., Arnold, F.H., and Quake, S.R. (2001). Dynamic
pattern formation in a vesicle-generating microfluidic device. Phys. Rev.
Lett. 86, 41634166.

Jeon, C.J., Strettoi, E., and Masland, R.H. (1998). The major cell populations of
the mouse retina. J. Neurosci. 18, 89368946.

Umbanhowar, P.B., Prasad, V., and Weitz, D.A. (2000). Monodisperse Emulsion Generation via Drop Break Off in a Coflowing Stream. Langmuir 16,
347351.

Cell 161, 12021214, May 21, 2015 2015 Elsevier Inc. 1213

Utada, A.S., Fernandez-Nieves, A., Stone, H.A., and Weitz, D.A. (2007).
Dripping to jetting transitions in coflowing liquid streams. Phys. Rev. Lett.
99, 094502.
van der Maaten, L., and Hinton, G. (2008). Visualizing Data using t-SNE.
J. Mach. Learn. Res. 9, 25792605.
Vogelstein, B., and Kinzler, K.W. (1999). Digital PCR. Proc. Natl. Acad. Sci.
USA 96, 92369241.
Wetmur, J.G., and Davidson, N. (1968). Kinetics of renaturation of DNA. J. Mol.
Biol. 31, 349370.

1214 Cell 161, 12021214, May 21, 2015 2015 Elsevier Inc.

Whitfield, M.L., Sherlock, G., Saldanha, A.J., Murray, J.I., Ball, C.A., Alexander,
K.E., Matese, J.C., Perou, C.M., Hurt, M.M., Brown, P.O., and Botstein, D.
(2002). Identification of genes periodically expressed in the human cell cycle
and their expression in tumors. Mol. Biol. Cell 13, 19772000.
Yang, Y., and Cvekl, A. (2005). Tissue-specific regulation of the mouse alphaAcrystallin gene in lens via recruitment of Pax6 and c-Maf to its promoter. J. Mol.
Biol. 351, 453469.
Zhu, Y.Y., Machleder, E.M., Chenchik, A., Li, R., and Siebert, P.D. (2001).
Reverse transcriptase template switching: a SMART approach for full-length
cDNA library construction. Biotechniques 30, 892897.

Anda mungkin juga menyukai