Rickard Sandberg
Assistant Professor
Ludwig Institute for Cancer Research
Department of Cell and Molecular Biology
Karolinska Institutet
muscle cells
kidney cells
muscle cells
kidney cells
zygote blastocyst
muscle cells
kidney cells
zygote blastocyst
muscle cells
kidney cells
! non-RNA applications:
ChIP-Seq, DNAse
hypersensitive sites,...
gene SLC25A3
2
Testes
log10(reads)
0
2
0 Liver
2
0 Skeletal Muscle
Heart
2
0
3B AK074759
3B BC011574
3A AK092689
log2(intensity)
10
0 Testes
Liver
10
Skeletal Muscle
10
Heart
10
2
0
2
0
2
0 %' %&("
3B %&
3A
7654
$
10
#
log2(probe intensity)
0
10 "
0 ! !(%" !(%!
10
)*+, -,./+, -,.0/10,23
0
10
0
Wang*, Sandberg* et al. 2008 Nature Mortazavi et al. 2008 Nat Methods
gene A (2 kb transcript)
gene B (600 bp transcript)
gene A (2 kb transcript)
gene B (600 bp transcript)
gene A (2 kb transcript)
gene B (600 bp transcript)
Sequencing
ACGCG...
TCGAG...
AGGTA...
CCGTG...
CTGCG...
gene A (2 kb transcript)
gene B (600 bp transcript)
Sequencing
Normalize for different transcripts lengths and
ACGCG...
different sequence depths in different samples.
TCGAG...
AGGTA...
RPKM (Reads per kilobase and million mappable reads):
CCGTG...
Given 10 million mappable reads:
CTGCG...
RPKM, Gene A: 500 reads x 1000/2000 x 106/107
500 / (2 x 10) = 25 RPKM
Biol. Replicates
MAQC samples
UHR (cell line mix)
Brain
Spiked-in RNAs
~20M reads
ES TS XEN EpiSC
Nanog 6525 20 1 263
Cdx2 124 6256 1 1
Sox17 11 5 9814 99
Sox3 151 1234 6 796
Shh 0 0 0 1
Ihh 4 12 107 17
Dhh 10 212 575 80
!"#$%&'($)*+,,
$"!!!
$!!!!
#"!!!
#!!!!
"!!!
!
! "!!! #!!!! #"!!! $!!!! $"!!!
Wang et al. 2009 Nat Rev Gen Guttman et al. Nat Biotech 2010
!"#$%&'($)*+,,
$"!!!
$!!!!
#"!!!
#!!!!
"!!!
!
! "!!! #!!!! #"!!! $!!!! $"!!!
Wang et al. 2009 Nat Rev Gen Guttman et al. Nat Biotech 2010
More quantitative and larger dynamic range Cost (higher depths for RNA
isoforms still more expensive)
Improved power to detect RNA isoforms
generated by alternative splicing, promoters Limited number of packages and
and polyadenylation
tools for non-bioinformaticians
De novo identification of transcripts and
junctions
background model:
0.05 x 1.5 x 20 = 1.5 reads
0.05 RPKM
expressed at 1 RPKM: 1 RPKM
1 x 1.5 x 20 = 30 reads
Fraction of reads
10-1
0.5
10-2
0.0
non-unique/ without 3'UTR)
Expression (full transcript
10-3
polyA sites
# reads
0.8
250
150
0.4
100
0.2
50
! 0.0
-4 -3 -2 -1 0 1 2 3 4
10 10 10 10 10 10 10 10 10 1 10 10
Reads per kilobase and million mappable reads Number of most
The density of reads in exons and introns of Ensembl genes with one Human cerebellum samples
annotated isoform, for 10 human tissue samples. The density is an average Ramskold purple and Comp
et al. PLoS black, Biol
and 2009
other tis
across
Wednesday, March all introns or exons of a gene.
16, 2011 This illustrates the reproducib
More on background/level of detection
Suppleme
a b
1.0
Read density in introns
300 Gene expression (exons)
0.8
250 80% 92%
150
0.4
100
0.2
50
! 0.0
-4 -3 -2 -1 0 1 2 3 4
10 10 10 10 10 10 10 10 10 1 10 10
Reads per kilobase and million mappable reads Number of most
The density of reads in exons and introns of Ensembl genes with one Human cerebellum samples
annotated isoform, for 10 human tissue samples. The density is an average Ramskold purple and Comp
et al. PLoS black, Biol
and 2009
other tis
across
Wednesday, March all introns or exons of a gene.
16, 2011 This illustrates the reproducib
More on background/level of detection
Suppleme
a b
1.0
Read density in introns
300 Gene expression (exons) 11-13,000 genes
per tissue
0.8
250 80% 92%
absolute expression
levels
150
0.4
100
0.2
50
! 0.0
-4 -3 -2 -1 0 1 2 3 4
10 10 10 10 10 10 10 10 10 1 10 10
Reads per kilobase and million mappable reads Number of most
The density of reads in exons and introns of Ensembl genes with one Human cerebellum samples
annotated isoform, for 10 human tissue samples. The density is an average Ramskold purple and Comp
et al. PLoS black, Biol
and 2009
other tis
across
Wednesday, March all introns or exons of a gene.
16, 2011 This illustrates the reproducib
A shared set of genes in tissue transcriptomes
KLQJ
" !
40...
41...
G"+,%)76N7B%!%&7'!7*$$
4P...
42...
G"+,%)76N7;%(%#(%;7B%!%&
41... 4....
4R... 3...
42... 0...
1...
44...
2...
4....
. 2 1 0 3 4. 42 7.
J'$$'6!&76N7)%*;&7"&%;
P 4. 4P 2.
G"+,%)76N7&*+C$%&
The number of shared genes are sensitive to the expression level used for detection,
but 5-10 times higher than microarray and SAGE based estimates
P ortion of mR NA pool
P ortion of mR NA pool
Testes
Mouse brain
20% Human brain
20%
0% 0%
1 10 100 1000 10000 1 10
Number of most expressed genes Num
a d e
Brain Muscle Liver 5'UTR
5'UTR
3'UTR
Extracellular
0.0 0.2 0.4 0.6 0.8 1.0
Read coverage relative to coding region
b All genes Plasma membrane
Muscle
f
Liver
c Tissue-specific
a d Testese
Liver
Brain Muscle Liver 5'UTR
5'UTR Skeletal Muscl
Heart
3'UTR AK074759
Extracellular BC011574
AK092689
0.0 0.2 0.4 0.6 0.8 1.0
Read coverage relative to coding region
b All genes Plasma membrane
Muscle
f
Liver
c Tissue-specific
a d Testese
Liver
Brain Muscle Liver 5'UTR
5'UTR Skeletal Muscl
a d Heart
3'UTR Brain
AK074759
Muscle
5'UTR Extracellular BC011574
AK092689
0.0 0.2 0.4 0.6 0.8 1.0
Read coverage relative to coding region
3'UTR
Extracellula
b All genes 0.2 0.0 0.4 0.6 0.8 Plasma
1.0 membrane
Expression weighted UTR length
Read coverage relative toestimates
coding region
5'UTR CDS 3'UTR Relative number of 0 20
b All genes Brain Plasma membr
tissue-specific genes
5'UTR CDS 3'UTR Relative numbe
Muscle
Brain tissue-specific
f g
Liver
Muscle
c Tissue-specific
0 500 1000 1500 2000
5'UTR CDSLength in nucleotides 3'UTR
c Tissue-specific Brain
Ramskold et al. PLoS Comp Biol 2009
Wednesday, March 16, 2011
RNA-Sequencing:
Transcriptome Reconstruction
Alternative Promoters
Extens. Core
Alternative Promoters
MXE1 MXE2
Extens. Core
Alternative Promoters
Alternative Promoters
Skipped Exons
pA pA
Alternative Promoters
Skipped Exons
Alternative Polyadenylation
5 6 7
2 2v 3
2 2v 3 Low frequencies
2 2v 3
2 2v 3 Low frequencies
2 2v 3
2 3 High frequencies
Ramanathan et al. 1999
ORJUHDGV
E Multi-exon genes F
Isoform 1
$OWHUQDWLYHO\VSOLFHGJHQHV
6LJPRLGILWWRREVHUYHG
)UDFWLRQRI*HQHV
)UDFWLRQRI*HQHV
Isoform 2
$OWHUQDWLYHO\VSOLFHG
JHQHVVXEVDPSOHG
Isoform 1
Isoform 2
1RRIUHDGVORJ
ES FKU
F )UDFWLRQRI*HQHVZLWK$6WRSELQ
(VWLPDWHG)UDFWLRQRI*HQHVZLWK$6
)UDFWLRQRI*HQHV
0LQLPXP0LQRU,VRIRUP)UDFWLRQ
Isoform 1
Isoform 2
Fra
/LYHU 0.2
6NHO0XVFOH
Extent of Alternative Splicing:
0.0
0 1 2
controls
+HDUW
3
$. 4 5 6
$. No. of reads (log10)
ES FKU
c
F )UDFWLRQRI*HQHVZLWK$6WRSELQ 1.0
(VWLPDWHG)UDFWLRQRI*HQHVZLWK$6
Alternatively spliced genes
0.8 (9 tissue samples)
Fraction of genes
Alternatively spliced genes
)UDFWLRQRI*HQHV
0.2
0.0
0 1 2 3 4 5 6
0LQLPXP0LQRU,VRIRUP)UDFWLRQ No. of reads (log10)
Isoform 1
Figure S2. Assessment of frequency of alternativ
a, Fraction of alternatively spliced genes binned by r
Isoform 2
of 250 genes per bin. The upper asymptote of the sig
observed data was used to infer the true fraction of
using bin size of 250).Wang*,
To assess whether
Sandberg* et high cove
al. 2008 Nature
0HDQSKDVW&RQVVFRUH
PHGLXP
ORZ
6(
0;(
S.6 H
6ZLWFKVFRUH
Licatalosi
D.,
et al. Nature 2008
3RVLWLRQUHODWLYHWRVSOLFHMXQFWLRQ
H 8*&$8*)R[
DGLSRVH 7LVVXHELDVHGLQFOXVLRQ
S
;( EUDLQ
S H
EUHDVW
FHUHEHOOXP
FRORQ 7LVVXHELDVHGH[FOXVLRQ
KHDUW ORJSYDOXH
OLYHU
O\PSKQRGH
VNHOPXVFOH
6ZLWFKVFRUHELQ FRQVWLWXWLYHH[RQ
WHVWHV
VNLSSHGH[RQ
Sandberg Lab
Daniel Ramsköld Dept of Biology, MIT, USA
Helena Storvall Chris Burge
Ersen Kavak, PhD Eric Wang
Mats Ensterö, PhD
Gösta Winberg, PhD
Liudmila Matskova, PhD
Cancer Center, MIT, USA
Phillip A. Sharp
Joel Neilson