Anda di halaman 1dari 424

UCSC Genome Browser

http://genome.ucsc.edu/

Louis Tang
Bioinformatics R&D
National Genotyping Center
Academia Sinica.
Quick Overview of features on
genomic regions
Loads of Data
Visual Correlation
Customization
The Rise
Francis Collins
Craig Venter
VS. Francis Collins

5 9

1998 1999 2000 2001


David Haussler
12

1998 1999 2000 2001


“Almost Done!”

1998 1999 2000 2001


“It’s looking grim…”

1998 1999 2000 2001


W. James Kent
56

1998 1999 2000 2001


67

1998 1999 2000 2001


One Ring to rule them all,
One Ring to find them,
One Ring to bring them all
and in the darkness bind them
Coordinate Genome Assembly
Annotation Tracks Gene

Regulation

Variation

Expression

More…
Annotation Tracks
Coordinate
Everlasting Assembly
Organism Code + version

hg19 human genome v.19


mm9 mus musculus v.9
Ever-Changing Tracks
http://genome.ucsc.edu/
30
31
Coordinate

chr3:1,000,000-2,000,000
chr3:1,000,000+2000

chr3:1,000,000-1,001,999
Landmark

Chromosome
Gene
SNP
STS EST
Cytogenetic band
chr7

chr7:1-158,821,424
20q12

chr20:37,100,001-41,100,000
apoe

chr19:50,100,879-50,104,490
rs328

chr8:19864004

±250

chr8:19,863,754-19,864,254
D16S2837

chr16: 81,120,163- 81,120,399

±100,000

chr16:81,020,163-81,220,399
Landmark1;Landmark2

Landmark1 Landmark2
rs328;rs316

chr8:19,864,004 chr8:19,862,716

Sort

chr8:19,862,716-19,864,004
Author
44
McAndrew,P.E.
Practice

hg18, TP53 (uc002gij.2)


Browser Graphic

Track Control
Mark

62
Gene

UTR Intron CDS

>>>>>> >>>>

63
PDB

>>>>>> >>>>

64
Reviewed

>>>>>> >>>>

65
RefSeq Provisional

>>>>>> >>>>

66
Non-RefSeq

>>>>>> >>>>

67
Alignment

TAACCAGCTGCCCAA--------TAGAAACTACGAGAGACAACAGGGAGT
||||| ||||||||| ||||||| | ||||||||||
TAACCAGCTGCCCAACTGTAGAAACTACCAACTCATTTCGAACAGGGAGT

68
Wiggle

69
Mapping and Sequencing
Phenotype and Disease Association (OMIN)

Genes and Gene Prediction (sno/miRNA)

mRNA and EST

Expression

Regulation (TFBS, miRNA target)

Comparative Genomics

Variation and Repeats (SNP, CNV)

ENCODE Pilot
ENCODE Production

84
Display Mode

85
Full

86
Pack

87
Squish

88
Dense

89
Hide

90
91
92
93
Configuration

Description
Display Convention

Method

Credit

Data Usage Restriction

References
Practice

hg18, tp53
UCSC Gene : full
RefSeq: dense
SNP: squish
RepeatMasker: pack
103
104
105
106
107
108
Different Display Mode
Exhibits Different Behavior

109
110
111
112
113
114
115
116
Stroll Along The Genome

117
118
>

>>>>>> >>>>

119
>

>>>>>> >>>>

120
121
Move End >

122
123
124
125
Click
Drag
Ctrl + Click
Practice

Find out if the mouse Brca1 gene has non-

synonymous SNPs, color them blue, and get

external data about one codon-changing SNP.

(Hint: Color option hides in SNP track control)

http://www.openhelix.com/downloads/ucsc/ucsc_home.shtml
Where is the Sequence?
131
132
RefSeq Gene
SNP
Practice

Retrieve genomic sequence of APOE (hg18),

color RefSeq Gene green and SNPs blue.


Browser Graphic Download
Session
Browser remembers
where you were
Session Sharing?
Practice Time

Find your favorite region,


shows only your favorite track,
and save and email the session
to the one in your heart .
chr8:19,863,754-19,864,254

rs328

McAndrew,P.E.

?
AACTAGAAATCAGTCAACAAATTGGATGCTTAGGATAAATTCAAGAACTG
AGTAGAGAAATAAAGCTTAATGAATGACCTTTTGGGCTCCTTCCAGTTCC
AAGGTTTTAGTATTCTAAAATTTTCGGCACAGAACAACTCCAAATGCTCA
GGAAATAAGAATGAGGTCTGTTTTTAAAAGGTGCAGTTTGGAGCATGTTG
GGTGGATGAGGCTATAAAAAGTGAAGTACGATTTTCAAGGAAAGGAAGCT
GACCAATCAAAGTCTTTTGGGCAGCCCCTCCAGAAATCCAGGTGAAGCCC
GGCTCCAGGCTGAGTTGCTGTTACTCTACACGAAAGCCAGGCCGCTACTT
BLAT

BLAST-Like Alignment Tool


W. James Kent
DNA & RNA

500x
158
Protein

50x
159
DNA & RNA

95%
25 bases
Protein

80%
20 bases
DNA

mRNA
BLAT’s Guess

169
DNA (RNA)
Query Genome

DNA DNA
Protein
Query Genome

Protein DNA
6 frames

Protein
Translated RNA
Query Genome

DNA DNA
3 frames 6 frames

Protein Protein
Translated DNA
Query Database

DNA DNA
6 frames 6 frames

Protein Protein
Query

Genome
Query

Genome
Query

Genome
Query

Genome
Usage Restriction
DNA Query

25,000 bases
Protein Query

10,000 bases
Translated Query

10,000 bases

BEFORE
Total

25 sequences
50,000 letters
Practice

Find the protein sequence for mouse APOE (mm9). BLAT


this sequence vs. the human genome (hg19) to find the
human homolog. Look for SNPs (SNPs130) in the coding
region of this gene. Obtain the human DNA sequence for
this region, and underline the SNPs

http://www.openhelix.com/downloads/ucsc/ucsc_home.shtml
Pretty graphic is good,
but…
I want raw data
Table Browser
Text-based access to features
on genomic regions
Mission

(Hg18, SNP130)
Find all single SNPs on tp53
Genome Browser Table Browser

Database
Database

Table

Table
Table

Table Table
Genome Browser

Annotation Track

Table Table

Table
Table Browser

Table
Table

Table
Positional

chrom chromStart chromEnd name strand

chr17 7512594 7512595 rs1794293 -


chr17 7512594 7512596 rs34734132 +
chr17 7512595 7512596 rs1794292 -
chr17 7512715 7512716 rs55817367 +
chr17 7512765 7512766 rs35659787 -
chr17 7512796 7512797 rs17884586 -
chr17 7512825 7512826 rs17884306 -
chr17 7512854 7512854 rs35940853 +
chr17 7512977 7512978 rs34182553 +
Non-positional

ccds srcDb mrnaAcc protAcc

CCDS10.1 H ENST00000379268 ENSP00000368570


CCDS10.1 N NM_004195.2 NP_004186.1
CCDS10.1 H OTTHUMT00000004083 OTTHUMP00000001519
CCDS100.2 H ENST00000377411 ENSP00000366628
CCDS100.2 N NM_024980.4 NP_079256.4
CCDS100.2 H OTTHUMT00000127658 OTTHUMP00000082681
CCDS1000.1 H ENST00000368849 ENSP00000357842
CCDS1000.1 N NM_020127.2 NP_064512.1
Coordinate Mismatch

C A C C T C A G A
Biology 1 2 3 4 5 6 7 8 9
Computer 0 1 2 3 4 5 6 7 8
Coordinate Mismatch

C A C C T C A G A
Biology 1 2 3 4 5 6 7 8 9
Computer 0 1 2 3 4 5 6 7 8
Coordinate Mismatch

C A C C T C A G A
Biology 1 2 3 4 5 6 7 8 9
Computer 0 1 2 3 4 5 6 7 8
Coordinate Mismatch

C A C C T C A G A
Biology 1 2 3 4 5 6 7 8 9
Computer 0 1 2 3 4 5 6 7 8
Coordinate Mismatch

C A C C T C A G A
Biology 1 2 3 4 5 6 7 8 9
Computer 0 1 2 3 4 5 6 7 8
3 Questions

Table?
Output Format?
Filter Criteria?
Table
249
250
251
252
253
254
255
Database

Table
Table

Table

Table
Table
Database

Table
Table

Table

Table
Table
snp130CondingDbSnp.name (via snp130.name)

snp130
name chrom observed
rs1642789 chr17 A/T
… … …

snp130CodingDbSNP
name transcript alleles codons
rs1642789 NM_001126113 TGT,AGT,
… … …
Connected Table

Table
Table

Table

Table
Table
Table Schema
Connected Table

Sample Rows

Track Description
Output
Practice

hg18
snp130
chr6:1,000,000-1,001,000
Sequence Output
100 extra bases on either stream
Filter Criteria
Positional

Non-Positional
Coordinate
Landmark
Landmark;Landmark
Author

One-Based
chrX 151073054 151173000
chrX 151183000 151190000
chrX 151283000 151290000

Zero-Based
chrX:151,073,055-151,173,000
chrX:151,183,001-151,190,000
chrX:151,283,001-151,290,000

One-Based
Limit

1,000 regions
Practice Time

hg18
snp130
chr6:1,000,000-1,500,000
chr6:2,000,000-2,500,000

Item Count: 6,739


Practice
hg18
snp130
genome
rs100, rs200, rs300, rs400, rs500
Data Type

String e.g. Gene Name (TP53)

Number e.g. Chrom Start (100000)

Enumeration e.g. Strand (+/-)


String

value does match criteria


doesn’t

TP53 = TP53

*? = any single character


= any character, any length
String

ap = APOL1, apol2, apoo, APOBEC3G…


*
apo?? = APOL1, apol2…

?po = IPO13, APOL1, apol2…


*
Number

value is ignored criteria


in range
<
<=
=
!=
>=
>
in range

100,200 or or
100 200 100, 200
Enumeration

value does match (choices)


doesn’t
Free-form Query

AND, OR, NOT, LIKE


+-*/
<, <=, =, !=, >=, >

where
Free-form Query

((txEnd – txStart) > 30000)


AND (exonCount = 1
OR exonCount = 2)
Practice Time
hg18
Variation and Repeats
Simple Repeats
simpleRepeat
chrx:1,000,000-2,000,000
Copy number > 100

Item Count: 47
SNP

Gene
SNP

Intersect
Gene
Intersect Items

5’ 3’

Intersect
Base-Pair-Wise Intersect

5’ 3’

Intersect
Base-Pair-Wise Union

5’ 3’

Union
Complement

5’ 3’

Complement
Practice Time
hg18
chr1:1,000,000-2,000,000
Find SNPs(v.130) on sno/miRNA

chr1 1092425 1092426 rs72563729 0 +


Custom Track
Baum AE et al. A genome-wide association study implicates
diacylglycerol kinase eta (DGKH) and several other genes in the etiology
of bipolar disorder. Mol Psychiatry. 2008 Feb;13(2):197-207
Radwan A et al. Prediction and analysis of nucleosome exclusion
regions in the human genome. BMC Genomics. 2008 Apr 22;9(1):186.
http://tiny.cc/c9gb2

browser hide all


track visibility=full useScore=1 itemRgb=on
chr4 1000000 1005000 myData 200 + 1000100 1004900 100,150,200
GFF
GTF PSL
bedGraph MAF
BED
bigWig WIG BED15
bigBed BAM
Data

browser hide all

track visibility=full useScore=1 itemRgb=on

chr4 1000000 1001000 myData 200 + 1000300 1000700 100,150,200


chr4 1000000 1005000 myData 200 + 1000100 1004900 100,150,200

chr start end

Zero-Based
chr4 1000000 1005000 myData 200 + 1000100 1004900 100,150,200

name
chr4 1000000 1005000 myData 200 + 1000100 1004900 100,150,200

score (0 ~ 1000)
chr4 1000000 1005000 myData 200 + 1000100 1004900 100,150,200

strand
chr4 1000000 1005000 myData 200 + 1000100 1004900 100,150,200

thickStart thickEnd
chr4 1000000 1005000 myData 200 + 1000100 1004900 100,150,200

color

Red,Green,Blue
0~255
Track

browser hide all

track visibility=full useScore=1 itemRgb=on

chr4 1000000 1001000 myData 200 + 1000300 1000700 100,150,200


Track

track visibility=full useScore=1 itemRgb=on

Track
Track

track visibility=full useScore=1 itemRgb=on

full
pack
squish
dense (Default)
hide
Track

track visibility=full useScore=1 itemRgb=on

Shading: 1
No Shading: 0 (Default)
Track

track visibility=full useScore=1 itemRgb=on

Color: on
No Color: off (Default)
Track

browser hide all

track visibility=full useScore=1 itemRgb=on

chr4 1000000 1001000 myData 200 + 1000300 1000700 100,150,200


Have Fun
in silico PCR
min 15 bases
5’ 3’

Max Product Size


3’ 5’
5’ 3’

Min Perfect Match


(>= 15 bases)
3’ 5’
5’ 3’

Min Good Match


5’ 3’
3’ 5’
5’ 3’

Flip
LiftOver
Mission

hg18 chr1:1,000,000-1,100,000

hg19 ?
Zero-Based
chr1 1010136 1020137

Zero-Based
DNA Duster
Mission

Clean up and reverse


complement a DNA
sequence
Help
Mailing List
Subject: in-Silico PCR Min Perfect/Good Match confuses me
From: louis@ibms.sinica.edu.tw
To: genome@soe.ucsc.edu
Date: 04/22/2010 03:24:06 PM

Hi,

On the in-silico PCR input page there are two options: min perfect match
& min good match. There are some explanations on the same page:

But how can these two options be specified simultaneously? will one
option be overridden by another?

Louis
Subject: Re: [Genome] in-Silico PCR Min Perfect/Good Match confuses me
From: Galt Barber galt@soe.ucsc.edu
To: genome@lists.soe.ucsc.edu
Date: 04/23/2010 02:50:01 AM

Hi, Louis!

All conditions apply at once:

You must have valid forward and reverse primers matching


to give a result.

This does allow you to increase Min Perfect Match above 15 if you want.
It allows you to increase Min Good Match over Min Perfect Match if you
want. But the specificity near the 3' end of the primers is always at
least 15 bp perfectly matching.

-Galt
"It's been a wonderful stone soup, where
other people have contributed bits,"

- James Kent

Anda mungkin juga menyukai