Anda di halaman 1dari 4

BI201C dChip for Gene Expression and SNP Microarray Data Analysis

Cheng Li
Jan 26, 2009 This files address: http://www.dchip.org/dchip_expression.doc This course is jointly conducted by Cheng Li and Shailender Nagpal, through Bioinformatics.Org: http://wiki.bioinformatics.org/BI201C_dChip_for_Gene_Expression_and_SNP_Genotyping dChip website: www.dchip.org

Session 3: Gene Expression Analysis: Case Study on Filtering, Comparison, Clustering, Enrichment
1. Obtained dChip and demo data Download and unzip the files to a directory: http://biosun1.harvard.edu/complab/dchip/dchip_demo.zip (69 Mb) You can follow me to explore this dataset using dChip in todays session. Alternatively, the full data can be obtained following the steps 1.1 1.5 (in your own time): 1.1 dChip software: http://biosun1.harvard.edu/~cli/dchip.exe (download to a directory, e.g. "c:\dchip") 1.2 Download and unzip example data CEL files, -- The paper describing this dataset (Armstrong et al. Nature Genetics 30, 41 47, 2001) -- Download and unzip these files: scaling_factors_and_fig_key.txt ALL1, ALL2, MLL1, MLL2 zipped files, may need to rename the extension to .gz before using WinZip to unzip them. 1.3 Download and unzip the CDF file:HG_U95A.zip 1.4 Download and unzip gene information file: HG-U95Av2 gene info2.zip 1.5 Download the sample information file made from scaling_factors_and_fig_key.txt: ALL sample info.xls 2. Basic steps to open expression data (covered in session 1-2) 2.1. Data extraction, normalization and expression computation

-- Analysis/Open group: specify data directory, working directory (in Options), sample information file, gene information file -- Analysis/Normalize & Model: are the arrays being normalized to have similar median intensity? 2.2 Check probe level data -- Click the PM/MM data on the left, use Home, End (go to another probe set), PageUp and PageDown (go to another array) keys to look at the probe level data, and the model fitted for the current probe set. 2.3 Check outlier arrays -- After step 2.1, look at the array summary file for any outlying arrays. -- Also check array images for marked single outliers in pink; press key O to toggle displaying array outliers; toggle back and forth two array images to see if these outliers are identified reasonably. -- Use Image/Normalization plot to view the scatterplot of outlier arrays and baseline arrays. 3. Filter genes and clustering 3.1 Analysis/Filter genes: usually its good to obtain < 1000 genes to look at in clustering 3.2 Analysis/Hierarchical clustering: Check both sample and gene clustering -- Are samples of similar types clustering together? Is there anything special about mis-clustered samples? -- Enlarge the image; what are the genes highly expressed in particular groups of samples? Are there replicate probe sets for the same gene selected and clustered closely? -- Redo gene filtering using different criteria to get gene lists of different size, and then do clustering. Is the sample clustering similar? -- Redo Analysis/Open group with Options/Log 2 transform expression values checked; redo filtering and clustering. Is the result similar to the original scale? -- Click a gene name on the right to go to online database 3.3 The Clustering menu 4. Gene function enrichment analysis 4.1 What are the known genes or functionally significant gene clusters in step 3.2? 4.2 Tools/Gene function enrichment -- Click a gene branch before doing this step will classify the genes in the branch 5. Compare samples for supervised gene selection 2

5.1 Analysis/Compare samples 5.2 Analysis/Hierarchical clustering -- Use Tools/Array list file to order samples. -- Sample clustering is not necessary, but may identify outlying samples 5.3 Combine comparisons 5.4 Permute samples to assess FDR

Session 4: Gene Expression Analysis: Case Study on ANOVA & Correlation, Classification, Genome, Chromosome, Automating dChip
6. Use ANOVA or correlation analysis for supervised gene selection 6.1 Analysis/ANOVA & Correlation -- Use type for ANOVA filtering, similar to Compare samples for two groups -- Use fake_class for ANOVA filtering -- Use fake_response for correlation filtering -- Use a gene or gene branch for correlation filtering 7. Classifying samples 7.1 Tool/Classify samples, using fake_class, specifying an ANOVA gene list, or filtered gene list -- LDA requires the R software to be installed -- PCA (Principle component analysis) may be performed, which doesnt use the class information -- The Cross-validation option 8. Use chromosome and genome information 8.1 Obtain genome information and cytobands files: http://biosun1.harvard.edu/complab/dchip/chromosome.htm Download the genome information file: hg_u95av2 genome info2.xls (hg11) 8.2 Analysis/Genome -- Color gene branches in the clustering figure and then do this 3

8.3 Analysis/Chromosome 9. Automate dChip functions Tools/Automate menu Homework 1. Use your own dataset, or find an expression dataset from GEO: http://www.ncbi.nlm.nih.gov/geo/ (E.g. search lung cancer at Query/DataSets, click a GSE reference series, download the CEL files from the bottom of the page). 2. Analyze the dataset using dChip. You may need to obtain CDF file and the gene information file of the specific array type, and create your own sample info file based on datasets annotations. Follow the similar analyses as the original paper of the dataset or the functions covered in todays sessions. Do you get similar conclusion as the original paper? Note any analysis questions for discussion.

Anda mungkin juga menyukai