COMPLEX TRAITS Biology is entering an exciting era brought about by the this challenge2, resulting in the need for new concepts
A trait that is determined by increase in genome-wide information. Functional and genome-wide technologies if this complexity is to
many genes, almost always genomics in particular is making rapid progress in be dissected.
interacting with environmental Despite the unresolved issues, the power and poten-
assigning biological meaning to genomic data. The
influences.
tools of functional genomics have enabled several tial of functional genomics is impressive. We illustrate
systematic approaches that can provide the answers this here by discussing three core applications of genome
to a few basic questions for the majority of genes in a technology, using selected examples from different
genome, including when is a gene expressed, where is organisms: genome-wide knock-out, gene expression
its product localized, with which other gene products and genetic mapping studies. We go beyond these exam-
does it interact and what phenotype results if a gene is ples to point out the areas in which technological
mutated. Functional genomics aspires to answer such improvements are possible.
questions systematically for all genes in a genome in As functional approaches and verification of their
contrast to conventional approaches that do so for one accuracy often require genetic manipulation, many
gene at a time. technical advances in functional genomics have their
*European Molecular Several key biological challenges are central to con- origin in model systems. Nonetheless, an effective tran-
Biology Laboratory,
Meyerhofstrasse 1, tinuing genome projects and are relevant to any sition of some of the technologies to humans is becom-
69117 Heidelberg, eukaryotic organism, from yeast to humans. One chal- ing more attractive3. The utility of such a transition can
Germany. lenge is to understand how genes that are encoded in a be maximized by careful evaluation of the power and
‡
Department of Biochemistry genome operate and interact to produce a complex liv- limitation of these approaches.
and Stanford Genome
ing system. A related challenge is to determine the To obtain the most benefits from functional geno-
Technology Center,
Stanford University, function of all the sequence elements in the genome. A mics, we argue, the technology, which is at present
855 California Avenue, third challenge is to understand the contributions of mainly carried out by a few dedicated centres, needs
Palo Alto, California 94304, the multitude of sequence variants to phenotypic vari- to become integrated into individual laboratories.
USA. ation, both within and between species. One of the Individual laboratories often have crucial expertise in
Correspondence to L.M.S.
e-mail: most enduring challenges in genetics has been to find a specific biological problem, and although functional
lars.steinmetz@embl.de the genetic variants that are responsible for COMPLEX genomics might provide approaches to address them, a
1
doi:10.1038/nrg1293 TRAITS . Current methods have mostly failed to meet key discovery can often only be made by bringing the
two together. We believe that for this to be achieved, two sequences takes centre stage3. With this role in mind, we
goals should be met: experiments must be further evaluate three areas of functional genomics that have
miniaturized and costs must be lowered. been piloted in different model systems. We indicate
promising directions of research and suggest new
Technological innovations approaches that need to be designed.
Efforts towards increased miniaturization and decreased
costs are exemplified by developments that originated Interfering with gene function. Phenotypic analysis of
from genome sequencing. In many ways, functional mutants has been a powerful approach for determining
genomics was catalysed by the genome-sequencing pro- gene function. Gene function can be altered through gene
jects: large-scale sequencing and the genome projects deletions, insertional mutagenesis and RNA INTERFERENCE
created an increase in available DNA sequences, around (RNAi) (BOX 1).
which new technologies that use this information were Few methods offer the experimental control that is
developed. A result is one of the most widely recognized afforded by gene deletion. A true knock-out or null
and accessible genomics tools — the DNA microarray mutation achieves complete functional reduction of the
— which allows parallel hybridization assays to be encoded gene product. Because it is difficult to achieve in
carried out on an unprecedented, miniaturized scale. many organisms, compromises have been made by gen-
The second, and often unrecognized, contribution of erating incomplete knock-outs. Gene products can be
the genome projects is the ~1,000-fold decrease in the knocked-down or silenced as a result of point mutagene-
cost of DNA sequencing, which had to be achieved to sis, insertional mutagenesis or RNAi. Although not yet
complete the Human Genome Project. The drop in feasible on a large scale, proteins might be targeted using
sequencing costs facilitated large-scale sequencing pro- drugs7, and it might eventually be possible to use drug
jects of other organisms and has contributed to the fact compounds to generate knock-downs for every gene
that DNA sequencing is still the most frequently used product in a genome and to apply them across species.
technology for detecting DNA variation. Today, the The power of systematic mutant analysis is well illus-
comparison of genomes among several species allows trated by an experiment in which an international con-
the study of numerous biological features, such as studies sortium systematically generated a gene deletion strain
of conserved sequences4–6. for every gene in the yeast Saccharomyces cerevisiae
Developments of genomic technology have until genome and analysed the phenotypes in a single tube
now primarily focused on the generation of genome assay8,9 (FIG. 1). The quantitative fitness measurements
sequence data, from the development of genome- that are obtained for each gene with this tool enable
analysis technologies to the generation of physical and applications beyond determining whether a gene is
genetic maps, the sequencing of model organism essential. This is an important advance because it opens
genomes and the completion of the human genome up a wide variety of applications based on quantitative
sequence. The next focus in genomics builds on the analysis, such as identifying functionally relevant genes
genome sequences and heralds the beginnings of an and drug targets, comparing function and expression,
exciting phase of genome biology — the true genome era, defining candidate disease genes and studying molecular
when deriving functional information from genome evolution (BOX 2).
a c
CP UPTAG CP KanMX CP DNTAG CP
Deletion cassette
ORF
Start Stop
Selection
F F
CP UPTAG KanMX DNTAG CP
F F
PCR amplification
F
F F
TAG
TAG TAG
F
F F F
TAG F F
F TAG TAG
F F
Hybridization Selection
Figure 1 | Assaying molecular barcode tags in yeast pools. a | Start-to-stop-codon deletion by double homologous
recombination with deletion cassettes. The 45-base pair (bp) sequences at each end of the PCR-amplified deletion cassettes are
identical to those found upstream and downstream of the targeted gene. On transformation, double homologous recombination
integrates the PCR product into the chromosome, displacing the target gene and generating a precise start-to-stop-codon
deletion. A dominant drug-resistance marker (KanMX) that is part of the PCR product serves to select for the integration event.
To allow pooling of the deletion strains and a parallel analysis, each PCR cassette also contains two unique 20-bp DNA
sequence tags that serve as molecular barcodes to uniquely identify each strain10 (UPTAG, 5′-end (upstream) tag sequence;
DNTAG, 3′-end (downstream) tag sequence). The tag sequences, flanked by two common priming (CP) sites, were designed to
be different to avoid cross-hybridization and to have optimal hybridization properties. b | Competitive growth of deletion pool,
genomic DNA extraction, PCR amplification and array hybridization. Because tag sets are flanked by two CPs, all UPTAGs
or DNTAGs can be amplified in a single PCR reaction from genomic DNA that is isolated from the mixed pool. The tags
can be quantitatively measured by hybridizing the PCR products to a high-density oligonucleotide array that contains the
complementary tag sequences. This microarray allows the signal from each tag to be distinguished. The signal quantity is
proportional to the abundance of strains in the culture. c | Measurement of quantitative deletion phenotypes in pools during an
experimental perturbation. By comparing the change in signal intensity during selection, change in abundance of strains (fitness)
can be calculated for all strains in parallel.
One of the strengths of the systematic deletion analy- A targeted deletion approach is also not without
sis in yeast lies in the fact that deletion strains can be caveats — only annotated open reading frames (ORFs)
pooled for parallel analysis. This is possible owing to the in a genome are generally targeted; the phenotypes of
presence of unique 20-base pair (bp) DNA sequence tags overlapping ORFs cannot be easily distinguished; trans-
in the deletion cassettes that serve as molecular barcodes formation events might have introduced secondary
to uniquely identify each strain10. mutations14; essential genes are inviable as homozygous
Despite their power, genome-wide deletions that are deletions; and functionally redundant genes might not
generated by homologous recombination have yet to be show a detectable phenotype when deleted.
reported in organisms other than yeast. In many cases, Apart from complete knock-outs achieved by gene
such as mouse11, Drosophila12 and sheep13, in which the deletion, two other approaches that are more easily
homologous recombination system has been developed, applied to other organisms — insertional mutagenesis
low targeting efficiencies and practical considerations and RNAi — might generate partial knock-outs. Partial
in maintaining deletion collections are important knock-outs are not as suited to interpreting fitness
concerns. effects, because the molecular nature of the mutation, its
parts of the same gene achieve different levels of reduc- measurement of gene expression is closely related to the
tion and secondary off-target effects24. Two studies have biological goal, more assumptions are required to infer
further shown that siRNAs mediate an interferon function from expression differences in the third type
response in mammalian cells as a secondary effect25,26. of study.
Caution must therefore be exercised before attributing a Perhaps the most powerful application of gene-
particular response to the targeted gene. Nevertheless, it expression data is its use as a signature profile, which
is hoped that these issues can be addressed, as RNA can be used as a detailed molecular phenotype. Although
silencing methodologies are likely to be applied genome- the identity of the individual genes that change expres-
wide in many organisms owing to their ease of use. sion is, in the first case, irrelevant, they must have
accurate gene assignments (BOX 3). The profile across
Gene-expression profiling. Gene-expression profiling is thousands of genes gives a distinctive pattern for a sam-
the most widely used functional genomic technology ple. Such patterns have been used in model organisms
today, in part because it was one of the earliest to be to classify genetic mutants by similarities in profile27 and
developed and in part because it achieves high through- to evaluate secondary effects of drug treatments28.
put in a single tube assay. However, the most powerful applications of signature
Applications of gene-expression studies can be profiling are likely to be for characterizing human disease
divided into three principal areas: those that generate populations (see below).
signature profiles, those that study transcription and its Another application of gene expression is to study
regulation and those that determine the function of mRNA transcription and its regulation. Gene-expression
gene products. Each is associated with different biologi- data can be used to search for regulatory sequences by
cal concerns. Although in the first two applications the looking for common sequence elements in upstream
regions of genes that have similar expression profiles29–31.
To find target genes of specific transcription factors,
genes that are under expressed as a result of transcrip-
Box 3 | Technical concerns about microarrays tion-factor deletions or overexpressed as a result of tran-
Microarray analysis can detect the presence of individual target sequences in complex scription-factor gain-of-function mutations can be
mixtures because hybridization specificity is exploited. Nevertheless, important identified32. A complementary approach identifies the
technical concerns are associated with different types of microarray platform. Two types DNA that is bound to transcription factors by co-
of high-density microarray platform are most widely used and can be distinguished by immunoprecipitation33. Integration of the analysis of
the method of probe placement onto the array surface. In the first type of microarray, DNA that is bound to transcription factors with co-
oligonucleotide probes are synthesized directly on the glass surface90,91. expression of genes has shown a complex network of
Photolithography and photosensitive oligonucleotide synthesis chemistry90 are the most interactions that explains the difficulty of distinguish-
common processes used and achieve a density that is greater than 1,000,000 spots/cm2. ing primary from secondary effects in the analysis of
In the second type of microarray platform, represented mainly by the cDNA transcription-factor mutations33. Together, these data
microarray92, probe synthesis is separated from array manufacture92–94. On printed help to formulate hypotheses about transcription factors
microarrays, probes are synthesized and then mechanically spotted or printed in an and their targets. The approaches can be complemented
arrayed format, with a density of ~10,000 spots/cm2. by knock-out analysis of predicted pathway members,
The method of array manufacture places different demands on quality control. Probes with the aim of dissecting the regulatory circuitry and
on synthesized microarrays are short and are synthesized directly on the glass surface. A signalling pathways that regulate transcription.
sequence database directs the synthesis of probes. Because synthesis is highly
The application that is at present least understood is
reproducible, absolute levels of expression can be estimated, and because the probes are
the use of gene expression to determine gene-product
short, SNPs can be detected. Probes can be specifically designed to distinguish splice
function. Most mRNAs are not functional themselves;
variants and members of gene families. Nevertheless, the short probes require accurate
databases, as errors in sequence databases translate to errors in probe sequence on the they are intermediates and transmitters of information
arrays. However, as sequences in databases are updated and sequence quality improves, from the genome to the proteome. Inferring function
the chance for errors decreases. Importantly, because there can be multiple probes to from mRNA levels is indirect and rests on several
each gene, the array designs provide internal redundancy. assumptions. One is that evolutionary selection has
For printed microarrays, probe generation is separated from array synthesis; therefore, been tight enough to ensure that mRNAs are pro-
storage and tracking of probe samples is crucial. Errors occur when probes are spotted duced and present only when the corresponding gene
from microtitre plates that are either wrong, out of order, in the wrong orientation or products are needed. This assumes that no regulatory
contaminated. One study reported that only 62% of DNA samples from plates used for changes occur at the protein level that are not
microarray spotting were of the correct identity 95. Such errors would be further amplified reflected by changes in amounts of mRNA. A second
by errors during and after printing 96. Printing results in variations in spot size and probe assumption is that transcription of one gene is inde-
concentrations, and is at present addressed by co-hybridization of a reference to the pendent of another and that there is no competition
same array. The choice of reference is crucial for data interpretation as it affects the ratio for resources.
calculation, and the use of different reference samples by different laboratories makes These assumptions are questionable. A cell functions
comparisons between experiments performed by different laboratories difficult44. as a unit and therefore transcription-factor availability
Another problem is with distinguishing splice variants and members of gene families and protein concentrations probably depend on their
— the long sequences on cDNA microarrays allow cross-hybridization. This is an use at other parts of the cell34. It is also evident that regu-
important limitation because, to some extent, the molecular complexity in humans latory changes occur during mRNA translation, post-
arises from differential splicing 97. Array designs that can distinguish between splice translational modification and protein degradation. In
variants need to be implemented98.
fact, experiments show that the correlation between
mRNA changes and changes in protein level in a cell is challenge is to develop methods that can accurately
weak35–37. Although a genome-wide comparison is still genotype thousands of markers when only a small
missing — mainly as a result of the technical difficulty amount of sample is available and at low costs. One
of accurately measuring protein levels under multiple technology uses molecular barcodes to genotype
conditions for many proteins38,39 — the trend is appar- directly from genomic DNA48. It uses the same concept
ent: one cannot assume that there is a correlation at the of multiplexing as the yeast deletion project: the signal is
individual gene level. In addition, the poor correlation amplified using common primer sequences after the
between changes at the gene-expression level and fit- genotyping reactions have occurred (BOX 4).
ness effects of gene products8,9,40–42 (BOX 2) indicates that Even with the ability to identify map intervals, the
there are many more genes that change expression causative variants remain to be identified. In cases in
between any two experimental conditions in yeast than which intervals have been identified, several approaches
there are genes that affect phenotype by gene deletion have been taken to locate the underlying genes. A sys-
during the same transition. In addition, they point out tematic approach involves identifying all sequence vari-
that most genes with a deletion phenotype in yeast do ants in these intervals. This approach is challenging
not show an expression change when measured for the because the sequencing of entire intervals is, in most
same perturbations. cases, impractical given the large sizes and, often, limited
There are two types of issue when we interpret DNA samples. Furthermore, mutations that are present
expression differences: if gene expression is changed or in a heterozygous state are difficult to identify. A tech-
remains unchanged between two conditions, how reli- nology that uses the mismatch-repair system of bacteria
able is this result and what is its biological significance? holds promise for focusing attention on cloned frag-
Whether the result is reliable depends to a large extent ments that contain a polymorphism49,50 (BOX 5). The
on the technology that is used to measure the expression approach can be combined with molecular barcodes for
difference and on the accuracy of the experimental exe- high-throughput mutation scanning.
cution; these concerns exemplify technical challenges Rather than scanning an interval for all polymor-
that are associated with high-throughput approaches. In phisms, candidate genes can be prioritized. Such
the case of gene-expression profiling, they indicate that approaches can be applied either to genes in mapped
more rigid quality control and careful experimental intervals or to candidate genes genome-wide, in the
design are needed43,44 (BOX 3). This technical concern is absence of positional information. One study has used
independent of the difficulty that is associated with the data mining of the published literature to establish a
functional interpretation of expression changes because connection between disease terms and functional terms
there are difficulties even when considering the most dif- associated with genes51. The approach scores a candidate
ferentially expressed genes for which different platforms on the basis of how frequently it can be connected to the
often agree. disease description by terms that co-appear in the litera-
ture (using PubMed). Another approach is to use func-
Genetic mapping of quantitative trait loci. A third prin- tional genomic data from model organisms to rank
cipal area of high-throughput genomics is the identifi- human candidates according to functional information
cation of the genetic factors that underlie complex traits. about their orthologues41,52.
Developments in this field can be divided into three Gene-expression profiling has also been proposed as
areas that correspond to the process of dissecting the a general method for identifying candidate genes53. One
genetic basis of complex traits: defining genetic inter- study integrated a disease interval with experimental
vals, identifying candidate genes and verifying an allele’s data of expression and proteomics to identify a muta-
contribution to the phenotype. tion in patients with Leigh syndrome52. Another study
Genetic mapping has been successful in finding successfully found a quantitative trait locus (QTL) gene
important genes for some complex diseases such as on the basis of the absence of mutant-gene expression
asthma45. However, in most cases, genetic mapping has in hypertensive rats54. In the latter study, gene expression
been problematic1, as evidenced by reports that previ- was decreased as a result of partial deletion of the ORF.
ously mapped intervals failed to withstand significance However, for traits that are not caused by deletions that
tests in subsequent studies46. result in expression differences or mutations that affect
A few recently developed high-throughput genotyp- mRNA levels, expression might not identify causative
ing technologies show promise for testing the feasibility of variants.
high-resolution association studies. The rationale in these A recent review47 assessed the types of mutation
studies is to test thousands of polymorphisms in carefully found in Mendelian diseases and enables inferences
selected populations to identify sequence variants that about possible implications for disease-gene identifica-
are more frequent in individuals with one phenotype tion approaches in complex traits. Surprisingly, regula-
than in individuals with other phenotypes. The tech- tory changes are found in fewer than 1% of the more
nologies can achieve throughputs of 10,000’s of markers than 1,400 known Mendelian disease genes (see Human
(BOX 4). Unfortunately, genotyping the 50,000–1,000,000 Gene Mutation Database in online links box). The vast
markers that have been proposed as the minimum majority of mutations are missense or nonsense changes
needed by statistical predictions47 still represents a in the ORFs (58%), insertions or deletions (30%) and
tremendous challenge, especially because the markers splicing variants (10%). If similar patterns are true for
need to be genotyped in large numbers of samples. The complex traits, it might be expected that genes identified
by altered expression will primarily be modulators of reduction of multiple parallel high-throughput assays
the phenotype or will be involved in secondary effects back into a single tube. These development cycles allow
and not the causative variants themselves. unprecedented exponential increases in throughput
and decreases in cost that would define a paradigm
Meeting the challenges shift in biological interrogation.
The future of new genomic technologies is crucially Molecular barcoding is among the most promising
linked with bringing them into the laboratory of an technologies for miniaturizing high-throughput
individual investigator because it is there that the key approaches into single tube assays. Through a combi-
biological expertise that is required to make a biologi- nation of molecular barcodes and the microarray-
cal breakthrough lies. Therefore, the technologies with based detection system, the yeast-deletion approach has
most promise are those that allow high-throughput achieved an unparalleled level of throughput, and its
biology to be carried out in an individual laboratory, many applications are a testament to its power (BOX 2).
instead of genome centres and well-equipped and A molecular barcode assay might simply consist of
well-financed institutes. For this purpose, the app- attaching tags to the biological entities that are to be
roaches that achieve high throughput with one interrogated (for example, knock-out strains), collect-
method performed many thousands of times in paral- ing mixed tag pools after selection, performing a single
lel should be distinguished from those that achieve PCR reaction, labelling it and hybridizing it to a
high throughput in a single tube assay (FIG. 2). The lat- microarray to obtain the relative abundance of all bio-
ter approaches not only allow the transfer of high logical entities in the selected pool. The primary advan-
throughput to individual laboratories but also allow a tage of the barcode concept is that it is versatile and can
further scale up, by several orders of magnitude, be applied to other knock-out approaches, such as
through the use of robotics or high-throughput equip- insertional mutagenesis and RNAi, as well as to other
ment. This advance also enables the next cycle — the biological assays (see below).
Interfering with gene function. Although the deletion are without errors. DNA-primer synthesis might
approach for yeast is high throughput, its power can be have introduced mutations in some of the barcode
improved by complementing the assay with further sequences, which would affect the hybridization signal
approaches; for example, temporally induced knock- on microarrays. The presence of two molecular bar-
outs using repressible promoters55 or targeting pro- codes in each mutant makes it unlikely that both con-
teins for proteolysis56 for the study of essential genes tain mutations; in addition, the comparative analysis of
that are inviable as homozygous knock-outs. Further- samples before and after selection controls for such mis-
more, to uncover molecular differences among mutants takes. Nevertheless, barcodes are being sequenced and
when no growth differences are apparent, the fitness the hybridization effect should be corrected by remaking
data of knock-outs could be complemented with the strains or by introducing the corresponding changes
metabolite profiling57,58 and expression profiling of dele- into the microarray probes.
tion strains27. Finally, the systematic construction of To advance the insertional mutagenesis approach,
double deletions that has been initiated to assess syn- barcodes could, in theory, also be incorporated into each
thetic lethality59,60 could help to characterize functionally insertion sequence, enabling a pooled knock-out analy-
redundant genes. sis. Such pooled approaches using transposons have
To ensure that molecular barcodes are associated been carried out in bacteria61,62. Technically, the RNAi
with the right ORFs, Shoemaker et al.10 incorporated the approach might also be aided by the use of molecular
molecular barcode tags into the primers that contain the barcodes, which can in theory be integrated into the
target-specific homologous sequences, making it plasmids or the genome-integration cassettes that
unlikely that one molecular barcode sequence will express dsRNA molecules. Such an advance might sim-
associate with a deletion of a different ORF. It does plify mutant tracking and enable pooled analysis in
not, however, guarantee that all barcode sequences organisms for which this is practical.
EPISTASIS To speed the dissection of quantitative traits, new Technological improvements need to maximize the
An interaction between non- tools, such as reciprocal hemizygosity, genome-wide potential offered by functional genomics. False predic-
allelic genes, such that one gene marker mapping and mutation-scanning methods, need tion rates of high-throughput approaches need to be
masks, interferes with or
to be more broadly applied73,74. The ideal approaches eliminated, particularly as we advance towards applying
enhances the effect of the other
gene. should be applicable to different genetic backgrounds, as functional genomics in the clinical setting for diagnostics
the alleles identified in one background might not and personalized medicine. Improvements in data analy-
necessarily have the same effects in another background sis, comparison and the integration of approaches that
or under different environmental conditions75. Further- combine different measurements for a genome need to
more, allele frequencies are likely to range from common be improved to generate a global picture and to move
to rare72, highlighting the demand for approaches that towards generating refined models of cellular processes.
make no prior assumptions about allele frequency in Importantly, to accelerate discoveries, high-through-
populations. Alleles are also likely to have additive and put biology needs to be brought into individual laborato-
EPISTATIC effects, requiring tools that can analyse single ries because that is where the biological expertise lies. The
alleles and higher-order combinations of alleles. use of molecular barcodes is one example of an innova-
Classification of phenotypes also deserves attention76. tion that provides increased throughput through minia-
In cases in which samples do not come from crosses but turization, accompanied by reduced costs. Incorporating
are sampled from populations, phenotypic heterogene- such technology into other assays would provide the
ity could have detrimental effects on the ability to detect advance in efficiency that makes it possible to imagine
association between a marker and disease. Phenotypic high-throughput experimentation as an integral part of
classification might be aided by molecular phenotypes any individual investigator’s laboratory. It should also be
based on expression profiles, signature patterns and possible to combine different whole-genome measure-
other molecular parameters. By integrating genetics ments, such as gene expression, genotyping, knock-out
with high-throughput tools of functional genomics in analysis, protein interactions, protein levels and splicing
this way, it might be possible to advance the dissection into a single assay — a further step towards miniaturiza-
of complex traits and gain deeper insights into the tion — as proposed in FIG. 2. These developments would
interaction of genes and the environment. in turn maximize biological discovery and take functional
genomics to the next level. Similar approaches would also
Conclusions promise to revolutionize medical diagnostics for the
Functional genomics has changed the way biology is delivery of better health care and lower health care costs.
done. And yet the field is still in its infancy in terms of Although we believe that high-throughput experi-
detailing the complexity that underlies biological sys- mentation will become more widespread, we expect
tems, such as the complex network of genetic regula- single-gene studies to remain essential. They continue
tion, protein interactions and biochemical reactions that to help to verify and evaluate high-throughput data
make up a cell. As the applications illustrate, systematic sets; high-throughput technologies are often not sensi-
knock-out analyses in multiple model organisms will aid tive enough to obtain accurate information for every
the definition of different cellular components and help gene in a genome and, in some cases, detailed analysis
to describe the conservation of biological processes dur- of individual genes is needed to establish a link between
ing evolution. Detailed analysis of mRNA regulation will the global picture and individual discoveries that
help to clarify the functional significance of gene expres- involve few sets of genes. Nevertheless, the integration
sion. Furthermore, the discovery of examples for the of functional genomics into individual laboratories will
genetic basis of complex traits in model organisms will provide the most accurate test of the power brought
help to formulate more rigorous hypotheses for what to about by integrating single-gene studies with high-
expect in higher organisms and promises to provide throughput approaches for faster and more efficient
better approaches for dissecting human disorders. biological discovery.
1. Risch, N. J. Searching for genetic determinants in the new 9. Winzeler, E. A. et al. Functional characterization of the 15. Ross-Macdonald, P. et al. Large-scale analysis of the yeast
millennium. Nature 405, 847–856 (2000). S. cerevisiae genome by gene deletion and parallel analysis. genome by transposon tagging and gene disruption. Nature
2. Flint, J. & Mott, R. Finding the molecular basis of quantitative Science 285, 901–906 (1999). 402, 413–418 (1999).
traits: successes and pitfalls. Nature Rev. Genet. 2, 437–445 10. Shoemaker, D. D., Lashkari, D. A., Morris, D., Mittmann, M. 16. Alonso, J. M. et al. Genome-wide insertional mutagenesis
(2001). & Davis, R. W. Quantitative phenotypic analysis of yeast of Arabidopsis thaliana. Science 301, 653–657 (2003).
3. Collins, F. S., Green, E. D., Guttmacher, A. E. & Guyer, M. S. deletion mutants using a highly parallel molecular bar-coding A large-scale insertional mutagenesis screen in
A vision for the future of genomics research. Nature 422, strategy. Nature Genet. 14, 450–456 (1996). Arabidopsis.
835–847 (2003). The first use of PCR-amplifiable molecular barcodes 17. Fire, A. et al. Potent and specific genetic interference by
4. Waterston, R. H. et al. Initial sequencing and comparative for high-throughput parallel biology. double-stranded RNA in Caenorhabditis elegans. Nature
391, 806–811 (1998).
analysis of the mouse genome. Nature 420, 520–562 (2002). 11. Thomas, K. R. & Capecchi, M. R. Site-directed mutagenesis
18. Kamath, R. S. et al. Systematic functional analysis of the
5. Kellis, M., Patterson, N., Endrizzi, M., Birren, B. & Lander, E. S. by gene targeting in mouse embryo-derived stem cells. Cell
Caenorhabditis elegans genome using RNAi. Nature 421,
Sequencing and comparison of yeast species to identify 51, 503–512 (1987).
231–237 (2003).
genes and regulatory elements. Nature 423, 241–254 (2003). 12. Rong, Y. S. & Golic, K. G. Gene targeting by homologous 19. Kennerdell, J. R. & Carthew, R. W. Heritable gene silencing
6. Cliften, P. et al. Finding functional features in recombination in Drosophila. Science 288, 2013–2018 (2000). in Drosophila using double-stranded RNA. Nature
Saccharomyces genomes by phylogenetic footprinting. 13. McCreath, K. J. et al. Production of gene-targeted sheep by Biotechnol. 18, 896–898 (2000).
Science 301, 71–76 (2003). nuclear transfer from cultured somatic cells. Nature 405, 20. Nasevicius, A. & Ekker, S. C. Effective targeted gene
7. Stockwell, B. R. Chemical genetics: ligand-based discovery 1066–1069 (2000). ‘knockdown’ in zebrafish. Nature Genet. 26, 216–220 (2000).
of gene function. Nature Rev. Genet. 1, 116–125 (2000). 14. Hughes, T. R. et al. Widespread aneuploidy revealed by 21. Wianny, F. & Zernicka-Goetz, M. Specific interference with
8. Giaever, G. et al. Functional profiling of the Saccharomyces DNA microarray expression profiling. Nature Genet. 25, gene function by double-stranded RNA in early mouse
cerevisiae genome. Nature 418, 387–391 (2002). 333–337 (2000). development. Nature Cell Biol. 2, 70–75 (2000).
22. Ogita, S., Uefuji, H., Yamaguchi, Y., Koizumi, N. & Sano, H. 53. Wayne, M. L. & McIntyre, L. M. Combining mapping and 82. Clemens, J. C. et al. Use of double-stranded RNA
RNA interference: producing decaffeinated coffee plants. arraying: an approach to candidate gene identification. Proc. interference in Drosophila cell lines to dissect signal
Nature 423, 823 (2003). Natl Acad. Sci. USA 99, 14903–14906 (2002). transduction pathways. Proc. Natl Acad. Sci. USA 97,
23. Elbashir, S. M. et al. Duplexes of 21-nucleotide RNAs 54. Aitman, T. J. et al. Identification of Cd36 (Fat) as an insulin- 6499–6503 (2000).
mediate RNA interference in cultured mammalian cells. resistance gene causing defective fatty acid and glucose 83. Esposito, M. S. & Esposito, R. E. The genetic control of
Nature 411, 494–498 (2001). metabolism in hypertensive rats. Nature Genet. 21, 76–83 sporulation in Saccharomyces. I. The isolation of
24. Jackson, A. L. et al. Expression profiling reveals off-target (1999). temperature-sensitive sporulation-deficient mutants.
gene regulation by RNAi. Nature Biotechnol. 21, 635–637 55. Belli, G., Gari, E., Aldea, M. & Herrero, E. Functional analysis Genetics 61, 79–89 (1969).
(2003). of yeast essential genes using a promoter-substitution 84. Giaever, G. et al. Genomic profiling of drug sensitivities via
25. Bridge, A. J., Pebernard, S., Ducraux, A., Nicoulaz, A. L. & cassette and the tetracycline-regulatable dual expression induced haploinsufficiency. Nature Genet. 21, 278–283
Iggo, R. Induction of an interferon response by RNAi vectors system. Yeast 14, 1127–1138 (1998). (1999).
in mammalian cells. Nature Genet. 34, 263–264 (2003). 56. Kanemaki, M., Sanchez-Diaz, A., Gambus, A. & Labib, K. 85. Lum, P. Y. et al. Discovering modes of action for therapeutic
26. Sledz, C. A., Holko, M., de Veer, M. J., Silverman, R. H. & Functional proteomic identification of DNA replication compounds using a genome-wide screen of yeast
Williams, B. R. Activation of the interferon system by short- proteins by induced proteolysis in vivo. Nature 423, heterozygotes. Cell 116, 121–137 (2004).
interfering RNAs. Nature Cell Biol. 5, 834–839 (2003). 720–725 (2003). 86. Hirsh, A. E. & Fraser, H. B. Protein dispensability and rate of
27. Hughes, T. R. et al. Functional discovery via a compendium 57. Raamsdonk, L. M. et al. A functional genomics strategy that evolution. Nature 411, 1046–1049 (2001).
of expression profiles. Cell 102, 109–126 (2000). uses metabolome data to reveal the phenotype of silent 87. Fraser, H. B., Hirsh, A. E., Steinmetz, L. M., Scharfe, C. &
The use of expression signature profiles on yeast mutations. Nature Biotechnol. 19, 45–50 (2001). Feldman, M. W. Evolutionary rate in the protein interaction
knock-out mutants for achieving functional 58. Allen, J. et al. High-throughput classification of yeast network. Science 296, 750–752 (2002).
groupings. mutants for functional genomics using metabolic 88. Gu, Z. et al. Role of duplicate genes in genetic robustness
28. Marton, M. J. et al. Drug target validation and identification footprinting. Nature Biotechnol. 21, 692–696 (2003). against null mutations. Nature 421, 63–66 (2003).
of secondary drug target effects using DNA microarrays. 59. Tong, A. H. et al. Systematic genetic analysis with ordered An example of the use of fitness data in addressing
Nature Med. 4, 1293–1301 (1998). arrays of yeast deletion mutants. Science 294, 2364–2368 fundamental questions in molecular evolution.
29. Pilpel, Y., Sudarsanam, P. & Church, G. M. Identifying (2001). Incorporates published data sets without new bench
regulatory networks by combinatorial analysis of promoter 60. Ooi, S. L., Shoemaker, D. D. & Boeke, J. D. DNA helicase experiments.
elements. Nature Genet. 29, 153–159 (2001). gene interaction network defined using synthetic lethality 89. Papp, B., Pal, C. & Hurst, L. D. Dosage sensitivity and the
30. Ihmels, J. et al. Revealing modular organization in the analyzed by microarray. Nature Genet. 35, 277–286 (2003). evolution of gene families in yeast. Nature 424, 194–197
yeast transcriptional network. Nature Genet. 31, 370–377 61. Hensel, M. et al. Simultaneous identification of bacterial (2003).
(2002). virulence genes by negative selection. Science 269, 90. Fodor, S. P. et al. Light-directed, spatially addressable
31. Segal, E. et al. Module networks: identifying regulatory 400–403 (1995). parallel chemical synthesis. Science 251, 767–773 (1991).
modules and their condition-specific regulators from gene 62. Karlyshev, A. V. et al. Application of high-density array-based The first high-density microarray made by direct
expression data. Nature Genet. 34, 166–176 (2003). signature-tagged mutagenesis to discover novel Yersinia synthesis.
32. Holstege, F. C. et al. Dissecting the regulatory circuitry of a virulence-associated genes. Infect. Immun. 69, 7810–7819 91. Blanchard, A. P., Kaiser, R. J. & Hood, L. E. Synthetic DNA
eukaryotic genome. Cell 95, 717–728 (1998). (2001). arrays. Biosens. Bioelectron. 11, 687–690 (1996).
33. Lee, T. I. et al. Transcriptional regulatory networks in 63. Heller, R. A. et al. Discovery and analysis of inflammatory 92. Schena, M., Shalon, D., Davis, R. W. & Brown, P. O.
Saccharomyces cerevisiae. Science 298, 799–804 (2002). disease-related genes using cDNA microarrays. Proc. Natl Quantitative monitoring of gene expression patterns with a
34. Brenner, S. Sillycon valley fever. Curr. Biol. 9, R671 (1999). Acad. Sci. USA 94, 2150–2155 (1997). complementary DNA microarray. Science 270, 467–470
35. Ideker, T. et al. Integrated genomic and proteomic analyses One of the first applications of gene-expression (1995).
of a systematically perturbed metabolic network. Science profiling for disease sample classification. The first cDNA microarray for gene-expression
64. Perou, C. M. et al. Distinctive gene expression patterns in
292, 929–934 (2001). profiling made by printing.
human mammary epithelial cells and breast cancers. Proc.
36. Griffin, T. J. et al. Complementary profiling of gene 93. Ferguson, J. A., Boles, T. C., Adams, C. P. & Walt, D. R.
Natl Acad. Sci. USA 96, 9212–9217 (1999).
expression at the transcriptome and proteome levels in A fiber-optic DNA biosensor microarray for the analysis of
65. Golub, T. R. et al. Molecular classification of cancer: class
Saccharomyces cerevisiae. Mol. Cell. Proteomics 1, gene expression. Nature Biotechnol. 14, 1681–1684
discovery and class prediction by gene expression
323–333 (2002). (1996).
monitoring. Science 286, 531–537 (1999).
37. Washburn, M. P. et al. Protein pathway and complex 94. Khrapko, K. R. et al. Hybridization of DNA with
66. Bohen, S. P. et al. Variation in gene expression patterns in
clustering of correlated mRNA and protein expression oligonucleotides immobilized in a gel: a convenient method
follicular lymphoma and the response to rituximab. Proc.
analyses in Saccharomyces cerevisiae. Proc. Natl Acad. for recording single base replacements. Mol. Biol. (Mosk)
Natl Acad. Sci. USA 100, 1926–1930 (2003).
Sci. USA 100, 3107–3112 (2003). 25, 718–730 (1991).
67. Eisen, M. B., Spellman, P. T., Brown, P. O. & Botstein, D.
38. Patterson, S. D. & Aebersold, R. H. Proteomics: the first 95. Halgren, R. G., Fielden, M. R., Fong, C. J. &
Cluster analysis and display of genome-wide expression
decade and beyond. Nature Genet. 33 (Suppl), 311–323 Zacharewski, T. R. Assessment of clone identity and
patterns. Proc. Natl Acad. Sci. USA 95, 14863–14868 (1998).
(2003). sequence fidelity for 1189 IMAGE cDNA clones. Nucleic
68. Tavazoie, S., Hughes, J. D., Campbell, M. J., Cho, R. J. &
39. Ghaemmaghami, S. et al. Global analysis of protein Acids Res. 29, 582–588 (2001).
Church, G. M. Systematic determination of genetic network
expression in yeast. Nature 425, 737–741 (2003). 96. Knight, J. When the chips are down. Nature 410, 860–861
architecture. Nature Genet. 22, 281–285 (1999).
40. Birrell, G. W. et al. Transcriptional response of 69. Wu, L. F. et al. Large-scale prediction of Saccharomyces (2001).
Saccharomyces cerevisiae to DNA-damaging agents does cerevisiae gene function using overlapping transcriptional 97. Modrek, B. & Lee, C. A genomic view of alternative splicing.
not identify the genes that protect against these agents. clusters. Nature Genet. 31, 255–265 (2002). Nature Genet. 30, 13–19 (2002).
Proc. Natl Acad. Sci. USA 99, 8778–8783 (2002). 70. Stuart, J. M., Segal, E., Koller, D. & Kim, S. K. A gene-co- 98. Johnson, J. M. et al. Genome-wide survey of human
41. Steinmetz, L. M. et al. Systematic screen for human disease expression network for global discovery of conserved alternative pre-mRNA splicing with exon junction
genes in yeast. Nature Genet. 31, 400–404 (2002). genetic modules. Science 302, 249–255 (2003). microarrays. Science 302, 2141–2144 (2003).
42. Deutschbauer, A. M., Williams, R. M., Chu, A. M. & 71. Glazier, A. M., Nadeau, J. H. & Aitman, T. J. Finding genes 99. Patil, N. et al. Blocks of limited haplotype diversity revealed
Davis, R. W. Parallel phenotypic analysis of sporulation and that underlie complex traits. Science 298, 2345–2349 (2002). by high-resolution scanning of human chromosome 21.
postgermination growth in Saccharomyces cerevisiae. Proc. 72. Steinmetz, L. M. et al. Dissecting the architecture of a Science 294, 1719–1723 (2001).
Natl Acad. Sci. USA 99, 15530–15535 (2002). quantitative trait locus in yeast. Nature 416, 326–330 (2002). 100. Winzeler, E. A. et al. Direct allelic variation scanning of the
43. Steinmetz, L. M. & Davis, R. W. High-density arrays and The first report of the dissection of a complex trait yeast genome. Science 281, 1194–1197 (1998).
insights into genome function. Biotechnol. Genet. Eng. Rev. from a description of the phenotype to identification 101. Kwok, P. Y. SNP genotyping with fluorescence polarization
17, 109–146 (2000). of the genes published in a single study. Provides detection. Hum. Mutat. 19, 315–323 (2002).
44. Yang, Y. H. & Speed, T. Design issues for cDNA microarray evidence for complex QTL architecture and describes 102. Jurinke, C., van den Boom, D., Cantor, C. R. & Koster, H.
experiments. Nature Rev. Genet. 3, 579–588 (2002). a new functional assay. Automated genotyping using the DNA MassArray
45. Van Eerdewegh, P. et al. Association of the ADAM33 gene 73. Darvasi, A. & Pisante-Shalom, A. Complexities in the genetic technology. Methods Mol. Biol. 187, 179–192 (2002).
with asthma and bronchial hyperresponsiveness. Nature dissection of quantitative trait loci. Trends Genet. 18, 489–491
418, 426–430. (2002). Acknowledgements
(2002).
46. Sklar, P. et al. Association analysis of NOTCH4 loci in We would like to thank L. David and T. Neklesa for helpful com-
74. Christians, J. K. & Keightley, P. D. Genetic architecture:
schizophrenia using family and population-based controls. ments on the manuscript.
dissecting the genetic basis of phenotypic variation. Curr.
Nature Genet. 28, 126–128. (2001). Biol. 12, R415–416. (2002). Competing interests statement
47. Botstein, D. & Risch, N. Discovering genotypes underlying 75. Mackay, T. F. Quantitative trait loci in Drosophila. Nature Rev. The authors declare that they have no competing financial interests.
human phenotypes: past successes for mendelian disease, Genet. 2, 11–20 (2001).
future approaches for complex disease. Nature Genet. 33 76. Freimer, N. & Sabatti, C. The human phenome project.
(Suppl), 228–237 (2003). Nature Genet. 34, 15–21 (2003).
48. Hardenbol, P. et al. Multiplexed genotyping with sequence- 77. Spradling, A. C. et al. The Berkeley Drosophila Genome
Online links
tagged molecular inversion probes. Nature Biotechnol. 21, Project gene disruption project: single P-element insertions
673–678 (2003). mutating 25% of vital Drosophila genes. Genetics 153, DATABASES
49. Faham, M. & Cox, D. R. A novel in vivo method to detect 135–177 (1999). The following terms in this article are linked online to:
DNA sequence variation. Genome Res. 5, 474–482 (1995). 78. Peter, A. et al. Mapping and identification of essential gene Entrez: http://www.ncbi.nlm.nih.gov/Entrez
50. Faham, M., Baharloo, S., Tomitaka, S., DeYoung, J. & functions on the X chromosome of Drosophila. EMBO Rep. LRPPRC | tetR
Freimer, N. B. Mismatch repair detection (MRD): high- 3, 34–38 (2002). OMIM: http://www.ncbi.nlm.nih.gov/Omim
throughput scanning for DNA variations. Hum. Mol. Genet. 79. Zambrowicz, B. P. et al. Disruption and sequence Leigh syndrome
10, 1657–1664 (2001). identification of 2,000 genes in mouse embryonic stem cells.
51. Perez-Iratxeta, C., Bork, P. & Andrade, M. A. Association of Nature 392, 608–611 (1998). FURTHER INFORMATION
genes to genetically inherited diseases using data mining. 80. Martin, E. et al. Identification of 1,088 new transposon Arabidopsis Mutations: http://www.arabidopsis.org/abrc
Nature Genet. 31, 316–319 (2002). insertions of Caenorhabditis elegans: a pilot study toward Human Gene Mutation Database:
52. Mootha, V. K. et al. Identification of a gene causing human large-scale screens. Genetics 162, 521–524 (2002). http://archive.uwcm.ac.uk/uwcm/mg/hgmd0.html
cytochrome c oxidase deficiency by integrative genomics. 81. Gura, T. A silence that speaks volumes. Nature 404, Yeast Deletion Database: http://yeastdeletion.stanford.edu
Proc. Natl Acad. Sci. USA 100, 605–610 (2003). 804–808 (2000). Access to this interactive links box is free online.