Anda di halaman 1dari 12

REVIEWS

MAXIMIZING THE POTENTIAL


OF FUNCTIONAL GENOMICS
Lars M. Steinmetz* and Ronald W. Davis‡
Geneticists have made tremendous progress in understanding the genetic basis
of phenotypes, and genomics promises to bring further insights at a rapid pace. The
progress in functional genomics has been driven primarily by the development of new
techniques that are used in a few dedicated research centres. Focusing on selected
advances in genomic technologies, we assess the results that have been obtained so far,
highlight the challenges faced by these new tools and suggest ways in which they can be
overcome. We argue that progress in functional genomics will depend on developing
high-throughput technologies that can easily be moved away from dedicated centres and
into individual laboratories.

COMPLEX TRAITS Biology is entering an exciting era brought about by the this challenge2, resulting in the need for new concepts
A trait that is determined by increase in genome-wide information. Functional and genome-wide technologies if this complexity is to
many genes, almost always genomics in particular is making rapid progress in be dissected.
interacting with environmental Despite the unresolved issues, the power and poten-
assigning biological meaning to genomic data. The
influences.
tools of functional genomics have enabled several tial of functional genomics is impressive. We illustrate
systematic approaches that can provide the answers this here by discussing three core applications of genome
to a few basic questions for the majority of genes in a technology, using selected examples from different
genome, including when is a gene expressed, where is organisms: genome-wide knock-out, gene expression
its product localized, with which other gene products and genetic mapping studies. We go beyond these exam-
does it interact and what phenotype results if a gene is ples to point out the areas in which technological
mutated. Functional genomics aspires to answer such improvements are possible.
questions systematically for all genes in a genome in As functional approaches and verification of their
contrast to conventional approaches that do so for one accuracy often require genetic manipulation, many
gene at a time. technical advances in functional genomics have their
*European Molecular Several key biological challenges are central to con- origin in model systems. Nonetheless, an effective tran-
Biology Laboratory,
Meyerhofstrasse 1, tinuing genome projects and are relevant to any sition of some of the technologies to humans is becom-
69117 Heidelberg, eukaryotic organism, from yeast to humans. One chal- ing more attractive3. The utility of such a transition can
Germany. lenge is to understand how genes that are encoded in a be maximized by careful evaluation of the power and

Department of Biochemistry genome operate and interact to produce a complex liv- limitation of these approaches.
and Stanford Genome
ing system. A related challenge is to determine the To obtain the most benefits from functional geno-
Technology Center,
Stanford University, function of all the sequence elements in the genome. A mics, we argue, the technology, which is at present
855 California Avenue, third challenge is to understand the contributions of mainly carried out by a few dedicated centres, needs
Palo Alto, California 94304, the multitude of sequence variants to phenotypic vari- to become integrated into individual laboratories.
USA. ation, both within and between species. One of the Individual laboratories often have crucial expertise in
Correspondence to L.M.S.
e-mail: most enduring challenges in genetics has been to find a specific biological problem, and although functional
lars.steinmetz@embl.de the genetic variants that are responsible for COMPLEX genomics might provide approaches to address them, a
1
doi:10.1038/nrg1293 TRAITS . Current methods have mostly failed to meet key discovery can often only be made by bringing the

190 | MARCH 2004 | VOLUME 5 www.nature.com/reviews/genetics


REVIEWS

two together. We believe that for this to be achieved, two sequences takes centre stage3. With this role in mind, we
goals should be met: experiments must be further evaluate three areas of functional genomics that have
miniaturized and costs must be lowered. been piloted in different model systems. We indicate
promising directions of research and suggest new
Technological innovations approaches that need to be designed.
Efforts towards increased miniaturization and decreased
costs are exemplified by developments that originated Interfering with gene function. Phenotypic analysis of
from genome sequencing. In many ways, functional mutants has been a powerful approach for determining
genomics was catalysed by the genome-sequencing pro- gene function. Gene function can be altered through gene
jects: large-scale sequencing and the genome projects deletions, insertional mutagenesis and RNA INTERFERENCE
created an increase in available DNA sequences, around (RNAi) (BOX 1).
which new technologies that use this information were Few methods offer the experimental control that is
developed. A result is one of the most widely recognized afforded by gene deletion. A true knock-out or null
and accessible genomics tools — the DNA microarray mutation achieves complete functional reduction of the
— which allows parallel hybridization assays to be encoded gene product. Because it is difficult to achieve in
carried out on an unprecedented, miniaturized scale. many organisms, compromises have been made by gen-
The second, and often unrecognized, contribution of erating incomplete knock-outs. Gene products can be
the genome projects is the ~1,000-fold decrease in the knocked-down or silenced as a result of point mutagene-
cost of DNA sequencing, which had to be achieved to sis, insertional mutagenesis or RNAi. Although not yet
complete the Human Genome Project. The drop in feasible on a large scale, proteins might be targeted using
sequencing costs facilitated large-scale sequencing pro- drugs7, and it might eventually be possible to use drug
jects of other organisms and has contributed to the fact compounds to generate knock-downs for every gene
that DNA sequencing is still the most frequently used product in a genome and to apply them across species.
technology for detecting DNA variation. Today, the The power of systematic mutant analysis is well illus-
comparison of genomes among several species allows trated by an experiment in which an international con-
the study of numerous biological features, such as studies sortium systematically generated a gene deletion strain
of conserved sequences4–6. for every gene in the yeast Saccharomyces cerevisiae
Developments of genomic technology have until genome and analysed the phenotypes in a single tube
now primarily focused on the generation of genome assay8,9 (FIG. 1). The quantitative fitness measurements
sequence data, from the development of genome- that are obtained for each gene with this tool enable
analysis technologies to the generation of physical and applications beyond determining whether a gene is
genetic maps, the sequencing of model organism essential. This is an important advance because it opens
genomes and the completion of the human genome up a wide variety of applications based on quantitative
sequence. The next focus in genomics builds on the analysis, such as identifying functionally relevant genes
genome sequences and heralds the beginnings of an and drug targets, comparing function and expression,
exciting phase of genome biology — the true genome era, defining candidate disease genes and studying molecular
when deriving functional information from genome evolution (BOX 2).

Box 1 | Comparison of knock-out approaches in model organisms


Targeted deletion by homologous recombination
Precise gene deletion can be readily achieved by homologous recombination in yeast 8,9 and mouse11. Because this
approach removes the targeted gene, functional reduction is complete. In organisms in which it works, this method
is the gold standard. Unfortunately, homologous recombination does not work efficiently in several model
organisms, including Arabidopsis and Caenorhabditis elegans. Although it has been shown to work in some cases,
as seen recently in Drosophila12, the efficiencies are still too low for systematic application.
Insertional mutagenesis
Disruption of gene sequences can be achieved by insertional mutagenesis using transposons or other
insertion sequences. Because the genome insertions are random, screening for disruption in a gene of interest
is required. The insertion can lead to complete, incomplete or no functional reduction, depending on where
the integration occurs. The insertion site and level of functional reduction therefore need to be determined
experimentally. The method has been used extensively in Arabidopsis16 and Drosophila77,78, yeast15, mouse79
and C. elegans 80.
RNA interference
RNA interference (RNAi) is the newest technology for reducing gene expression. It follows reports of gene silencing
RNA INTERFERENCE
(RNAi). A process by which in plants and other model organisms81, and is based on the observation from C. elegans that adding double-stranded
double-stranded RNA silences RNA (dsRNA) to cells often interferes with gene function in a sequence-specific manner17. In most cases, the level of
specifically the expression of functional reduction is incomplete and the level of specificity is not entirely predictable24–26. Nevertheless, RNAi has
homologous genes through been shown to work in many model organisms. Current applications are primarily in C. elegans18, Drosophila19,
degradation of their cognate various plants 81, tissue culture cells of Drosophila 82 and mammals23.
mRNA.

NATURE REVIEWS | GENETICS VOLUME 5 | MARCH 2004 | 1 9 1


REVIEWS

a c
CP UPTAG CP KanMX CP DNTAG CP
Deletion cassette

ORF
Start Stop

Selection

F F
CP UPTAG KanMX DNTAG CP
F F
PCR amplification
F
F F
TAG
TAG TAG
F
F F F
TAG F F
F TAG TAG
F F

Hybridization Selection

Figure 1 | Assaying molecular barcode tags in yeast pools. a | Start-to-stop-codon deletion by double homologous
recombination with deletion cassettes. The 45-base pair (bp) sequences at each end of the PCR-amplified deletion cassettes are
identical to those found upstream and downstream of the targeted gene. On transformation, double homologous recombination
integrates the PCR product into the chromosome, displacing the target gene and generating a precise start-to-stop-codon
deletion. A dominant drug-resistance marker (KanMX) that is part of the PCR product serves to select for the integration event.
To allow pooling of the deletion strains and a parallel analysis, each PCR cassette also contains two unique 20-bp DNA
sequence tags that serve as molecular barcodes to uniquely identify each strain10 (UPTAG, 5′-end (upstream) tag sequence;
DNTAG, 3′-end (downstream) tag sequence). The tag sequences, flanked by two common priming (CP) sites, were designed to
be different to avoid cross-hybridization and to have optimal hybridization properties. b | Competitive growth of deletion pool,
genomic DNA extraction, PCR amplification and array hybridization. Because tag sets are flanked by two CPs, all UPTAGs
or DNTAGs can be amplified in a single PCR reaction from genomic DNA that is isolated from the mixed pool. The tags
can be quantitatively measured by hybridizing the PCR products to a high-density oligonucleotide array that contains the
complementary tag sequences. This microarray allows the signal from each tag to be distinguished. The signal quantity is
proportional to the abundance of strains in the culture. c | Measurement of quantitative deletion phenotypes in pools during an
experimental perturbation. By comparing the change in signal intensity during selection, change in abundance of strains (fitness)
can be calculated for all strains in parallel.

One of the strengths of the systematic deletion analy- A targeted deletion approach is also not without
sis in yeast lies in the fact that deletion strains can be caveats — only annotated open reading frames (ORFs)
pooled for parallel analysis. This is possible owing to the in a genome are generally targeted; the phenotypes of
presence of unique 20-base pair (bp) DNA sequence tags overlapping ORFs cannot be easily distinguished; trans-
in the deletion cassettes that serve as molecular barcodes formation events might have introduced secondary
to uniquely identify each strain10. mutations14; essential genes are inviable as homozygous
Despite their power, genome-wide deletions that are deletions; and functionally redundant genes might not
generated by homologous recombination have yet to be show a detectable phenotype when deleted.
reported in organisms other than yeast. In many cases, Apart from complete knock-outs achieved by gene
such as mouse11, Drosophila12 and sheep13, in which the deletion, two other approaches that are more easily
homologous recombination system has been developed, applied to other organisms — insertional mutagenesis
low targeting efficiencies and practical considerations and RNAi — might generate partial knock-outs. Partial
in maintaining deletion collections are important knock-outs are not as suited to interpreting fitness
concerns. effects, because the molecular nature of the mutation, its

192 | MARCH 2004 | VOLUME 5 www.nature.com/reviews/genetics


REVIEWS

Box 2 | Systematic mutant analysis in yeast


Knock-out analysis of the yeast genome is further along than that of any other organism, both in terms of the percentage
of the genome that has been deleted and the ease with which the quantitative phenotype of all the mutants can be
analysed. Its applications therefore serve to illustrate the power of genome-wide mutational analysis.
Identifying functionally relevant genes
By assessing quantitative reductions in fitness, a single study42 of yeast-homozygous diploid deletions doubled the
number of genes that are functionally implicated in sporulation, even though yeast sporulation has been studied
genetically for more than 30 years83. This study exemplifies the general finding that for every pathway that has been
studied so far using yeast deletions — including those in which SATURATION MUTAGENESIS has been reached — new genes
have been uncovered using systematic deletions.
Comparing function and expression
Intriguingly, comparing results from deletion and mRNA-expression studies showed that there is little overlap in the genes
that the two approaches identify. For sporulation data, only 16% of genes with changes in expression showed a significant
defect in sporulation when deleted42; for growth on non-fermentable carbon sources, the overlap was 7% (REF. 41); for
growth in galactose, high pH, high salt and sorbitol, the overlap was less than 7% (REF. 8), and in the case of DNA-
damaging agents, the number of genes with a fitness defect that showed differential expression in response to DNA
damage was no larger than expected by chance40. These consistent findings of low overlap indicated that it is not the result
of measurement error but that it is biologically significant, which means that change in gene expression might be less
functionally relevant than commonly anticipated.
Identifying drug targets
In an application of systematic deletions to define drug targets, heterozygous diploid strains were used for drug-induced
84,85
HAPLOINSUFFICIENCY profiling . The assay — in which sub-lethal concentrations of a drug reduce the fitness of strains
that carry heterozygous deletions of the drug target — was exploited genome-wide as a tool for drug target identification
and target-specificity evaluation.
Defining candidate disease genes
Deletion strains have specifically been used to study biological processes that have been conserved between yeast and
humans. A total of 466 genes in which deletions impaired mitochondrial respiration were identified and aligned against
the human genome to yield new mitochondrial candidate genes41. The candidates are of value in studies of human
mitochondrial disorders in which genomic intervals have been mapped but no responsible gene has been identified so far.
Recently, one gene (LRPPRC) was implicated in a human mitochondrial disorder, for which the yeast orthologue had
such a characteristic deletion phenotype52.
Studying molecular evolution
The quantitative knock-out analysis has catalysed new developments in molecular evolution. Using published genome-
wide fitness data in yeast, it was shown that highly conserved proteins tend to have larger fitness effects when deleted86, that
protein members of an interaction network tend to evolve at slower, yet similar, rates87, that duplicated genes tend to have
no, or weak, fitness effects, presumably as a result of functional redundancy between paralogoues88, and that heterozygous
deletions of protein subunits of protein–protein complexes result in a STOICHIOMETRIC imbalance that is often deleterious89.
SATURATION MUTAGENESIS
A mutagenesis screen that has
reached a stage of saturation in
effect on functional reduction and the degree of sec- into ORFs can lead to complete or partial reduction
which additional mutagenesis ondary effects are often uncertain. Nevertheless, in some depending on exact location, whereas insertions into
does not seem to recover cases — for example, when a partial reduction in the intergenic sequences often have no effect. The site of
mutations in new genes. function of essential genes is of interest — these integration and degree of functional reduction therefore
approaches can be beneficial. need to be determined for each mutant of interest.
HAPLOINSUFFICIENCY
A gene dosage effect that occurs One approach to insertional mutagenesis in yeast A more directed approach uses RNAi to silence
when a diploid requires both involves transposons that have been designed to measure the function of a target gene17. Although RNAi has
functional copies of a gene for a mutant phenotypes, gene expression and protein local- not been reported in S. cerevisiae, it is applicable in a
wild-type phenotype. An ization, all from the same collection of mutants15. In wide variety of model systems, including worms18,
organism that is heterozygous
for a haploinsufficient locus does
Arabidopsis thaliana, 88,000 Agrobacterium TRANSFERRED- flies19, zebrafish20, mouse21, plants22 and human cul-
not have a wild-type phenotype. DNA (T-DNA) insertion lines — in which more than tured cells23. The primary advantages of RNAi are the
20,000 predicted genes have been mutated — have been ease of generating short double-stranded RNAs
STOICHIOMETRIC generated, providing a powerful resource for future (dsRNAs) that mediate RNAi and the flexibility of inhi-
The molar ratio of interacting
mutational analysis16. bition: the user can spatially and temporally control the
molecules.
The main disadvantages of insertional mutagenesis interference reaction. Furthermore, it has been applied
TRANSFERRED DNA in comparison to a targeted gene deletion approach are genome-wide18 by feeding worms Escherichia coli that
(T DNA). The segment of DNA that full genome saturation is difficult and that some contain dsRNA-producing plasmids. The disadvantage
in the Ti plasmid of regions in the genome are not susceptible to insertion, in systematic genome-wide application, however, is that
Agrobacterium tumefaciens that
is transferred to plant cells and
whereas other regions are hot spots. Furthermore, the the level of functional reduction is unpredictable and
inserted into the chromosomes degree of functional reduction for a desired gene difficult to measure experimentally. A study showed that
of the plant. depends on the location of the integration. Integrations small interfering RNAs (siRNAs) that target different

NATURE REVIEWS | GENETICS VOLUME 5 | MARCH 2004 | 1 9 3


REVIEWS

parts of the same gene achieve different levels of reduc- measurement of gene expression is closely related to the
tion and secondary off-target effects24. Two studies have biological goal, more assumptions are required to infer
further shown that siRNAs mediate an interferon function from expression differences in the third type
response in mammalian cells as a secondary effect25,26. of study.
Caution must therefore be exercised before attributing a Perhaps the most powerful application of gene-
particular response to the targeted gene. Nevertheless, it expression data is its use as a signature profile, which
is hoped that these issues can be addressed, as RNA can be used as a detailed molecular phenotype. Although
silencing methodologies are likely to be applied genome- the identity of the individual genes that change expres-
wide in many organisms owing to their ease of use. sion is, in the first case, irrelevant, they must have
accurate gene assignments (BOX 3). The profile across
Gene-expression profiling. Gene-expression profiling is thousands of genes gives a distinctive pattern for a sam-
the most widely used functional genomic technology ple. Such patterns have been used in model organisms
today, in part because it was one of the earliest to be to classify genetic mutants by similarities in profile27 and
developed and in part because it achieves high through- to evaluate secondary effects of drug treatments28.
put in a single tube assay. However, the most powerful applications of signature
Applications of gene-expression studies can be profiling are likely to be for characterizing human disease
divided into three principal areas: those that generate populations (see below).
signature profiles, those that study transcription and its Another application of gene expression is to study
regulation and those that determine the function of mRNA transcription and its regulation. Gene-expression
gene products. Each is associated with different biologi- data can be used to search for regulatory sequences by
cal concerns. Although in the first two applications the looking for common sequence elements in upstream
regions of genes that have similar expression profiles29–31.
To find target genes of specific transcription factors,
genes that are under expressed as a result of transcrip-
Box 3 | Technical concerns about microarrays tion-factor deletions or overexpressed as a result of tran-
Microarray analysis can detect the presence of individual target sequences in complex scription-factor gain-of-function mutations can be
mixtures because hybridization specificity is exploited. Nevertheless, important identified32. A complementary approach identifies the
technical concerns are associated with different types of microarray platform. Two types DNA that is bound to transcription factors by co-
of high-density microarray platform are most widely used and can be distinguished by immunoprecipitation33. Integration of the analysis of
the method of probe placement onto the array surface. In the first type of microarray, DNA that is bound to transcription factors with co-
oligonucleotide probes are synthesized directly on the glass surface90,91. expression of genes has shown a complex network of
Photolithography and photosensitive oligonucleotide synthesis chemistry90 are the most interactions that explains the difficulty of distinguish-
common processes used and achieve a density that is greater than 1,000,000 spots/cm2. ing primary from secondary effects in the analysis of
In the second type of microarray platform, represented mainly by the cDNA transcription-factor mutations33. Together, these data
microarray92, probe synthesis is separated from array manufacture92–94. On printed help to formulate hypotheses about transcription factors
microarrays, probes are synthesized and then mechanically spotted or printed in an and their targets. The approaches can be complemented
arrayed format, with a density of ~10,000 spots/cm2. by knock-out analysis of predicted pathway members,
The method of array manufacture places different demands on quality control. Probes with the aim of dissecting the regulatory circuitry and
on synthesized microarrays are short and are synthesized directly on the glass surface. A signalling pathways that regulate transcription.
sequence database directs the synthesis of probes. Because synthesis is highly
The application that is at present least understood is
reproducible, absolute levels of expression can be estimated, and because the probes are
the use of gene expression to determine gene-product
short, SNPs can be detected. Probes can be specifically designed to distinguish splice
function. Most mRNAs are not functional themselves;
variants and members of gene families. Nevertheless, the short probes require accurate
databases, as errors in sequence databases translate to errors in probe sequence on the they are intermediates and transmitters of information
arrays. However, as sequences in databases are updated and sequence quality improves, from the genome to the proteome. Inferring function
the chance for errors decreases. Importantly, because there can be multiple probes to from mRNA levels is indirect and rests on several
each gene, the array designs provide internal redundancy. assumptions. One is that evolutionary selection has
For printed microarrays, probe generation is separated from array synthesis; therefore, been tight enough to ensure that mRNAs are pro-
storage and tracking of probe samples is crucial. Errors occur when probes are spotted duced and present only when the corresponding gene
from microtitre plates that are either wrong, out of order, in the wrong orientation or products are needed. This assumes that no regulatory
contaminated. One study reported that only 62% of DNA samples from plates used for changes occur at the protein level that are not
microarray spotting were of the correct identity 95. Such errors would be further amplified reflected by changes in amounts of mRNA. A second
by errors during and after printing 96. Printing results in variations in spot size and probe assumption is that transcription of one gene is inde-
concentrations, and is at present addressed by co-hybridization of a reference to the pendent of another and that there is no competition
same array. The choice of reference is crucial for data interpretation as it affects the ratio for resources.
calculation, and the use of different reference samples by different laboratories makes These assumptions are questionable. A cell functions
comparisons between experiments performed by different laboratories difficult44. as a unit and therefore transcription-factor availability
Another problem is with distinguishing splice variants and members of gene families and protein concentrations probably depend on their
— the long sequences on cDNA microarrays allow cross-hybridization. This is an use at other parts of the cell34. It is also evident that regu-
important limitation because, to some extent, the molecular complexity in humans latory changes occur during mRNA translation, post-
arises from differential splicing 97. Array designs that can distinguish between splice translational modification and protein degradation. In
variants need to be implemented98.
fact, experiments show that the correlation between

194 | MARCH 2004 | VOLUME 5 www.nature.com/reviews/genetics


REVIEWS

mRNA changes and changes in protein level in a cell is challenge is to develop methods that can accurately
weak35–37. Although a genome-wide comparison is still genotype thousands of markers when only a small
missing — mainly as a result of the technical difficulty amount of sample is available and at low costs. One
of accurately measuring protein levels under multiple technology uses molecular barcodes to genotype
conditions for many proteins38,39 — the trend is appar- directly from genomic DNA48. It uses the same concept
ent: one cannot assume that there is a correlation at the of multiplexing as the yeast deletion project: the signal is
individual gene level. In addition, the poor correlation amplified using common primer sequences after the
between changes at the gene-expression level and fit- genotyping reactions have occurred (BOX 4).
ness effects of gene products8,9,40–42 (BOX 2) indicates that Even with the ability to identify map intervals, the
there are many more genes that change expression causative variants remain to be identified. In cases in
between any two experimental conditions in yeast than which intervals have been identified, several approaches
there are genes that affect phenotype by gene deletion have been taken to locate the underlying genes. A sys-
during the same transition. In addition, they point out tematic approach involves identifying all sequence vari-
that most genes with a deletion phenotype in yeast do ants in these intervals. This approach is challenging
not show an expression change when measured for the because the sequencing of entire intervals is, in most
same perturbations. cases, impractical given the large sizes and, often, limited
There are two types of issue when we interpret DNA samples. Furthermore, mutations that are present
expression differences: if gene expression is changed or in a heterozygous state are difficult to identify. A tech-
remains unchanged between two conditions, how reli- nology that uses the mismatch-repair system of bacteria
able is this result and what is its biological significance? holds promise for focusing attention on cloned frag-
Whether the result is reliable depends to a large extent ments that contain a polymorphism49,50 (BOX 5). The
on the technology that is used to measure the expression approach can be combined with molecular barcodes for
difference and on the accuracy of the experimental exe- high-throughput mutation scanning.
cution; these concerns exemplify technical challenges Rather than scanning an interval for all polymor-
that are associated with high-throughput approaches. In phisms, candidate genes can be prioritized. Such
the case of gene-expression profiling, they indicate that approaches can be applied either to genes in mapped
more rigid quality control and careful experimental intervals or to candidate genes genome-wide, in the
design are needed43,44 (BOX 3). This technical concern is absence of positional information. One study has used
independent of the difficulty that is associated with the data mining of the published literature to establish a
functional interpretation of expression changes because connection between disease terms and functional terms
there are difficulties even when considering the most dif- associated with genes51. The approach scores a candidate
ferentially expressed genes for which different platforms on the basis of how frequently it can be connected to the
often agree. disease description by terms that co-appear in the litera-
ture (using PubMed). Another approach is to use func-
Genetic mapping of quantitative trait loci. A third prin- tional genomic data from model organisms to rank
cipal area of high-throughput genomics is the identifi- human candidates according to functional information
cation of the genetic factors that underlie complex traits. about their orthologues41,52.
Developments in this field can be divided into three Gene-expression profiling has also been proposed as
areas that correspond to the process of dissecting the a general method for identifying candidate genes53. One
genetic basis of complex traits: defining genetic inter- study integrated a disease interval with experimental
vals, identifying candidate genes and verifying an allele’s data of expression and proteomics to identify a muta-
contribution to the phenotype. tion in patients with Leigh syndrome52. Another study
Genetic mapping has been successful in finding successfully found a quantitative trait locus (QTL) gene
important genes for some complex diseases such as on the basis of the absence of mutant-gene expression
asthma45. However, in most cases, genetic mapping has in hypertensive rats54. In the latter study, gene expression
been problematic1, as evidenced by reports that previ- was decreased as a result of partial deletion of the ORF.
ously mapped intervals failed to withstand significance However, for traits that are not caused by deletions that
tests in subsequent studies46. result in expression differences or mutations that affect
A few recently developed high-throughput genotyp- mRNA levels, expression might not identify causative
ing technologies show promise for testing the feasibility of variants.
high-resolution association studies. The rationale in these A recent review47 assessed the types of mutation
studies is to test thousands of polymorphisms in carefully found in Mendelian diseases and enables inferences
selected populations to identify sequence variants that about possible implications for disease-gene identifica-
are more frequent in individuals with one phenotype tion approaches in complex traits. Surprisingly, regula-
than in individuals with other phenotypes. The tech- tory changes are found in fewer than 1% of the more
nologies can achieve throughputs of 10,000’s of markers than 1,400 known Mendelian disease genes (see Human
(BOX 4). Unfortunately, genotyping the 50,000–1,000,000 Gene Mutation Database in online links box). The vast
markers that have been proposed as the minimum majority of mutations are missense or nonsense changes
needed by statistical predictions47 still represents a in the ORFs (58%), insertions or deletions (30%) and
tremendous challenge, especially because the markers splicing variants (10%). If similar patterns are true for
need to be genotyped in large numbers of samples. The complex traits, it might be expected that genes identified

NATURE REVIEWS | GENETICS VOLUME 5 | MARCH 2004 | 1 9 5


REVIEWS

Box 4 | High-throughput concepts for genotyping


DNA hybridization to a microarray
PCR-amplified DNA99, or genomic DNA in the case of small genomes100, that contains the polymorphisms to be genotyped
is hybridized to an oligonucleotide array in one assay. If a target has a perfect match to a probe, a stronger hybridization
signal is obtained than if it has a mismatch. High throughput is achieved through a high density of probes on arrays.
Fluorescence polarization
In one version of fluorescence polarization genotyping101, the genomic DNA that contains a polymorphism to be assayed
is amplified by PCR. A primer is designed to end one base before the polymorphic position. The primer is extended in a
reaction with two different fluorescent dideoxy-nucleotides that are specific to the alleles of the SNP. The amount of each
fluorescent base that is incorporated can be determined by scanning the plate at two wavelengths. Fluorescent bases
incorporated into the extended primer rotate in solution more slowly than free-floating (unincorporated) fluorescent
nucleotides, leading to fluorescence polarization. This method can be carried out in multi-well plates, is economical and is
simple to use.
Mass spectrometry
Genotyping can be achieved by analysing primer extension products by mass spectrometry102. The increased mass of an
extended primer is detected by a shift in mass peak. The speed, throughput and accuracy of mass spectrometry allow for
fast and accurate genotype assignments.
Molecular barcodes
In a method termed ‘molecular inversion probes’, genotyping can be performed directly on genomic DNA and thousands
of SNPs can be analysed in one reaction by taking advantage of the multiplexing capability of DNA barcodes48. As
illustrated in the figure, for each SNP, a probe that contains a unique DNA barcode is designed to have two ends that are
complementary to the bases that flank the polymorphic site. The SNP base is filled in with a single base extension and
ligation closes the gap to form a circular piece of DNA (exonuclease digests away unreacted linear probes). The barcodes
from reacted probes are
detected by hybridization to Anneal Gap fill-polymerization Gap fill-ligation
a high-density array that Barcode tag
contains the tag complements.
A high level of multiplexing can Genomic Probe
be achieved by performing the CGGAGATGGCCCA CGGAGATGGCCCA CGGAGATGGCCCA
genotyping reactions first and GCCTCT CCGGGT GCCTCTACCGGGT GCCTCTACCGGGT
then amplifying the signal by
PCR. Figure modified with Exonuclease selection Probe release Amplification Hybridization
permission from REF. 48 ©
(2003) Macmillan Magazines
Ltd.

by altered expression will primarily be modulators of reduction of multiple parallel high-throughput assays
the phenotype or will be involved in secondary effects back into a single tube. These development cycles allow
and not the causative variants themselves. unprecedented exponential increases in throughput
and decreases in cost that would define a paradigm
Meeting the challenges shift in biological interrogation.
The future of new genomic technologies is crucially Molecular barcoding is among the most promising
linked with bringing them into the laboratory of an technologies for miniaturizing high-throughput
individual investigator because it is there that the key approaches into single tube assays. Through a combi-
biological expertise that is required to make a biologi- nation of molecular barcodes and the microarray-
cal breakthrough lies. Therefore, the technologies with based detection system, the yeast-deletion approach has
most promise are those that allow high-throughput achieved an unparalleled level of throughput, and its
biology to be carried out in an individual laboratory, many applications are a testament to its power (BOX 2).
instead of genome centres and well-equipped and A molecular barcode assay might simply consist of
well-financed institutes. For this purpose, the app- attaching tags to the biological entities that are to be
roaches that achieve high throughput with one interrogated (for example, knock-out strains), collect-
method performed many thousands of times in paral- ing mixed tag pools after selection, performing a single
lel should be distinguished from those that achieve PCR reaction, labelling it and hybridizing it to a
high throughput in a single tube assay (FIG. 2). The lat- microarray to obtain the relative abundance of all bio-
ter approaches not only allow the transfer of high logical entities in the selected pool. The primary advan-
throughput to individual laboratories but also allow a tage of the barcode concept is that it is versatile and can
further scale up, by several orders of magnitude, be applied to other knock-out approaches, such as
through the use of robotics or high-throughput equip- insertional mutagenesis and RNAi, as well as to other
ment. This advance also enables the next cycle — the biological assays (see below).

196 | MARCH 2004 | VOLUME 5 www.nature.com/reviews/genetics


REVIEWS

Interfering with gene function. Although the deletion are without errors. DNA-primer synthesis might
approach for yeast is high throughput, its power can be have introduced mutations in some of the barcode
improved by complementing the assay with further sequences, which would affect the hybridization signal
approaches; for example, temporally induced knock- on microarrays. The presence of two molecular bar-
outs using repressible promoters55 or targeting pro- codes in each mutant makes it unlikely that both con-
teins for proteolysis56 for the study of essential genes tain mutations; in addition, the comparative analysis of
that are inviable as homozygous knock-outs. Further- samples before and after selection controls for such mis-
more, to uncover molecular differences among mutants takes. Nevertheless, barcodes are being sequenced and
when no growth differences are apparent, the fitness the hybridization effect should be corrected by remaking
data of knock-outs could be complemented with the strains or by introducing the corresponding changes
metabolite profiling57,58 and expression profiling of dele- into the microarray probes.
tion strains27. Finally, the systematic construction of To advance the insertional mutagenesis approach,
double deletions that has been initiated to assess syn- barcodes could, in theory, also be incorporated into each
thetic lethality59,60 could help to characterize functionally insertion sequence, enabling a pooled knock-out analy-
redundant genes. sis. Such pooled approaches using transposons have
To ensure that molecular barcodes are associated been carried out in bacteria61,62. Technically, the RNAi
with the right ORFs, Shoemaker et al.10 incorporated the approach might also be aided by the use of molecular
molecular barcode tags into the primers that contain the barcodes, which can in theory be integrated into the
target-specific homologous sequences, making it plasmids or the genome-integration cassettes that
unlikely that one molecular barcode sequence will express dsRNA molecules. Such an advance might sim-
associate with a deletion of a different ORF. It does plify mutant tracking and enable pooled analysis in
not, however, guarantee that all barcode sequences organisms for which this is practical.

Box 5 | Identifying polymorphisms by mismatch-repair detection


Mismatch-repair detection uses the repair Linear vector Pool of single-stranded
Pool of PCR (5-bp CRE deletion, standards cloned in a vector
system of bacteria to separate DNA fragments products methylated) (full length CRE, unmethylated)
that contain a polymorphism from those that do
not50. A pool of PCR products to be tested is
+ +
mixed with linear vector DNA. The vector DNA
is methylated and contains a 5-base pair (bp)
deletion of an incorporated CRE gene. A pool of Heteroduplex, Taq ligase + endonuclease
single-stranded, unmethylated control DNAs
(standards) is added to this mixture, which
consists of, for example, PCR products that are
amplified from a different individual and cloned Transformation
into a vector that is identical to the first but
contains the full-length Cre gene (no 5-bp strS
deletion). On denaturing and reannealing, the tetR

mixture forms a heteroduplex vector that lox


CRE contains the methylated tester DNA and the Cre No variation lox Variation
Cre encodes a site-specific deletion on one strand and the unmethylated
recombinase that recognizes and
control DNA and the full-length Cre gene on the
binds to specific sites called lox. tetR
Two lox sites recombine at nearly
other strand. After digesting away unligated strS strS
tetR
100% efficiency in the presence fragments, the heteroduplex molecules are
of Cre, allowing DNA that is transformed into an Escherichia coli strain that CRE
cloned between two such sites to carries on its EPISOME two LOX SITES that flank
be removed by Cre-mediated tetracycline-resistance (tetR) and streptomycin-
recombination.
sensitive (strS) genes. If there are no
EPISOME
polymorphisms in the test fragment relative to
An independent DNA element, the standard, no repair occurs (5 bp or more lox strS
such as a plasmid, that can insertions/deletion polymorphisms, as in the tetR
replicate extrachromosomally Cre gene, do not trigger the mismatch-repair
or that can be maintained by system). The heteroduplex replicates and
integrating into the genome
generates a plasmid with an active Cre gene,
of the host.
which leads in turn to recombination of the tet sensitive tet resistant
LOX SITE tetR/strS/lox cassette. This event renders the cell str resistant str sensitive
A site to which Cre recombinase tetracycline sensitive and streptomycin resistant
binds to mediate recombination, (the strain contains a streptomycin-resistance gene on its chromosome). If a polymorphism is present, repair occurs and
allowing DNA that is cloned extends into the Cre gene. Repair yields an inactive copy of Cre, leaving the cell tetracycline resistant and streptomycin
between two such sites to
sensitive. By growth on the respective selectable media conditions, variant and invariant pools can be separated.
be removed.

NATURE REVIEWS | GENETICS VOLUME 5 | MARCH 2004 | 1 9 7


REVIEWS

Gene-expression profiling. Evaluating gene expression


1
today shows that, although very popular, the data and
the technology might be among the least understood.
The overlap between deletion phenotype and gene
A1 expression in yeast, for example, questions the functional
Amplify using relevance of expression changes.
robotics Applications that do not require an explanation for
the functional relevance of an expression change might
avoid this potential pitfall. One application is in medi-
cine, and might yield a true demonstration of the power
2 of expression analysis. By classifying disease samples
into groups63–65, signature profiling has immediate
promise for disease diagnosis and personalized medi-
cine. A comparison of expression profiles between
A1 A2 A3 Ai patients and controls has identified groups of genes that
Miniaturize with can be used as predictors of treatment outcome and
a new concept survival rates66. These predictors might be comple-
mented in the future with genetic predictors that are
identified at the DNA sequence level. In fact, it is possi-
ble that many biological assays, including identification
3
of sequence variants, can be carried out on arrays so that
millions of individual measurements can be made on an
individual, providing excellent resolution for disease
A1– i diagnosis by pattern recognition. Meanwhile, in the
Amplify using short-term, while the challenge is to classify the dis-
robotics ease samples rather than to identify the genetic causes,
expression analysis is already powerful on its own.
Furthermore, if, as has been suggested, gene expression
might reflect the environmental past to which a patient
4
has been exposed43, expression profiling of patients could
yield further useful information. In both cases, the expres-
sion signatures are easy to score and provide different data
for diagnosis from the data that are conventionally avail-
A 1– i B 1– i C 1– i Z 1– i
able through measures of clinical parameters and from a
Miniaturize patient’s description of symptoms.
again with a
new concept How can the mRNA-expression changes be inter-
preted to obtain functional information? It is likely that
understanding the functional relevance of the changes in
gene expression will require understanding the complex
5 networks that regulate expression, which in turn requires
understanding the regulation of transcription factors
and the function of the sequences to which they bind.
A1–i
Repeat Therefore, integrating expression profiles with binding
B1–i
sites of transcription factors and knock-out phenotypes
C1–i of pathway members might help with understanding
the significance of expression differences33.
It is also likely that inferring function from expression
Z1–i
change is most powerful when analyses are combined
Figure 2 | High-throughput development cycles. High throughput can be achieved by en masse across many conditions to obtain a molecular
amplifying the number of experiments using robotics and automation (transition from stage 1 readout of the regulatory patterns of a gene. Similar to
to stage 2). However, a true advancement does not occur until a new concept or approach
using expression profiles as signature patterns for differ-
allows high throughput to be achieved in a single experiment, illustrated here with a single tube
(stage 3). This development allows a further increase in throughput with multiple parallel high-
ent samples, this application uses signature patterns
throughput reactions (stage 4), which can in turn be reduced back to a single assay, completing obtained for each gene to group genes on the basis of
the next cycle (stage 5). An example for stage 2 might be performing individual Northern blots for similarity67,68. This application seems especially promis-
all genes in the yeast genome; for stage 3, it might be measuring genome-wide gene expression ing for studies of transcriptional regulation29–31,33, and a
in parallel using a microarray. Stage 5 might involve combining different whole-genome better view of the regulatory network that controls tran-
measurements, such as gene expression, genotyping, knock-out phenotypes, protein scription will help us to understand why so many genes
interactions, protein levels, splicing, and so on, in a single assay. Extending this approach to
include a dedicated unique molecular barcode for each element in each assay might make it
change expression in response to environmental pertur-
possible to make all measurements with a single high-density microarray hybridization. Further bations. Analysis across many conditions might also
developments could then allow these assays to be run not just for one strain, but for many help to define functional groupings30 and to assign
strains or individuals in parallel. probability-scored functional assignments to unknowns

198 | MARCH 2004 | VOLUME 5 www.nature.com/reviews/genetics


REVIEWS

that have been assigned to groups69. Nonetheless, the a


co-expression of genes, even across thousands of condi- A1 A2
tions, is in many cases not enough to make a correct
functional association. Studies have begun to integrate B1 B2 B1 B2
A1 vs A2
other parameters, such as evolution70, to filter out gene
C1 C2 C1 C2
interactions that are not functionally relevant.
As the data sets for such global analyses are often
not collected by the same experimenter, these analyses A1 A2 A1 A2
still raise issues of data reliability. For example, print-
ing mistakes (BOX 3) that result in a partial randomiza- B1 B2
B1 vs B2
tion of spots on cDNA microarrays introduce errors.
C1 C2 C1 C2
Historically, errors were detected by repeating experi-
ments, and microarray experiments of expression pat-
terns under many conditions might also benefit from A1 A2 A1 A2
being repeated. Raw data need to be made available to
cross-validate experiments (for cDNA microarrays, that B1 B2 B1 B2
C1 vs C2
is the data before ratio measurements are calculated); it C1 C2
is especially important when gene expression is used as a
diagnostic indicator that the technical and biological Genetic Genetic Genetic Genetic
background background background background
error rates of microarray data are addressed. Consistent 1 2 1 2
with the diverse applications of expression data, the b
~6,000 ~6,000
concerns indicate that our understanding of the biologi-
cal significance of gene expression is still rudimentary
and can be much improved by insights from more
rigorous studies.

Genetic mapping of QTLs. New methods that allow the A1 – Z1 A2 – Z2


rapid discovery, scoring and functional analysis of c
genetic variation could revolutionize QTL analysis.
A1 A2 Deletion
Verifying and measuring the effect of an allele on the effect
phenotype is an area in complex trait dissection that B1 Allelic switching B2
requires increased attention. Sequence-based approaches effect
C1 C2
might identify polymorphisms but in practice they can-
not distinguish between neutral polymorphisms and Figure 3 | Allelic switching with reciprocal hemizygosity.
causative mutations. Candidate gene approaches are a | All pairwise allelic comparisons are performed between two
based on previous knowledge and cannot identify genomes in a heterozygous hybrid strain background. Two
strains are compared that differ in the allele of only one gene
mutations in previously unknown genes. The most sys-
and are heterozygous diploid for the rest of the genome. As
tematic approaches would combine identification of illustrated, allelic comparisons are achieved using reciprocal
alleles with proof of their contribution to the pheno- hemizygous deletions that disrupt a single allele, although, in
type. In model organisms, genes have often been identi- theory, a similar comparison can be carried out with allele
fied and their role confirmed by COMPLEMENTATION71. Yet, replacements or disruptions of combinations of alleles. b | With
in most cases, it is impractical to test every gene in an molecular barcodes, the comparisons for an entire genome
interval by complementation. To overcome this problem, can be performed in pools. c | By hybridizing barcodes to
arrays, each experiment promises to identify the allelic variants
COMPLEMENTATION a promising functional tool of reciprocal HEMIZYGOSITY was
One example is the use of partial that contribute differentially to the phenotype under the tested
developed in yeast72. The technology can systematically conditions. In the example shown, the strain that carries the B1
diploids to determine whether
two mutations affect the same or identify a QTL allele by determining the effects of a sin- allele yields a greater hybridization signal than the reciprocal
different genes. If the mutations gle allele in an otherwise uniform genetic background. strain that carries the B2 allele, indicating that the B1 allele
are in the same gene, they The assay is based on a phenotypic comparison between confers a fitness advantage over the B2 allele.
generally fail to complement two strains in which a single allele is switched in a het-
each other and the diploid
retains the mutant phenotype.
erozygous diploid strain. The method can be adapted
By contrast, mutations in for the whole genome — by uniquely tagging each tested in this study but failed to identify the QTL alleles.
different genes usually strain with molecular barcode tags, all reciprocal hem- The tight linkage between three QTL alleles with different
complement one another and izygous strains could be grown as a pool. With this levels of contribution indicated that linkage mapping —
restore a wild-type phenotype to
approach, it might be possible to circumvent linkage the method of choice for single-gene Mendelian traits —
the diploid. However, exceptions
to both cases abound. mapping and analyse all allelic variants between two is intrinsically deficient when applied to quantitative
genomes in a single step (FIG. 3). traits. Although narrowing a map interval in the hope of
HEMIZYGOUS The dissection of the quantitative trait of high- approaching a single point might locate the main con-
A diploid genotype that has only temperature growth in yeast using reciprocal hemizygos- tributor, neighbouring genetic factors could be missed.
one copy of a particular gene, as
in X-chromosome genes in a
ity identified three genes that are located in a single QTL Contributors might also go undetected when their effect
male, or when the homologous interval72. The conventional approaches of sequence, is small or when they are located in trans, opposite
chromosome carries a deletion. expression and marker-trait association analyses were another QTL allele.

NATURE REVIEWS | GENETICS VOLUME 5 | MARCH 2004 | 1 9 9


REVIEWS

EPISTASIS To speed the dissection of quantitative traits, new Technological improvements need to maximize the
An interaction between non- tools, such as reciprocal hemizygosity, genome-wide potential offered by functional genomics. False predic-
allelic genes, such that one gene marker mapping and mutation-scanning methods, need tion rates of high-throughput approaches need to be
masks, interferes with or
to be more broadly applied73,74. The ideal approaches eliminated, particularly as we advance towards applying
enhances the effect of the other
gene. should be applicable to different genetic backgrounds, as functional genomics in the clinical setting for diagnostics
the alleles identified in one background might not and personalized medicine. Improvements in data analy-
necessarily have the same effects in another background sis, comparison and the integration of approaches that
or under different environmental conditions75. Further- combine different measurements for a genome need to
more, allele frequencies are likely to range from common be improved to generate a global picture and to move
to rare72, highlighting the demand for approaches that towards generating refined models of cellular processes.
make no prior assumptions about allele frequency in Importantly, to accelerate discoveries, high-through-
populations. Alleles are also likely to have additive and put biology needs to be brought into individual laborato-
EPISTATIC effects, requiring tools that can analyse single ries because that is where the biological expertise lies. The
alleles and higher-order combinations of alleles. use of molecular barcodes is one example of an innova-
Classification of phenotypes also deserves attention76. tion that provides increased throughput through minia-
In cases in which samples do not come from crosses but turization, accompanied by reduced costs. Incorporating
are sampled from populations, phenotypic heterogene- such technology into other assays would provide the
ity could have detrimental effects on the ability to detect advance in efficiency that makes it possible to imagine
association between a marker and disease. Phenotypic high-throughput experimentation as an integral part of
classification might be aided by molecular phenotypes any individual investigator’s laboratory. It should also be
based on expression profiles, signature patterns and possible to combine different whole-genome measure-
other molecular parameters. By integrating genetics ments, such as gene expression, genotyping, knock-out
with high-throughput tools of functional genomics in analysis, protein interactions, protein levels and splicing
this way, it might be possible to advance the dissection into a single assay — a further step towards miniaturiza-
of complex traits and gain deeper insights into the tion — as proposed in FIG. 2. These developments would
interaction of genes and the environment. in turn maximize biological discovery and take functional
genomics to the next level. Similar approaches would also
Conclusions promise to revolutionize medical diagnostics for the
Functional genomics has changed the way biology is delivery of better health care and lower health care costs.
done. And yet the field is still in its infancy in terms of Although we believe that high-throughput experi-
detailing the complexity that underlies biological sys- mentation will become more widespread, we expect
tems, such as the complex network of genetic regula- single-gene studies to remain essential. They continue
tion, protein interactions and biochemical reactions that to help to verify and evaluate high-throughput data
make up a cell. As the applications illustrate, systematic sets; high-throughput technologies are often not sensi-
knock-out analyses in multiple model organisms will aid tive enough to obtain accurate information for every
the definition of different cellular components and help gene in a genome and, in some cases, detailed analysis
to describe the conservation of biological processes dur- of individual genes is needed to establish a link between
ing evolution. Detailed analysis of mRNA regulation will the global picture and individual discoveries that
help to clarify the functional significance of gene expres- involve few sets of genes. Nevertheless, the integration
sion. Furthermore, the discovery of examples for the of functional genomics into individual laboratories will
genetic basis of complex traits in model organisms will provide the most accurate test of the power brought
help to formulate more rigorous hypotheses for what to about by integrating single-gene studies with high-
expect in higher organisms and promises to provide throughput approaches for faster and more efficient
better approaches for dissecting human disorders. biological discovery.

1. Risch, N. J. Searching for genetic determinants in the new 9. Winzeler, E. A. et al. Functional characterization of the 15. Ross-Macdonald, P. et al. Large-scale analysis of the yeast
millennium. Nature 405, 847–856 (2000). S. cerevisiae genome by gene deletion and parallel analysis. genome by transposon tagging and gene disruption. Nature
2. Flint, J. & Mott, R. Finding the molecular basis of quantitative Science 285, 901–906 (1999). 402, 413–418 (1999).
traits: successes and pitfalls. Nature Rev. Genet. 2, 437–445 10. Shoemaker, D. D., Lashkari, D. A., Morris, D., Mittmann, M. 16. Alonso, J. M. et al. Genome-wide insertional mutagenesis
(2001). & Davis, R. W. Quantitative phenotypic analysis of yeast of Arabidopsis thaliana. Science 301, 653–657 (2003).
3. Collins, F. S., Green, E. D., Guttmacher, A. E. & Guyer, M. S. deletion mutants using a highly parallel molecular bar-coding A large-scale insertional mutagenesis screen in
A vision for the future of genomics research. Nature 422, strategy. Nature Genet. 14, 450–456 (1996). Arabidopsis.
835–847 (2003). The first use of PCR-amplifiable molecular barcodes 17. Fire, A. et al. Potent and specific genetic interference by
4. Waterston, R. H. et al. Initial sequencing and comparative for high-throughput parallel biology. double-stranded RNA in Caenorhabditis elegans. Nature
391, 806–811 (1998).
analysis of the mouse genome. Nature 420, 520–562 (2002). 11. Thomas, K. R. & Capecchi, M. R. Site-directed mutagenesis
18. Kamath, R. S. et al. Systematic functional analysis of the
5. Kellis, M., Patterson, N., Endrizzi, M., Birren, B. & Lander, E. S. by gene targeting in mouse embryo-derived stem cells. Cell
Caenorhabditis elegans genome using RNAi. Nature 421,
Sequencing and comparison of yeast species to identify 51, 503–512 (1987).
231–237 (2003).
genes and regulatory elements. Nature 423, 241–254 (2003). 12. Rong, Y. S. & Golic, K. G. Gene targeting by homologous 19. Kennerdell, J. R. & Carthew, R. W. Heritable gene silencing
6. Cliften, P. et al. Finding functional features in recombination in Drosophila. Science 288, 2013–2018 (2000). in Drosophila using double-stranded RNA. Nature
Saccharomyces genomes by phylogenetic footprinting. 13. McCreath, K. J. et al. Production of gene-targeted sheep by Biotechnol. 18, 896–898 (2000).
Science 301, 71–76 (2003). nuclear transfer from cultured somatic cells. Nature 405, 20. Nasevicius, A. & Ekker, S. C. Effective targeted gene
7. Stockwell, B. R. Chemical genetics: ligand-based discovery 1066–1069 (2000). ‘knockdown’ in zebrafish. Nature Genet. 26, 216–220 (2000).
of gene function. Nature Rev. Genet. 1, 116–125 (2000). 14. Hughes, T. R. et al. Widespread aneuploidy revealed by 21. Wianny, F. & Zernicka-Goetz, M. Specific interference with
8. Giaever, G. et al. Functional profiling of the Saccharomyces DNA microarray expression profiling. Nature Genet. 25, gene function by double-stranded RNA in early mouse
cerevisiae genome. Nature 418, 387–391 (2002). 333–337 (2000). development. Nature Cell Biol. 2, 70–75 (2000).

200 | MARCH 2004 | VOLUME 5 www.nature.com/reviews/genetics


REVIEWS

22. Ogita, S., Uefuji, H., Yamaguchi, Y., Koizumi, N. & Sano, H. 53. Wayne, M. L. & McIntyre, L. M. Combining mapping and 82. Clemens, J. C. et al. Use of double-stranded RNA
RNA interference: producing decaffeinated coffee plants. arraying: an approach to candidate gene identification. Proc. interference in Drosophila cell lines to dissect signal
Nature 423, 823 (2003). Natl Acad. Sci. USA 99, 14903–14906 (2002). transduction pathways. Proc. Natl Acad. Sci. USA 97,
23. Elbashir, S. M. et al. Duplexes of 21-nucleotide RNAs 54. Aitman, T. J. et al. Identification of Cd36 (Fat) as an insulin- 6499–6503 (2000).
mediate RNA interference in cultured mammalian cells. resistance gene causing defective fatty acid and glucose 83. Esposito, M. S. & Esposito, R. E. The genetic control of
Nature 411, 494–498 (2001). metabolism in hypertensive rats. Nature Genet. 21, 76–83 sporulation in Saccharomyces. I. The isolation of
24. Jackson, A. L. et al. Expression profiling reveals off-target (1999). temperature-sensitive sporulation-deficient mutants.
gene regulation by RNAi. Nature Biotechnol. 21, 635–637 55. Belli, G., Gari, E., Aldea, M. & Herrero, E. Functional analysis Genetics 61, 79–89 (1969).
(2003). of yeast essential genes using a promoter-substitution 84. Giaever, G. et al. Genomic profiling of drug sensitivities via
25. Bridge, A. J., Pebernard, S., Ducraux, A., Nicoulaz, A. L. & cassette and the tetracycline-regulatable dual expression induced haploinsufficiency. Nature Genet. 21, 278–283
Iggo, R. Induction of an interferon response by RNAi vectors system. Yeast 14, 1127–1138 (1998). (1999).
in mammalian cells. Nature Genet. 34, 263–264 (2003). 56. Kanemaki, M., Sanchez-Diaz, A., Gambus, A. & Labib, K. 85. Lum, P. Y. et al. Discovering modes of action for therapeutic
26. Sledz, C. A., Holko, M., de Veer, M. J., Silverman, R. H. & Functional proteomic identification of DNA replication compounds using a genome-wide screen of yeast
Williams, B. R. Activation of the interferon system by short- proteins by induced proteolysis in vivo. Nature 423, heterozygotes. Cell 116, 121–137 (2004).
interfering RNAs. Nature Cell Biol. 5, 834–839 (2003). 720–725 (2003). 86. Hirsh, A. E. & Fraser, H. B. Protein dispensability and rate of
27. Hughes, T. R. et al. Functional discovery via a compendium 57. Raamsdonk, L. M. et al. A functional genomics strategy that evolution. Nature 411, 1046–1049 (2001).
of expression profiles. Cell 102, 109–126 (2000). uses metabolome data to reveal the phenotype of silent 87. Fraser, H. B., Hirsh, A. E., Steinmetz, L. M., Scharfe, C. &
The use of expression signature profiles on yeast mutations. Nature Biotechnol. 19, 45–50 (2001). Feldman, M. W. Evolutionary rate in the protein interaction
knock-out mutants for achieving functional 58. Allen, J. et al. High-throughput classification of yeast network. Science 296, 750–752 (2002).
groupings. mutants for functional genomics using metabolic 88. Gu, Z. et al. Role of duplicate genes in genetic robustness
28. Marton, M. J. et al. Drug target validation and identification footprinting. Nature Biotechnol. 21, 692–696 (2003). against null mutations. Nature 421, 63–66 (2003).
of secondary drug target effects using DNA microarrays. 59. Tong, A. H. et al. Systematic genetic analysis with ordered An example of the use of fitness data in addressing
Nature Med. 4, 1293–1301 (1998). arrays of yeast deletion mutants. Science 294, 2364–2368 fundamental questions in molecular evolution.
29. Pilpel, Y., Sudarsanam, P. & Church, G. M. Identifying (2001). Incorporates published data sets without new bench
regulatory networks by combinatorial analysis of promoter 60. Ooi, S. L., Shoemaker, D. D. & Boeke, J. D. DNA helicase experiments.
elements. Nature Genet. 29, 153–159 (2001). gene interaction network defined using synthetic lethality 89. Papp, B., Pal, C. & Hurst, L. D. Dosage sensitivity and the
30. Ihmels, J. et al. Revealing modular organization in the analyzed by microarray. Nature Genet. 35, 277–286 (2003). evolution of gene families in yeast. Nature 424, 194–197
yeast transcriptional network. Nature Genet. 31, 370–377 61. Hensel, M. et al. Simultaneous identification of bacterial (2003).
(2002). virulence genes by negative selection. Science 269, 90. Fodor, S. P. et al. Light-directed, spatially addressable
31. Segal, E. et al. Module networks: identifying regulatory 400–403 (1995). parallel chemical synthesis. Science 251, 767–773 (1991).
modules and their condition-specific regulators from gene 62. Karlyshev, A. V. et al. Application of high-density array-based The first high-density microarray made by direct
expression data. Nature Genet. 34, 166–176 (2003). signature-tagged mutagenesis to discover novel Yersinia synthesis.
32. Holstege, F. C. et al. Dissecting the regulatory circuitry of a virulence-associated genes. Infect. Immun. 69, 7810–7819 91. Blanchard, A. P., Kaiser, R. J. & Hood, L. E. Synthetic DNA
eukaryotic genome. Cell 95, 717–728 (1998). (2001). arrays. Biosens. Bioelectron. 11, 687–690 (1996).
33. Lee, T. I. et al. Transcriptional regulatory networks in 63. Heller, R. A. et al. Discovery and analysis of inflammatory 92. Schena, M., Shalon, D., Davis, R. W. & Brown, P. O.
Saccharomyces cerevisiae. Science 298, 799–804 (2002). disease-related genes using cDNA microarrays. Proc. Natl Quantitative monitoring of gene expression patterns with a
34. Brenner, S. Sillycon valley fever. Curr. Biol. 9, R671 (1999). Acad. Sci. USA 94, 2150–2155 (1997). complementary DNA microarray. Science 270, 467–470
35. Ideker, T. et al. Integrated genomic and proteomic analyses One of the first applications of gene-expression (1995).
of a systematically perturbed metabolic network. Science profiling for disease sample classification. The first cDNA microarray for gene-expression
64. Perou, C. M. et al. Distinctive gene expression patterns in
292, 929–934 (2001). profiling made by printing.
human mammary epithelial cells and breast cancers. Proc.
36. Griffin, T. J. et al. Complementary profiling of gene 93. Ferguson, J. A., Boles, T. C., Adams, C. P. & Walt, D. R.
Natl Acad. Sci. USA 96, 9212–9217 (1999).
expression at the transcriptome and proteome levels in A fiber-optic DNA biosensor microarray for the analysis of
65. Golub, T. R. et al. Molecular classification of cancer: class
Saccharomyces cerevisiae. Mol. Cell. Proteomics 1, gene expression. Nature Biotechnol. 14, 1681–1684
discovery and class prediction by gene expression
323–333 (2002). (1996).
monitoring. Science 286, 531–537 (1999).
37. Washburn, M. P. et al. Protein pathway and complex 94. Khrapko, K. R. et al. Hybridization of DNA with
66. Bohen, S. P. et al. Variation in gene expression patterns in
clustering of correlated mRNA and protein expression oligonucleotides immobilized in a gel: a convenient method
follicular lymphoma and the response to rituximab. Proc.
analyses in Saccharomyces cerevisiae. Proc. Natl Acad. for recording single base replacements. Mol. Biol. (Mosk)
Natl Acad. Sci. USA 100, 1926–1930 (2003).
Sci. USA 100, 3107–3112 (2003). 25, 718–730 (1991).
67. Eisen, M. B., Spellman, P. T., Brown, P. O. & Botstein, D.
38. Patterson, S. D. & Aebersold, R. H. Proteomics: the first 95. Halgren, R. G., Fielden, M. R., Fong, C. J. &
Cluster analysis and display of genome-wide expression
decade and beyond. Nature Genet. 33 (Suppl), 311–323 Zacharewski, T. R. Assessment of clone identity and
patterns. Proc. Natl Acad. Sci. USA 95, 14863–14868 (1998).
(2003). sequence fidelity for 1189 IMAGE cDNA clones. Nucleic
68. Tavazoie, S., Hughes, J. D., Campbell, M. J., Cho, R. J. &
39. Ghaemmaghami, S. et al. Global analysis of protein Acids Res. 29, 582–588 (2001).
Church, G. M. Systematic determination of genetic network
expression in yeast. Nature 425, 737–741 (2003). 96. Knight, J. When the chips are down. Nature 410, 860–861
architecture. Nature Genet. 22, 281–285 (1999).
40. Birrell, G. W. et al. Transcriptional response of 69. Wu, L. F. et al. Large-scale prediction of Saccharomyces (2001).
Saccharomyces cerevisiae to DNA-damaging agents does cerevisiae gene function using overlapping transcriptional 97. Modrek, B. & Lee, C. A genomic view of alternative splicing.
not identify the genes that protect against these agents. clusters. Nature Genet. 31, 255–265 (2002). Nature Genet. 30, 13–19 (2002).
Proc. Natl Acad. Sci. USA 99, 8778–8783 (2002). 70. Stuart, J. M., Segal, E., Koller, D. & Kim, S. K. A gene-co- 98. Johnson, J. M. et al. Genome-wide survey of human
41. Steinmetz, L. M. et al. Systematic screen for human disease expression network for global discovery of conserved alternative pre-mRNA splicing with exon junction
genes in yeast. Nature Genet. 31, 400–404 (2002). genetic modules. Science 302, 249–255 (2003). microarrays. Science 302, 2141–2144 (2003).
42. Deutschbauer, A. M., Williams, R. M., Chu, A. M. & 71. Glazier, A. M., Nadeau, J. H. & Aitman, T. J. Finding genes 99. Patil, N. et al. Blocks of limited haplotype diversity revealed
Davis, R. W. Parallel phenotypic analysis of sporulation and that underlie complex traits. Science 298, 2345–2349 (2002). by high-resolution scanning of human chromosome 21.
postgermination growth in Saccharomyces cerevisiae. Proc. 72. Steinmetz, L. M. et al. Dissecting the architecture of a Science 294, 1719–1723 (2001).
Natl Acad. Sci. USA 99, 15530–15535 (2002). quantitative trait locus in yeast. Nature 416, 326–330 (2002). 100. Winzeler, E. A. et al. Direct allelic variation scanning of the
43. Steinmetz, L. M. & Davis, R. W. High-density arrays and The first report of the dissection of a complex trait yeast genome. Science 281, 1194–1197 (1998).
insights into genome function. Biotechnol. Genet. Eng. Rev. from a description of the phenotype to identification 101. Kwok, P. Y. SNP genotyping with fluorescence polarization
17, 109–146 (2000). of the genes published in a single study. Provides detection. Hum. Mutat. 19, 315–323 (2002).
44. Yang, Y. H. & Speed, T. Design issues for cDNA microarray evidence for complex QTL architecture and describes 102. Jurinke, C., van den Boom, D., Cantor, C. R. & Koster, H.
experiments. Nature Rev. Genet. 3, 579–588 (2002). a new functional assay. Automated genotyping using the DNA MassArray
45. Van Eerdewegh, P. et al. Association of the ADAM33 gene 73. Darvasi, A. & Pisante-Shalom, A. Complexities in the genetic technology. Methods Mol. Biol. 187, 179–192 (2002).
with asthma and bronchial hyperresponsiveness. Nature dissection of quantitative trait loci. Trends Genet. 18, 489–491
418, 426–430. (2002). Acknowledgements
(2002).
46. Sklar, P. et al. Association analysis of NOTCH4 loci in We would like to thank L. David and T. Neklesa for helpful com-
74. Christians, J. K. & Keightley, P. D. Genetic architecture:
schizophrenia using family and population-based controls. ments on the manuscript.
dissecting the genetic basis of phenotypic variation. Curr.
Nature Genet. 28, 126–128. (2001). Biol. 12, R415–416. (2002). Competing interests statement
47. Botstein, D. & Risch, N. Discovering genotypes underlying 75. Mackay, T. F. Quantitative trait loci in Drosophila. Nature Rev. The authors declare that they have no competing financial interests.
human phenotypes: past successes for mendelian disease, Genet. 2, 11–20 (2001).
future approaches for complex disease. Nature Genet. 33 76. Freimer, N. & Sabatti, C. The human phenome project.
(Suppl), 228–237 (2003). Nature Genet. 34, 15–21 (2003).
48. Hardenbol, P. et al. Multiplexed genotyping with sequence- 77. Spradling, A. C. et al. The Berkeley Drosophila Genome
Online links
tagged molecular inversion probes. Nature Biotechnol. 21, Project gene disruption project: single P-element insertions
673–678 (2003). mutating 25% of vital Drosophila genes. Genetics 153, DATABASES
49. Faham, M. & Cox, D. R. A novel in vivo method to detect 135–177 (1999). The following terms in this article are linked online to:
DNA sequence variation. Genome Res. 5, 474–482 (1995). 78. Peter, A. et al. Mapping and identification of essential gene Entrez: http://www.ncbi.nlm.nih.gov/Entrez
50. Faham, M., Baharloo, S., Tomitaka, S., DeYoung, J. & functions on the X chromosome of Drosophila. EMBO Rep. LRPPRC | tetR
Freimer, N. B. Mismatch repair detection (MRD): high- 3, 34–38 (2002). OMIM: http://www.ncbi.nlm.nih.gov/Omim
throughput scanning for DNA variations. Hum. Mol. Genet. 79. Zambrowicz, B. P. et al. Disruption and sequence Leigh syndrome
10, 1657–1664 (2001). identification of 2,000 genes in mouse embryonic stem cells.
51. Perez-Iratxeta, C., Bork, P. & Andrade, M. A. Association of Nature 392, 608–611 (1998). FURTHER INFORMATION
genes to genetically inherited diseases using data mining. 80. Martin, E. et al. Identification of 1,088 new transposon Arabidopsis Mutations: http://www.arabidopsis.org/abrc
Nature Genet. 31, 316–319 (2002). insertions of Caenorhabditis elegans: a pilot study toward Human Gene Mutation Database:
52. Mootha, V. K. et al. Identification of a gene causing human large-scale screens. Genetics 162, 521–524 (2002). http://archive.uwcm.ac.uk/uwcm/mg/hgmd0.html
cytochrome c oxidase deficiency by integrative genomics. 81. Gura, T. A silence that speaks volumes. Nature 404, Yeast Deletion Database: http://yeastdeletion.stanford.edu
Proc. Natl Acad. Sci. USA 100, 605–610 (2003). 804–808 (2000). Access to this interactive links box is free online.

NATURE REVIEWS | GENETICS VOLUME 5 | MARCH 2004 | 2 0 1

Anda mungkin juga menyukai