Official
NowComment
on ScienceBlogs:
Count: 1,000,000 Comment Contest!
1,014,040
● ScienceBlogs home
● Last 24 Hours
● Email Subscriptions
« First Arab genome sequenced | Main | David Goldstein on the failures of genome-wide association ● The SB Weekly Recap
Search this blog
studies »
Blogs in the Network
Recent Comments ● zayzayem throughout the genome is both the strength and the weakness of the GWAS approach. The
on Google co-founder at power of GWAS is that they provide a relatively unbiased examination of the entire genome
increased risk of Parkinson's,
for common risk variants; their weakness is that in doing so, they swamp the signal from true
according to 23andMe
risk variants with statistical noise from the vast numbers of markers that aren't associated
● Henk Visscher on Google co- with disease. To separate true signals from noise, researchers have to set an exceptionally
founder at increased risk of high threshold that a marker needs to exceed before it is accepted as a likely disease-causing
Parkinson's, according to
candidate. That reduces the problem of false positives, but it also means that any true
23andMe
disease markers with small effects are lost in the background noise.
● Steven Murphy on HapMap
phase 3 data available for The solution: This seems to be one problem that will need to be solved, at least to some
browsing
extent, with sheer brute force. By increasing the numbers of samples in their disease and
● Daniel MacArthur on David control groups researchers will steadily dial down the statistical noise from non-associated
Goldstein on the failures of markers until even disease genes with small effects stand out above the crowd. As the cost of
genome-wide association genotyping (and sequencing) tumbles ever downward such an approach will become more and
studies
more feasible; however, the logistical challenge of collecting large numbers of carefully-
● razib on David Goldstein on the ascertained patients will always be a serious obstacle.
failures of genome-wide
association studies Rare variants
● Hakon Hakonarson on David The problem: Current genome scan technology relies heavily on the "common disease,
Goldstein on the failures of common variant" (CDCV) assumption, which states that the genetic risk for common disease is
genome-wide association mostly attributable to a relatively small number of common genetic variants. This is largely
studies an assumption of convenience: firstly, our catalogue of human genetic variation (built up by
● Jason Malloy on David Goldstein efforts such as the HapMap project) is largely restricted to common variants, since rare
on the failures of genome-wide variants are much harder to identify; and secondly, chip-makers have restrictions on how
association studies many different SNPs they can analyse on a single chip, so the natural tendency has been to
● razib on David Goldstein on the cram in the high-frequency variants that capture the largest proportion of genetic variation
failures of genome-wide per probe. There is also some theoretical justification for this assumption based on models of
association studies human demographic history, but these models are themselves based on numerous
assumptions, and the argument may not apply equally to all common human diseases.
● autumnmist on David Goldstein
on the failures of genome-wide
association studies In any case, everyone agrees that some non-trivial fraction of the genetic risk of common
diseases will be the result of rare variants, and the latest results from GWAS in a variety of
● razib on David Goldstein on the
diseases have failed to provide unambiguous support for the CDCV hypothesis. Whatever the
failures of genome-wide
association studies proportion of variance that turns out to be explained by rare variants, current GWAS
technologies are essentially powerless to unravel it.
Archives The solution: Increasing sample sizes may help a little, but the fundamental problem is the
● September 2008
inability of current chips to tag rare variation. Short-term, the solution will be higher-density
SNP chips incorporating lower frequency variants identified by large-scale sequencing projects
Blogs I read: Genetics Blogs: like the 1000 Genomes Project. However, such approaches will have diminishing returns: as
chip-makers lower the frequency of the variants on their chips, the number of probes that will
● John Hawks
have to be added to capture a reasonable fraction of total genetic variation will increase
● Gene Expression
exponentially, with each new probe adding only a minute increase in power.
● Gene Expression SB
● evolgen Ultimately, the answer lies in large-scale sequencing, which will provide a complete catalogue
● Popgen Ramblings of every variant in the genomes of both patients and controls. The problem here is not so
● Eye on DNA much the sequencing itself - the costs of sequencing are currently plummeting due to massive
investment in rapid sequencing technologies - but in the interpretation. Whole new analytical
http://scienceblogs.com/geneticfuture/2008/09/why_do_genomewide_scans_fail.php (2 of 8) [27/09/2008 14:49:28]
Genetic Future : Why do genome-wide scans fail?
● genomeboy
techniques will be required to convert these data into useful information.
● The Personal Genome
● Yann Klimentidis Population differences
● Discovering Biology in a Digital The problem: Over the last 50 to 100 thousand years modern humans have enthusiastically
World colonised much of the world's landmass. Each wave of expansion has carried with it a fraction
● The Genetic Genealogist of the genetic variation of its ancestral population, along with a few novel variants acquired
● business|bytes|genes|moleculesthrough mutation. In each new habitat encountered, natural selection has acted to increase
● Mass Genomics the frequency of variants that provided an advantage, and cull those that were harmful, while
the rest of the genome passively gained and lost genetic variation. The end result is a set of
● Thomas Mailund
human populations that, while extremely similar across the genome as a whole, can carry
● ThinkGene
quite different sets of genetic variants relevant to disease. In addition, the correlation
● Genomicron
between markers close together in the genome (known as linkage disequilibrium) can also
differ between populations, so that a marker that is tightly correlated with a disease variant
Corporate Blogs: in one population may be only weakly associated in other groups.
● OpenHelix
These differences have profound implications for disease gene mapping efforts. As a result of
● 23andMe
this variation, markers that are associated with disease in one population can never be
● Navigenics
assumed to show the same associations in other human groups (this will be especially true for
● deCODEme rare variants, of course). Current GWAS have been dominated by subjects of Western
● DNA Direct European ancestry, and our understanding of genetic risk variants in non-European populations
● CLC Bio NGS is almost non-existent. In addition, these differences mean that mixing people with different
● Gene Sherpa ancestries together in a disease cohort can seriously confound the identification of causative
genes - in certain situations, such mixing can greatly increase the risk of false positive findings.
Bits and Pieces The solution: For GWAS results to be universally applicable, they will need to be performed in
cohorts from a wide range of populations. Data-sets such as the HapMap project, the Human
Genome Diversity Panel and the powerful new 1000 Genomes Project will provide information
about the patterns of genetic variation in diverse populations that is needed to design the
assays for GWAS. A greater challenge will be collecting the large numbers of ancestry-
homogeneous samples - both well-validated disease patients and healthy controls - required
for GWAS approaches to be successful. This problem is likely to be particularly acute for
African populations, where linkage disequilibrium is lower and genetic diversity much higher
than in other regions (thus requiring larger numbers of markers and individuals to identify
disease variants); and of course, in Africa and much of the rest of the world, local
governments typically have much more pressing issues than genome scans to spend their
limited health budgets on.
Epistatic interactions
The problem: Most current genetic approaches assume that genetic risk is additive - in other
words, that the presence of two risk factors in an individual will increase risk by the sum of
the two factors by themselves. However, there's no reason to expect that this will always be
the case. Epistatic interactions, in which combined risk is greater (or less) than the sum of the
risk from individual genes, are difficult to identify with genome scans and even harder to
untangle. If epistasis is strong, then just a few genes - each with a weak effect by itself, well
below the threshold of a scan - could in concert explain a large chunk of genetic risk. Such a
situation would be largely invisible to current approaches.
The solution: Large sample sizes, and clever analytical techniques. I'm not going to attempt a
http://scienceblogs.com/geneticfuture/2008/09/why_do_genomewide_scans_fail.php (3 of 8) [27/09/2008 14:49:28]
Genetic Future : Why do genome-wide scans fail?
more detailed answer as this area is well outside my knowledge zone - but fortunately, it's an
active area of research (see, for instance, the Epistasis Blog). I'd welcome any comments from
people who know more about epistasis than I do about the likely scope of this problem and
the methods that will be used to resolve it.
However, our understanding of these variants is still in its infancy. The chips currently used in
GWAS, which interrogate single base-pair variations between individuals known as SNPs, can
be used to detect a small proportion of CNVs indirectly (by looking for distortions of signal
intensity or inheritance patterns), and may effectively "tag" a fraction of the remainder (by
using SNPs that are very close to the CNV, and therefore tend to be inherited along with it).
However, the vast majority of copy number variation remains invisible to current GWAS
technology.
The solution: High-resolution tiling arrays - chips containing millions of probes, each of which
binds to a small region of the genome - can be used to explore CNVs in some areas of the
genome, but they break down for the large fraction of the genome containing repetitive
elements. Ultimately, the complete detection of CNVs from patients and controls will require
whole-genome sequencing, preferably using methods with much longer read lengths than the
current crop of rapid sequencing technologies.
Epigenetic inheritance
The problem: Not all inherited information is carried in the DNA sequence of the genome; a
child also receives "epigenetic" information from its parents in the form of chemical
modifications of DNA that can alter the expression of genes - and thus physical traits - without
changing the sequence. Although epigenetic inheritance is known to occur, the degree to
which it influences human physical variation and disease risk is essentially totally unknown.
All existing technologies used in GWAS are based on DNA sequence, and thus don't detect
epigenetic variation. It is even invisible to full-genome sequencing.
Disease heterogeneity
The problem: Some "diseases" are actually simply collections of symptoms, which may stem
from multiple, distinct genetic causes. Lumping patients with fundamentally different
conditions into a single patient cohort for a GWAS is a recipe for failure: even if there are
strong genetic risk factors for each one of the separate conditions, each of these will be
http://scienceblogs.com/geneticfuture/2008/09/why_do_genomewide_scans_fail.php (4 of 8) [27/09/2008 14:49:28]
Genetic Future : Why do genome-wide scans fail?
drowned out by the noise from the other, unrelated diseases. The problem is that for some
diseases - particularly mental illnesses, where causation lurks deep within the complex and
poorly-understood human brain - the knowledge and tools required to separate patients into
distinct sub-categories simply may not exist yet.
The solution: The geneticists can't fix this one - it will take a combined effort from clinicians
and medical researchers to break down complex diseases into useful diagnostic categories,
which can then each be subjected to separate genetic analysis. In the cancer arena,
conditions previously lumped together as one entity have now been separated using new
technologies such as gene expression arrays; similar approaches will no doubt prove fruitful in
a range of other diseases, although the inaccessibility of brain tissue will make it more
difficult to apply such approaches to mental illness.
The application of cheap, rapid sequencing technology is likely to generate a harvest of new
disease genes that far exceeds the yield of current GWAS, by providing simultaneous access to
both the rare variants and copy number variations that are inaccessible to current chip-based
approaches. However, building a more complete catalogue of the heritable variants that drive
common disease risk will require more than just cheap sequencing: it will also take advances
in clinical diagnostics to better sub-categorise patients into homogeneous groups, as well as
new and powerful analytical approaches to cope with the torrent of sequence data, and to
efficiently identify epistatic interactions between disease variants. To have any chance of
picking out variants of small effect from whole-genome sequencing data sample sizes will
have to be enormous - massive cohorts currently being assembled, such as the 500,000-
person UK Biobank and a similar NIH-funded study currently in the works, will provide
essential raw material for the selection of participants. Naturally, to be applicable to
humanity as a whole, cohorts will need to be gathered separately from many different
human populations.
Finally, epigenetic variation remains a wild-card of uncertain significance, which will need to
be tackled with a different set of high-throughput technologies (although it's likely that many
of these will feed on advances in high-throughput sequencing).
Although I probably sound pretty negative about GWAS, I want to emphasise that the current
problems are the result of technological limitations that will soon disappear. Barring global
catastrophe, within the lifetimes of most of those reading this post we will have a near-
complete catalogue of the genetic variants influencing the risk of most of the common
diseases that plague the industrialised world (and, hopefully, many of those that plague the
rest of humanity). Together with parallel advances in medical science, this catalogue will
provide an unprecedented ability to predict, treat and potentially completely eliminate a host
of common diseases. It will also bring social and ethical challenges of unprecedented
magnitude - but that's a topic for another post...
Find ShareThis
more
Comments
posts
in:
I'm not searching through your old archives right now, but it would be nice to read a post on
"whereLifegenome-wide scan succeed." Given the current state of the technology and analyses,
Science
are there commonalities to the successes? Are there particular types of conditions where this
approach is more likely to succeed? There would be some overlap with this text, but it would
be a nice parallel.
Posted by: bsci | September 15, 2008 12:03 PM
Your study confirms the sad consequence of neglecting the structure and the function of the
epigenetic control system of the organism. Indeed, the subject requires focusing on the
physical carrier of this control system and nobody wants to challenge the current physical
paradigm.
Nicholas Wade interviews Daniel Goldstein of Duke University on related subjects in today's
NYT. Link via Razib at GNXP, who adds useful context.
Posted by: AMac | September 16, 2008 10:18 AM
Post a Comment
(Email is required for authentication purposes only. Comments are moderated for spam, your comment
may not appear immediately. Thanks for waiting.)
Name:
Email Address:
URL:
Preview Post
Address:
Email:
Copyright ©2005-2008 ScienceBlogs LLC · Advertise with Seed · Privacy Policy · Terms & Conditions · Contact Us · Home