Why Do Genome Wide Scans Fail

Genetic Future : Why do genome-wide scans fail?
Official
NowComment
on ScienceBlogs:
Count: 1,000,000 Comment Contest!
1,014,040
● ScienceBlogs home
● Last 24 Hours
Latest Posts Archives About Contact Subscribe ● Syndication Feeds (RSS)
● Email Subscriptions
« First Arab genome sequenced | Main | David Goldstein on the failures of genome-wide association ● The SB Weekly Recap
Search this blog
studies »
Blogs in the Network
Search Why do genome-wide scans fail? All Blogs

Category: genome-wide association studies
Posted on: September 15, 2008 9:02 AM, by Daniel MacArthur Advertisement
Profile I write about the genetic
and evolutionary basis of human Reposted from the old Genetic Future domain. Top Five: Readers' Picks
variation, and the companies
trying to sell you information
The successes of genome-wide association studies (GWAS) in identifying genetic risk factors 1. Greg Laden's Blog : Minnesota Science
about your genome. Standards: Shifting Controversies
for common diseases have been heavily publicised in the mainstream media - barely a week
2. Gene Expression: Male skew; dude likes
goes by these days that we don't hear about another genome scan that has identified new risk ladies
Recent Posts genes for diabetes, lupus, cardiac disease, or any of the other common ailments of Western 3. Tetrapod Zoology : The Long-necked
civilisation. seal, described 1751
● Google co-founder at increased 4. Greg Laden's Blog : Hey, can I still get
risk of Parkinson's, according to Some of this publicity is well-founded: for the first time in human history, we have the power that $50 / hour for picking lettuce?
23andMe 5. Living the Scientific Life (Scientist,
to identify the precise genetic differences between human beings that contribute to variation
● HapMap phase 3 data available in disease susceptibility. If we can document all of the factors, both genetic and Interrupted): Love, Sex and War in the
Seychelles
for browsing
environmental, that result in common disease we will be able to target early interventions to
● David Goldstein on the failures the individuals who are most susceptible. Every GWAS success brings us closer to the long-
of genome-wide association Search All Blogs
awaited era of personalised medicine.
studies
● Why do genome-wide scans fail?
But while the media trumpet the successes of genome scans, little attention is paid to their
● First Arab genome sequenced failures. The fact remains that despite the hundreds of millions of dollars spent on genome-
● Millionth comment party in wide association studies, most of the genetic variance in risk for most common diseases
Sydney on 17th September remains undiscovered. Indeed, some common diseases with a strong heritable component,
● 10 hints on parsing such as bipolar disease, have remained almost completely resistant to GWAS.
● Cheap personal genomics: the
death-knell for the industry? Where is this heritable risk hiding? It now seems likely that it's lurking in a number of different
● Cheap as chips: 23andMe places, with the fraction of the risk in each category varying from disease to disease. This
slashes the price of personal post serves as a generic list of the dark regions of the genome currently inaccessible to GWAS,
genomics with some discussion of the techniques that will likely prove useful in mapping risk variants in
● Is a personal genome sequence these areas.
worth $350,000?
Alleles with small effect sizes
The problem: The ability to simultaneously examine hundreds of thousands of variants
http://scienceblogs.com/geneticfuture/2008/09/why_do_genomewide_scans_fail.php (1 of 8) [27/09/2008 14:49:28]
Recent Comments ● zayzayem throughout the genome is both the strength and the weakness of the GWAS approach. The
on Google co-founder at power of GWAS is that they provide a relatively unbiased examination of the entire genome
increased risk of Parkinson's,
for common risk variants; their weakness is that in doing so, they swamp the signal from true
according to 23andMe
risk variants with statistical noise from the vast numbers of markers that aren't associated
● Henk Visscher on Google co- with disease. To separate true signals from noise, researchers have to set an exceptionally
founder at increased risk of high threshold that a marker needs to exceed before it is accepted as a likely disease-causing
Parkinson's, according to
candidate. That reduces the problem of false positives, but it also means that any true
23andMe
disease markers with small effects are lost in the background noise.
● Steven Murphy on HapMap
phase 3 data available for The solution: This seems to be one problem that will need to be solved, at least to some
browsing
extent, with sheer brute force. By increasing the numbers of samples in their disease and
● Daniel MacArthur on David control groups researchers will steadily dial down the statistical noise from non-associated
Goldstein on the failures of markers until even disease genes with small effects stand out above the crowd. As the cost of
genome-wide association genotyping (and sequencing) tumbles ever downward such an approach will become more and
studies
more feasible; however, the logistical challenge of collecting large numbers of carefully-
● razib on David Goldstein on the ascertained patients will always be a serious obstacle.
failures of genome-wide
association studies Rare variants
● Hakon Hakonarson on David The problem: Current genome scan technology relies heavily on the "common disease,
Goldstein on the failures of common variant" (CDCV) assumption, which states that the genetic risk for common disease is
genome-wide association mostly attributable to a relatively small number of common genetic variants. This is largely
studies an assumption of convenience: firstly, our catalogue of human genetic variation (built up by
● Jason Malloy on David Goldstein efforts such as the HapMap project) is largely restricted to common variants, since rare
on the failures of genome-wide variants are much harder to identify; and secondly, chip-makers have restrictions on how
association studies many different SNPs they can analyse on a single chip, so the natural tendency has been to
● razib on David Goldstein on the cram in the high-frequency variants that capture the largest proportion of genetic variation
failures of genome-wide per probe. There is also some theoretical justification for this assumption based on models of
association studies human demographic history, but these models are themselves based on numerous
assumptions, and the argument may not apply equally to all common human diseases.
● autumnmist on David Goldstein
on the failures of genome-wide
association studies In any case, everyone agrees that some non-trivial fraction of the genetic risk of common
diseases will be the result of rare variants, and the latest results from GWAS in a variety of
● razib on David Goldstein on the
diseases have failed to provide unambiguous support for the CDCV hypothesis. Whatever the
failures of genome-wide
association studies proportion of variance that turns out to be explained by rare variants, current GWAS
technologies are essentially powerless to unravel it.
Archives The solution: Increasing sample sizes may help a little, but the fundamental problem is the
● September 2008
inability of current chips to tag rare variation. Short-term, the solution will be higher-density
SNP chips incorporating lower frequency variants identified by large-scale sequencing projects
Blogs I read: Genetics Blogs: like the 1000 Genomes Project. However, such approaches will have diminishing returns: as
chip-makers lower the frequency of the variants on their chips, the number of probes that will
● John Hawks
have to be added to capture a reasonable fraction of total genetic variation will increase
● Gene Expression
exponentially, with each new probe adding only a minute increase in power.
● Gene Expression SB
● evolgen Ultimately, the answer lies in large-scale sequencing, which will provide a complete catalogue
● Popgen Ramblings of every variant in the genomes of both patients and controls. The problem here is not so
● Eye on DNA much the sequencing itself - the costs of sequencing are currently plummeting due to massive
investment in rapid sequencing technologies - but in the interpretation. Whole new analytical
● genomeboy
techniques will be required to convert these data into useful information.
● The Personal Genome
● Yann Klimentidis Population differences
● Discovering Biology in a Digital The problem: Over the last 50 to 100 thousand years modern humans have enthusiastically
World colonised much of the world's landmass. Each wave of expansion has carried with it a fraction
● The Genetic Genealogist of the genetic variation of its ancestral population, along with a few novel variants acquired
● business|bytes|genes|moleculesthrough mutation. In each new habitat encountered, natural selection has acted to increase
● Mass Genomics the frequency of variants that provided an advantage, and cull those that were harmful, while
the rest of the genome passively gained and lost genetic variation. The end result is a set of
● Thomas Mailund
human populations that, while extremely similar across the genome as a whole, can carry
● ThinkGene
quite different sets of genetic variants relevant to disease. In addition, the correlation
● Genomicron
between markers close together in the genome (known as linkage disequilibrium) can also
differ between populations, so that a marker that is tightly correlated with a disease variant
Corporate Blogs: in one population may be only weakly associated in other groups.
● OpenHelix
These differences have profound implications for disease gene mapping efforts. As a result of
● 23andMe
this variation, markers that are associated with disease in one population can never be
● Navigenics
assumed to show the same associations in other human groups (this will be especially true for
● deCODEme rare variants, of course). Current GWAS have been dominated by subjects of Western
● DNA Direct European ancestry, and our understanding of genetic risk variants in non-European populations
● CLC Bio NGS is almost non-existent. In addition, these differences mean that mixing people with different
● Gene Sherpa ancestries together in a disease cohort can seriously confound the identification of causative
genes - in certain situations, such mixing can greatly increase the risk of false positive findings.
Bits and Pieces The solution: For GWAS results to be universally applicable, they will need to be performed in
cohorts from a wide range of populations. Data-sets such as the HapMap project, the Human
Genome Diversity Panel and the powerful new 1000 Genomes Project will provide information
about the patterns of genetic variation in diverse populations that is needed to design the
assays for GWAS. A greater challenge will be collecting the large numbers of ancestry-
homogeneous samples - both well-validated disease patients and healthy controls - required
for GWAS approaches to be successful. This problem is likely to be particularly acute for
African populations, where linkage disequilibrium is lower and genetic diversity much higher
than in other regions (thus requiring larger numbers of markers and individuals to identify
disease variants); and of course, in Africa and much of the rest of the world, local
governments typically have much more pressing issues than genome scans to spend their
limited health budgets on.
Epistatic interactions
The problem: Most current genetic approaches assume that genetic risk is additive - in other
words, that the presence of two risk factors in an individual will increase risk by the sum of
the two factors by themselves. However, there's no reason to expect that this will always be
the case. Epistatic interactions, in which combined risk is greater (or less) than the sum of the
risk from individual genes, are difficult to identify with genome scans and even harder to
untangle. If epistasis is strong, then just a few genes - each with a weak effect by itself, well
below the threshold of a scan - could in concert explain a large chunk of genetic risk. Such a
situation would be largely invisible to current approaches.
The solution: Large sample sizes, and clever analytical techniques. I'm not going to attempt a
more detailed answer as this area is well outside my knowledge zone - but fortunately, it's an
active area of research (see, for instance, the Epistasis Blog). I'd welcome any comments from
people who know more about epistasis than I do about the likely scope of this problem and
the methods that will be used to resolve it.
Copy number variation

The problem: One of the great surprises of the last five years has been the discovery of
widespread, large-scale insertions and deletions of DNA, known as copy number variations
(CNVs), in even healthy genomes. CNVs are now known to account for a substantial fraction of
human genetic variation, and have been shown to play a role in variation in human gene
expression and in human evolution. It seems highly likely that CNVs will be responsible for a
non-trivial proportion of common disease risk.
However, our understanding of these variants is still in its infancy. The chips currently used in
GWAS, which interrogate single base-pair variations between individuals known as SNPs, can
be used to detect a small proportion of CNVs indirectly (by looking for distortions of signal
intensity or inheritance patterns), and may effectively "tag" a fraction of the remainder (by
using SNPs that are very close to the CNV, and therefore tend to be inherited along with it).
However, the vast majority of copy number variation remains invisible to current GWAS
technology.
The solution: High-resolution tiling arrays - chips containing millions of probes, each of which
binds to a small region of the genome - can be used to explore CNVs in some areas of the
genome, but they break down for the large fraction of the genome containing repetitive
elements. Ultimately, the complete detection of CNVs from patients and controls will require
whole-genome sequencing, preferably using methods with much longer read lengths than the
current crop of rapid sequencing technologies.
Epigenetic inheritance
The problem: Not all inherited information is carried in the DNA sequence of the genome; a
child also receives "epigenetic" information from its parents in the form of chemical
modifications of DNA that can alter the expression of genes - and thus physical traits - without
changing the sequence. Although epigenetic inheritance is known to occur, the degree to
which it influences human physical variation and disease risk is essentially totally unknown.
All existing technologies used in GWAS are based on DNA sequence, and thus don't detect
epigenetic variation. It is even invisible to full-genome sequencing.
The solution: It first needs to be established that epigenetically inherited variations do

actually contribute a non-trivial fraction of human disease risk. If so, techniques currently
being developed to identify these variants in a high-throughput fashion could be used to
perform EWAS (epigenome-wide association studies).
Disease heterogeneity
The problem: Some "diseases" are actually simply collections of symptoms, which may stem
from multiple, distinct genetic causes. Lumping patients with fundamentally different
conditions into a single patient cohort for a GWAS is a recipe for failure: even if there are
strong genetic risk factors for each one of the separate conditions, each of these will be
drowned out by the noise from the other, unrelated diseases. The problem is that for some
diseases - particularly mental illnesses, where causation lurks deep within the complex and
poorly-understood human brain - the knowledge and tools required to separate patients into
distinct sub-categories simply may not exist yet.
The solution: The geneticists can't fix this one - it will take a combined effort from clinicians
and medical researchers to break down complex diseases into useful diagnostic categories,
which can then each be subjected to separate genetic analysis. In the cancer arena,
conditions previously lumped together as one entity have now been separated using new
technologies such as gene expression arrays; similar approaches will no doubt prove fruitful in
a range of other diseases, although the inaccessibility of brain tissue will make it more
difficult to apply such approaches to mental illness.
The future of genetic association studies

Current chip-based technologies for genome-wide analysis, while having some success in
identifying the lowest-hanging genetic fruit for many common diseases, seem to have already
started to run up against barriers that are unlikely to be overcome by simply increasing
sample sizes. These technologies should really be regarded as little more than a place-
holder for whole-genome sequencing, which should become affordable enough to use for
large-scale association studies within 3-5 years.
The application of cheap, rapid sequencing technology is likely to generate a harvest of new
disease genes that far exceeds the yield of current GWAS, by providing simultaneous access to
both the rare variants and copy number variations that are inaccessible to current chip-based
approaches. However, building a more complete catalogue of the heritable variants that drive
common disease risk will require more than just cheap sequencing: it will also take advances
in clinical diagnostics to better sub-categorise patients into homogeneous groups, as well as
new and powerful analytical approaches to cope with the torrent of sequence data, and to
efficiently identify epistatic interactions between disease variants. To have any chance of
picking out variants of small effect from whole-genome sequencing data sample sizes will
have to be enormous - massive cohorts currently being assembled, such as the 500,000-
person UK Biobank and a similar NIH-funded study currently in the works, will provide
essential raw material for the selection of participants. Naturally, to be applicable to
humanity as a whole, cohorts will need to be gathered separately from many different
human populations.
Finally, epigenetic variation remains a wild-card of uncertain significance, which will need to
be tackled with a different set of high-throughput technologies (although it's likely that many
of these will feed on advances in high-throughput sequencing).
Although I probably sound pretty negative about GWAS, I want to emphasise that the current
problems are the result of technological limitations that will soon disappear. Barring global
catastrophe, within the lifetimes of most of those reading this post we will have a near-
complete catalogue of the genetic variants influencing the risk of most of the common
diseases that plague the industrialised world (and, hopefully, many of those that plague the
rest of humanity). Together with parallel advances in medical science, this catalogue will
provide an unprecedented ability to predict, treat and potentially completely eliminate a host

of common diseases. It will also bring social and ethical challenges of unprecedented
magnitude - but that's a topic for another post...
Subscribe to Genetic Future.
Find ShareThis
more
Comments
posts
in:
I'm not searching through your old archives right now, but it would be nice to read a post on
"whereLifegenome-wide scan succeed." Given the current state of the technology and analyses,
Science
are there commonalities to the successes? Are there particular types of conditions where this
approach is more likely to succeed? There would be some overlap with this text, but it would
be a nice parallel.
Posted by: bsci | September 15, 2008 12:03 PM
Your study confirms the sad consequence of neglecting the structure and the function of the
epigenetic control system of the organism. Indeed, the subject requires focusing on the
physical carrier of this control system and nobody wants to challenge the current physical
paradigm.
The subject is discussed in our book presented at www.misaha.com

Best regards,
Savely Savva
Posted by: Savely Savva | September 15, 2008 2:59 PM
bsci - I'm currently working on precisely that post. :-)

Posted by: Daniel MacArthur | September 15, 2008 8:07 PM
Add dyslexia to the list. Too many variables.

Posted by: gillt | September 15, 2008 11:22 PM
Nicholas Wade interviews Daniel Goldstein of Duke University on related subjects in today's
NYT. Link via Razib at GNXP, who adds useful context.
Posted by: AMac | September 16, 2008 10:18 AM
Post a Comment
(Email is required for authentication purposes only. Comments are moderated for spam, your comment
may not appear immediately. Thanks for waiting.)
Name:
Email Address:

URL:
Comments: (you may use HTML tags for style)
Preview Post
Having problems commenting? (UPDATED)
● Focus ● News ● Magazine
● Seed's Daily Zeitgeist: ● Mechanical Generation

8/7/2008
● Beauty and the Brain
● The Creation Simulation

✔
YES! Send me a free issue of Seed.
● Standing on the Shoulders of
Giants If I like what I see, I'll receive 5 more
issues (6 in all) for just $14.95. That's
● Steven Pinker on Swearing and
50% off the cover price! If I'm not
Violence
completely satisfied, I'll simply write
● Seed Salon: Jill Tarter + Will
"cancel" on the invoice and owe
Wright
nothing. The free issue is mine to keep.
● Wing of Bat, and Mouse's Leg
First Name: Last Name:
● Inheriting Confucius
Address:
City: State: Zip Code:
Email:
(Non-U.S. subscribers, click here.)
Copyright ©2005-2008 ScienceBlogs LLC · Advertise with Seed · Privacy Policy · Terms & Conditions · Contact Us · Home


Why Do Genome Wide Scans Fail

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Why Do Genome Wide Scans Fail

Diunggah oleh

Hak Cipta:

Format Tersedia

Genetic Future : Why do genome-wide scans fail?

Latest Posts Archives About Contact Subscribe ● Syndication Feeds (RSS)

Search Why do genome-wide scans fail? All Blogs

Copy number variation

The solution: It first needs to be established that epigenetically inherited variations do

The future of genetic association studies

http://scienceblogs.com/geneticfuture/2008/09/why_do_genomewide_scans_fail.php (5 of 8) [27/09/2008 14:49:28]

Subscribe to Genetic Future.

The subject is discussed in our book presented at www.misaha.com

bsci - I'm currently working on precisely that post. :-)

Add dyslexia to the list. Too many variables.

http://scienceblogs.com/geneticfuture/2008/09/why_do_genomewide_scans_fail.php (6 of 8) [27/09/2008 14:49:28]

Comments: (you may use HTML tags for style)

Having problems commenting? (UPDATED)

● Focus ● News ● Magazine

● Seed's Daily Zeitgeist: ● Mechanical Generation

● The Creation Simulation

City: State: Zip Code:

(Non-U.S. subscribers, click here.)

http://scienceblogs.com/geneticfuture/2008/09/why_do_genomewide_scans_fail.php (7 of 8) [27/09/2008 14:49:28]

http://scienceblogs.com/geneticfuture/2008/09/why_do_genomewide_scans_fail.php (8 of 8) [27/09/2008 14:49:28]

Anda mungkin juga menyukai