We need to create a market for genetic-association data

In the past I have argued that we shouldn’t be too paternalistic about genetic information.  I reasoned that genetic information is getting very inexpensive, and will be available as a commodity to many people very soon.  Therefore, interposing  mandatory healthcare worker between genetic information and the consumer just seems cumbersome and unlikely to be effective.  Some have pointed out that there are many people who will need assistance in interpreting their results, and this is a reasonable concern.   I think we could handle it in other ways.

For example, how about if for every genetic association study, the journal editors got together and decided that they would require that authors write a lay-oriented summary of their study, and what the potential health (or other) significance of each allele is, and how it may be combined with other genetic or environmental information.  These summaries could include statistics for risk and other quantitative information–all of which could be part of the peer review of the article.  In this way, there would be a readable summary of each genetic association and its significance created and available at the time of publication.  Most importantly, we would ask that these summaries be written using standard templates so that they all had a similar format (perhaps represented in XML or whatever), and we would ask that the journals relinquish claim to copyright (as they do for abstracts) so that these summaries could be reused freely by others.

What’s the benefit of this?  Well, first of all there would be basic information available with every genetic association study about how to interpret it.  This would be useful for physicians and patients, and could help assist the public in interpreting their commodity genetic data.   Second, by relinquishing claims to copyright, the journals would allow the information to be widely disseminated.  This would allow third party information aggregators to create improved products to help consumers understand their genetic information.  There could be a public “free” version, but companies could compete by providing high quality annotation and others amenities to create a competitive marked for the presentation of genetic data.  Some could even have telephone hotlines with real people at the other end, explaining the genetics.  Consumers would have a choice about how much help they want to understand their genome.  Finally, if these summaries are sufficiently structured and use appropriate controlled terminologies, bioinformatics folks could write pretty cool algorithms to annoate the genome, aggregate related genetic variations, and otherwise create value through the power of bioinformatics.

If scientists take issue with some of these summaries (despite peer review), we can have a process for rebuttal or refutation.  These can also be in standard templates, in the public domain, and linked to the initial summary.  This would provide alternative views, and would also need to be peer reviewed.

Basically, I am proposing that we handle the issue of “public education” by creating  inexpensive and ubiquitous summaries of the health implications of genetic associations, peer review them, disseminate them widely and freely, and allow a market to emerge that helps individuals interpret their genome.   One could argue that scientists have a duty to create such summaries all the time, but often fail.  I think for genetic research it is mandatory that we scientists take an active role in intepreting the significance of our findings for the general public.


  1. As a bioinformatician, I am also dreaming of semantic abstracts (http://tinyurl.com/5f9p9k). The RDFa specification could be of help here but I don’t think that the NCBI is ready to make some changes. I’m also working with association studies and here I would dream of an RDF/XML format to share the informations about the individuals, the markers and the genotypes instead of those ugly ‘linkage’ files filled with a bunch of ‘0’, ‘1’ and ‘2’….

  2. It might also make sense to call for a “commons” for such data (in the sense that Lawrence Lessig puts it). Right now, each company such as 23andme, navigenics, smart genetics and the 10s of wannabes of genetic testing are “curating” the same association and interpreting it in widely different ways. (e.g. the Alzheimer’s mirror test by smart genetic just had to be pulled from the market because of unreliable association data).

    It is a shame that no information models exist for communicating such associations and everyone has their own ad-hoc interpretation mechanisms.

    Having a “commons” where the authors write the interpretation in a syndicatable format solves both these issues. Then everyone is free to dip into the commons and offer their analysis or interpretive services on it. Those that don’t need help interpreting don’t need to pay for it.

  3. Bravo Doctor Altman: I am working with a community hospital to develop a clearinghouse of genetic tests considered “ready for prime time” for our medical staff and patients to access, with assistance from genetic counselors. Everyone overlooks hospitals as a source for disseminating such information, yet every hospital in the country “aggregates” hundreds of physicians and serves as a source of health information for their communites. Perhaps your idea would be more readily adopted if one of the big hospital associations would make the dissemination of such summaries available to its members. We would surely subscribe!

  4. I would take a completely different view point — sequence information so far has been relatively clinically useless and to some extent disappointing (look at odds ratios published, they are tiny). Maybe the magic of the genome is around the corner — but for now this sounds like an expensive way to provide jobs for bioinformaticians.

    The information is becoming a commodity, but so is whole body CT scan — and we don’t hear many proponents for that these days.

  5. An author’s lay-oriented summary would be a step forward (since it’s almost never done now) and could be very helpful. But it would only refer to the current state of affairs at that point in time and would shortly become obsolete and/or misleading given how quickly things change. To be more useful over time these summaries would either have to be periodically updated in light of new data, or be somehow dynamic and/or self-maintaining (something for which I can’t quite envision a practical mechanism, semantic/syndicated or otherwise).

    That’s kind of the problem now with using any genetic-association collection (SNPedia, NIH’s geneticassociationdb, etc.) – they’re only as useful as their most recent data and how well it’s integrated with the larger body of genetic-association evidence. Separating the good/accurate from the bad/inaccurate information may not be easy, nor is understanding the implications of that information.

    Even current, accurate genetic-association data may have limited usefulness given how little predictive value they usually seem to have (as jor’s comment alludes to), with few exceptions where effects are strong and straightforward, at least until lots more significantly-associated polymorphisms are discovered for a given disease or trait. Even then their effects would probably have to be considered in aggregate / cumulatively rather than individually. Interesting that NIGMS just funded InSilicos to work on software for roughly this sort of thing using a statistical technique to “produce stable predictions from data with a large number of dimensions … as a way of using these signals to predict disease.”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s