We need the Environment-500K

I often talk about how phenotype (phi) is a function of genotype (g):

phi ~ f(g)
And with genome-wide association studies, we are trying to estimate f.  Unfortunately, many of these studies get a pretty poor estimate of f, based on the ability to match the computed phenotype with the observed phenotype.  The reasons include insufficient statistics, poor sampling of relevant genotypes (g), and the potentially highly nonlinear form for f which is hard to approximate with simple functional forms.  But I think these may be secondary.  I think the primary reason is that phenotype is really a function of genotype as well as the environment, e:

phi = f(g, e)

Thus, for example, whatever the genetic predisposition to lung cancer in the context of smoking, it is pretty irrelevant if the person does not smoke!

There is currently a huge mismatch between our ability to measure g (see my recent post that one company thinks they can measure g almost completely for $5000) and our ability to measure e.  The environment is a complicated thing to reduce down to a standard digital representation.  Genotype has been great because it is discrete, and four-valued.  Very digital.   Environmental variables are protean and include:

  • lifetime exposures to infectious disease
  • occupational exposures
  • habits (drinking, smoking, illegal drugs)
  • medication history
  • history of stressful events
  • exposure to allergens, pets, different types of food, etc…
  • epigenetics (I don’t count this as genotype, we can argue about that)
  • many others that I am forgetting

The point is that I am concerned that our ability to measure genotype is improving so much faster than our ability to categorize and measure environment that our ability to estimate f is going to plateau at a level that is less than satisfying.  What’s the solution?   There needs to be a research agenda for folks to figure out a standard set of biomarkers (preferably mostly independent, but if not we can take care of that with the awesome power of bioinformatics) that measure the current and past environment of the patient.  It may not be 500K measures, but I would love to have a bunch.  Right now, I think a wise person interested in really getting good estimates of f would probably accept a moratorium on improved genotyping if they could get a covering set of information about the environment, e.   Just as we had the Affy 500K genechip, we need the Environment-500K.  Wouldn’t it be cool if you could take the blood sample from which you are measuring the genotype, and also measure the envirotype?

One comment

  1. We’ll probably need more than one type of envirochip, to assess long-term and recent exposures. The best all-in-one solution so far is an old-style family physician.

    A famous French physician, when visiting a patient who was supposed to follow a strict diet, took his pulse and said – why did not you follow my recommendations and ate a soft-boiled egg?
    How do you know? Asked the amazed patient. Sulfur and Phosphorus from the egg affected your pulse, answered the doctor.
    When the doctor and his assistant left the patient, the assistant asked: Dear teacher, how did you learn by pulse that the patient ate the egg? You have to be more attentive, said the doctor. The egg was spilled on his shirtfront.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s