I often talk about how phenotype (phi) is a function of genotype (g):
phi ~ f(g)
And with genome-wide association studies, we are trying to estimate f. Unfortunately, many of these studies get a pretty poor estimate of f, based on the ability to match the computed phenotype with the observed phenotype. The reasons include insufficient statistics, poor sampling of relevant genotypes (g), and the potentially highly nonlinear form for f which is hard to approximate with simple functional forms. But I think these may be secondary. I think the primary reason is that phenotype is really a function of genotype as well as the environment, e:
phi = f(g, e)
Thus, for example, whatever the genetic predisposition to lung cancer in the context of smoking, it is pretty irrelevant if the person does not smoke!
There is currently a huge mismatch between our ability to measure g (see my recent post that one company thinks they can measure g almost completely for $5000) and our ability to measure e. The environment is a complicated thing to reduce down to a standard digital representation. Genotype has been great because it is discrete, and four-valued. Very digital. Environmental variables are protean and include:
- lifetime exposures to infectious disease
- occupational exposures
- habits (drinking, smoking, illegal drugs)
- medication history
- history of stressful events
- exposure to allergens, pets, different types of food, etc…
- epigenetics (I don’t count this as genotype, we can argue about that)
- many others that I am forgetting
The point is that I am concerned that our ability to measure genotype is improving so much faster than our ability to categorize and measure environment that our ability to estimate f is going to plateau at a level that is less than satisfying. What’s the solution? There needs to be a research agenda for folks to figure out a standard set of biomarkers (preferably mostly independent, but if not we can take care of that with the awesome power of bioinformatics) that measure the current and past environment of the patient. It may not be 500K measures, but I would love to have a bunch. Right now, I think a wise person interested in really getting good estimates of f would probably accept a moratorium on improved genotyping if they could get a covering set of information about the environment, e. Just as we had the Affy 500K genechip, we need the Environment-500K. Wouldn’t it be cool if you could take the blood sample from which you are measuring the genotype, and also measure the envirotype?