Call for great translational bioinformatics papers 3/08-3/09

Friends,

I have agreed once again to give a talk at the AMIA Summit on Translational Bioinformatics ) on the “year in review” highlighting some of the most important papers in our field of translational bioinformatics.  Last year’s talk is online.

I am writing to request that you nominate papers that you think are important for me to include from the last 12-15 months.  Nominations of your own papers are fine, but much less compelling than nominations of others’ papers (you can collude though…).   Here are the criteria:

  • I define translational bioinformatics broadly, but expect it to involve molecular-level (genes, small molecules) data AND clinical-level data (diseases, symptoms, drugs)
  • I define bioinformatics as the creation and application of novel methods of analysis/discovery

I have been trolling the sites myself, but want to make sure I ask colleagues in bioinformatics for their thoughts.

I would like to highlight papers that have done a good, creative job at the above, and have a chance to become “classics” over the next 10-20 years.   I expect to discuss about ~15 papers in “1-2 slide” detail and then to have another list of 10 “noteworthy but not discussed in detail” papers.

I would like these nominations by 5 PM on 3/11.  Thanks for your help.  If you don’t want to post publically, you can send to me by email at russ.altman-at-stanford.edu.

Bioinformatics & Computational Biology = same? No.

I spent the first 15 years of my professional life unwilling to recognize a difference between bioinformatics and computational biology.  It was not because I didn’t think that there was or could be a difference, but because I thought the difference was not significant.  I have changed my position on this.  I now believe that they are quite different and worth distinguishing.  For me,

Computational biology = the study of biology using computational techniques.  The goal is to learn new biology, knowledge about living sytems.  It is about science.

Bioinformatics = the creation of tools (algorithms, databases) that solve problems.  The goal is to build useful tools that work on biological data.  It is about engineering.

All this became important to me when I finally joined a bioengineering department, and I was forced to ask myself if I was a scientist or an engineer.  I am both, and now am at peace.

When I build a method (usually as software, and with my staff, students, post-docs–I never unfortunately do it myself anymore), I am engaging in an engineering activity:  I design it to have certain performance characteristics, I build it using best engineering practices, I validate that it performs as I intended, and I create it to solve not just a single problem, but a class of similar problems that all should be solvable with the software.   I then write papers about the method, and these are engineering papers.   This is bioinformatics.

When I use my method (or those of others) to answer a biological question, I am doing science.  I am learning new biology.   The criteria for success has little to do with the computational tools that I use, and is all about whether the new biology is true and has been validated appropriately and to the standards of evidence expected among the biological community.   The papers that result report new biological knowledge and are science papers.  This is computational biology.

As I look at my published work I have always tried to balance the publications in biological/medical journals and those in engineering/informatics journals.  It is an aesthetic really, there is no reason why one should feel compelled to do this.  However, it is useful to know when you are doing biology and when you are doing something else.   I suppose someone can argue with the my use of the term “bioinformatics” as an engineering discipline.  That’s fine–I’m open to a different term.  But I would ask why bioinformatics isn’t good.   I think computational biology is more solid–the ‘biology’ is clearly the noun and the ‘computational’ is clearly the adjective.

Bioinformatics and experimentation

I have been involved in a number of discussions recently about the proper role of bioinformatics in biomedical research.  A few themes emerge that bother me, and upon which I feel compelled to comment.  I will first take on this proposition:

“Bioinformaticians should do wet lab experiments.”

I disagree.  Experiments are hard and require as much training and expertise as algorithm design and implementation.  To suggest that they are getting “easy enough” for even bioinformaticians to perform misses the point.   It is a very rare person who will perform at the highest levels of technical virtuosity and innovation in both algorithm development and experiment development.   Why ask or expect a person who is great at computation, to add this credential?  My fear is that they will become a mediocre experimentalist and (as a side effect) their algorithmic work will also suffer, thus creating a “jack of all trades, master of none” disappointment.    Of course, bioinformatics algorithms and databases should often be tested by experiments, and this is why we have collaborations (or even subcontracts), but expecting young bioinformaticians to do it all is risking drowning out their real expertise with distractions.  It is also very critical that bioinformaticians understand the experiments that they analyze in great detail–otherwise risking irrelevant methods that don’t make valid assumptions about the data.   I could go on forever, but my advice to a talented young bioinformatician is  to stay away from pipettes until your career is really secure.  After that, you can do what you want, but recognize whether you are doing something because it is an ego trip or because it is the most effective way to get something done.   Or perhaps you don’t want to share credit, and so want to bring everything in-house under your own control.  That’s a whole ‘nother discussion…

As a footnote, I have recently been convinced (by some students) that attending biological group meetings is probably the best way to fully understand the data and how biologists think about it without having to actually do the experiments.  That will require earning the trust of the biologists to let them sit with you and speak openly about their data and its shortcomings.   But it is probably the single best way to understand a biological experiment.

We are blushing.

I just learned last week that my efforts to blog here have gotten some recognition from Nature Publishing Group (NPG) which has awarded the blog the “Nature Network Science Blogging Challenge 2008 prize.”  I was inspired to do this by my student, Shirley Wu, who also is a winner of the prize.  She has an excellent blog entitled “I was lost but now I live here.”   The prize singled out my entry entitled “One of my first post-genomic moments” and this will appear in a digest of blog entries they are assembling.

What’s the prize?  According to the NPG press release “Russ Altman and Shirley Wu win an invitation to Science Foo Camp 2009 (Sci Foo), the annual invitation-only scientific ‘unconference’ organized by Nature Publishing Group and O’Reilly and hosted by Google at their headquarters in California.  Since Russ and Shirley live nearby, the travel expenses included in their prize will be used to help other deserving individuals, such as attendees from developing countries, to attend Sci Foo ’09.”

I look forward to participating in the meeting.  I want to thank anyone who (?) voted for me.   I will try to continue blogging with fervor.  My colleague in Bioengineering, Steve Quake, will soon start blogging for the New York Times, I understand.   Who needs to write papers?  😉

Stimulus package: fund academic research.

This blog is not really meant to be political, but with all the talk of politics these days, and the terrible financial situation, I would like to make a (admittedly potentially self-serving) plug for having the new administration and anyone who cares about jump-starting the economy fund more academic research.  Why?

  • Research grants create jobs today for scientists, engineers and administrators
  • Research grants also have the side effect of training the next generation of scientists, engineers and administrators and giving them the skills for future innovation and success.  This includes particularly opportunities for undergraduates who might not enter academic research, but will then have an appreciation and knowledge of it as they enter other spheres of endeavor.
  • At some stochastic (random) frequency, research grants will yield discoveries or technologies that can really impact, even change, the world, create companies and create jobs.
  • Scientists are pretty good at squeezing relatively modest amounts of money into the maximum possible impact.   I don’t have data to prove this (I bet somebody does), but I can assure you that the numbers we see in other endeavors make our heads spin.   One reason is that peer-review can be brutal, and when best implemented, really provides very strong quality controls of the type that would reassure the taxpayer.

So research is a great investment:  jobs today and potential discoveries that will form the basis of our economy in the future.   I am not arguing only for NIH-type health-related research (mostly what I do), but also for NSF, Department of Energy, and any other agency that funds basic research.   Of course there needs to be other aspects of a stimulus package (I am not an economist), but a big part of it needs to be research, and I think that these dollars are particularly well spent in terms of the double benefit of jobs now + investment in the future of the US economy.   Towards that end, training grant support to help educate the next generation is also a very cost-effective investment.

This is as close to politics as this blog will get.

My favorite bioinformatics/computational bio meeting = PSB

Of course, I am an organizer, so I am biased.  But the Pacific Symposium on Biocomputing (PSB) is about to start in a couple of days and I am psyched.   The meeting started in 1995 and grew out of the Hawaiian International Conference on Systems Sciences (HICCS).  It turns out that Hawaiian hotels empty out the week after New Year’s Day as tourists go home, and so they are willing to make deals with scientists who want to have conferences.  PSB has switched islands every couple of years, but our most common venue is what is now known as the Fairmont Orchid.   Why is this meeting so good?

1.  We pick the sessions each year based on hot emerging topics in bioinformatics or computational biology.   Each session is organized by folks in that field who get colleagues to submit papers for peer review.  Our accept rate is typically around 33%, and so the quality is high.  More importantly, the work is usually cutting edge and emerging.   Once fields become “main stream” they can go to the other conferences in the field.  No “general” track here.

2.  The venue makes for lots of good opportunities for side conversations at the beach, pool, or other venues.  It is a little isolated and so people can talk about science.

3.  The cost is a little high, but not much higher than other major conferences in the field.  The key is that lots of food is included in the registration fee (we negotiate with the hotel) and the rooms are really comparable in cost to  in New York or D.C.  We try to be very generous with travel awards, particularly to students and post-docs who are first authors on accepted papers.  Our annual surveys of attendees indicate that the costs are not prohibitive and the venue is beloved.

4.  The papers are all peer-reviewed and are indexed in PubMED.   We give a hard-covered volume of proceedings to attendees, published by World Scientific Publishers.   Importantly, WSP allows us to distribute the articles on line for free, and so almost all the articles from the last 15 years are at the PSB online website.

5.  This year, my student Shirley Wu is co-organizing a session on Open Science.  Because of this, there is a FriendFeed room for the conference.  (I’m still trying to figure out what that is, but I’m sure it is very hip.)

6. In addition to a scientific keynote speaker each year, we also have a “Ethics, Legal, Social implications” (ELSI) speaker.  This year it is Drew Endy (Stanford) for the former, and Greg Hampikian (Boise State) for the latter.  You can see summaries of their lives on the PSB website.

Anyway, this is a great meeting, and I hope you will all consider it in the future.

Aloha,

Russ

Biology + chemistry -> bioinformatics + chemoinformatics

(Sorry for the gap in frequency of postings, end of quarter gets busy…)  Last week, I attended a fascinating talk by Brian Shoichet of UCSF.   He talked about much of his work in pharmaceutical science, including great work docking small molecules to proteins.  But the work that excited me the most was his efforts to look at classes of drugs (grouped together based on a common target) and predict potential cross-interactions based on chemical similarities within these classes.  The work is introduced in J Chem Inf Model. 2008 Apr;48(4):755-65. Epub 2008 Mar 13. I think this is fascinating work.   This really raises the more general issue of the meeting of biology and chemistry–clearly not a new phenomenon since “without chemistry, life itself would be impossible” (thanks, Monsanto).  However, the increased interest in using modern informatics techniques to analyze small molecules in the context of biological macromolecules is pretty exciting.    Part of this renaissance stems from the increased availability of small molecule information via PubChem and similar projects.   A critical need is open source tools for small molecule informatics processing, and I believe these are starting to emerge (not fast enough, I have a student looking hard for a basic package upon which we can build).  But this will be very exciting, because it allows us to connect the substantial literatures on manipulating chemical information (chemoinformatics) with that on manipulating biological information (bioinformatics).  I will write more on this, but it is a good topic–and one that is emerging in response to a growing impatience for molecular-level understanding of key biological processes and interactions–in response to the proliferation of associative science (e.g. X is associated with Y, but without a firm mechanistic understanding).  

We need to create a market for genetic-association data

In the past I have argued that we shouldn’t be too paternalistic about genetic information.  I reasoned that genetic information is getting very inexpensive, and will be available as a commodity to many people very soon.  Therefore, interposing  mandatory healthcare worker between genetic information and the consumer just seems cumbersome and unlikely to be effective.  Some have pointed out that there are many people who will need assistance in interpreting their results, and this is a reasonable concern.   I think we could handle it in other ways.

For example, how about if for every genetic association study, the journal editors got together and decided that they would require that authors write a lay-oriented summary of their study, and what the potential health (or other) significance of each allele is, and how it may be combined with other genetic or environmental information.  These summaries could include statistics for risk and other quantitative information–all of which could be part of the peer review of the article.  In this way, there would be a readable summary of each genetic association and its significance created and available at the time of publication.  Most importantly, we would ask that these summaries be written using standard templates so that they all had a similar format (perhaps represented in XML or whatever), and we would ask that the journals relinquish claim to copyright (as they do for abstracts) so that these summaries could be reused freely by others.

What’s the benefit of this?  Well, first of all there would be basic information available with every genetic association study about how to interpret it.  This would be useful for physicians and patients, and could help assist the public in interpreting their commodity genetic data.   Second, by relinquishing claims to copyright, the journals would allow the information to be widely disseminated.  This would allow third party information aggregators to create improved products to help consumers understand their genetic information.  There could be a public “free” version, but companies could compete by providing high quality annotation and others amenities to create a competitive marked for the presentation of genetic data.  Some could even have telephone hotlines with real people at the other end, explaining the genetics.  Consumers would have a choice about how much help they want to understand their genome.  Finally, if these summaries are sufficiently structured and use appropriate controlled terminologies, bioinformatics folks could write pretty cool algorithms to annoate the genome, aggregate related genetic variations, and otherwise create value through the power of bioinformatics.

If scientists take issue with some of these summaries (despite peer review), we can have a process for rebuttal or refutation.  These can also be in standard templates, in the public domain, and linked to the initial summary.  This would provide alternative views, and would also need to be peer reviewed.

Basically, I am proposing that we handle the issue of “public education” by creating  inexpensive and ubiquitous summaries of the health implications of genetic associations, peer review them, disseminate them widely and freely, and allow a market to emerge that helps individuals interpret their genome.   One could argue that scientists have a duty to create such summaries all the time, but often fail.  I think for genetic research it is mandatory that we scientists take an active role in intepreting the significance of our findings for the general public.

The greatest strength and weakness of pharmacogenomics: mechanism

Many people don’t realize that drugs are approved by the FDA not based on a firm understanding of how they work, but based on whether they work and are safe.  Efficacy and safety are both (at least up until now) statistical claims based on the effects on hundreds to thousands of subjects during phase I-III trials.  If you give a particular drug to a large number of people, and if the adverse events are “tolerable” (by some reasonable definition, often in the context of how bad the disease is) and the drug has a positive effect, then the drug will be approved.

This FDA approach has been the right one for many decades–our scientific understanding of why drugs work often lags far behind the demonstrations that people can benefit from them.  We may never have had access to many great medications if we insisted that we know how they work before we approved them.

However pharmacogenomics challenges this paradigm, because fundamental to pharmacogenomics is an understanding AT LEAST of the genes that are involved in either the way the drug works (pharmacodynamics, PD) or how the drug is absorbed, distributed, metabolized and eliminated from the body (pharmacokinetics, PK).   Thus, the “black box” model of how a drug works erodes as you gain pharmacogneomics knowledge.  Is this good or bad?
Well, the good part is that most will agree that it certainly seems beneficial to understand how a drug works (its mechanism), and to know what it interacts with.   We can use this knowledge to reason about potential opportunities for new drugs that interact with similar biological processes, for modifying existing drugs with new chemical properties that may improve their interactions, and to predict which other drugs might have unexpected interactions with our drug because they work in similar biological pathways.   As the noted founder of Faber College said “Knowledge is good.”

On the other hand, we do not have good knowledge of mechanism for many drugs, and that is a major barrier to the rapid application of pharmacogenomics.   Which genetic variations should we assess for relevance to a drug response?  The lack of this information has lead to great interest in genome-wide association studies (GWAS) to broadly search for genes that impact drug response.   The problem, of course, is that these studies are essentially trying to define the mechanism of drug action (both PK and PD, kind of rolled into one) with a somewhat crude tool.   The pitfalls of GWAS for picking up subtlety are pretty well chronicled.  Are there better ways to uncover drug mechanisms so that genetics can be used with surgical precision instead?  I think that this is what we call “good old-fashioned science.”  To be sure, we need to use modern technologies to make the experiments faster, cheaper and more accurate, but I think there needs to be an investment in drug mechanism.  It will make the pharmacogenetic task easier, but will also lead to a more profound understanding of how drugs work and how new drugs may work better.