So I’ve been admonished in the past for posting ponderings and opinions on my blog–I guess the problem is that my comments are not a-priori peer-reviewed, and it seems a lot like I’m just pontificating to an audience on my personal peeves. I think, however, writing to the blog helps me organize my internal thoughts, and I enjoy the a-postiori commentary to my post, which can be more embarrassing and candid than any private peer-review. Well, either way, if you don’t like reading about my opinions or don’t want to be influenced by them, skip this post and the one after it.
I was reading Nature again (a lot of my pondering posts seem to start there!) and being a hacker as well as an armchair quarterback in molecular biology and genomics, I’m gently amused by the surprise that the genetics community is registering about the results from the Human Genome project. Simply put, there was a prevailing notion that once we had the entire genetic sequence written out, we would crack the code on all sorts of diseases and be able to trace out the function of a cell–and perhaps the human body–from the ground truth of the genetic code.
However, for the past year I have read numerous articles that contain a phrase similar to this: researchers were surprised to find that having the source code told them nothing about how the network was configured. Or better yet, having the source code wasn’t useful because the code is self-modifying. Simply put, the Human Genome project is like having the source code to your OS, but humans are complex networks of cellular machines; many diseases and problems arise from a failure of the network or a failure of the configuration of the OS, which is not apparent from the source code alone.
I guess, to some extent, it’s not surprising that biologists are peeling the onion instead of cutting through it. I remember back in college, I took a couple of molecular biology courses. It was interesting to see the approach of the typical pre-med/biology student toward biology: lots of rote memorization, with no attention at all to system design. It’s like trying to study computer architecture by memorizing the configuration of all the transistors in a standard cell library, without understanding why you’d use one element over another.
My personal experience is that there is a significant amount of architecture in biology. When people found out I had none of the organic chemistry or genetics prerequisites for the molecular biology class, they looked at me like I was crazy. However, I survived the class with relatively little studying, the difference being that I looked at molecular biology from a system standpoint. I tried to look for high-level patterns, and totally skipped the memorizing the basic patterns–because for the tests, we were allowed to bring in an 8.5×11 sheet with notes. I wrote the basic organic chemistry operations on there, as well as the basic formulae and chemical reaction sequences I would need, so I didn’t have to memorize them. The class also focused a lot of its attention on the design of an experiment–how do you analyze a complex system and determine its features given a set of limited techniques? I remember we had a number of difficult questions about using radioactive carbon labeling to try and determine the metabolic path of a molecule. The techniques you use to design these experiments are very similar to those you use when reverse engineering a hardware system.
Epigenomics is a field that I think is very interesting and exciting, and is closing in on the idea of a “biological architect”. Epigenomics is the study of the tertiary and quaternary genetic code, to borrow terms from protein folding (okay, for you real biologists out there, I am really pushing it). It turns out that DNA is indeed self-modifying and carries information beyond the genetic code. For example, your DNA adds methyl (CH3) groups to its backbone, which modifies the rate of protein expression from that segment of DNA. Also, DNA has a very complex 3-D structure. Those Hollywood views we have of DNA being this beautiful, perfect double-helix are eminently misleading. DNA is twisted upon itself, tied in knots, and bound up by histones (protein complexes that act like DNA katamari). Given that chemical machinery is essentially a mechanical computer, the 3-D morphology of a molecule is as much part of the programming as is its composition. So Epigenomics in my view should be the study of all the factors that aren’t coded in the genome–sort of like a study of all the different configurations of an OS and how it affects the race conditions, callbacks and stability of an OS. Stepping beyond that, we have the network context and ultimately the user behavior. A human cell is many orders of magnitude more complex than the internet, and a single cell is a far cry from a human being. We are a long way off from understanding the human genome and what it really means in the context of the human network, which means there will be a lot of interesting and exciting work for years to come.
And so I ponder on this beautiful, mellow Saturday afternoon in San Diego as I procrastinate on my long list of things to do…