Archive for the ‘Biology’ Category

Mythbusting Personalized Genomics

Sunday, October 11th, 2009

It’s the year 2009, and I’m wondering: where is my flying car? After all, Hollywood reels from the 60’s and 70’s all predicted that flying cars are what I’d be using to get around town these days. Of course, automotive technology isn’t the only victim of Hollywood hype. The potential impact of personalized genomics has been greatly overstated in movies like GATTACA. This has lead to the pervasive myth that your genome is like a crystal ball, and somehow your fate is predestined by your genetic programming. Recently, my perlfriend co-authored a paper in Nature (“A Personalized Medicine Research Agenda”, Nature Vol 461, October 8 2009), comparing Navigenics’ and 23andMe’s “Direct to Consumer” (DTC) personal genomics offerings. She’s qualified to offer deep insight into personal genomics, since she designed the original Illumina bead chip used by leading companies to generate their DTC genetic data, and she is also the person who made sense of the first complete diploid human genome sequence (1 2). She’s sort of the biology equivalent of the reverse engineer who takes binary sequences and annotates meaning into the disassembled binary sequences. So, let the mythbusting begin.

Myth: having your genome read is like hex-dumping the ROM of your computer. Many people (I was one of them) have the impression that “reading your genome” means that at the end of the day someone has a record of all the base pairs of DNA in my genome. This is called a “full sequence”. In reality, full sequencing is still cost-prohibitive, and instead a technique called “genotyping” is used. Here, a selective diff is done between your genome and a “reference” human genome, or in other words, your genome is simply sampled in potentially interesting spots for single-point mutations called Single Nucleotide Polymorphisms (SNPs, pronounced “snips”). In the end, about 1 in 3000 base pairs are actually sampled in this process. Thus, the result of a personalized genomic screen is not your entire sequence, but a subset of potentially interesting mutations compared against a reference genome. This naturally leads to two questions: first, how do you choose the “interesting subset” of SNPs to sample? And second, how do we know the reference genome is an accurate comparison point? This sets us up to bust another two myths.

Myth: We know which mutations predict disease. Herein lies a subtle point. Many of the mutations are simply correlative with disease, but not proven to be predictive or causal with disease. The truth is that we really don’t understand why many genetic diseases happen. For poorly understood diseases (which is still most of them), all we can say is that people who have a particular disease tend to have a certain pattern of SNP mutations. It’s important not to confuse causality with correlation. Doing so might lead you to conclude, for example, that diet coke makes you fat, because diet coke is often consumed by people who are overweight.

Thus, there are two echelons of understanding that can come from a genotype: disease correlations, and disease causes. The majority of SNP mutation-based “predictions” are correlative, not causative. As a result, a genotype should not be considered a “crystal ball” for predicting your disease future; rather, it is closer to a “Rorschach blot” that we have to squint and stare at for a while before we can make a statement about what it means. The table below from the paper illustrates how varied disease predictions can be as a result of these disagreements on the interpretation of mutation meanings.

Myth: the “reference genome” is accurate reference. The term “reference genome” alone should tip you off on a problem: it implies there is such a thing as “reference people”. Ultimately, just a handful of individuals were sequenced to create today’s reference genome, and most of them are of European ancestry. As time goes on and more full sequence genetic data is collected, the reference genome wlll be merged and massaged to present a more accurate picture of the overall human race, but for now it’s important to remember that a genotype study is a diff against a source repository of questionable universal validity, partially because it’s questionable if there is such a thing as a “reference human”, i.e. there are structural variations and some SNPs have different frequencies in different populations (e.g. the base “A” could dominate in a European population, but at that same position, the base “G” could dominate in an African population). It’s also important to keep in mind that the “reference genome” has an aggregate error rate of about 1 error every 10,000 base pairs, although to be fair the process of discovering a disease variant usually cleans up any errors in the reference genome for the relevant sequence regions.

So now you can see that in fact “reading your genome” is less of looking into a crystal ball and more of staring at a Rorschach blot obscured by cheesecloth (i.e., the genome is simply sampled and not sequenced). And, even if we could remove the cheesecloth and sequence the genome such that we knew every base pair, it would still be … a Rorschach blot, but in high resolution. It will be decades until we have a full understanding of what all the sequences mean, and even then it’s unclear if they are truly predictive.

Here lies perhaps the most important message, and a point I cannot make fine enough: in most situations, environment has as much, perhaps even more, to do with whom you are, what you become, and what diseases you may develop than your genes. If there is any upside to personal genomics, it won’t be due to crystal ball predictions. It will be the lifestyle changes it can encourage. If there’s one thing I’ve learned from dating a preeminent bioinformaticist, it’s that no matter your genetic makeup, most common diseases can be prevented with proper diet and exercise.

The Immune System of Red Algae vs. Ebola

Wednesday, October 7th, 2009

Saw an article that I found particularly interesting in my perusal of Science this week…”Sugary Achilles’ Heel Raises Hope For Broad-Acting Antiviral Drugs” by Robert F. Service (Science 4 September 2009 325: 1200 [DOI: 10.1126/science.325_1200a]).

I had always wondered how the immune systems of plants and simple creatures worked. Sure, our immune system is adaptive and has all those macrophages and T-cells and B-cells, but would a plant or a lobster do the same thing? Turns out they don’t. I can’t say I really understand how these work, but from what I can tell many organisms have passive immune systems that simply emit toxins and broad-spectrum antiviral compounds to protect themselves when they come under attack by bugs. Innate social behavior is also part of the immune response of certain simple animals; sick ones may instinctively isolate themselves from the group, for example, to prevent the further spread of disease.

Apparently, researchers at the US National Cancer Institute have discovered that red algae emit a compound called griffithsin (GRFT). GRFT targets mannose sugars commonly attached to viral protein particles that are also not commonly attached to human proteins. The article dives a little bit into the mechanism for the specificity, but what I found most interesting were the results of early studies:

For nearly all HIV strains, it takes less than 0.23 billionths of a mole, or nanomoles, of GRFT to inhibit half the viruses in vitro—a standard measure of drug effectiveness known as the compound’s IC50, in which the lower the number the more potent the compound. For SARS, GRFT’s IC50 is about 50 nanomoles, and for Ebola it is 380 nanomoles. That makes all three mannose binders some of the most powerful antivirals around.

In mice infected with SARS, 70% of the animals that received no antivirals died. By contrast, among those that received an intranasal dose of 5 milligrams per kilogram per day of GRFT for 4 days, 100% lived. With mice exposed to Ebola, one of nature’s most lethal viruses, all of the 10 control animals that didn’t receive GRFT died within 12 days. In the five groups of 10 animals that each received different injected doses of GRFT, up to 90% survived. Even when they were injected with the antiviral 2 days after being exposed to Ebola, 30% still lived.

That’s pretty remarkable. It’s also effective against H1N1. Of course, it’s only a matter of time before viruses manage to adapt, but until then this could cure a lot of very sick people. Hurray for red algae!

On Influenza A (H1N1)

Sunday, June 21st, 2009

I read a fantastic article in Nature magazine (vol 459, pp931-939 (18 June 2009)) that summarizes not only the current state of novel H1N1 (aka Swine Flu) understanding, but also a compares H1N1 against other flu strains. In particular, it discusses in-depth how the pathogenic components — i.e., the stuff that kills you — compare against each other.

The Influenza virus is quite fascinating. Allow me to ramble on…

Comparison to Computer Viruses

How many bits does it take to kill a human?

The H1N1 virus has been comprehensively disassembled (sequenced) and logged into the NCBI Influenza Virus Resource database. For example, an instance of influenza known as A/Italy/49/2009(H1N1) isolated from the nose of a 26-year old female homo sapiens returning from the USA to Italy (I love the specificity of these database records), has its entire sequence posted at the NCBI website. It’s amazing — here’s the first 120 bits of the sequence.

atgaaggcaa tactagtagt tctgctatat acatttgcaa ccgcaaatgc agacacatta

Remember, each symbol represents 2 bits of information. This is alternatively represented as an amino acid sequence, through a translation lookup table, of the following peptides:

MKAILVVLLYTFATANADTL

In this case, each symbol represents an amino acid which is the equivalent of 6 bits (3 DNA-equivalent codons per amino acid). M is methionine, K is Lysine, A is Alanine, etc. (you can find the translation table here).

For those not familiar with molecular biology, DNA is information-equivalent to RNA on a 1 to 1 mapping; DNA is like a program stored on disk, and RNA is like a program loaded into RAM. Upon loading DNA, a transcription occurs where “T” bases are replaced with “U” bases. Remember, each base pair specifies one of four possible symbols (A [T/U] G C), so a single base pair corresponds to 2 bits of information.

Proteins are the output of running an RNA program. Proteins are synthesized according to the instructions in RNA on a 3 to 1 mapping. You can think of proteins a bit like pixels in a frame buffer. A complete protein is like an image on the screen; each amino acid on a protein is like a pixel; each pixel has a depth of 6 bits (3 to 1 mapping of a medium that stores 2 bits per base pair); and each pixel has to go through a color palette (the codon translation table) to transform the raw data into a final rendered color. Unlike a computer frame buffer, different biological proteins vary in amino acid count (pixel count).

To ground this in a specific example, six bits stored as “ATG” on your hard drive (DNA) is loaded into RAM (RNA) as “AUG” (remember the T->U transcription). When the RNA program in RAM is executed, “AUG” is translated to a pixel (amino acid) of color “M”, or methionine (which is incidentally the biological “start” codon, the first instruction in every valid RNA program). As a short-hand, since DNA and RNA are 1:1 equivalent, bioinformaticists represent gene sequences in DNA format, even if the biological mechanism is in RNA format (as is the case for Influenza–more on the significance of that later!).

OK, back to the main point of this post. The particular RNA subroutine mentioned above codes for the HA gene which produces the Hemagglutinin protein: in particular, an H1 variety. This is the “H1” in the H1N1 designation.

If you thought of organisms as computers with IP addresses, each functional group of cells in the organism would be listening to the environment through its own active port. So, as port 25 maps specifically to SMTP services on a computer, port H1 maps specifically to the windpipe region on a human. Interestingly, the same port H1 maps to the intestinal tract on a bird. Thus, the same H1N1 virus will attack the respiratory system of a human, and the gut of a bird. In contrast, H5 — the variety found in H5N1, or the deadly “avian flu” — specifies the port for your inner lungs. As a result, H5N1 is much more deadly because it attacks your inner lung tissue, causing severe pneumonia. H1N1 is not as deadly because it is attacking a much more benign port that just causes you to blow your nose a lot and cough up loogies, instead of ceasing to breathe.

Researchers are still discovering more about the H5 port; the Nature article indicates that perhaps certain human mutants have lungs that do not listen on the H5 port. So, those of us with the mutation that causes lungs to ignore the H5 port would have a better chance of surviving an Avian flu infection, whereas as those of us that open port H5 on the lungs have no chance to survive make your time / all your base pairs are belong to H5N1.

So how many bits are in this instance of H1N1? The raw number of bits, by my count, is 26,022; the actual number of coding bits approximately 25,054 — I say approximately because the virus does the equivalent of self-modifying code to create two proteins out of a single gene in some places (pretty interesting stuff actually), so it’s hard to say what counts as code and what counts as incidental non-executing NOP sleds that are required for self-modifying code.

So it takes about 25 kilobits — 3.2 kbytes — of data to code for a virus that has a non-trivial chance of killing a human. This is more efficient than a computer virus, such as MyDoom, which rings in at around 22 kbytes.

It’s humbling that I could be killed by 3.2kbytes of genetic data. Then again, with 850 Mbytes of data in my genome, there’s bound to be an exploit or two.

Hacking Swine Flu

One interesting consequence of reading this Nature article, and having access to the virus sequence, is that I now know how to modify the virus sequence to probably make it more deadly.

Here’s how:

The Nature article notes, for example, that variants of the PB2 Influenza gene with Glutamic acid at position 627 in the sequence has a low pathogenicity (not very deadly). However, PB2 variants with Lysine at the same position is more deadly. Well, let’s see the sequence of PB2 for H1N1. Going back to our NCBI database:

601 QQMRDVLGTFDTVQIIKLLP
621 FAAAPPEQSRMQFSSLTVNV
641 RGSGLRILVRGNSPVFNYNK

As you can see from the above annotation, position 627 has “E” in it, which is the code for Glutamic acid. Thankfully, it’s the less-deadly version; perhaps this is why not as many people have died from contracting H1N1 as the press releases might have scared you into thinking. Let’s reverse this back to the DNA code:

621  F  A  A  A   P  P  E   Q  S  R  
1861 tttgctgctg ctccaccaga acagagtagg

As you can see, we have “GAA” coding for “E” (Glutamic acid). To modify this genome to be more deadly, we simply need to replace “GAA” with one of the codes for Lysine (“K”), which is either of “AAA” or “AAG”. Thus, the more deadly variant of H1N1 would have its coding sequence read like this:

621  F  A  A  A   P  P  K   Q  S  R  
1861 tttgctgctg ctccaccaaa acagagtagg
                        ^ changed

There. A single base-pair change, flipping two bits, is perhaps all you need to turn the current less-deadly H1N1 swine flu virus into a more deadly variant.

Theoretically, I could apply a long series of well-known biological procedures to synthesize this and actually implement this deadly variant; as a first step, I can go to any number of DNA synthesis websites (such as the cutely-named “Mr. Gene”) and order the modified sequence to get my deadly little project going for a little over $1,000. Note that Mr. Gene implements a screening procedure against DNA sequences that could be used to implement biohazardous products. I don’t know if they specifically screen against HA variants such as this modified H1 gene. Even if they do, there are well-known protocols for site-directed mutagenesis that can possibly be used to modify a single base of RNA from material extracted from normal H1N1.

[Just noticed this citation from the Nature article: Neumann, G. et al Generation of influenza A viruses entirely from cloned cDNA. Proc. Natl Acad. Sci. USA 96, 9345-9350 (1999). This paper tells you how to DIY an Influenza A. Good read.].

Adaptable Influenza

OK, before we get our hackles up about this little hack, let’s give Influenza some credit: after all, it packs a deadly punch in 3.2kbytes and despite our best efforts we can’t eradicate it. Could Influenza figure this out on its own?

The short answer is yes.

In fact, the Influenza virus is evolved to allow for these adaptations. Normally, when DNA is copied, an error-checking protein runs over the copied genome to verify that no mistakes were made. This keeps the error rate quite low. But remember, Influenza uses an RNA architecture. It therefore needs a different mechanism from DNA for copying.

It turns out that Influenza packs inside its virus capsule a protein complex (RNA-dependent RNA polymerase) that is customized for its style of RNA copying. Significantly, it omits the error checking protein. The result is that there is about one error made in copying every 10,000 base pairs. How long is the Influenza genome? About 13,000 base pairs. Thus, on average, every copy of an Influenza virus has one random mutation in it.

Some of these mutations make no difference; others render the virus harmless; and quite possibly, some render the virus much more dangerous. Since viruses are replicated and distributed in astronomical quantities, the chance that this little hack could end up occurring naturally is in fact quite high. This is part of the reason, I think, why the health officials are so worried about H1N1: we have no resistance to it, and even though it’s not quite so deadly today, it’s probably just a couple mutations away from being a much bigger health problem.

In fact, if anything, perhaps I should be trying to catch the strain of H1N1 going around today because its pathogenicity is currently in-line with normal flu variants — as of this article’s writing, the CDC has recorded 87 deaths out of 21,449 confirmed cases, or a 0.4% mortality rate (to contrast, “normal” flu is <0.1%, while the dreaded Spanish flu of 1918 was around 2.5%; H5N1, or avian flu, is over 50%(!), but thankfully it has trouble spreading between humans). By getting H1N1 today, I would get the added bonus of developing a natural immunity to H1N1, so after it mutates and comes back again I stand a better chance of fighting it. What doesn’t kill you makes you stronger!…or on second thought maybe I’ll just wait until they develop a vaccine for it.

There is one other important subtlety to the RNA architecture of the influenza virus, aside from the well-adjusted mutation rate that it guarantees. The subtlety is that the genetic information is stored inside the virus as 8 separate snippets of RNA, instead of as a single unbroken strand (as it is in many other viruses and in living cells). Why is this important?

Consider what happens when a host is infected by two types of Influenza at the same time. If the genes were stored as a single piece of DNA, there would be little opportunity for the genes between the two types to shuffle. However, because Influenza stores its genes as 8 separate snippets, the snippets mix freely inside the infected cell, and are randomly shuffled into virus packets as they emerge. Thus, if you are unlucky enough to get two types of flus at once, the result is a potentially novel strain of flu, as RNA strands are copied, mixed and picked out of the metaphorical hat and then packed into virus particles. This process is elegant in that the same mechanism allows for mixing of an arbitrary number of strains in a single host: if you can infect a cell with three or four types of influenza at once, the result is an even wilder variation of flu particles.

This is part of the reason why the novel H1N1 is called a “triple-reassortant” virus: through either a series of dual-infections, or perhaps a single calamitous infection of multiple flu varieties, the novel H1N1 acquired a mix of RNA snippets that has bestowed upon it high transmission rates along with no innate human immunity to the virus, i.e., the perfect storm for a pandemic.

I haven’t been tracking the latest efforts on the part of computer virus writers, but if there was a computer analogy to this RNA-shuffling model, it would be a virus that distributes itself in the form of unlinked object code files plus a small helper program that, upon infection in a host, would first re-link its files in a random order before copying and redistributing itself. In addition to doing this, it would search for similar viruses that may already be infecting that computer, and it would on occasion link in object code with matching function templates from the other viruses. This re-arrangement and novel re-linking of the code itself would work to foil certain classes of anti-virus software that searches for virus signatures based on fixed code patterns. It would also cause a proliferation of a diverse set of viruses in the wild, with less predictable properties.

Thus, the Influenza virus is remarkable in its method for achieving a multi-level adaptation mechanism, consisting of both a slowly evolving point mutation mechanism, as well as a mechanism for drastically altering the virus’ properties in a single generation through gene-level mixing with other viruses (it’s not quite like sex but probably just as good, if not better). It’s also remarkable that these two important properties of the virus arise as a consequence of using RNA instead of DNA as the genetic storage medium.

Well, that’s it for me tonight — and if you made it this far through the post, I appreciate your attention; I do tend to ramble in my “Ponderings” posts. There’s actually a lot more fascinating stuff about Influenza A inside the aforementioned Nature article. If you want to know more, I highly recommend the read.

Bacteria Living on Antibiotics

Sunday, April 13th, 2008

I like dabbling in bio, so I keep abreast of recent developments by reading Nature and Science. One article in particular caught my eye the other day–George Church’s “Bacteria Subsisting on Antibiotics” in Science (April 4, 2008, Vol 320, p. 100).

The common wisdom is that “superbugs” — antibiotic resistant bacteria — are being bread inside humans who don’t finish their full course of antibiotics. The theory is that when you don’t finish your full course of antibiotics, you only weaken them, killing off the ones most susceptible to antibiotics: the remaining few were the ones most resistant to antibiotics. If these remaining bacteria cause you to relapse, the new infection will have a greater resistance to antibiotics. Repeat this process a few times, and you are the culture dish for evolving antibiotic resistant bacteria. Clearly, the solution to this problem is to just make sure we all take our antibiotics to the end of its course. Or is it?

The interesting part about Church’s report is that the bacteria commonly found all around us in the soil has a high chance of being resistant to every known antibiotic; and not only do they resist them, they can use these antibiotics as a food source! They are “ultimate superbugs”. The obvious question is, why haven’t these just taken over and killed every human? [Note: the rest is all my speculation, and not part of Church’s report…] The answer probably lies along several reasons. Typically, soil-based bacteria doesn’t grow well in human hosts; however, it was noted in the article that several strains of resistant bacteria are close relatives to human pathogens. So maybe that’s not the reason. My thought is that antibiotic resistance requires the bacteria to spend extra energy and resources, so when left in a nutrient-rich environment — like the mucous lining of your sinus — they are out-reproduced by the more nimble, but less robust human pathogens. Since bacterial reproduction happens on an exponential curve, even tiny extra metabolic costs add up to a huge disadvantage in the end. Anyone who has financed a mortgage is aware of how a change of a few fractions of a percentage compound interest per year can add up to a lot over many years!

So, I guess that’s good — the superbugs aren’t winning yet. However, the remaining threat is that bacteria are very promiscuous. They will acquire or exchange DNA under a large number of conditions, including changes in heat, pH, and electric current, as well as viral vectors. My thought is that human pathogens could “acquire” genomes from their resistant soil-based kin when they mix together, and that the slow-growing but long-lived soil based bacteria are acting like a genome archive where useful but expensive bacterial genes are stored. The problem with this theory, of course, is that when the human pathogen acquires the resistance genes, they reproduce slower than those that don’t, so they eventually go extinct, probably before they can infect a human host.

But there’s one other factor that’s missing. A lot of antibiotics used on humans and animals are excreted through urine, feces, and sweat. These antibiotics are concentrated in sewage and released into the environment — into the soil. The presence of these antibiotics, even in small quantities, combined with the genetic archive stored in soil bacteria, could be enough to bias natural selection to favor the bacteria that have acquired the antibiotic resistance genes, thus providing a natural environmental reservoir for the breeding and storage of superbugs.

Think about it: the mere prescription of an antibiotic may ultimately lead to environmental bacteria acquiring a resistance to them, and no amount of care or attention on the part of you and me in finishing our antibiotic courses can prevent this.

That being said, it’s all just speculation on the part of someone who’s really an electronics hacker and not a biologist, so I wouldn’t go sounding any alarms. But it is interesting to think about the role of environmental DNA and the evolution of species; it may be one of those rule-changing disruptive concepts. I’ve been reading about how sea water contains lots of DNA that codes for all kinds of interesting genes, and how our DNA contains lots of “junk” DNA introduced by viruses, etc. Maybe there is more to evolution and genetics than just simple random mutation and how genes are selected from a pool defined by only those found in the parents. With the incorporation of environmental DNA, totally random, unexpected whole genes can be introduced by the environmental library, absent of any parent. Furthermore, genes that fall out of favor (become “extinct”) due to external changes can be archived in this environmental library and brought back into service at a later time, so evolution, at least for simple organisms like bacteria, might not be a simple linear progression.

Also, in the same issue of Science, there is a snazzy article titled “Single-Molecule DNA Sequencing of a Viral Genome”. Really, really clever stuff going on in there that probably has application beyond just DNA sequencing; if you have an interest in nanotechnology or single-molecule data storage/manipulation/retrieval it’s worth the read.

FOO Camp 07 and RNA Folding

Monday, July 2nd, 2007

I was at FOO camp last weekend and it was a blast. As usual, Tim brought together quite an interesting crowd of people. It was a pleasant surprise to see old friends from MIT, some whom I hadn’t seen in years. To date, four FOO alumni worked on the same robotics team (ORCA) building autonomous submarines at MIT back when we were all students there, and at least three students/ra’s of my MIT graduate advisor, Tom Knight, have also attended FOO. Of course, I got to meet some interesting new people, including a group of folks who have expertise and great interest in manufacturing in China (we had a little round table discussion about doing business in China and China’s economic role in the world). I also gave a little presentation about how chumbys are made in China, something which I will blog about in the next couple of days through a set of posts forthcoming (I have a lot of material to go through so it’s taking me a while to organize them and write them).

One FOO attendee who I was most fortunate to stumble upon was Christine Smolke. She gave a fascinating talk about the capabilities of RNA that really opened my mind. As many may be aware, the “central dogma” of biology is being rewritten, and RNA is starting to take a more active role in everything from heredity of genetic traits to catalysis of reactions. Recent findings have caused some hypotheses to be revisited, such as the “RNA world” hypothesis, which indicate that life may actually have started through self-replicating strands of RNA, instead of DNA.

The most interesting connection I made listening to her talk was with my experience looking at the protein folding problem. In a nutshell, protein folding is one of the “grand challenges” of computer science today, and the basic mission is to predict the 3-D structure of a protein given its amino acid sequence–in my opinion, one important part of the “uber-tool” for nanotechnology engineers that would create a catalyst for an arbitrary substrate (another application for protein folding is also to elucidate the structure of proteins that cannot be crystallized and are thus unsuitable for X-ray diffraction analysis).

Protein folding is hard. I mean, really hard. It’s one of the few computational problems that truly scare me. There are whole supercomputer projects devoted to the subject, from DE Shaw’s ambitious project to IBM’s Blue Gene series of machines, to Stanford’s Folding at Home distributed computing project. My facts are a couple years out of date but iirc, a typical goal for such a big project would be to fold one “small-ish” protein of about 50 to 100 amino acids in about a month–a reaction that happens in a cell on a timescale on the order of milliseconds. And, the problem doesn’t scale particularly well. The reasons why protein folding is hard are numerous, and most of them have to do with the enormous dynamic range of timescales required for the simulation, the very sensitive interactions that the numerous hydrophilic and hydrophobic amino acids have with the surrounding water, and the sheer number of particles involved. The simplifying assumptions made in even the most sophisticated simulations today are crude compared to the actual conditions in the cell. The way a protein folds depends upon the rate of sequence output, the temperature, pH conditions, presence of helper molecules, coordinating ions, and even post-folding sequence modifications–all things that challenge current computational models.

To illustrate the point, even the iconic double-helix of DNA is a direct result of its interaction with its surroundings. The double helix arises from the fact that the base pairs are “greasy” (hydrophobic) and they repel water, so they stick together…thus, a structure that might otherwise look like a straight ladder collapses in on itself to minimize the distance between the rungs, squeezing out the water, and in the process twisting the backbone into a double helix; the process also requires coordinating ions from the water to neutralize the concentration of charges brought on by the collapse into the double-helix. Before I learned about this I just took the twisting of DNA for granted…shows how little I know about the real mechanics of biochemistry, but boy, is it fascinating.

Christine’s talk on RNA got me thinking…RNA is nice, as it can function single-stranded, and is very pliable. It only has four base pairs, instead of the twenty basic amino acids found in proteins. The secondary structure of an RNA molecule is also predictable. And, RNA can be active on a variety of substrates. Granted, RNA may not be as effective, efficient, or as versatile as the more complex protein counterparts, but I can’t help but wonder if maybe a good baby-step would be to first try to solve the RNA folding problem. It’s only a hunch right now but it feels like RNA might be an easier beast to tame than proteins. And as a molecular tinkerer, I’d rather have a tool that creates less than optimal results but is available sooner, can iterate faster, and is more affordable, instead of a tool that gives ultimate results but also comes at enormous cost and effort. There are a lot of simple molecular problems that need solutions today, and perhaps from these learnings we can eventually develop smarter tools for the more complex problems.

Ah, if only I had the time and the money…too many interesting things to do! I wonder if I had become a professor instead of a professional, if I would have had the priviledge to investigate such interesting diversions, or if I would simply be consumed by the tenure clock…