Monday, April 18, 2011

DNA sequencing

This project is a bit of a departure from my previous work in that, *gasp*, it might actually be somewhat useful. What’s DNA, you ask? The sourcecode of life =)

Now, let’s say we happen to wander across  a DNA sequence one day. “Hey there, Mr. ACTTTCGACA! What are you used for?” A DNA sequence by itself isn’t very useful, but what if we could find similar sequences in other animals (or even other places in our own genome!)? Maybe we could get clues to its function. Or maybe even figure out which DNA sequences might have evolved from other  sequences (Go Darwin!). This sounds like a job for . . . COMPUTATIONAL GENOMICS!!!

For the last decade or so, scientists have been sequencing the DNA of everything they can get their hands on. Proteins, chromosomes, even the entire human genome! There are huge databases full of A’s, G’s, C’s and T’s out there, just waiting to be searched. The common way of searching is to take our “query” sequence, and figure out where it lines up best with every other sequence in the database. Most of these will be junk, but every once in a while we might find an almost-identical gem of a sequence hidden in somewhere in the database. These matches can give us clues to both its role and possibly its evolutionary history. One problem, though - for the most accurate search methods, these searches can be *slow.* Really slow. For those computer scientists out there, we’re talkin’ O(n^2) slow.