Modeling DNA Evolution

Suppose we gather the entire DNA sequence from organisms of the same species once a year for 1,000 years. If we count the frequency of bases (C, G, A, T) in each of these sequences we will observe that they are not identical. In some sample sequences C might be higher than in others. In other sequences, G might be more prevalent. Evolutionary biologists are interested in studying and, indeed, formally modeling this variation.

Several models have been proposed for DNA evolution. One such model, a Markov model, assumes that the probability of a base being replaced by another base (or staying the same) depends only on what the base is. Replacement, in a Markov model, does not depend on how prevalent other bases are, in the past or in the present. Although the Markov model does not describe the evolution of DNA particularly well, it does do a better job than a model which assumes bases are replaced at random.


This post is part of a series. This is the first post in the series. Learn when new posts appear by subscribing (RSS).