Evolution is the ever-present change in traits of living creatures over generations, as they adapt to changing environments. According to this theory, as species adapt, different populations adapt to different local environments, and this leads to the gradual emergence of different species. This process is known as
speciation. Walking back in time, almost all modern species therefore should be able to derive their ancestry from a single primordial ancestor. Traditionally, this job is done by
paleontologists, who study history of life on Earth based on fossil record, and by
taxonomists, who maintain and upgrade this hierarchy of extant and extinct species. This hierarchy is also known as the
Tree Of Life.
Tree Of Life (Courtesy: TreeOfLife)However, a more modern approach is to find the similarities between the genetic codes of several modern species, and then try to estimate their position on the tree. Since the genetic code runs into millions of
base-pairs (the basic unit of DNA), this pattern matching is not trivial. Add in the possibility that there might be random mutations, duplicates, or inverted sequences, and the matching problem becomes a nightmare, which requires extensive computing power:(.
To that end, a new supercomputing cluster designed for the phylogenetic research community has been
installed at the
San Diego Supercomputer Center. The cluster has 128 Opteron processors each with 4 GB memory, and is supported with a grant from the
National Science Foundation in support of the
CyberInfrastructure for Phylogenetic Research project, a collaboration of biologists, computer scientists, statisticians and mathematicians at 19 institutions whose goal is to understand the evolutionary relationships between all living organisms.
According to the project leader Mark Miller, the goal is to reconstruct the tree of life for 100,000 species or more. In addition to finding the exact nature of the relationships between the species of the world, the project would also develop new algorithms and database approaches, that will have benefits to research related to data mining, protein decoding, and drug manufacturing:).