Now that the age of the genome is upon us, scientists must find a way to spin mountains of DNA code into biological gold. To do it, they are building their own Rumpelstiltskins: powerful computer programs that automatically scrutinize the code and decipher its genetic elements. The April issue of Genome Research reports a new enterprise to test the state of the art in computer "genome annotation." Organized by a team from University of California, Berkeley, 12 international groups compared the power of their computer programs to predict gene elements within a 3 million base pair stretch of Drosophila DNA.
The groups compared the results of their programs against each other and against the results of an exhaustive experimental and computational effort to locate all the genes in this region (not available to the participants during the test).
When the results were in, many programs had detected the genes in the region with 95% accuracy compared to the experimental effort. Furthermore, the programs made predictions of genes that had not been found in that effort, which researchers are now investigating. However, the programs were less accurate in defining the exact boundaries of the genes within the code, and groups that attempted to find elements controlling gene activity (e.g., promoters) made a large number of false predictions. This project showed that the state of the art has some way to go -- but provided standards by which to make future improvements.
As published in Genome Research, the project consists of an overview article, eight independent reports from the participants, as well as two commentaries on the current state of computerized genome annotation.