To help understand the process followed by Yuan and the other two teams, think of the genome as a copy of James Joyce's lengthy novel Ulysses. Each chromosome would be a chapter, each gene a sentence.
The draft version of the genome's DNA sequences that was assembled by scientists at the Human Genome Project would then resemble a copy of Ulysses that lacked all punctuation and spacing. Each of this book's chapters would consist of one long string of letters.
To identify the sentences in that long continuous string, scientists would turn to databases-assembled by other researchers-of complete or partial sentences. The scientists would then use computers to match the fragments from the databases to the string of letters in each and every chapter of the novel.
The genome map in Science and, particularly, the map in Nature relied mainly on only two databases to identify genes on their respective genome maps. The Ohio State researchers used these databases plus 11 others.
For example, the Ohio State researchers used a rodent gene database, which provided evidence for 1,437 possible genes in the human genome.
"We used more experimental evidence in assembling our map, and that suggests that there are probably between 65,000 and 75,000 transcriptional units," said Yuan.