To create the database entries, dozens of trained biologists, most at the Institute of Bioinformatics in India, started with the database Online Mendelian Inheritance in Man (OMIM), the offspring of a paper catalog of disease genes started in 1966 by Victor A. McKusick, M.D., University Professor of Medical Genetics at Hopkins.
Focusing on these genes' proteins, the scientists critically reviewed hundreds of thousands of scientific papers, making connections between papers and resolving inconsistencies -- something automated computer programs cannot do, says Pandey. They also pulled information from smaller, existing databases to complete each protein's entry.
"We believe that manual curation -- lots of scientists poring through the literature -- is the key to building a more accurate and more complete database," says Pandey, who serves as chief scientific adviser to the Institute of Bioinformatics. "Eventually, we hope the database will be managed by the larger community of scientists, because it will be most useful if those who know these proteins best take responsibility for keeping entries up to date and accurate."
The database currently contains everything that's known about proteins involved in diseases, such as so-called breast cancer genes BRCA1 and BRCA2, and proteins in key pathways, such as families of enzymes that modify other proteins. It includes only experimentally proven or widely accepted facts about the proteins, without mixing in computer-generated predictions the way some other databases do, says Pandey.
The online database is also easy to use, in large part because those who designed it are experts in both computer science and biology, he adds. A biologist looking for information about BRCA1, for example, can search by any of its names and get a single entry that contains everything -- its alterna
Contact: Joanna Downer
Johns Hopkins Medical Institutions