Reporting in the Nature Genetics journal (Nature Genetics 36, 664, 01 Jul 2004), the two scientists describe how iHOP, which was developed as part of the EU-funded ORIEL and TEMBLOR projects, converts the 14 million abstracts in the PubMed (National Library of Medicine) bibliographic database into a network of interlinked references to genes, proteins, mutations, diseases and (bio)chemical compounds. By using genes and proteins as hyperlinks between sentences and articles, iHOP makes the information stored in PubMed accessible as one navigable resource.
The technology behind iHOP is a combination of state-of-the-art components and novel in-house developments. Key features are the organisation of textual and genomic information in a relational database and the use of the latest text-mining technology for the detection of biomedical entities in natural text. Production of state data is based entirely on XML technology and avoidance of complex front-end database queries means response times are extremely fast.
Connecting biomedical concepts
While conventional keyword searches result in long and not always informative lists of abstracts, navigation along this gene-guided network allows for a stepwise and controlled exploration of the information space. The iHOP system shows that distant medical and biological concepts can be related by surprisingly few intermediate genes; the shortest path between any two genes involving on average only four steps.
Hoffmann and Valencia expect this highly connected network to trigger a revolution in new text-mining tools that will bring biomedicine within
Contact: Les Grivell
European Molecular Biology Laboratory