Earlier methods for deciding what is advertising and what is story depended on statistics. "To continue the magazine analogy, it is something like going through back issues of the magazine and finding that human-gene 'stories' are less likely to contain phrases such as 'For Sale,' telephone numbers, and the dollar sign," Gelfand explained.
While better than random reconstruction, these statistical methods are inaccurate at best.
The method developed by Pevzner and his colleagues zeros in on the proper pages that are potentially part of the "story" -- all pages that seem to have sequences that are part of the message.
The accuracy of the method developed by Pevzner and his colleagues is always good, the scientists reported, and often remarkable -- 99 or 100 percent accurate.
The Proceedings paper contains a listing of trials of the method on nearly 100 different genes, 47 of them from mammals (mostly mice), 45 from other organisms, including bacteria. For mammals, 40 of 47 reconstructions were perfect -- 100 percent accurate. In six of the remaining cases, where the method did not give a perfect prediction, it came close, accurately predicting 94 to 97 percent.
Even the lone case in which the method seemed to fall down -- predicting with 75 percent accuracy on the basis of mouse data -- the failure was interesting. In this case, chicken data for the same gene were also available to use for predictions, and the prediction of the human gene from the chicken data was 100 percent accurate. "This is surprising, given that we think of humans as more closely related to mice than to chickens," Pevzner notes.
Even when the starting poi
'"/>
Contact: Eric Mankin
mankin@usc.edu
213-740-9344
University of Southern California
20-Aug-1996