For instance, multiple research groups have shown that microarray data can identify previously unappreciated molecular subtypes of lung cancer that differ in their prognoses. Unfortunately, poor reproducibility of results exists across studies.
Furthermore, there is now a tremendous volume of data, particularly from human clinical specimens, which can't be duplicated, so strategies to improve analysis of (that is, "clean up") existing data sets are needed. One limitation of the application of microarray technology could be due to the failure of similar studies to measure identical biological parameters. In other words, the problem could arise from the fact that many of the microarray probes and there are now up to hundreds of thousands on a single slide are often based on gene sequences that are five years old, or more.
Frustrated by more than two years of trying to analyze microarray data contrasting two known conditions, researchers at Harvard Medical School and Washington University in St. Louis decided to look at the nucleotide sequences that measure gene expression on the most widely used commercial microarray technology. They found that in many cases they did not match the most current information.
In this study, they undertook a global analysis of the microarrays and systematically attempted to confirm the accuracy of individual probe sequences. They looked at every probe on the array to see if it corresponded with the gene that it was intended to measure. They found that an important percentage of the probe sequences -- sometimes as much as 20%, on both
Contact: Mayer Resnick
American Physiological Society