Regulatory sequences are emerging as an important part of the non-gene majority of human genetic material, once thought of as "junk DNA." A new frontier in genetic research is the defining of the regulome, the complete set of DNA sequences that regulate the behavior of genes. DNA segments that code for proteins average 200 base pairs in length, whereas regulatory sequences typically include just six to 10 base pairs, making them hard to find.
As a human embryo develops from a single cell into tens of billions of cells, DNA must be read and copied again and again to supply each cell with its needed copy. Over time, random changes, or mutations, are inserted into the code during the copying process. Some mutations bring survival advantages and others cause disease. Most known genetic diseases identified to date result from a mutation within a gene that directs protein construction, but that may soon change.
"We believe more and more disease-causing mutations will be found within regulatory sequences that control genes turning on or off," Miano said. "We therefore are very interested in defining as many functional regulatory elements as we can to help geneticists pinpoint a growing number of disease-causing mutations."
In Miano's study, the regulatory sequence under examination was the CArG box. The nucleotide building blocks of DNA chains may contain any one of four nucleobases: adenine (A), thymine (T), guanine (G) and cytosine (C). Any sequence of code starting with 2 Cs, followed by any combination of 6 As or Ts, and ending in 2 Gs is a CArG box.