Thank you for Subscribing to Life Science Review Weekly Brief
Studies suggest the gene discovery efforts of the Human Genome Project were just the beginning, and the research consortium aims to encourage the scientific community to integrate the data into the major human genome databases.
FREMONT, CA: Thousands of frequently very small open reading frames (ORFs) have been discovered in the human genome in recent years. These are DNA sequence segments that may contain protein-building instructions. Several of the current study's authors have previously discovered and described ORFs in scientific journals. Nonetheless, none of these previously virtually unexplored segments was later included in reference databases. Other sequences have been reported in journals such as Science and Nature Chemical Biology, but they have remained largely out of reach for most members of the scientific community, despite evidence that they produce RNA molecules that bind to ribosomes, the cell's protein factories. Protein-coding regions in genes have traditionally been identified by comparing DNA sequences from different species: the most important coding regions have been preserved throughout animal evolution. However, there are drawbacks to this method: relatively young coding regions, that is, that arose during the evolution of primates, fall through the cracks and are thus missing from the databases. So the task now is to incorporate the largely ignored ORFs into the largest reference databases, because researchers had to specifically search for them in the literature if they wanted to study them previously.
The international research team began by gathering data on sequences discovered through ribosome profiling, a technique that determines which part of the messenger RNA (mRNA) the ribosome interacts with. They then compiled the information into a standardised catalogue. This was no easy task because data obtained in a variety of ways from various laboratories cannot simply be combined. The international consortium worked on fundamental questions that define our understanding of the human genome: What exactly is a gene? What exactly is a protein? Are flexible ideas about whether ribosomes always produce a protein or some other type of cellular output required? The group is now calling for the revision of human genome databases used by scientists worldwide. Ensembl-GENCODE is integrating this ORF catalogue into its reference annotation database. Many others, including UniProt, HGNC, PeptideAtlas, and HUPO, will support the approach.
This research marks a huge step forward in understanding the genetic make-up and the total number of proteins in humans It's an experience to be able to provide this new catalogue to the research community. It is too early to say whether all of the unexplored sections of DNA truly represent proteins, but it is seen that something unexplored is happening across the human genome and that the world should pay attention. The scientific community has been largely unaware of these ORFs. This is the point at which they will enter the mainstream of genomic and medical science, an effort that is anticipated will have far-reaching consequences. It is especially remarkable that the majority of these 7,200 ORFs are unique to primates and may represent evolutionary innovations unique to the species. This demonstrates how these elements can provide important clues about what makes us human.