SGJlab :: Research
RNA genes
The long-term aim of my research activities is to understand the complement of genomes that codes for functional RNA molecules, rather than translated proteins. Some classes of so-called non-protein-coding RNA genes (ncRNAs) are well-known, for example, ribosomal RNAs, transfer RNAs and spliceosomal RNAs. Until recently, ncRNA genes have been essentially ignored by genome annotation projects, partly because many ncRNA genes conserve a base-paired secondary structure, without significant sequence conservation. Computational identification of such sequences is therefore extremely tough. However, comparative genomics and complementary experimental studies suggest that the number of ncRNAs in the eukaryotic genome may far out-strip previous expectations. Indeed, large and important classes of RNAs in eukaryotes have been discovered remarkably recently, including microRNAs (in 2001) and piwi-associated RNAs (in 2006).
Much of my work revolves around RNA database resources, and providing methods and models for computational ncRNA homologue detection. I co-founded and led the Rfam database of non-coding RNA families, and continue to collaborate in its development. I am also responsible for curating the nomenclature classification of microRNA genes, and run the miRBase database. Papers describing each of these these resources have over 200 citations (G-J 2004, Griffiths-Jones et al. 2003, 2004, 2005, 2006, 2008).
The solid database ground work means that the time is ripe for a wide range of ncRNA gene studies, their structure, function and evolution. I use the best available computational techniques and databases to address fundamental questions such as:
- How many ncRNA genes are present in the eukaryotic genome?
- What are their structures and functions?
- How do ncRNA genes evolve?
We have recently analysed the genomic features that surround microRNAs, to describe their primary transcripts in animals (Saini et al. 2007). We previously determined that over half of all mammalian microRNAs are expressed from introns of protein- and non-coding genes (Rodriguez et al. 2005).
The Rfam database provides a model for the computational identification of ncRNA homologues in complete genomes. Pushing these tools to their limits allows us to extend the taxonomic ranges of known ncRNA families, and inform on novel biology. For example, we recently identified homologues of the selenocysteine insertion machinery in apicomplexa (Mourier et al. 2004), for which selenocysteine incorporation was previously unknown. I also combine the large-scale use of computational tools, comparative genomics, and manual alignment and annotation to discover novel families of ncRNA genes, and to investigate novel RNA function.
Sam Griffiths-Jones