Rat Genome Database

Last updated
RGD
Content
DescriptionThe Rat Genome Database
Organisms Rattus norvegicus (rat)
Contact
Research center Medical College of Wisconsin & Biomedical Engineering
LaboratoryAnne E. Kwitek
AuthorsYsabel Chen & the RGD Team
Primary citation PMID   31713623
Access
Website rgd.mcw.edu
Download URL RGD Data Release

The Rat Genome Database (RGD) is a database of rat genomics, genetics, physiology and functional data, as well as data for comparative genomics between rat, human and mouse. [1] [2] RGD is responsible for attaching biological information to the rat genome via structured vocabulary, or ontology, annotations assigned to genes and quantitative trait loci (QTL), and for consolidating rat strain data and making it available to the research community. They are also developing a suite of tools for mining and analyzing genomic, physiologic and functional data for the rat, and comparative data for rat, mouse, human, and five other species.

Contents

RGD began as a collaborative effort between research institutions involved in rat genetic and genomic research. Its goal, as stated in the National Institutes of Health’s Request for Grant Application: HL-99-013, is the establishment of a Rat Genome Database to collect, consolidate, and integrate data generated from ongoing rat genetic and genomic research efforts and make this data widely available to the scientific community. A secondary, but critical goal is to provide curation of mapped positions for quantitative trait loci, known mutations and other phenotypic data.

The rat continues to be extensively used by researchers as a model organism for investigating pharmacology, toxicology, general physiology and the biology and pathophysiology of disease. [3] In recent years, there has been a rapid increase in rat genetic and genomic data. In addition to this, the Rat Genome Database has become a central point for information on the rat for research and now features information on not just genetics and genomics, but physiology and molecular biology as well. There are tools and data pages available for all of these fields that are curated by RGD staff. [4]

Data

RGD’s data consists of manual annotations from RGD researchers as well as imported annotations from a variety of different sources. RGD also exports their own annotations to share with others.

RGD's Data page [5] lists eight types of data stored in the database: Genes, QTLs, Markers, Maps, Strains, Ontologies, Sequences and References. Of these, six are actively used and regularly updated. The RGD Maps datatype refers to legacy genetic and radiation hybrid maps. This data has been largely supplanted by the rat whole genome sequence. The Sequences data type is not a full list of either genomic, transcript or protein sequences, but rather mostly contains PCR primer sequences which define simple sequence length polymorphism (SSLP) and expressed sequence tag (EST) Markers. Such sequences are useful primarily for researchers still using these markers for genotyping their animals and for distinguishing between markers of the same name. The six major data types in RGD are as follows:

Genome tools

RGD's Genome tools [9] include both software tools developed at RGD and tools from third party sources.

Genome tools developed at RGD

RGD develops web-based tools designed to use the data stored in the RGD database for analyses in rat and across species. These include:

Third party genome tools adapted for use with RGD data

RGD offers several third party software tools that have been adapted for use on the website utilizing data stored in the RGD database. These include:

Additional Data and Tools

Phenotypes and Models Portal

RGD's Phenotypes and Models portal [13] focuses on strains, phenotypes and the rat as a model organism for physiology and disease.

Common NameScientific NameModel For
Chinchilla Chinchilla lanigera
  • The physiology, development, and function of the auditory system
  • The pathobiology of infections and development of vaccines
Thirteen-lined Ground Squirrel Ictidomys tridecemlineatus
  • Retinal function, metabolism, hypoxia/reperfusion, and longevity
Domestic Dog Canis lupus familiaris
  • Cancer, heart disease, autoimmune disorders, allergies, thyroid disease, cataracts, epilepsy, hip dysplasia, blindness, deafness, and more
Bonobo Pan paniscus
  • Cardiovascular disorders
Pig Sus scrofa
  • Renal function, vascular structure, obesity, cardiovascular disease, endocrinology, alcoholism, diabetes, nephropathy, and organ transplantation
Green Monkey Chlorocebus sabaeus
  • Neurodegeneration, diabetes and other metabolic syndromes
  • HIV transmission and AIDS
Naked Mole Rat Heterocephalus glaber
  • Biogerontology, aging and cancer resistance

Disease Portals

Disease Portals consolidate the data in RGD for a specific disease category and present it in a single group of pages. Genes, QTLs and strains annotated to any disease in the category are listed, with genome-wide views of their locations in rat, human and mouse (see Genome Viewer in Genome tools developed at RGD). Additional sections of the portal display data for phenotypes, biological processes and pathways related to the disease category. Pages are also supplied to give users access to information about rat strains used as models for one or more diseases in the category, tools that could be used to analyze the data and additional resources related to the disease category. Further, access to the RGD's Multi-Ontology Enrichment Tool (MOET) is available at the bottom of the individual disease portals.

As of May 2021, RGD has fifteen disease portals: [16] [17]

Disease portals consolidate the data in RGD for a specific disease category and present it in a single group of pages. Genes, QTLs and strains annotated to any disease in the category are listed, with genome-wide views of their locations in rat, human and mouse (see "Genome Viewer" in Genome tools developed at RGD). Additional sections of the portal display data for phenotypes, biological processes and pathways related to the disease category. Pages are also supplied to give users access to information about rat strains used as models for one or more diseases in the category, tools that could be used to analyze the data and additional resources related to the disease category.

Pathways

RGD's Pathway resources [18] [19] include a Pathway Ontology [20] of pathway terms (developed and maintained at RGD, encompassing not only metabolic pathways but also disease, drug, regulatory and signaling pathways), as well as interactive diagrams of the components and interactions of selected pathways. Included on the diagram pages are a description, lists of pathway gene members and additional elements, tables of disease, pathway and phenotype annotations made to pathway member genes, associated references and an ontology path diagram. Pathway Suites and Suite Networks, i.e. groupings of related pathways which all contribute to a larger process such as glucose homeostasis or gene expression regulation are presented, as well as Physiological Pathway diagrams which display networks of organs, tissues, cells and molecular pathways at the whole animal or systems level.

Knockouts

Until recently, direct, specific genomic manipulations in the rat were not possible. However, with the rise of technologies such as Zinc finger nuclease- and CRISPR -based mutagenesis techniques, that is no longer the case. [21] Groups producing rat gene knockouts and other types of genetically modified rats include the Human and Molecular Genetics Center at MCW. RGD links to information about the rat strains produced in these studies via pages about the PhysGen Knockout project [22] and the MCW Gene Editing Rat Resource Center (GERRC), [23] accessed from RGD page headers. Funding for both the PhysGenKO project and the GERRC came from the National Heart Lung and Blood Institute (NHLBI). The stated goal of both projects is to produce rats with alterations in one or more specific genes related to the mission of the NHLBI. Genes were nominated by rat researchers. Nominations were adjudicated by an External Advisory Board. In the case of the PhysGenKO project, many of the rats produced by the group were phenotyped using a standardized high-throughput phenotyping protocol and the data is available in RGD's PhenoMiner tool.

Community outreach and education

RGD reaches out to the rat research community in a variety of ways including an email forum, a news page, a Facebook page, a Twitter account, and regular attendance and presentations at scientific meetings and conferences. [24] Additional educational activities include the production of tutorial videos, both outlining how to use RGD tools and data, and on more general topics such as biomedical ontologies and biological (i.e. gene, QTL and strain) nomenclature. These videos can be viewed on several online video hosting sites including YouTube.

Funding

RGD is funded by grant R01HL64541 from the National Heart, Lung, and Blood Institute (NHLBI) on behalf of the National Institutes of Health (NIH). The principal investigator of the grant is Anne E. Kwitek, PhD, who was appointed to this leadership position from Mary E. Shimoyama, PhD in March of 2020. Melinda R Dwinell, PhD is Co-Investigator. [25]

New Genome Assembly

The new genome rat assembly, mRatBN7.2, was generated by the Darwin Tree of Life Project at the Wellcome Sanger Institute and has been accepted into the Genome Reference Consortium. mRatBN7.2 was derived from a male BN/NHsdMcwi rat that is a direct descendant of the female BN rat previously sequenced. The new BN rat reference genome was created using a variety of technologies including PacBio long reads, 10X linked reads, Bionano maps and Arima Hi-C. Its contiguity is similar to the human or mouse reference assemblies. It is available at NCBI’s GenBank and at RefSeq, and it will be made the primary assembly at RGD in the near future. [26]

Related Research Articles

A quantitative trait locus (QTL) is a locus that correlates with variation of a quantitative trait in the phenotype of a population of organisms. QTLs are mapped by identifying which molecular markers correlate with an observed trait. This is often an early step in identifying the actual genes that cause the trait variation.

<span class="mw-page-title-main">KEGG</span> Collection of bioinformatics databases

KEGG is a collection of databases dealing with genomes, biological pathways, diseases, drugs, and chemical substances. KEGG is utilized for bioinformatics research and education, including data analysis in genomics, metagenomics, metabolomics and other omics studies, modeling and simulation in systems biology, and translational research in drug development.

The Saccharomyces Genome Database (SGD) is a scientific database of the molecular biology and genetics of the yeast Saccharomyces cerevisiae, which is commonly known as baker's or budding yeast. Further information is located at the Yeastract curated repository.

FlyBase is an online bioinformatics database and the primary repository of genetic and molecular data for the insect family Drosophilidae. For the most extensively studied species and model organism, Drosophila melanogaster, a wide range of data are presented in different formats.

Mouse Genome Informatics (MGI) is a free, online database and bioinformatics resource hosted by The Jackson Laboratory, with funding by the National Human Genome Research Institute (NHGRI), the National Cancer Institute (NCI), and the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD). MGI provides access to data on the genetics, genomics and biology of the laboratory mouse to facilitate the study of human health and disease. The database integrates multiple projects, with the two largest contributions coming from the Mouse Genome Database and Mouse Gene Expression Database (GXD). As of 2018, MGI contains data curated from over 230,000 publications.

<span class="mw-page-title-main">Zebrafish Information Network</span>

The Zebrafish Information Network is an online biological database of information about the zebrafish. The zebrafish is a widely used model organism for genetic, genomic, and developmental studies, and ZFIN provides an integrated interface for querying and displaying the large volume of data generated by this research. To facilitate use of the zebrafish as a model of human biology, ZFIN links these data to corresponding information about other model organisms and to human disease databases. Abundant links to external sequence databases and to genome browsers are included. Gene product, gene expression, and phenotype data are annotated with terms from biomedical ontologies. ZFIN is based at the University of Oregon in the United States, with funding provided by the National Institutes of Health (NIH).

GeneCards is a database of human genes that provides genomic, proteomic, transcriptomic, genetic and functional information on all known and predicted human genes. It is being developed and maintained by the Crown Human Genome Center at the Weizmann Institute of Science, in collaboration with LifeMap Sciences.

A biological pathway is a series of interactions among molecules in a cell that leads to a certain product or a change in a cell. Such a pathway can trigger the assembly of new molecules, such as a fat or protein. Pathways can also turn genes on and off, or spur a cell to move. Some of the most common biological pathways are involved in metabolism, the regulation of gene expression and the transmission of signals. Pathways play a key role in advanced studies of genomics.

GeneNetwork is a combined database and open-source bioinformatics data analysis software resource for systems genetics. This resource is used to study gene regulatory networks that link DNA sequence differences to corresponding differences in gene and protein expression and to variation in traits such as health and disease risk. Data sets in GeneNetwork are typically made up of large collections of genotypes and phenotypes from groups of individuals, including humans, strains of mice and rats, and organisms as diverse as Drosophila melanogaster, Arabidopsis thaliana, and barley. The inclusion of genotypes makes it practical to carry out web-based gene mapping to discover those regions of genomes that contribute to differences among individuals in mRNA, protein, and metabolite levels, as well as differences in cell function, anatomy, physiology, and behavior.

<span class="mw-page-title-main">DNA annotation</span> The process of describing the structure and function of a genome

In molecular biology and genetics, DNA annotation or genome annotation is the process of describing the structure and function of the components of a genome, by analyzing and interpreting them in order to extract their biological significance and understand the biological processes in which they participate. Among other things, it identifies the locations of genes and all the coding regions in a genome and determines what those genes do.

Molecular breeding is the application of molecular biology tools, often in plant breeding and animal breeding. In the broad sense, molecular breeding can be defined as the use of genetic manipulation performed at the level of DNA to improve traits of interest in plants and animals, and it may also include genetic engineering or gene manipulation, molecular marker-assisted selection, and genomic selection. More often, however, molecular breeding implies molecular marker-assisted breeding (MAB) and is defined as the application of molecular biotechnologies, specifically molecular markers, in combination with linkage maps and genomics, to alter and improve plant or animal traits on the basis of genotypic assays.

<span class="mw-page-title-main">International Mouse Phenotyping Consortium</span>

The International Mouse Phenotyping Consortium (IMPC) is an international scientific endeavour to create and characterize the phenotype of 20,000 knockout mouse strains. Launched in September 2011, the consortium consists of over 15 research institutes across four continents with funding provided by the NIH, European national governments and the partner institutions.

In bioinformatics, the PANTHER classification system is a large curated biological database of gene/protein families and their functionally related subfamilies that can be used to classify and identify the function of gene products. PANTHER is part of the Gene Ontology Reference Genome Project designed to classify proteins and their genes for high-throughput analysis.

<span class="mw-page-title-main">Gene set enrichment analysis</span> Bioinformatics method

Gene set enrichment analysis (GSEA) (also called functional enrichment analysis or pathway enrichment analysis) is a method to identify classes of genes or proteins that are over-represented in a large set of genes or proteins, and may have an association with different phenotypes (e.g. different organism growth patterns or diseases). The method uses statistical approaches to identify significantly enriched or depleted groups of genes. Transcriptomics technologies and proteomics results often identify thousands of genes which are used for the analysis.

In bioinformatics, a Gene Disease Database is a systematized collection of data, typically structured to model aspects of reality, in a way to comprehend the underlying mechanisms of complex diseases, by understanding multiple composite interactions between phenotype-genotype relationships and gene-disease mechanisms. Gene Disease Databases integrate human gene-disease associations from various expert curated databases and text mining derived associations including Mendelian, complex and environmental diseases.

PomBase is a model organism database that provides online access to the fission yeast Schizosaccharomyces pombe genome sequence and annotated features, together with a wide range of manually curated functional gene-specific data. The PomBase website was redeveloped in 2016 to provide users with a more fully integrated, better-performing service.

Model organism databases (MODs) are biological databases, or knowledgebases, dedicated to the provision of in-depth biological data for intensively studied model organisms. MODs allow researchers to easily find background information on large sets of genes, plan experiments efficiently, combine their data with existing knowledge, and construct novel hypotheses. They allow users to analyse results and interpret datasets, and the data they generate are increasingly used to describe less well studied species. Where possible, MODs share common approaches to collect and represent biological information. For example, all MODs use the Gene Ontology (GO) to describe functions, processes and cellular locations of specific gene products. Projects also exist to enable software sharing for curation, visualization and querying between different MODs. Organismal diversity and varying user requirements however mean that MODs are often required to customize capture, display, and provision of data.

Echinobase is a Model Organism Database (MOD). It supports the international research community by providing a centralized, integrated web based resource to access the diverse and rich, functional genomics data of echinoderm evolution, development and gene regulatory networks.

SoyBase is a database created by the United States Department of Agriculture. It contains genetic information about soybeans. It includes genetic maps, information about Mendelian genetics and molecular data regarding genes and sequences. It was started in 1990 and is freely available to individuals and organizations worldwide.

References

  1. Smith, Jennifer R.; Hayman, G. Thomas; Wang, Shur-Jen; Laulederkind, Stanley J. F.; Hoffman, Matthew J.; Kaldunski, Mary L.; Tutaj, Monika; Thota, Jyothi; Nalabolu, Harika S.; Ellanki, Santoshi L. R.; Tutaj, Marek A. (2020-01-08). "The Year of the Rat: The Rat Genome Database at 20: a multi-species knowledgebase and analysis platform". Nucleic Acids Research. 48 (D1): D731–D742. doi:10.1093/nar/gkz1041. ISSN   1362-4962. PMC   7145519 . PMID   31713623.
  2. Shimoyama M, De Pons J, Hayman GT, et al. (2015). "The Rat Genome Database 2015: genomic, phenotypic and environmental variations and disease". Nucleic Acids Res. 43 (Database issue): D743–50. doi:10.1093/nar/gku1026. PMC   4383884 . PMID   25355511.
  3. Aitman TJ, Critser JK, Cuppen E, et al. (2008). "Progress and prospects in rat genetics: a community view". Nat. Genet. 40 (5): 516–22. doi:10.1038/ng.147. PMID   18443588. S2CID   22522876.
  4. RGD. "About RGD - Rat Genome Database". Rgd.mcw.edu. Retrieved 2013-02-17.
  5. "RGD Data - Rat Genome Database". Rgd.mcw.edu. Retrieved 2015-07-08.
  6. 1 2 Laulederkind SJ, Shimoyama M, Hayman GT, et al. (2011). "The Rat Genome Database curation tool suite: a set of optimized software tools enabling efficient acquisition, organization, and presentation of biological data". Database (Oxford). 2011: bar002. doi:10.1093/database/bar002. PMC   3041158 . PMID   21321022.
  7. 1 2 Shimoyama M, Hayman GT, Laulederkind SJ, et al. (2009). "The rat genome database curators: who, what, where, why". PLOS Comput. Biol. 5 (11): e1000582. Bibcode:2009PLSCB...5E0582S. doi: 10.1371/journal.pcbi.1000582 . PMC   2775909 . PMID   19956751.
  8. RGD. "About RGD Ontologies - Rat Genome Database". Rgd.mcw.edu. Retrieved 2015-07-08.
  9. RGD. "Genome Tools - Rat Genome Database". Rgd.mcw.edu. Retrieved 2015-07-08.
  10. de la Cruz N, Bromberg S, Pasko D, et al. (2005). "The Rat Genome Database (RGD): developments towards a phenome database". Nucleic Acids Res. 33 (Database issue): D485–91. doi:10.1093/nar/gki050. PMC   540004 . PMID   15608243.
  11. Smith RN, Aleksic J, Butano D, et al. (2012). "InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data". Bioinformatics. 28 (23): 3163–5. doi:10.1093/bioinformatics/bts577. PMC   3516146 . PMID   23023984.
  12. Rachel L, Julie S, Daniela B, et al. (2015). "Cross-organism analysis using InterMine". Genesis. 53 (8): 547–60. doi:10.1002/dvg.22869. PMC   4545681 . PMID   26097192.
  13. RGD. "Phenotypes & Models - Rat Genome Database". Rgd.mcw.edu. Retrieved 2015-07-15.
  14. Laulederkind SJ, Liu W, Smith JR, et al. (2013). "PhenoMiner: quantitative phenotype curation at the rat genome database". Database (Oxford). 2013: bat015. doi:10.1093/database/bat015. PMC   3630803 . PMID   23603846.
  15. RGD. "Phenotype Data - Rat Genome Database". Rgd.mcw.edu. Retrieved 2015-07-14.
  16. RGD. "RGD Disease Portals - Rat Genome Database". Rgd.mcw.edu. Retrieved 2015-07-15.
  17. Twigger SN, Shimoyama M, Bromberg S, Kwitek AE, Jacob HJ (2007). "The Rat Genome Database, update 2007--easing the path from disease to data and back again". Nucleic Acids Res. 35 (Database issue): D658–62. doi:10.1093/nar/gkl988. PMC   1761441 . PMID   17151068.
  18. Petri V, Shimoyama M, Hayman GT, et al. (2011). "The Rat Genome Database pathway portal". Database (Oxford). 2011: bar010. doi:10.1093/database/bar010. PMC   3072770 . PMID   21478484.
  19. Hayman GT, Jayaraman P, Petri V, et al. (2013). "The updated RGD Pathway Portal utilizes increased curation efficiency and provides expanded pathway information". Hum. Genomics. 7: 4. doi: 10.1186/1479-7364-7-4 . PMC   3598722 . PMID   23379628.
  20. Petri V, Jayaraman P, Tutaj M, et al. (2014). "The pathway ontology - updates and applications". J Biomed Semantics. 5 (1): 7. doi: 10.1186/2041-1480-5-7 . PMC   3922094 . PMID   24499703.
  21. Flister MJ, Prokop JW, Lazar J, Shimoyama M, Dwinell M, Geurts A (2015). "2015 Guidelines for Establishing Genetically Modified Rat Models for Cardiovascular Research". J Cardiovasc Transl Res. 8 (4): 269–77. doi:10.1007/s12265-015-9626-4. PMC   4475456 . PMID   25920443.
  22. RGD. "PhysGen Knockouts - Rat Genome Database". Rgd.mcw.edu. Retrieved 2015-07-15.
  23. RGD. "Gene Editing Rat Resource Center". Rgd.mcw.edu. Retrieved 2015-07-21.
  24. RGD. "Rat Community - Rat Genome Database". Rgd.mcw.edu. Retrieved 2015-07-15.
  25. NIH. "Project Information for grant HL64541: Rat Genome Database". nih.gov. Retrieved 2015-07-15.
  26. NIH. "Rnor_6.0 - Assembly - NCBI". nih.gov. Retrieved 2015-07-15.