Generic Model Organism Database

Generic Model Organism Database (GMOD)
Operating system	Windows, Mac OS X
Type	Bioinformatics
License	GPL v3
Website	gmod.org/wiki/Main_Page

Last updated April 27, 2024

The Generic Model Organism Database (GMOD) project provides biological research communities with a toolkit of open-source software components for visualizing, annotating, managing, and storing biological data. The GMOD project is funded by the United States National Institutes of Health, National Science Foundation and the USDA Agricultural Research Service.

History

The GMOD project was started in the early 2000s as a collaboration between several model organism databases (MODs) who shared a need to create similar software tools for processing data from sequencing projects. MODs, or organism-specific databases, describe genome and other information about important experimental organisms in the life sciences and capture the large volumes of data and information being generated by modern biology. Rather than each group designing their own software, four major MODs--FlyBase, Saccharomyces Genome Database, Mouse Genome Database, and at or run off a Chado schema database.

Chado database schema

The Chado^[1] schema aims to cover many of the classes of data frequently used by modern biologists, from genetic data to phylogenetic trees to publications to organisms to microarray data to IDs to RNA/protein expression. Chado makes extensive use of controlled vocabularies to type all entities in the database; for example: genes, transcripts, exons, transposable elements, etc., are stored in a feature table, with the type provided by Sequence Ontology. When a new type is added to the Sequence Ontology, the feature table requires no modification, only an update of the data in the database. The same is largely true of analysis data that can be stored in Chado as well.

The existing core modules of Chado are:

sequence - for sequences/features
cv - for controlled-vocabs/ontologies
general - currently just dbxrefs
organism - taxonomic data
pub - publication and references
companalysis - augments sequence module with computational analysis data
map - non-sequence maps
genetic - genetic and phenotypic data
expression - gene expression
natural diversity - population data

Software

The full list of GMOD software components is found on the GMOD Components page.^[2] These components include:

GMOD Core (Chado database and tools)
- Chado: the Chado schema and tools to install it.^[3]
- XORT: a tool for loading and dumping chado-xml^[4]
- GMODTools: extracts data from a Chado database into common genome bulk formats (GFF, Fasta, etc.)^[5]
MOD website
- Tripal: a web front end based on Drupal.^[6]
Genome Editing and Visualization
- Apollo: a Java application for viewing and editing genome annotations^[7]^[8]
- GBrowse: a CGI application for displaying genome annotations^[9]^[10]
- JBrowse: a JavaScript application for displaying genome annotations^[11]
- Pathway Tools: a genome browser with a comparative mode
Comparative Genomics
- GBrowse_syn: a GBrowse based synteny viewer^[12]
- CMap: a CGI application for displaying comparative maps^[13]
Literature curation
- Textpresso: a text mining system for scientific literature ^[14]
Database querying tools
- BioMart: a query-oriented data management system
- InterMine: open source data warehouse system
Biological Pathways
- Pathway Tools: tools for metabolic pathway information, and analysis of high-throughput functional genomics data
Regulatory Networks
- Pathway Tools: supports definition of regulatory interactions and browsing of regulatory networks
Analysis
- Galaxy ^[15]
- MAKER^[16]

Participating databases

The following organism databases are contributing to and/or adopting GMOD components for model organism databases.

NISEED^[17]	AntonosporaDB^{[ citation needed ]}	Arabidopsis^[18]
BeeBase	BeetleBase^[19]^[20]	Bovine genome database (BGD)
BioHealthBase^[21]	Bovine QTL Viewer	Cattle EST Gene Family Database
CGD	CGL	ChromDB
Chromosome 7 Annotation Project	CSHLmpd	Database of Genomic Variants
DictyBase ^[22]	DroSpeGe	EcoCyc
FlyBase	Fungal Comparative Genomics	Fungal Telomere Browser
Gallus Genome Browser	GeneDB	GrainGenes
Gramene	HapMap	Human 2q33
Human Genome Segmental Duplication Database	IVDB	MAGI
Marine Biological Lab Organism Databases	Mouse Genome Informatics	Non-Human Segmental Duplication Database
OMAP	OryGenesDB	Oryza Chromosome 8
Pathway Tools	ParameciumDB ^[23]	PeanutMap
PlantsDB	PlasmoDB	PomBase
PseudoCAP	PossumBase	PUMAdb
Rat Genome Database	Saccharomyces Genome Database	SGD Lite
SmedDB	Sol Genomics Network	Soybase
Soybean Gbrowse Database	T1DBase	The Arabidopsis Information Resource
TGD	The Genome Institute	The Institute for Genomic Research
TIGR Rice Genome Browser	ToxoDB	TriAnnot BAC Viewer
VectorBase	wFleaBase^[24]	WormBase
XanthusBase	Xenbase

Related projects

Bioperl, BioJava, Biopython, BioRuby, etc.
Ensembl
Gene Ontology
DAS
Genomics Unified Schema
Manatee: Manual Annotation Tool
Biocurator.org
Open Biomedical Ontologies
Sequence Ontology Project

Related Research Articles

Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics. Information contained in biological databases includes gene function, structure, localization, clinical effects of mutations as well as similarities of biological sequences and structures.

Ensembl genome database project is a scientific project at the European Bioinformatics Institute, which provides a centralized resource for geneticists, molecular biologists and other researchers studying the genomes of our own species and other vertebrates and model organisms. Ensembl is one of several well known genome browsers for the retrieval of genomic information.

KEGG is a collection of databases dealing with genomes, biological pathways, diseases, drugs, and chemical substances. KEGG is utilized for bioinformatics research and education, including data analysis in genomics, metagenomics, metabolomics and other omics studies, modeling and simulation in systems biology, and translational research in drug development.

The Saccharomyces Genome Database (SGD) is a scientific database of the molecular biology and genetics of the yeast Saccharomyces cerevisiae, which is commonly known as baker's or budding yeast. Further information is located at the Yeastract curated repository.

<span class="mw-page-title-main">PHI-base</span>

The Pathogen-Host Interactions database (PHI-base) is a biological database that contains manually curated information on genes experimentally proven to affect the outcome of pathogen-host interactions. The database has been maintained by researchers at Rothamsted Research and external collaborators since 2005. PHI-base has been part of the UK node of ELIXIR, the European life-science infrastructure for biological information, since 2016.

FlyBase is an online bioinformatics database and the primary repository of genetic and molecular data for the insect family Drosophilidae. For the most extensively studied species and model organism, Drosophila melanogaster, a wide range of data are presented in different formats.

Mouse Genome Informatics (MGI) is a free, online database and bioinformatics resource hosted by The Jackson Laboratory, with funding by the National Human Genome Research Institute (NHGRI), the National Cancer Institute (NCI), and the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD). MGI provides access to data on the genetics, genomics and biology of the laboratory mouse to facilitate the study of human health and disease. The database integrates multiple projects, with the two largest contributions coming from the Mouse Genome Database and Mouse Gene Expression Database (GXD). As of 2018, MGI contains data curated from over 230,000 publications.

MicrobesOnline is a publicly and freely accessible website that hosts multiple comparative genomic tools for comparing microbial species at the genomic, transcriptomic and functional levels. MicrobesOnline was developed by the Virtual Institute for Microbial Stress and Survival, which is based at the Lawrence Berkeley National Laboratory in Berkeley, California. The site was launched in 2005, with regular updates until 2011.

SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. Domains are functional, structural, and evolutionary units that form proteins. Domains of common Ancestry are grouped into superfamilies. The domains and domain superfamilies are defined and described in SCOP. Superfamilies are groups of proteins which have structural evidence to support a common evolutionary ancestor but may not have detectable sequence homology.

In molecular biology and genetics, DNA annotation or genome annotation is the process of describing the structure and function of the components of a genome, by analyzing and interpreting them in order to extract their biological significance and understand the biological processes in which they participate. Among other things, it identifies the locations of genes and all the coding regions in a genome and determines what those genes do.

PDBsum is a database that provides an overview of the contents of each 3D macromolecular structure deposited in the Protein Data Bank (PDB).

Lincoln David Stein is a scientist and Professor in bioinformatics and computational biology at the Ontario Institute for Cancer Research.

In bioinformatics, the PANTHER classification system is a large curated biological database of gene/protein families and their functionally related subfamilies that can be used to classify and identify the function of gene products. PANTHER is part of the Gene Ontology Reference Genome Project designed to classify proteins and their genes for high-throughput analysis.

Gene Ontology (GO) term enrichment is a technique for interpreting sets of genes making use of the Gene Ontology system of classification, in which genes are assigned to a set of predefined bins depending on their functional characteristics. For example, the gene FasR is categorized as being a receptor, involved in apoptosis and located on the plasma membrane.

In bioinformatics, a Gene Disease Database is a systematized collection of data, typically structured to model aspects of reality, in a way to comprehend the underlying mechanisms of complex diseases, by understanding multiple composite interactions between phenotype-genotype relationships and gene-disease mechanisms. Gene Disease Databases integrate human gene-disease associations from various expert curated databases and text mining derived associations including Mendelian, complex and environmental diseases.

PomBase is a model organism database that provides online access to the fission yeast Schizosaccharomyces pombe genome sequence and annotated features, together with a wide range of manually curated functional gene-specific data. The PomBase website was redeveloped in 2016 to provide users with a more fully integrated, better-performing service.

Model organism databases (MODs) are biological databases, or knowledgebases, dedicated to the provision of in-depth biological data for intensively studied model organisms. MODs allow researchers to easily find background information on large sets of genes, plan experiments efficiently, combine their data with existing knowledge, and construct novel hypotheses. They allow users to analyse results and interpret datasets, and the data they generate are increasingly used to describe less well studied species. Where possible, MODs share common approaches to collect and represent biological information. For example, all MODs use the Gene Ontology (GO) to describe functions, processes and cellular locations of specific gene products. Projects also exist to enable software sharing for curation, visualization and querying between different MODs. Organismal diversity and varying user requirements however mean that MODs are often required to customize capture, display, and provision of data.

SoyBase is a database created by the United States Department of Agriculture. It contains genetic information about soybeans. It includes genetic maps, information about Mendelian genetics and molecular data regarding genes and sequences. It was started in 1990 and is freely available to individuals and organizations worldwide.

Biocuration is the field of life sciences dedicated to organizing biomedical data, information and knowledge into structured formats, such as spreadsheets, tables and knowledge graphs. The biocuration of biomedical knowledge is made possible by the cooperative work of biocurators, software developers and bioinformaticians and is at the base of the work of biological databases.

References

↑ Christopher J. Mungall; David B. Emmert; The FlyBase Consortium (2007). "A Chado case study: an ontology-based modular schema for representing genome-associated biological information". Bioinformatics. 23 (13): i337–i346. doi: 10.1093/bioinformatics/btm189 . PMID 17646315.
↑ "GMOD Components - GMOD". gmod.org.
↑ "Chado - Getting Started - GMOD". gmod.org.
↑ "XORT - GMOD". gmod.org.
↑ "GMODTools - GMOD". gmod.org.
↑ "Tripal". gmod.org.
↑ "Apollo - GMOD". gmod.org.
↑ "Apollo — Apollo 2.7.0 documentation". genomearchitect.readthedocs.io.
↑ "GBrowse - GMOD". gmod.org.
↑ Stein LD; Mungall C; Shu S; Caudy M; Mangone M; Day A; Nickerson E; Stajich JE; Harris TW; Arva A; Lewis S. (2002). "The generic genome browser: a building block for a model organism system database". Genome Res. 12 (10): 1599–610. doi:10.1101/gr.403602. PMC 187535 . PMID 12368253.
↑ "JBrowse - GMOD". gmod.org.
↑ "GBrowse syn - GMOD". gmod.org.
↑ "CMap - GMOD". gmod.org.
↑ "Textpresso". gmod.org.
↑ Afgan, E.; Baker, D.; van den Beek, M.; Blankenberg, D.; Bouvier, D.; Čech, M.; Chilton, J.; Clements, D.; Coraor, N.; Eberhard, C.; Grüning, B.; Guerler, A.; Hillman-Jackson, J.; Von Kuster, G.; Rasche, E.; Soranzo, N.; Turaga, N.; Taylor, J.; Nekrutenko, A.; Goecks, J. (8 July 2016). "The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update". Nucleic Acids Research. 44 (W1): W3–W10. doi:10.1093/nar/gkw343. PMC 4987906 . PMID 27137889.
↑ Cantarel, Brandi L.; Korf, Ian; Robb, Sofia M. C.; Parra, Genis; Ross, Eric; Moore, Barry; Holt, Carson; Sánchez Alvarado, Alejandro; Yandell, Mark (January 2008). "MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes". Genome Research. 18: 188–196. doi:10.1101/gr.6743907.
↑ Tassy, Olivier; Dauga, Delphine; Daian, Fabrice; Sobral, Daniel; Robin, François; Khoueiry, Pierre; Salgado, David; Fox, Vanessa; Caillol, Danièle; Schiappa, Renaud; Laporte, Baptiste; Rios, Anne; Luxardi, Guillaume; Kusakabe, Takehiro; Joly, Jean-Stéphane; Darras, Sébastien; Christiaen, Lionel; Contensin, Magali; Auger, Hélène; Lamy, Clément; Hudson, Clare; Rothbächer, Ute; Gilchrist, Michael J.; Makabe, Kazuhiro W.; Hotta, Kohji; Fujiwara, Shigeki; Satoh, Nori; Satou, Yutaka; Lemaire, Patrick (1 October 2010). "The ANISEED database: Digital representation, formalization, and elucidation of a chordate developmental program". Genome Research. 20 (10): 1459–1468. doi:10.1101/gr.108175.110. ISSN 1088-9051. PMC 2945195 . PMID 20647237.
↑ Weems, Danforth; Miller, Neil; Garcia-Hernandez, Margarita; Huala, Eva; Rhee, Seung Y. (2004). "Design, Implementation and Maintenance of a Model Organism Database for Arabidopsis thaliana". Comparative and Functional Genomics. 5 (4): 362–369. doi:10.1002/cfg.408. ISSN 1531-6912. PMC 2447457 . PMID 18629167.
↑ Wang L; Wang S; Li Y; Paradesi MS; Brown SJ. (2007). "BeetleBase: the model organism database for Tribolium castaneum". Nucleic Acids Res. 35 (Database issue): D476–9. doi:10.1093/nar/gkl776. PMC 1669707 . PMID 17090595.
↑ "BeetleBase". www.bioinformatics.ksu.edu/Be. Archived from the original on 13 July 2006.
↑ Noronha, Antonio; Cui, Changhai; Harris, Robert Adron; Crabbe, John C. (2014). Neurobiology of Alcohol Dependence. Elsevier. ISBN 978-0-12-407155-1.
↑ Chisholm RL; Gaudet P; Just EM; Pilcher KE; Fey P; Merchant SN; Kibbe WA. (2006). "dictyBase, the model organism database for Dictyostelium discoideum". Nucleic Acids Res. 34 (Database issue): D423–7. doi:10.1093/nar/gkj090. PMC 1347453 . PMID 16381903.
↑ Arnaiz O; Cain S; Cohen J; Sperling L. (2007). "ParameciumDB: a community resource that integrates the Paramecium tetraurelia genome sequence with genetic data". Nucleic Acids Res. 35 (Database issue): D439–44. doi:10.1093/nar/gkl777. PMC 1669747 . PMID 17142227.
↑ Colbourne JK; Singan VR; Gilbert DG. (2005). "wFleaBase: the Daphnia genome database". BMC Bioinformatics. 6: 45. doi: 10.1186/1471-2105-6-45 . PMC 555599 . PMID 15752432.

External links

GMOD website

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Christopher J. Mungall; David B. Emmert; The FlyBase Consortium (2007). "A Chado case study: an ontology-based modular schema for representing genome-associated biological information". Bioinformatics. 23 (13): i337–i346. doi: 10.1093/bioinformatics/btm189 . PMID 17646315.

[2] "GMOD Components - GMOD". gmod.org.

[3] "Chado - Getting Started - GMOD". gmod.org.

[4] "XORT - GMOD". gmod.org.

[5] "GMODTools - GMOD". gmod.org.

[6] "Tripal". gmod.org.

[7] "Apollo - GMOD". gmod.org.

[8] "Apollo — Apollo 2.7.0 documentation". genomearchitect.readthedocs.io.

[9] "GBrowse - GMOD". gmod.org.

[10] Stein LD; Mungall C; Shu S; Caudy M; Mangone M; Day A; Nickerson E; Stajich JE; Harris TW; Arva A; Lewis S. (2002). "The generic genome browser: a building block for a model organism system database". Genome Res. 12 (10): 1599–610. doi:10.1101/gr.403602. PMC 187535 . PMID 12368253.

[11] "JBrowse - GMOD". gmod.org.

[12] "GBrowse syn - GMOD". gmod.org.

[13] "CMap - GMOD". gmod.org.

[14] "Textpresso". gmod.org.

[15] Afgan, E.; Baker, D.; van den Beek, M.; Blankenberg, D.; Bouvier, D.; Čech, M.; Chilton, J.; Clements, D.; Coraor, N.; Eberhard, C.; Grüning, B.; Guerler, A.; Hillman-Jackson, J.; Von Kuster, G.; Rasche, E.; Soranzo, N.; Turaga, N.; Taylor, J.; Nekrutenko, A.; Goecks, J. (8 July 2016). "The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update". Nucleic Acids Research. 44 (W1): W3–W10. doi:10.1093/nar/gkw343. PMC 4987906 . PMID 27137889.

[16] Cantarel, Brandi L.; Korf, Ian; Robb, Sofia M. C.; Parra, Genis; Ross, Eric; Moore, Barry; Holt, Carson; Sánchez Alvarado, Alejandro; Yandell, Mark (January 2008). "MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes". Genome Research. 18: 188–196. doi:10.1101/gr.6743907.

[17] Tassy, Olivier; Dauga, Delphine; Daian, Fabrice; Sobral, Daniel; Robin, François; Khoueiry, Pierre; Salgado, David; Fox, Vanessa; Caillol, Danièle; Schiappa, Renaud; Laporte, Baptiste; Rios, Anne; Luxardi, Guillaume; Kusakabe, Takehiro; Joly, Jean-Stéphane; Darras, Sébastien; Christiaen, Lionel; Contensin, Magali; Auger, Hélène; Lamy, Clément; Hudson, Clare; Rothbächer, Ute; Gilchrist, Michael J.; Makabe, Kazuhiro W.; Hotta, Kohji; Fujiwara, Shigeki; Satoh, Nori; Satou, Yutaka; Lemaire, Patrick (1 October 2010). "The ANISEED database: Digital representation, formalization, and elucidation of a chordate developmental program". Genome Research. 20 (10): 1459–1468. doi:10.1101/gr.108175.110. ISSN 1088-9051. PMC 2945195 . PMID 20647237.

[18] Weems, Danforth; Miller, Neil; Garcia-Hernandez, Margarita; Huala, Eva; Rhee, Seung Y. (2004). "Design, Implementation and Maintenance of a Model Organism Database for Arabidopsis thaliana". Comparative and Functional Genomics. 5 (4): 362–369. doi:10.1002/cfg.408. ISSN 1531-6912. PMC 2447457 . PMID 18629167.

[19] Wang L; Wang S; Li Y; Paradesi MS; Brown SJ. (2007). "BeetleBase: the model organism database for Tribolium castaneum". Nucleic Acids Res. 35 (Database issue): D476–9. doi:10.1093/nar/gkl776. PMC 1669707 . PMID 17090595.

[20] "BeetleBase". www.bioinformatics.ksu.edu/Be. Archived from the original on 13 July 2006.

[21] Noronha, Antonio; Cui, Changhai; Harris, Robert Adron; Crabbe, John C. (2014). Neurobiology of Alcohol Dependence. Elsevier. ISBN 978-0-12-407155-1.

[22] Chisholm RL; Gaudet P; Just EM; Pilcher KE; Fey P; Merchant SN; Kibbe WA. (2006). "dictyBase, the model organism database for Dictyostelium discoideum". Nucleic Acids Res. 34 (Database issue): D423–7. doi:10.1093/nar/gkj090. PMC 1347453 . PMID 16381903.

[23] Arnaiz O; Cain S; Cohen J; Sperling L. (2007). "ParameciumDB: a community resource that integrates the Paramecium tetraurelia genome sequence with genetic data". Nucleic Acids Res. 35 (Database issue): D439–44. doi:10.1093/nar/gkl777. PMC 1669747 . PMID 17142227.

[24] Colbourne JK; Singan VR; Gilbert DG. (2005). "wFleaBase: the Daphnia genome database". BMC Bioinformatics. 6: 45. doi: 10.1186/1471-2105-6-45 . PMC 555599 . PMID 15752432.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]