BioGRID

Last updated
BioGRID
Biogrid logo.png
Content
DescriptionBioGRID is a biomedical interaction repository with data compiled through comprehensive curation efforts.
Data types
captured
Protein Interactions, Genetic Interactions, Chemical Interactions, Post-Translational Modifications.
Organisms 80
Contact
Research center Université de Montréal, Princeton University, Mount Sinai Hospital (Toronto)
LaboratoryInstitut de Recherche en Immunologie et en Cancérologie, Lewis-Sigler Institute for Integrative Genomics, Lunenfeld-Tanenbaum Research Institute
AuthorsLorrie Boucher, Ashton Breitkreutz, Bobby-Joe Breitkreutz, Christie Chang, Andrew Chatr-Aryamontri, Kara Dolinski, Sven Heinicke, Nadine Kolas, Lara O'Donnell, Sara Oster, Rose Oughtred, Jennifer Rust, Adnane Sellam, Chris Stark, Jean Tang, Chandra Theesfeld, Mike Tyers.
Primary citationStark & al. (2006) [1]
Access
Data format Custom flat files, PSI-MI, MITAB
Website thebiogrid.org
Download URL downloads.thebiogrid.org/BioGRID
Web service URLYes - wiki.thebiogrid.org/doku.php/biogridrest
Tools
Web Advanced search, integrated network viewer, custom downloads, bulk retrieval/download
Miscellaneous
Versioning Yes
Data release
frequency
Monthly (4 Weeks)
Version4.2.193; 1 January 2021;2 years ago (2021-01-01)
Curation policyYes - manual; Also focused curation efforts.
Bookmarkable
entities
Yes - both individual results and searches,

The Biological General Repository for Interaction Datasets (BioGRID) is a curated biological database of protein-protein interactions, genetic interactions, chemical interactions, and post-translational modifications created in 2003 (originally referred to as simply the General Repository for Interaction Datasets (GRID) [2] by Mike Tyers, Bobby-Joe Breitkreutz, and Chris Stark at the Lunenfeld-Tanenbaum Research Institute at Mount Sinai Hospital. It strives to provide a comprehensive curated resource for all major model organism species while attempting to remove redundancy to create a single mapping of data. Users of The BioGRID can search for their protein, chemical or publication of interest and retrieve annotation, as well as curated data as reported, by the primary literature and compiled by in house large-scale curation efforts. The BioGRID is hosted in Toronto, Ontario, Canada and Dallas, Texas, United States and is partnered with the Saccharomyces Genome Database, FlyBase, WormBase, PomBase, and the Alliance of Genome Resources. The BioGRID is funded by the NIH and CIHR. BioGRID is an observer member of the International Molecular Exchange Consortium (IMEx).

Contents

History

The BioGRID was originally published and released as simply the General Repository for Interaction Datasets [2] but was later renamed to the BioGRID [1] in order to more concisely describe the project, and help distinguish it from several GRID Computing projects with a similar name. Originally separated into organism specific databases, the newest version now provides a unified front end allowing for searches across several organisms simultaneously. The BioGRID was developed initially as a project at the Lunenfeld-Tanenbaum Research Institute at Mount Sinai Hospital but has since expanded to include teams at the Institut de Recherche en Immunologie et en Cancérologie at the Université de Montréal and the Lewis-Sigler Institute for Integrative Genomics at Princeton University. The BioGRID's original focus was on curation of binary protein-protein and genetic interactions, but has expanded over several updates [1] [3] [4] [5] [6] [7] [8] to incorporate curated post-translational modification data, [9] [10] chemical interaction data, and complex multi-gene/protein interactions. Moreover, on a monthly basis, the BioGRID continues to expand curated data and also develop and release new tools, [9] [10] [11] [12] data from comprehensive targeted curation projects, [13] and perform targeted scientific analysis. [14]

Curation of Genetic, Protein, and Chemical Interactions

The Biological General Repository for Interaction Datasets (BioGRID) is an open access database that houses genetic and protein interactions curated from the primary biomedical literature for all major model organism species and humans. As of 18 October 2020, [15] the BioGRID contains 1,928 million interactions as drawn from 63,083 publications that represent 71 model organisms. At the start of 2021 it already contained more than 2,0 million biological interactions, 29,023 chemical-protein interactions, and 506,485 post-translational modifications collectively curated from 75,988 publications for more than 80 species. [16] BioGRID data are freely distributed through partner model organism databases and meta-databases and are directly downloadable in a variety of formats. BioGRID curation is coordinated through an Interaction Management System (IMS) that facilitates the compilation interaction records through structured evidence codes, phenotype ontologies, and gene annotation. The BioGRID architecture has been improved in order to support a broader range of interaction and post-translational modification types, to allow the representation of more complex multi-gene/protein interactions, to account for cellular phenotypes through structured ontologies, to expedite curation through semi-automated text mining approaches, and to enhance curation quality control. Through comprehensive curation efforts, BioGRID now includes a virtually complete set of interactions reported to date in the primary literature for budding yeast ( Saccharomyces cerevisiae ), thale cress ( Arabidopsis thaliana ), and fission yeast ( Schizosaccharomyces pombe ).

Themed Curation Projects

Due to the overwhelming size of published scientific literature containing human (Homo sapiens) gene, protein, and chemical interactions, BioGRID has taken a targeted, project-based approach to curation of human interaction data in manageable collections of high impact data. These themed curation projects represent central biological processes with disease relevance such as chromatin modification, autophagy, and the ubiquitin-proteasome system or diseases of interest including glioblastoma, Fanconi Anemia, and COVID-19. As of 18 October 2020, [15] BioGRID themed curation project efforts have resulted in the extraction of 424,631 interactions involving 2,361 proteins from more than 37,000 scientific articles.

Curation of Genome-Wide CRISPR Screens

CRISPR‐based genetic screens have now been reported in numerous publications that link gene function to cell viability, chemical and stress resistance, and other phenotypes. To increase the accessibility of CRISPR screen data and facilitate assignment of protein function, BioGRID has developed an embedded resource called the Open Repository of CRISPR Screens (ORCS) [7] [15] to house and distribute manually curated, comprehensive collections of CRISPR screen datasets using Cas9 and other CRISPR nucleases. As of 18 October 2020, [15] BioGRID-ORCS contains more than 1,042 CRISPR screens curated from 114 publications representing more than 60,000 unique genes across three species human ( Homo sapiens ), fruit fly ( Drosophila melanogaster ), and house mouse ( Mus musculus ) in over 670 cell lines and 17 phenotypes.

Supported Organisms

The following organisms are currently supported within the BioGRID, and each has curated interaction data available according to the latest statistics.

Funding for BioGRID

BioGRID is funded by grants from the National Institutes of Health and the Canadian Institutes of Health Research

Related Research Articles

<span class="mw-page-title-main">Protein–protein interaction</span> Physical interactions and constructions between multiple proteins

Protein–protein interactions (PPIs) are physical contacts of high specificity established between two or more protein molecules as a result of biochemical events steered by interactions that include electrostatic forces, hydrogen bonding and the hydrophobic effect. Many are physical contacts with molecular associations between chains that occur in a cell or in a living organism in a specific biomolecular context.

Biomedical text mining refers to the methods and study of how text mining may be applied to texts and literature of the biomedical domain. As a field of research, biomedical text mining incorporates ideas from natural language processing, bioinformatics, medical informatics and computational linguistics. The strategies in this field have been applied to the biomedical literature available through services such as PubMed.

BioCreAtIvE consists in a community-wide effort for evaluating information extraction and text mining developments in the biological domain.

The Human Protein Reference Database (HPRD) is a protein database accessible through the Internet. It is closely associated with the premier Indian Non-Profit research organisation Institute of Bioinformatics (IOB), Bangalore, India. This database is a collaborative output of IOB and the Pandey Lab of Johns Hopkins University.

The Saccharomyces Genome Database (SGD) is a scientific database of the molecular biology and genetics of the yeast Saccharomyces cerevisiae, which is commonly known as baker's or budding yeast. Further information is located at the Yeastract curated repository.

The Database of Interacting Proteins (DIP) is a biological database which catalogs experimentally determined interactions between proteins. It combines information from a variety of sources to create a single, consistent set of protein–protein interactions. The data stored within DIP have been curated, both manually, by expert curators, and automatically, using computational approaches that utilize the knowledge about the protein–protein interaction networks extracted from the most reliable, core subset of the DIP data. The database was initially released in 2002. As of 2014, DIP is curated by the research group of David Eisenberg at UCLA.

<span class="mw-page-title-main">ANKRD13C</span> Protein-coding gene in the species Homo sapiens

Ankyrin repeat domain-containing protein 13C is a protein that in humans is encoded by the ANKRD13C gene.

The Eukaryotic Linear Motif (ELM) resource is a computational biology resource for investigating short linear motifs (SLiMs) in eukaryotic proteins. It is currently the largest collection of linear motif classes with annotated and experimentally validated linear motif instances.

<span class="mw-page-title-main">STRING</span>

In molecular biology, STRING is a biological database and web resource of known and predicted protein–protein interactions.

MatrixDB is a biological database focused on molecular interactions between extracellular proteins and polysaccharides. MatrixDB takes into account the multimeric nature of the extracellular proteins. The database was initially released in 2009 and is maintained by the research group of Sylvie Ricard-Blum at UMR5246, Claude Bernard University Lyon 1.

The Death Domain database is a secondary database of protein-protein interactions (PPI) of the death domain superfamily. Members of this superfamily are key players in apoptosis, inflammation, necrosis, and immune cell signaling pathways. Negative death domain superfamily-mediated signaling events result in various human diseases which include, cancers, neurodegenerative diseases, and immunological disorders. Creating death domain databases are of particular interest to researchers in the biomedical field as it enables a further understanding of the molecular mechanisms involved in death domain interactions while also providing easy access to tools such as an interaction map that illustrates the protein-protein interaction network and information. There is currently only one database that exclusively looks at death domains but there are other databases and resources that have information on this superfamily. According to PubMed, this database has been cited by seven peer-reviewed articles to date because of its extensive and specific information on the death domains and their PPI summaries.

Cancer systems biology encompasses the application of systems biology approaches to cancer research, in order to study the disease as a complex adaptive system with emerging properties at multiple biological scales. Cancer systems biology represents the application of systems biology approaches to the analysis of how the intracellular networks of normal cells are perturbed during carcinogenesis to develop effective predictive models that can assist scientists and clinicians in the validations of new therapies and drugs. Tumours are characterized by genomic and epigenetic instability that alters the functions of many different molecules and networks in a single cell as well as altering the interactions with the local environment. Cancer systems biology approaches, therefore, are based on the use of computational and mathematical methods to decipher the complexity in tumorigenesis as well as cancer heterogeneity.

The Expression Atlas is a database maintained by the European Bioinformatics Institute that provides information on gene expression patterns from RNA-Seq and Microarray studies, and protein expression from Proteomics studies. The Expression Atlas allows searches by gene, splice variant, protein attribute, disease, treatment or organism part. Individual genes or gene sets can be searched for. All datasets in Expression Atlas have its metadata manually curated and its data analysed through standardised analysis pipelines. There are two components to the Expression Atlas, the Baseline Atlas and the Differential Atlas:

<span class="mw-page-title-main">C14orf80</span> Protein-coding gene in the species Homo sapiens

Uncharacterized protein C14orf80 is a protein which in humans is encoded by the chromosome 14 open reading frame 80, C14orf80, gene.

The human interactome is the set of protein–protein interactions that occur in human cells. The sequencing of reference genomes, in particular the Human Genome Project, has revolutionized human genetics, molecular biology, and clinical medicine. Genome-wide association study results have led to the association of genes with most Mendelian disorders, and over 140 000 germline mutations have been associated with at least one genetic disease. However, it became apparent that inherent to these studies is an emphasis on clinical outcome rather than a comprehensive understanding of human disease; indeed to date the most significant contributions of GWAS have been restricted to the “low-hanging fruit” of direct single mutation disorders, prompting a systems biology approach to genomic analysis. The connection between genotype and phenotype remain elusive, especially in the context of multigenic complex traits and cancer. To assign functional context to genotypic changes, much of recent research efforts have been devoted to the mapping of the networks formed by interactions of cellular and genetic components in humans, as well as how these networks are altered by genetic and somatic disease.

<span class="mw-page-title-main">International Molecular Exchange Consortium</span>

The International Molecular Exchange Consortium (IMEx) is a group of the major public providers of molecular interaction data to provide a single, non-redundant set of molecular interactions. Data is captured using a detailed curation model and made available in the PSI-MI standard formats. Participating databases include DIP, IntAct, the Molecular Interaction Database (MINT), MatrixDB, InnateDB, IID, HPIDB, UCL Cardiovascular Gene Annotation, MBInfo, Molecular Connections and UniProt. The group collates the interaction data and prevents duplicate entries in the various databases. The IMEx consortium also supports and contributes to the development of the HUPO-PSI-MI XML format, which is now widely implemented.

A proteolysis targeting chimera (PROTAC) is a heterobifunctional molecule composed of two active domains and a linker, capable of removing specific unwanted proteins. Rather than acting as a conventional enzyme inhibitor, a PROTAC works by inducing selective intracellular proteolysis. PROTACs consist of two covalently linked protein-binding molecules: one capable of engaging an E3 ubiquitin ligase, and another that binds to a target protein meant for degradation. Recruitment of the E3 ligase to the target protein results in ubiquitination and subsequent degradation of the target protein via the proteasome. Because PROTACs need only to bind their targets with high selectivity, there are currently many efforts to retool previously ineffective inhibitor molecules as PROTACs for next-generation drugs.

The Histone Database is a comprehensive database of histone protein sequences including histone variants, classified by histone types and variants, maintained by National Center for Biotechnology Information. The creation of the Histone Database was stimulated by the X-ray analysis of the structure of the nucleosomal core histone octamer followed by the application of a novel motif searching method to a group of proteins containing the histone fold motif in the early-mid-1990. The first version of the Histone Database was released in 1995 and several updates have been released since then.

<span class="mw-page-title-main">Transmembrane protein 255A</span> Mammalian protein found in Homo sapiens

Transmembrane protein 255A is a protein that is encoded by the TMEM255A gene. TMEM255A is often referred to as family with sequence similarity 70, member A (FAM70A). The TMEM255A protein is transmembrane and is predicted to be located the nuclear envelope of eukaryote organisms.

References

  1. 1 2 3 Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M (Jan 2006). "BioGRID: A General Repository for Interaction Datasets". Nucleic Acids Research. 34 (90001): 535–539. doi:10.1093/nar/gkj109. PMC   1347471 . PMID   16381927.
  2. 1 2 Breitkreutz BJ, Stark C, Tyers M (Jan 2003). "The GRID: the General Repository for Interaction Datasets". Genome Biology. 4 (3): R23. doi: 10.1186/gb-2003-4-3-r23 . PMC   153463 . PMID   12620108.
  3. Chatr-Aryamontri A, Breitkreutz BJ, Oughtred R, Boucher L, Heinicke S, Chen D, Stark C, Breitkreutz A, Kolas N, O'Donnell L, Reguly T, Nixon J, Ramage L, Winter A, Sellam A, Chang C, Hirschman J, Theesfeld C, Rust J, Livstone MS, Dolinski K, Tyers M (Jan 2015). "The BioGRID interaction database: 2015 update". Nucleic Acids Research. 43 (Database issue): 470–478. doi:10.1093/nar/gku1204. PMC   4383984 . PMID   25428363.
  4. Chatr-Aryamontri A, Breitkreutz BJ, Heinicke S, Boucher L, Winter A, Stark C, Nixon J, Ramage L, Kolas N, O'Donnell L, Reguly T, Breitkreutz A, Sellam A, Chen D, Chang C, Rust JM, Livstone MS, Oughtred R, Dolinski K, Tyers M (Jan 2013). "The BioGRID interaction database: 2013 update". Nucleic Acids Research. 41 (Database issue): 816–823. doi:10.1093/nar/gks1158. PMC   3531226 . PMID   23203989.
  5. Stark C, Breitkreutz BJ, Chatr-Aryamontri A, Boucher L, Oughtred R, Livstone MS, Nixon J, Van Auken K, Wang X, Shi X, Reguly T, Rust JM, Winter A, Dolinski K, Tyers M (Jan 2011). "The BioGRID Interaction Database: 2011 update". Nucleic Acids Research. 39 (Database issue): 698–704. doi:10.1093/nar/gkq1116. PMC   3013707 . PMID   21071413.
  6. Breitkreutz BJ, Stark C, Reguly T, Boucher L, Breitkreutz A, Livstone M, Oughtred R, Lackner DH, Bähler J, Wood V, Dolinski K, Tyers M (Jan 2008). "The BioGRID Interaction Database: 2008 update". Nucleic Acids Research. 36 (Database issue): 637–640. doi:10.1093/nar/gkm1001. PMC   2238873 . PMID   18000002.
  7. 1 2 Chatr-Aryamontri, Andrew; Oughtred, Rose; Boucher, Lorrie; Rust, Jennifer; Chang, Christie; Kolas, Nadine K.; O'Donnell, Lara; Oster, Sara; Theesfeld, Chandra; Sellam, Adnane; Stark, Chris (2017-01-04). "The BioGRID interaction database: 2017 update". Nucleic Acids Research. 45 (D1): D369–D379. doi:10.1093/nar/gkw1102. ISSN   1362-4962. PMC   5210573 . PMID   27980099.
  8. Oughtred, Rose; Stark, Chris; Breitkreutz, Bobby-Joe; Rust, Jennifer; Boucher, Lorrie; Chang, Christie; Kolas, Nadine; O'Donnell, Lara; Leung, Genie; McAdam, Rochelle; Zhang, Frederick (2019-08-01). "The BioGRID interaction database: 2019 update". Nucleic Acids Research. 47 (D1): D529–D541. doi:10.1093/nar/gky1079. ISSN   1362-4962. PMC   6324058 . PMID   30476227.
  9. 1 2 Stark C, Ting-Cheng Su, Breitkreutz A, Lourenco P, Dahabieh M, Breitkreutz BJ, Tyers M, Sadowski I (Jan 2010). "PhosphoGRID: a database of experimentally verified in vivo protein phosphorylation sites from the budding yeast Saccharomyces cerevisiae". Database. 2010: bap026. doi:10.1093/database/bap026. PMC   2860897 . PMID   20428315.
  10. 1 2 Sadowski I, Breitkreutz BJ, Stark C, Su TC, Dahabieh M, Raithatha S, Bernhard W, Oughtred R, Dolinski K, Barreto K, Tyers M (May 2013). "The PhosphoGRID Saccharomyces cerevisiae protein phosphorylation site database: version 2.0 update". Database. 2013: bat026. doi:10.1093/database/bat026. PMC   3653121 . PMID   23674503.
  11. Winter AG, Wildenhain J, Tyers M (April 2011). "BioGRID REST Service, BiogridPlugin2 and BioGRID WebGraph: new tools for access to interaction data at BioGRID". Bioinformatics. 27 (7): 1043–1044. doi:10.1093/bioinformatics/btr062. PMC   3065694 . PMID   21300700.
  12. Breitkreutz BJ, Stark C, Tyers M (January 2003). "Osprey: a network visualization system". Genome Biology. 4 (3): R22. doi: 10.1186/gb-2003-4-3-r22 . PMC   153462 . PMID   12620107.
  13. Reguly T, Breitkreutz A, Boucher L, Breitkreutz BJ, Hon GC, Myers CL, Parsons A, Friesen H, Oughtred R, Tong A, Stark C, Ho Y, Botstein D, Andrews B, Boone C, Troyanskya OG, Ideker T, Dolinski K, Batada NN, Tyers M (2006). "Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae". The Journal of Biological Chemistry. 5 (4): 11. doi: 10.1186/jbiol36 . PMC   1561585 . PMID   16762047.
  14. Breitkreutz A, Choi H, Sharom JR, Boucher L, Neduva V, Larsen B, Lin ZY, Breitkreutz BJ, Stark C, Liu G, Ahn J, Dewar-Darch D, Reguly T, Tang X, Almeida R, Qin ZS, Pawson T, Gingras AC, Nesvizhskii AI, Tyers M (May 2010). "A global protein kinase and phosphatase interaction network in yeast". Science. 328 (5981): 1043–1046. Bibcode:2010Sci...328.1043B. doi:10.1126/science.1176495. PMC   3983991 . PMID   20489023.
  15. 1 2 3 4 Oughtred, Rose; Rust, Jennifer; Chang, Christie; Breitkreutz, Bobby-Joe; Stark, Chris; Willems, Andrew; Boucher, Lorrie; Leung, Genie; Kolas, Nadine; Zhang, Frederick; Dolma, Sonam (2020-10-18). "The BioGRID database: A comprehensive biomedical resource of curated protein, genetic, and chemical interactions". Protein Science. 30 (1): 187–200. doi:10.1002/pro.3978. ISSN   1469-896X. PMC   7737760 . PMID   33070389.
  16. "Build Statistics (4.2.193) - January 2021 | BioGRID". wiki.thebiogrid.org. Retrieved 2021-01-26.