The Arabidopsis Information Resource

Last updated
The Arabidopsis Information Resource (TAIR)
Database.png
Content
Descriptiona community resource and online model organism database of genetic and molecular biology data for the model plant Arabidopsis thaliana , commonly known as mouse-ear cress.
Organisms Arabidopsis thaliana
Contact
Research center Phoenix Bioinformatics
Access
Website https://www.arabidopsis.org/

The Arabidopsis Information Resource (TAIR) is a community resource and online model organism database of genetic and molecular biology data for the model plant Arabidopsis thaliana , commonly known as mouse-ear cress. [1]

Contents

TAIR integrates information about the Arabidopsis genome, genes, gene products, natural variants, mutant alleles and plant phenotypes and research literature. Data in TAIR can be retrieved using simple and advanced searches, bulk query and download tools, and in collections of prepared text files. The Arabidopsis genome and annotations can be visualized using the interactive SeqViewer and GBrowse tools. TAIR’s biocurators are responsible for acquiring and integrating data from the research literature (functional annotation) [2] as well as for assisting the community in using Arabidopsis data and tools. TAIR collaborates with the Arabidopsis Biological Resource Consortium [3] (ABRC) to allow researchers to search, browse and order seed and DNA stocks. The ABRC's mission is to acquire, preserve and distribute seed and DNA resources that are useful to the Arabidopsis research community. TAIR’s community includes over 28,000 registered users [4] and the website draws about 60,000 unique visitors per month. [5]

TAIR is located at Phoenix Bioinformatics, [6] and funded by subscriptions. [7]

TAIR funding history

From its inception in 1999 to 2013, TAIR was primarily funded by the National Science Foundation (Grant No. DBI-0850219). In response to the end of NSF funding, a core group of TAIR staff founded the non-profit organization, Phoenix Bioinformatics, [6] with the aim of finding creative solutions to database sustainability. In September 2013, with the support of Phoenix, TAIR transitioned to subscription revenues. Subscription fees are used to fund continuous data curation and improvements to TAIR’s database and tools. TAIR offers a variety of subscription options to access the full, up-to-date resource. [8]

To ensure the greatest community access to data, and promote data reuse, subscriber-only data in TAIR is made available to the public one year after its initial release on the TAIR site. [9]

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

<i>Arabidopsis thaliana</i> Model plant species in the family Brassicaceae

Arabidopsis thaliana, the thale cress, mouse-ear cress or arabidopsis, is a small plant from the mustard family (Brassicaceae), native to Eurasia and Africa. Commonly found along the shoulders of roads and in disturbed land, it is generally considered a weed.

A sequence profiling tool in bioinformatics is a type of software that presents information related to a genetic sequence, gene name, or keyword input. Such tools generally take a query such as a DNA, RNA, or protein sequence or ‘keyword’ and search one or more databases for information related to that sequence. Summaries and aggregate results are provided in standardized format describing the information that would otherwise have required visits to many smaller sites or direct literature searches to compile. Many sequence profiling tools are software portals or gateways that simplify the process of finding information about a query in the large and growing number of bioinformatics databases. The access to these kinds of tools is either web based or locally downloadable executables.

The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to: 1) maintain and develop its controlled vocabulary of gene and gene product attributes; 2) annotate genes and gene products, and assimilate and disseminate annotation data; and 3) provide tools for easy access to all aspects of the data provided by the project, and to enable functional interpretation of experimental data using the GO, for example via enrichment analysis. GO is part of a larger classification effort, the Open Biomedical Ontologies, being one of the Initial Candidate Members of the OBO Foundry.

<span class="mw-page-title-main">UniProt</span> Database of protein sequences and functional information

UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the research literature. It is maintained by the UniProt consortium, which consists of several European bioinformatics organisations and a foundation from Washington, DC, United States.

The European Bioinformatics Institute (EMBL-EBI) is an intergovernmental organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the Wellcome Genome Campus in Hinxton near Cambridge, and employs over 600 full-time equivalent (FTE) staff. Institute leaders such as Rolf Apweiler, Alex Bateman, Ewan Birney, and Guy Cochrane, an adviser on the National Genomics Data Center Scientific Advisory Board, serve as part of the international research network of the BIG Data Center at the Beijing Institute of Genomics.

The completion of the human genome sequencing in the early 2000s was a turning point in genomics research. Scientists have conducted series of research into the activities of genes and the genome as a whole. The human genome contains around 3 billion base pairs nucleotide, and the huge quantity of data created necessitates the development of an accessible tool to explore and interpret this information in order to investigate the genetic basis of disease, evolution, and biological processes. The field of genomics has continued to grow, with new sequencing technologies and computational tool making it easier to study the genome.

<span class="mw-page-title-main">PHI-base</span>

The Pathogen-Host Interactions database (PHI-base) is a biological database that contains manually curated information on genes experimentally proven to affect the outcome of pathogen-host interactions. The database has been maintained by researchers at Rothamsted Research and external collaborators since 2005. PHI-base has been part of the UK node of ELIXIR, the European life-science infrastructure for biological information, since 2016.

FlyBase is an online bioinformatics database and the primary repository of genetic and molecular data for the insect family Drosophilidae. For the most extensively studied species and model organism, Drosophila melanogaster, a wide range of data are presented in different formats.

Mouse Genome Informatics (MGI) is a free, online database and bioinformatics resource hosted by The Jackson Laboratory, with funding by the National Human Genome Research Institute (NHGRI), the National Cancer Institute (NCI), and the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD). MGI provides access to data on the genetics, genomics and biology of the laboratory mouse to facilitate the study of human health and disease. The database integrates multiple projects, with the two largest contributions coming from the Mouse Genome Database and Mouse Gene Expression Database (GXD). As of 2018, MGI contains data curated from over 230,000 publications.

<span class="mw-page-title-main">MicrobesOnline</span>

MicrobesOnline is a publicly and freely accessible website that hosts multiple comparative genomic tools for comparing microbial species at the genomic, transcriptomic and functional levels. MicrobesOnline was developed by the Virtual Institute for Microbial Stress and Survival, which is based at the Lawrence Berkeley National Laboratory in Berkeley, California. The site was launched in 2005, with regular updates until 2011.

DAVID is a free online bioinformatics resource developed by the Laboratory of Human Retrovirology and Immunoinformatics. All tools in the DAVID Bioinformatics Resources aim to provide functional interpretation of large lists of genes derived from genomic studies, e.g. microarray and proteomics studies. DAVID can be found at https://david.ncifcrf.gov/

The Viral Bioinformatics Resource Center (VBRC) is an online resource providing access to a database of curated viral genomes and a variety of tools for bioinformatic genome analysis. This resource was one of eight BRCs funded by NIAID with the goal of promoting research against emerging and re-emerging pathogens, particularly those seen as potential bioterrorism threats. The VBRC is now supported by Dr. Chris Upton at the University of Victoria.

Pathema was one of the eight bioinformatics resource centers funded by the National Institute of Allergy and Infectious Diseases (NIAID), a component of the National Institute of Health (NIH), which is an agency of the United States Department of Health and Human Services.

The UCSC Genome Browser is an online and downloadable genome browser hosted by the University of California, Santa Cruz (UCSC). It is an interactive website offering access to genome sequence data from a variety of vertebrate and invertebrate species and major model organisms, integrated with a large collection of aligned annotations. The Browser is a graphical viewer optimized to support fast interactive performance and is an open-source, web-based tool suite built on top of a MySQL database for rapid visualization, examination, and querying of the data at many levels. The Genome Browser Database, browsing tools, downloadable data files, and documentation can all be found on the UCSC Genome Bioinformatics website.

GeneCards is a database of human genes, which provides genomic, proteomic, transcriptomic, genetic, medical, and functional information on all known and predicted human genes. It is being developed and maintained by the Crown Human Genome Center at the Weizmann Institute of Science, in collaboration with LifeMap Sciences.

<span class="mw-page-title-main">DNA annotation</span> The process of describing the structure and function of a genome

In molecular biology and genetics, DNA annotation or genome annotation is the process of describing the structure and function of the components of a genome, by analyzing and interpreting them in order to extract their biological significance and understand the biological processes in which they participate. Among other things, it identifies the locations of genes and all the coding regions in a genome and determines what those genes do.

Model organism databases (MODs) are biological databases, or knowledgebases, dedicated to the provision of in-depth biological data for intensively studied model organisms. MODs allow researchers to easily find background information on large sets of genes, plan experiments efficiently, combine their data with existing knowledge, and construct novel hypotheses. They allow users to analyse results and interpret datasets, and the data they generate are increasingly used to describe less well studied species. Where possible, MODs share common approaches to collect and represent biological information. For example, all MODs use the Gene Ontology (GO) to describe functions, processes and cellular locations of specific gene products. Projects also exist to enable software sharing for curation, visualization and querying between different MODs. Organismal diversity and varying user requirements however mean that MODs are often required to customize capture, display, and provision of data.

A plant genome assembly represents the complete genomic sequence of a plant species, which is assembled into chromosomes and other organelles by using DNA fragments that are obtained from different types of sequencing technology.

Biocuration is the field of life sciences dedicated to organizing biomedical data, information and knowledge into structured formats, such as spreadsheets, tables and knowledge graphs. The biocuration of biomedical knowledge is made possible by the cooperative work of biocurators, software developers and bioinformaticians and is at the base of the work of biological databases.

References

  1. Lamesch, P; Berardini, TZ; Li, D; Swarbreck, D; Wilks, C; Sasidharan, R; Muller, R; Dreher, K; Alexander, DL; Garcia-Hernandez, M; Karthikeyan, AS; Lee, CH; Nelson, WD; Ploetz, L; Singh, S; Wensel, A; Huala, E (2012). "The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools". Nucleic Acids Research. 40 (Database): D1202-10. doi:10.1093/nar/gkr1090. PMC   3245047 . PMID   22140109.
  2. Berardini, TZ; Mundodi, S; Reiser, L; Huala, E; Garcia-Hernandez, M; Zhang, P; Mueller, LA; Yoon, J; Doyle, A; Lander, G; Moseyko, N; Yoo, D; Xu, I; Zoeckler, B; Montoya, M; Miller, N; Weems, D; Rhee, SY (2004). "Functional annotation of the Arabidopsis genome using controlled vocabularies". Plant Physiology. 135 (2): 745–55. doi:10.1104/pp.104.040071. PMC   514112 . PMID   15173566.
  3. "ABRC". abrc.osu.edu.
  4. "TAIR Community Search". TAIR. Retrieved 11 May 2015.
  5. "TAIR Google Analytics". Google Analytics. Retrieved 12 May 2015.
  6. 1 2 "Phoenix Bioinformatics". www.phoenixbioinformatics.org.
  7. "TAIR Subscriptions".
  8. "TAIR subscription page". TAIR. Retrieved 12 May 2015.
  9. "TAIR Data Release News". TAIR. Retrieved 12 May 2015.