Echinobase

Last updated
Echinobase
EchLogo.png
Content
DescriptionEchinobase: The Echinoderm Knowledgebase.
Data types
captured
Literature, Nucleotide Sequence, RNA sequence, Protein sequence, Structure, Genomics, Morpholinos, Metabolic and Signaling Pathways, Human and other Vertebrate Genomes, Human Genes and Diseases, Microarray Data and other Gene Expression, Proteomics Resources, Other Molecular Biology, Organelle
Organisms echinoderms
Contact
Research center Carnegie Mellon University, University of Calgary
Laboratory Ettensohn lab, Hinman lab, Vize lab
Primary citation PMID   38262680
Release date2009
Access
Website https://www.echinobase.org/
Download URL https://download.echinobase.org/echinobase/
Tools
Standalone BLAST, JBrowse
Miscellaneous
License Public domain
Data release
frequency
Continuous
Version6.x
Curation policyProfessionally curated
Bookmarkable
entities
Yes

Echinobase is a Model Organism Database (MOD). It supports the international research community by providing a centralized, integrated web based resource to access the diverse and rich, functional genomics data of echinoderm evolution, development and gene regulatory networks. [1]

Contents

Genomic research data and tools are available for searching, browsing and bioinformatic analysis of genomes, genes, and transcripts.

Echinobase provides a critical data sharing infrastructure for other NIH-funded projects and enhances the availability and visibility of echinoderm data to the broader biomedical research community.


Supported Species

Echinobase offers two levels of integration for supported echinoderm species. Full support includes full genome integration in the database, including gene pages, as well as availability of the genomes to BLAST, browsing via JBrowse, and genome download. Partial support provides BLAST, JBrowse, and download options, but no gene page integration.

Current level one supported species (at various stages of integration) are:

Current level two supported species (at various stages of integration) are:

Software, Hardware and Platform

Echinobase runs in a cloud environment. [2] Its virtual machines are running in a VMware vSphere environment on two servers, with automatic load balancing and fault tolerance. Its software uses Java, JSP, JavaScript, AJAX, XML, and CSS. It also uses Apache Tomcat and the IBM Db2 database. Echinobase is developed in tandem with Xenbase.

Functional Genomics

Echinobase is a resource for genomics research that is organized by gene models and represented using gene pages. Each gene page has a tremendous amount of gene specific information.

Genomics - Search and BLAST tools are available directly or through the gene pages that display gene model HGNC compliant names, orthology, [3] GO terms and link to BLAST, the JBrowse genome browser, and a gene expression plotting tool.

Tabs beyond the summary provide gene specific literature, transcripts, expression data, protein sequences and interactants.

Genomic research tools are implemented to assist browsing, search and analysis and visualization of genomic sequence assemblies, annotations and features. Additionally, gene expression data collection, search and visualization is provided.

The Echinoderm Anatomical Ontology (ECAO) uses standardized terms to refer to anatomical cell types and structures and relates these to developmental stages. Numerous echinoderm species are included in the ontology so that some terms are present in all echinoderms while others are species specific. The ECAO contains thousands of anatomical terms for cell types, structures and tissues and anatomical systems such as the nervous system or skeletal system. Relationships between entities are defined using "develops_from" or "develops_into" and "is_a" or "part_of".

Literature, Resources and Community

Literature on Echinobase is collected by automatically searching published papers using echinoderm query terms and retrieved articles are then manually curated.

The data download site makes GFF genome files available and Gene Page Reports provide files for bioinformatic analyses.

In order to support the Community and to enable interdisciplinary and collaborative studies, research, descriptions and contact information of community members, labs and organizations are available and searchable. New Job Openings are also posted on Echinobase.

Other Model Organism Databases (MODs)

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

<span class="mw-page-title-main">National Center for Biotechnology Information</span> Database branch of the US National Library of Medicine

The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The NCBI is located in Bethesda, Maryland, and was founded in 1988 through legislation sponsored by US Congressman Claude Pepper.

The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to: 1) maintain and develop its controlled vocabulary of gene and gene product attributes; 2) annotate genes and gene products, and assimilate and disseminate annotation data; and 3) provide tools for easy access to all aspects of the data provided by the project, and to enable functional interpretation of experimental data using the GO, for example via enrichment analysis. GO is part of a larger classification effort, the Open Biomedical Ontologies, being one of the Initial Candidate Members of the OBO Foundry.

The completion of the human genome sequencing in the early 2000s was a turning point in genomics research. Scientists have conducted series of research into the activities of genes and the genome as a whole. The human genome contains around 3 billion base pairs nucleotide, and the huge quantity of data created necessitates the development of an accessible tool to explore and interpret this information in order to investigate the genetic basis of disease, evolution, and biological processes. The field of genomics has continued to grow, with new sequencing technologies and computational tool making it easier to study the genome.

The Saccharomyces Genome Database (SGD) is a scientific database of the molecular biology and genetics of the yeast Saccharomyces cerevisiae, which is commonly known as baker's or budding yeast. Further information is located at the Yeastract curated repository.

FlyBase is an online bioinformatics database and the primary repository of genetic and molecular data for the insect family Drosophilidae. For the most extensively studied species and model organism, Drosophila melanogaster, a wide range of data are presented in different formats.

<i>Strongylocentrotus purpuratus</i> Species of sea urchin

Strongylocentrotus purpuratus, the purple sea urchin, lives along the eastern edge of the Pacific Ocean extending from Ensenada, Mexico, to British Columbia, Canada. This sea urchin species is deep purple in color, and lives in lower inter-tidal and nearshore sub-tidal communities. Its eggs are orange when secreted in water. January, February, and March function as the typical active reproductive months for the species. Sexual maturity is reached around two years. It normally grows to a diameter of about 10 cm (4 inches) and may live as long as 70 years.

<span class="mw-page-title-main">Zebrafish Information Network</span> Model organism database on zebrafish

The Zebrafish Information Network is an online biological database of information about the zebrafish. The zebrafish is a widely used model organism for genetic, genomic, and developmental studies, and ZFIN provides an integrated interface for querying and displaying the large volume of data generated by this research. To facilitate use of the zebrafish as a model of human biology, ZFIN links these data to corresponding information about other model organisms and to human disease databases. Abundant links to external sequence databases and to genome browsers are included. Gene product, gene expression, and phenotype data are annotated with terms from biomedical ontologies. ZFIN is based at the University of Oregon in the United States, with funding provided by the National Institutes of Health (NIH).

<span class="mw-page-title-main">MicrobesOnline</span>

MicrobesOnline is a publicly and freely accessible website that hosts multiple comparative genomic tools for comparing microbial species at the genomic, transcriptomic and functional levels. MicrobesOnline was developed by the Virtual Institute for Microbial Stress and Survival, which is based at the Lawrence Berkeley National Laboratory in Berkeley, California. The site was launched in 2005, with regular updates until 2011.

Xenbase is a Model Organism Database (MOD), providing informatics resources, as well as genomic and biological data on Xenopus frogs. Xenbase has been available since 1999, and covers both X. laevis and X. tropicalis Xenopus varieties. As of 2013 all of its services are running on virtual machines in a private cloud environment, making it one of the first MODs to do so. Other than hosting genomics data and tools, Xenbase supports the Xenopus research community though profiles for researchers and laboratories, and job and events postings.

The UCSC Genome Browser is an online and downloadable genome browser hosted by the University of California, Santa Cruz (UCSC). It is an interactive website offering access to genome sequence data from a variety of vertebrate and invertebrate species and major model organisms, integrated with a large collection of aligned annotations. The Browser is a graphical viewer optimized to support fast interactive performance and is an open-source, web-based tool suite built on top of a MySQL database for rapid visualization, examination, and querying of the data at many levels. The Genome Browser Database, browsing tools, downloadable data files, and documentation can all be found on the UCSC Genome Bioinformatics website.

GeneCards is a database of human genes that provides genomic, proteomic, transcriptomic, genetic and functional information on all known and predicted human genes. It is being developed and maintained by the Crown Human Genome Center at the Weizmann Institute of Science, in collaboration with LifeMap Sciences.

GeneNetwork is a combined database and open-source bioinformatics data analysis software resource for systems genetics. This resource is used to study gene regulatory networks that link DNA sequence differences to corresponding differences in gene and protein expression and to variation in traits such as health and disease risk. Data sets in GeneNetwork are typically made up of large collections of genotypes and phenotypes from groups of individuals, including humans, strains of mice and rats, and organisms as diverse as Drosophila melanogaster, Arabidopsis thaliana, and barley. The inclusion of genotypes makes it practical to carry out web-based gene mapping to discover those regions of genomes that contribute to differences among individuals in mRNA, protein, and metabolite levels, as well as differences in cell function, anatomy, physiology, and behavior.

<span class="mw-page-title-main">DNA annotation</span> The process of describing the structure and function of a genome

In molecular biology and genetics, DNA annotation or genome annotation is the process of describing the structure and function of the components of a genome, by analyzing and interpreting them in order to extract their biological significance and understand the biological processes in which they participate. Among other things, it identifies the locations of genes and all the coding regions in a genome and determines what those genes do.

<span class="mw-page-title-main">Geworkbench</span> Genomic data analysis software

geWorkbench is an open-source software platform for integrated genomic data analysis. It is a desktop application written in the programming language Java. geWorkbench uses a component architecture. As of 2016, there are more than 70 plug-ins available, providing for the visualization and analysis of gene expression, sequence, and structure data.

WormBase is an online biological database about the biology and genome of the nematode model organism Caenorhabditis elegans and contains information about other related nematodes. WormBase is used by the C. elegans research community both as an information resource and as a place to publish and distribute their results. The database is regularly updated with new versions being released every two months. WormBase is one of the organizations participating in the Generic Model Organism Database (GMOD) project.

<span class="mw-page-title-main">Gene set enrichment analysis</span> Bioinformatics method

Gene set enrichment analysis (GSEA) (also called functional enrichment analysis or pathway enrichment analysis) is a method to identify classes of genes or proteins that are over-represented in a large set of genes or proteins, and may have an association with different phenotypes (e.g. different organism growth patterns or diseases). The method uses statistical approaches to identify significantly enriched or depleted groups of genes. Transcriptomics technologies and proteomics results often identify thousands of genes, which are used for the analysis.

Model organism databases (MODs) are biological databases, or knowledgebases, dedicated to the provision of in-depth biological data for intensively studied model organisms. MODs allow researchers to easily find background information on large sets of genes, plan experiments efficiently, combine their data with existing knowledge, and construct novel hypotheses. They allow users to analyse results and interpret datasets, and the data they generate are increasingly used to describe less well studied species. Where possible, MODs share common approaches to collect and represent biological information. For example, all MODs use the Gene Ontology (GO) to describe functions, processes and cellular locations of specific gene products. Projects also exist to enable software sharing for curation, visualization and querying between different MODs. Organismal diversity and varying user requirements however mean that MODs are often required to customize capture, display, and provision of data.

References

  1. Telmer CA, Karimi K, Chess MM, Agalakov S, Arshinoff BI, Lotay V, Wang DZ, Chu S, Pells TJ, Vize PD, Hinman VF, Ettensohn CA, Echinobase: a resource to support the echinoderm research community, Genetics, 2024, doi:10.1093/genetics/iyae002
  2. K. Karimi and P.D. Vize (2014). The Virtual Xenbase: transitioning an online bioinformatics resource to a private cloud, Database, doi: 10.1093/database/bau108
  3. S. Foley et al. Integration of 1: 1 orthology maps and updated datasets into Echinobase, Database, Volume 2021, 2021, baab030

Gene regulatory networks