Phylogenomics

Last updated

Phylogenomics is the intersection of the fields of evolution and genomics. [1] The term has been used in multiple ways to refer to analysis that involves genome data and evolutionary reconstructions. [2] It is a group of techniques within the larger fields of phylogenetics and genomics. Phylogenomics draws information by comparing entire genomes, or at least large portions of genomes. [3] Phylogenetics compares and analyzes the sequences of single genes, or a small number of genes, as well as many other types of data. Four major areas fall under phylogenomics:

Contents

The ultimate goal of phylogenomics is to reconstruct the evolutionary history of species through their genomes. This history is usually inferred from a series of genomes by using a genome evolution model and standard statistical inference methods (e.g. Bayesian inference or maximum likelihood estimation). [4]

Prediction of gene function

When Jonathan Eisen originally coined phylogenomics, it applied to prediction of gene function. Before the use of phylogenomic techniques, predicting gene function was done primarily by comparing the gene sequence with the sequences of genes with known functions. When several genes with similar sequences but differing functions are involved, this method alone is ineffective in determining function. A specific example is presented in the paper "Gastronomic Delights: A movable feast". [5] Gene predictions based on sequence similarity alone had been used to predict that Helicobacter pylori can repair mismatched DNA. [6] This prediction was based on the fact that this organism has a gene for which the sequence is highly similar to genes from other species in the "MutS" gene family which included many known to be involved in mismatch repair. However, Eisen noted that H. pylori lacks other genes thought to be essential for this function (specifically, members of the MutL family). Eisen suggested a solution to this apparent discrepancy – phylogenetic trees of genes in the MutS family revealed that the gene found in H. pylori was not in the same subfamily as those known to be involved in mismatch repair. [5] Furthermore, he suggested that this "phylogenomic" approach could be used as a general method for prediction functions of genes. This approach was formally described in 1998. [7] For reviews of this aspect of phylogenomics see Brown D, Sjölander K. Functional classification using phylogenomic inference. [8] [9]

Prediction and retracing lateral gene transfer

Traditional phylogenetic techniques have difficulty establishing differences between genes that are similar because of lateral gene transfer and those that are similar because the organisms shared an ancestor. By comparing large numbers of genes or entire genomes among many species, it is possible to identify transferred genes, since these sequences behave differently from what is expected given the taxonomy of the organism. Using these methods, researchers were able to identify over 2,000 metabolic enzymes obtained by various eukaryotic parasites from lateral gene transfer. [10]

Gene family evolution

The comparison of complete gene sets for a group of organisms allows the identification of events in gene evolution such as gene duplication or gene deletion. Often, such events are evolutionarily relevant. For example, multiple duplications of genes encoding degradative enzymes of certain families is a common adaptation in microbes to new nutrient sources. On the contrary, loss of genes is important in reductive evolution, such as in intracellular parasites or symbionts. Whole genome duplication events, which potentially duplicate all the genes in a genome at once, are drastic evolutionary events with great relevance in the evolution of many clades, and whose signal can be traced with phylogenomic methods.

Establishment of evolutionary relationships

Traditional single-gene studies are effective in establishing phylogenetic trees among closely related organisms, but have drawbacks when comparing more distantly related organisms or microorganisms. This is because of lateral gene transfer, convergence, and varying rates of evolution for different genes. By using entire genomes in these comparisons, the anomalies created from these factors are overwhelmed by the pattern of evolution indicated by the majority of the data. [11] [12] [13] Through phylogenomics, it has been discovered that most of the photosynthetic eukaryotes are linked and possibly share a single ancestor. Researchers compared 135 genes from 65 different species of photosynthetic organisms. These included plants, alveolates, rhizarians, haptophytes and cryptomonads. [14] This has been referred to as the Plants+HC+SAR megagroup. Using this method, it is theoretically possible to create fully resolved phylogenetic trees, and timing constraints can be recovered more accurately. [15] [16] However, in practice this is not always the case. Due to insufficient data, multiple trees can sometimes be supported by the same data when analyzed using different methods. [17]

Databases

  1. PhylomeDB

See also

Related Research Articles

In biology, phylogenetics is the study of the evolutionary history and relationships among or within groups of organisms. These relationships are determined by phylogenetic inference methods that focus on observed heritable traits, such as DNA sequences, protein amino acid sequences, or morphology. The result of such an analysis is a phylogenetic tree—a diagram containing a hypothesis of relationships that reflects the evolutionary history of a group of organisms.

Molecular phylogenetics is the branch of phylogeny that analyzes genetic, hereditary molecular differences, predominantly in DNA sequences, to gain information on an organism's evolutionary relationships. From these analyses, it is possible to determine the processes by which diversity among species has been achieved. The result of a molecular phylogenetic analysis is expressed in a phylogenetic tree. Molecular phylogenetics is one aspect of molecular systematics, a broader term that also includes the use of molecular data in taxonomy and biogeography.

Molecular evolution is the process of change in the sequence composition of cellular molecules such as DNA, RNA, and proteins across generations. The field of molecular evolution uses principles of evolutionary biology and population genetics to explain patterns in these changes. Major topics in molecular evolution concern the rates and impacts of single nucleotide changes, neutral evolution vs. natural selection, origins of new genes, the genetic nature of complex traits, the genetic basis of speciation, the evolution of development, and ways that evolutionary forces influence genomic and phenotypic changes.

<span class="mw-page-title-main">Excavata</span> Supergroup of unicellular organisms belonging to the domain Eukaryota

Excavata is an extensive and diverse but paraphyletic group of unicellular Eukaryota. The group was first suggested by Simpson and Patterson in 1999 and the name latinized and assigned a rank by Thomas Cavalier-Smith in 2002. It contains a variety of free-living and symbiotic protists, and includes some important parasites of humans such as Giardia and Trichomonas. Excavates were formerly considered to be included in the now obsolete Protista kingdom. They were distinguished from other lineages based on electron-microscopic information about how the cells are arranged. They are considered to be a basal flagellate lineage.

In bioinformatics, sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods to understand its features, function, structure, or evolution. Methodologies used include sequence alignment, searches against biological databases, and others.

<span class="mw-page-title-main">Protein family</span> Group of evolutionarily-related proteins

A protein family is a group of evolutionarily related proteins. In many cases, a protein family has a corresponding gene family, in which each gene encodes a corresponding protein with a 1:1 relationship. The term "protein family" should not be confused with family as it is used in taxonomy.

In computational biology, gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functional elements such as regulatory regions. Gene finding is one of the first and most important steps in understanding the genome of a species once it has been sequenced.

<span class="mw-page-title-main">Comparative genomics</span>

Comparative genomics is a field of biological research in which the genomic features of different organisms are compared. The genomic features may include the DNA sequence, genes, gene order, regulatory sequences, and other genomic structural landmarks. In this branch of genomics, whole or large parts of genomes resulting from genome projects are compared to study basic biological similarities and differences as well as evolutionary relationships between organisms. The major principle of comparative genomics is that common features of two organisms will often be encoded within the DNA that is evolutionarily conserved between them. Therefore, comparative genomic approaches start with making some form of alignment of genome sequences and looking for orthologous sequences in the aligned genomes and checking to what extent those sequences are conserved. Based on these, genome and molecular evolution are inferred and this may in turn be put in the context of, for example, phenotypic evolution or population genetics.

<span class="mw-page-title-main">Sequence homology</span> Shared ancestry between DNA, RNA or protein sequences

Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speciation event (orthologs), or a duplication event (paralogs), or else a horizontal gene transfer event (xenologs).

Computational phylogenetics, phylogeny inference, or phylogenetic inference focuses on computational and optimization algorithms, heuristics, and approaches involved in phylogenetic analyses. The goal is to find a phylogenetic tree representing optimal evolutionary ancestry between a set of genes, species, or taxa. Maximum likelihood, parsimony, Bayesian, and minimum evolution are typical optimality criteria used to assess how well a phylogenetic tree topology describes the sequence data. Nearest Neighbour Interchange (NNI), Subtree Prune and Regraft (SPR), and Tree Bisection and Reconnection (TBR), known as tree rearrangements, are deterministic algorithms to search for optimal or the best phylogenetic tree. The space and the landscape of searching for the optimal phylogenetic tree is known as phylogeny search space.

<span class="mw-page-title-main">Masatoshi Nei</span> Japanese-American geneticist (1931–2023)

Masatoshi Nei was a Japanese-born American evolutionary biologist.

<span class="mw-page-title-main">Horizontal gene transfer in evolution</span> Evolutionary consequences of transfer of genetic material between organisms of different taxa

Horizontal gene transfer (HGT) refers to the transfer of genes between distant branches on the tree of life. In evolution, it can scramble the information needed to reconstruct the phylogeny of organisms, how they are related to one another.

A conserved non-coding sequence (CNS) is a DNA sequence of noncoding DNA that is evolutionarily conserved. These sequences are of interest for their potential to regulate gene production.

PhylomeDB is a public biological database for complete catalogs of gene phylogenies (phylomes). It allows users to interactively explore the evolutionary history of genes through the visualization of phylogenetic trees and multiple sequence alignments. Moreover, phylomeDB provides genome-wide orthology and paralogy predictions which are based on the analysis of the phylogenetic trees. The automated pipeline used to reconstruct trees aims at providing a high-quality phylogenetic analysis of different genomes, including Maximum Likelihood tree inference, alignment trimming and evolutionary model testing.

<span class="mw-page-title-main">Eocyte hypothesis</span> Hypothesis in evolutionary biology

The eocyte hypothesis in evolutionary biology proposes that the eukaryotes originated from a group of prokaryotes called eocytes. After his team at the University of California, Los Angeles discovered eocytes in 1984, James A. Lake formulated the hypothesis as "eocyte tree" that proposed eukaryotes as part of archaea. Lake hypothesised the tree of life as having only two primary branches: prokaryotes, which include Bacteria and Archaea, and karyotes, that comprise Eukaryotes and eocytes. Parts of this early hypothesis were revived in a newer two-domain system of biological classification which named the primary domains as Archaea and Bacteria.

Microbial phylogenetics is the study of the manner in which various groups of microorganisms are genetically related. This helps to trace their evolution. To study these relationships biologists rely on comparative genomics, as physiology and comparative anatomy are not possible methods.

Horizontal or lateral gene transfer is the transmission of portions of genomic DNA between organisms through a process decoupled from vertical inheritance. In the presence of HGT events, different fragments of the genome are the result of different evolutionary histories. This can therefore complicate investigations of the evolutionary relatedness of lineages and species. Also, as HGT can bring into genomes radically different genotypes from distant lineages, or even new genes bearing new functions, it is a major source of phenotypic innovation and a mechanism of niche adaptation. For example, of particular relevance to human health is the lateral transfer of antibiotic resistance and pathogenicity determinants, leading to the emergence of pathogenic lineages.

PICRUSt is a bioinformatics software package. The name is an abbreviation for Phylogenetic Investigation of Communities by Reconstruction of Unobserved States.

In molecular phylogenetics, relationships among individuals are determined using character traits, such as DNA, RNA or protein, which may be obtained using a variety of sequencing technologies. High-throughput next-generation sequencing has become a popular technique in transcriptomics, which represent a snapshot of gene expression. In eukaryotes, making phylogenetic inferences using RNA is complicated by alternative splicing, which produces multiple transcripts from a single gene. As such, a variety of approaches may be used to improve phylogenetic inference using transcriptomic data obtained from RNA-Seq and processed using computational phylogenetics.

<span class="mw-page-title-main">Genome skimming</span> Method of genome sequencing

Genome skimming is a sequencing approach that uses low-pass, shallow sequencing of a genome, to generate fragments of DNA, known as genome skims. These genome skims contain information about the high-copy fraction of the genome. The high-copy fraction of the genome consists of the ribosomal DNA, plastid genome (plastome), mitochondrial genome (mitogenome), and nuclear repeats such as microsatellites and transposable elements. It employs high-throughput, next generation sequencing technology to generate these skims. Although these skims are merely 'the tip of the genomic iceberg', phylogenomic analysis of them can still provide insights on evolutionary history and biodiversity at a lower cost and larger scale than traditional methods. Due to the small amount of DNA required for genome skimming, its methodology can be applied in other fields other than genomics. Tasks like this include determining the traceability of products in the food industry, enforcing international regulations regarding biodiversity and biological resources, and forensics.

References

  1. BioMed Central | Fgenerated title -->
  2. Kumar S, Filipski AJ, Battistuzzi FU, Kosakovsky Pond SL, Tamura K (February 2012). "Statistics and truth in phylogenomics". Molecular Biology and Evolution. 29 (2): 457–472. doi:10.1093/molbev/msr202. PMC   3258035 . PMID   21873298.
  3. Pennisi E (June 2008). "Evolution. Building the tree of life, genome by genome". Science. 320 (5884): 1716–1717. doi:10.1126/science.320.5884.1716. PMID   18583591. S2CID   206580993.
  4. Simion P, Delsuc F, Phillipe H (2020). "2.1 To What Extent Current Limits of Phylogenomics Can Be Overcome?". Phylogenetics in the Genomic Era. pp. 2.1.1–2.1.34.
  5. 1 2 Eisen JA, Kaiser D, Myers RM (October 1997). "Gastrogenomic delights: a movable feast". Nature Medicine. 3 (10): 1076–1078. doi:10.1038/nm1097-1076. PMC   3155951 . PMID   9334711.
  6. Tomb JF, White O, Kerlavage AR, Clayton RA, Sutton GG, Fleischmann RD, et al. (August 1997). "The complete genome sequence of the gastric pathogen Helicobacter pylori". Nature. 388 (6642): 539–547. doi: 10.1038/41483 . PMID   9252185.
  7. Eisen JA (March 1998). "Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis". Genome Research. 8 (3): 163–167. doi: 10.1101/gr.8.3.163 . PMID   9521918.
  8. Brown D, Sjölander K (June 2006). "Functional classification using phylogenomic inference". PLOS Computational Biology. 2 (6): e77. Bibcode:2006PLSCB...2...77B. doi: 10.1371/journal.pcbi.0020077 . PMC   1484587 . PMID   16846248.
  9. Sjölander K (January 2004). "Phylogenomic inference of protein molecular function: advances and challenges". Bioinformatics. 20 (2): 170–179. doi: 10.1093/bioinformatics/bth021 . PMID   14734307.
  10. Whitaker JW, McConkey GA, Westhead DR (2009). "The transferome of metabolic genes explored: analysis of the horizontal transfer of enzyme encoding genes in unicellular eukaryotes". Genome Biology. 10 (4): R36. doi: 10.1186/gb-2009-10-4-r36 . PMC   2688927 . PMID   19368726.
  11. Delsuc F, Brinkmann H, Philippe H (May 2005). "Phylogenomics and the reconstruction of the tree of life". Nature Reviews. Genetics. 6 (5): 361–375. CiteSeerX   10.1.1.333.1615 . doi:10.1038/nrg1603. PMID   15861208. S2CID   16379422.
  12. Philippe H, Snell EA, Bapteste E, Lopez P, Holland PW, Casane D "Phylogenomics of eukaryotes: impact of missing data on large alignments Mol Biol Evol 2004 Sep;21(9):1740-52. .
  13. Jeffroy O, Brinkmann H, Delsuc F, Philippe H (April 2006). "Phylogenomics: the beginning of incongruence?" (PDF). Trends in Genetics. 22 (4): 225–231. doi:10.1016/j.tig.2006.02.003. PMID   16490279.
  14. Burki F, Shalchian-Tabrizi K, Pawlowski J (August 2008). "Phylogenomics reveals a new 'megagroup' including most photosynthetic eukaryotes". Biology Letters. 4 (4): 366–369. doi:10.1098/rsbl.2008.0224. PMC   2610160 . PMID   18522922.
  15. dos Reis M, Inoue J, Hasegawa M, Asher RJ, Donoghue PC, Yang Z (September 2012). "Phylogenomic datasets provide both precision and accuracy in estimating the timescale of placental mammal phylogeny". Proceedings. Biological Sciences. 279 (1742): 3491–3500. doi:10.1098/rspb.2012.0683. PMC   3396900 . PMID   22628470.
  16. Kober KM, Bernardi G (April 2013). "Phylogenomics of strongylocentrotid sea urchins". BMC Evolutionary Biology. 13: 88. doi: 10.1186/1471-2148-13-88 . PMC   3637829 . PMID   23617542.
  17. Philippe, Herve'; Delsuc, Frederic; Brinkmann, Henner; Lartillot, Nicolas (2005). "Phylogenomics". Annual Review of Ecology, Evolution, and Systematics. 36: 541–562. doi:10.1146/annurev.ecolsys.35.112202.130205.