Outgroup (cladistics)

Last updated
A simple cladogram showing the evolutionary relationships between four species: A, B, C, and D. Here, Species A is the outgroup, and Species B, C, and D form the ingroup. Outgroup.jpg
A simple cladogram showing the evolutionary relationships between four species: A, B, C, and D. Here, Species A is the outgroup, and Species B, C, and D form the ingroup.

In cladistics or phylogenetics, an outgroup [1] is a more distantly related group of organisms that serves as a reference group when determining the evolutionary relationships of the ingroup, the set of organisms under study, and is distinct from sociological outgroups. The outgroup is used as a point of comparison for the ingroup and specifically allows for the phylogeny to be rooted. Because the polarity (direction) of character change can be determined only on a rooted phylogeny, the choice of outgroup is essential for understanding the evolution of traits along a phylogeny. [2]

Contents

History

Although the concept of outgroups has been in use from the earliest days of cladistics, the term "outgroup" is thought to have been coined in the early 1970s at the American Museum of Natural History. [3] Prior to the advent of the term, various other terms were used by evolutionary biologists, including "exgroup", "related group", and "outside groups". [3]

Choice of outgroup

The chosen outgroup is hypothesized to be less closely related to the ingroup than the ingroup is related to itself. The evolutionary conclusion from these relationships is that the outgroup species has a common ancestor with the ingroup that is older than the common ancestor of the ingroup. Choice of outgroup can change the topology of a phylogeny. [4] Therefore, phylogeneticists typically use more than one outgroup in cladistic analysis. The use of multiple outgroups is preferable because it provides a more robust phylogeny, buffering against poor outgroup candidates and testing the ingroup's hypothesized monophyly. [3] [5] [6]

To qualify as an outgroup, a taxon must satisfy the following two characteristics:

Therefore, an appropriate outgroup must be unambiguously outside the clade of interest in the phylogenetic study. An outgroup that is nested within the ingroup will, when used to root the phylogeny, result in incorrect conclusions about phylogenetic relationships and trait evolution. [7] However, the optimal level of relatedness of the outgroup to the ingroup depends on the depth of phylogenetic analysis. Choosing a closely related outgroup relative to the ingroup is more useful when looking at subtle differences, while choosing an unduly distant outgroup can result in mistaking convergent evolution for a direct evolutionary relationship due to a common ancestor. [8] [9] For shallow phylogenetics—for example, resolving the evolutionary relationships of a clade within a genus—an appropriate outgroup would be a member of the sister clade. [10] However, for deeper phylogenetic analysis, less closely related taxa can be used. For example, Jarvis et al. (2014) used humans and crocodiles as outgroups while resolving the early branches of the avian phylogeny. [11] In molecular phylogenetics, satisfying the second requirement typically means that DNA or protein sequences from the outgroup can be successfully aligned to sequences from the ingroup. Although there are algorithmic approaches to identify the outgroups with maximum global parsimony, they are often limited by failing to reflect the continuous, quantitative nature of certain character states. [12] Character states are traits, either ancestral or derived, that affect the construction of branching patterns in a phylogenetic tree. [13]

Examples

IngroupOutgroup
Great Apes [14] Gibbons
Placental mammals [15] Marsupials
Chordates [16] Echinoderms
Angiosperms [17] Gymnosperms

In each example, a phylogeny of organisms in the ingroup may be rooted by scoring the same character states for one or more members of the outgroup.

See also

Related Research Articles

Cladistics is an approach to biological classification in which organisms are categorized in groups ("clades") based on hypotheses of most recent common ancestry. The evidence for hypothesized relationships is typically shared derived characteristics (synapomorphies) that are not present in more distant groups and ancestors. However, from an empirical perspective, common ancestors are inferences based on a cladistic hypothesis of relationships of taxa whose character states can be observed. Theoretically, a last common ancestor and all its descendants constitute a (minimal) clade. Importantly, all descendants stay in their overarching ancestral clade. For example, if the terms worms or fishes were used within a strict cladistic framework, these terms would include humans. Many of these terms are normally used paraphyletically, outside of cladistics, e.g. as a 'grade', which are fruitless to precisely delineate, especially when including extinct species. Radiation results in the generation of new subclades by bifurcation, but in practice sexual hybridization may blur very closely related groupings.

In biology, phylogenetics is the study of the evolutionary history and relationships among or within groups of organisms. These relationships are determined by phylogenetic inference, methods that focus on observed heritable traits, such as DNA sequences, protein amino acid sequences, or morphology. The result of such an analysis is a phylogenetic tree—a diagram containing a hypothesis of relationships that reflects the evolutionary history of a group of organisms.

<span class="mw-page-title-main">Paraphyly</span> Type of taxonomic group

Paraphyly is a taxonomic term describing a grouping that consists of the grouping's last common ancestor and some but not all of its descendant lineages. The grouping is said to be paraphyletic with respect to the excluded subgroups. In contrast, a monophyletic grouping includes a common ancestor and all of its descendants.

<span class="mw-page-title-main">Cladogram</span> Diagram used to show relations among groups of organisms with common origins

A cladogram is a diagram used in cladistics to show relations among organisms. A cladogram is not, however, an evolutionary tree because it does not show how ancestors are related to descendants, nor does it show how much they have changed, so many differing evolutionary trees can be consistent with the same cladogram. A cladogram uses lines that branch off in different directions ending at a clade, a group of organisms with a last common ancestor. There are many shapes of cladograms but they all have lines that branch off from other lines. The lines can be traced back to where they branch off. These branching off points represent a hypothetical ancestor which can be inferred to exhibit the traits shared among the terminal taxa above it. This hypothetical ancestor might then provide clues about the order of evolution of various features, adaptation, and other evolutionary narratives about ancestors. Although traditionally such cladograms were generated largely on the basis of morphological characters, DNA and RNA sequencing data and computational phylogenetics are now very commonly used in the generation of cladograms, either on their own or in combination with morphology.

A phylogenetic tree, phylogeny or evolutionary tree is a graphical representation which shows the evolutionary history between a set of species or taxa during a specific time. In other words, it is a branching diagram or a tree showing the evolutionary relationships among various biological species or other entities based upon similarities and differences in their physical or genetic characteristics. In evolutionary biology, all life on Earth is theoretically part of a single phylogenetic tree, indicating common ancestry. Phylogenetics is the study of phylogenetic trees. The main challenge is to find a phylogenetic tree representing optimal evolutionary ancestry between a set of species or taxa. Computational phylogenetics focuses on the algorithms involved in finding optimal phylogenetic tree in the phylogenetic landscape.

Molecular phylogenetics is the branch of phylogeny that analyzes genetic, hereditary molecular differences, predominantly in DNA sequences, to gain information on an organism's evolutionary relationships. From these analyses, it is possible to determine the processes by which diversity among species has been achieved. The result of a molecular phylogenetic analysis is expressed in a phylogenetic tree. Molecular phylogenetics is one aspect of molecular systematics, a broader term that also includes the use of molecular data in taxonomy and biogeography.

<span class="mw-page-title-main">Phylogenesis</span>

Phylogenesis is the biological process by which a taxon appears. The science that studies these processes is called phylogenetics.

<span class="mw-page-title-main">Apomorphy and synapomorphy</span> Two concepts on heritable traits

In phylogenetics, an apomorphy is a novel character or character state that has evolved from its ancestral form. A synapomorphy is an apomorphy shared by two or more taxa and is therefore hypothesized to have evolved in their most recent common ancestor. In cladistics, synapomorphy implies homology.

In phylogenetics and computational phylogenetics, maximum parsimony is an optimality criterion under which the phylogenetic tree that minimizes the total number of character-state changes. Under the maximum-parsimony criterion, the optimal tree will minimize the amount of homoplasy. In other words, under this criterion, the shortest possible tree that explains the data is considered best. Some of the basic ideas behind maximum parsimony were presented by James S. Farris in 1970 and Walter M. Fitch in 1971.

<span class="mw-page-title-main">Substitution model</span> Description of the process by which states in sequences change into each other and back

In biology, a substitution model, also called models of sequence evolution, are Markov models that describe changes over evolutionary time. These models describe evolutionary changes in macromolecules, such as DNA sequences or protein sequences, that can be represented as sequence of symbols. Substitution models are used to calculate the likelihood of phylogenetic trees using multiple sequence alignment data. Thus, substitution models are central to maximum likelihood estimation of phylogeny as well as Bayesian inference in phylogeny. Estimates of evolutionary distances are typically calculated using substitution models. Substitution models are also central to phylogenetic invariants because they are necessary to predict site pattern frequencies given a tree topology. Substitution models are also necessary to simulate sequence data for a group of organisms related by a specific tree.

In phylogenetics, long branch attraction (LBA) is a form of systematic error whereby distantly related lineages are incorrectly inferred to be closely related. LBA arises when the amount of molecular or morphological change accumulated within a lineage is sufficient to cause that lineage to appear similar to another long-branched lineage, solely because they have both undergone a large amount of change, rather than because they are related by descent. Such bias is more common when the overall divergence of some taxa results in long branches within a phylogeny. Long branches are often attracted to the base of a phylogenetic tree, because the lineage included to represent an outgroup is often also long-branched. The frequency of true LBA is unclear and often debated, and some authors view it as untestable and therefore irrelevant to empirical phylogenetic inference. Although often viewed as a failing of parsimony-based methodology, LBA could in principle result from a variety of scenarios and be inferred under multiple analytical paradigms.

<span class="mw-page-title-main">Laurasiatheria</span> Clade of mammals

Laurasiatheria is a superorder of placental mammals that groups together true insectivores (eulipotyphlans), bats (chiropterans), carnivorans, pangolins (pholidotes), even-toed ungulates (artiodactyls), odd-toed ungulates (perissodactyls), and all their extinct relatives. From systematics and phylogenetic perspectives, it is subdivided into order Eulipotyphla and clade Scrotifera. It is a sister group to Euarchontoglires with which it forms the magnorder Boreoeutheria. Laurasiatheria was discovered on the basis of the similar gene sequences shared by the mammals belonging to it; no anatomical features have yet been found that unite the group, although a few have been suggested such as a small coracoid process, a simplified hindgut and allantoic vessels that are large to moderate in size. The Laurasiatheria clade is based on DNA sequence analyses and retrotransposon presence/absence data. The superorder originated on the northern supercontinent of Laurasia, after it split from Gondwana when Pangaea broke up. Its last common ancestor is supposed to have lived between ca. 76 to 90 million years ago.

<span class="mw-page-title-main">Evolutionary grade</span> Non-monophyletic grouping of organisms united by morphological or physiological characteristics

A grade is a taxon united by a level of morphological or physiological complexity. The term was coined by British biologist Julian Huxley, to contrast with clade, a strictly phylogenetic unit.

Computational phylogenetics, phylogeny inference, or phylogenetic inference focuses on computational and optimization algorithms, heuristics, and approaches involved in phylogenetic analyses. The goal is to find a phylogenetic tree representing optimal evolutionary ancestry between a set of genes, species, or taxa. Maximum likelihood, parsimony, Bayesian, and minimum evolution are typical optimality criteria used to assess how well a phylogenetic tree topology describes the sequence data. Nearest Neighbour Interchange (NNI), Subtree Prune and Regraft (SPR), and Tree Bisection and Reconnection (TBR), known as tree rearrangements, are deterministic algorithms to search for optimal or the best phylogenetic tree. The space and the landscape of searching for the optimal phylogenetic tree is known as phylogeny search space.

Ancestral reconstruction is the extrapolation back in time from measured characteristics of individuals, populations, or specie to their common ancestors. It is an important application of phylogenetics, the reconstruction and study of the evolutionary relationships among individuals, populations or species to their ancestors. In the context of evolutionary biology, ancestral reconstruction can be used to recover different kinds of ancestral character states of organisms that lived millions of years ago. These states include the genetic sequence, the amino acid sequence of a protein, the composition of a genome, a measurable characteristic of an organism (phenotype), and the geographic range of an ancestral population or species. This is desirable because it allows us to examine parts of phylogenetic trees corresponding to the distant past, clarifying the evolutionary history of the species in the tree. Since modern genetic sequences are essentially a variation of ancient ones, access to ancient sequences may identify other variations and organisms which could have arisen from those sequences. In addition to genetic sequences, one might attempt to track the changing of one character trait to another, such as fins turning to legs.

Bayesian inference of phylogeny combines the information in the prior and in the data likelihood to create the so-called posterior probability of trees, which is the probability that the tree is correct given the data, the prior and the likelihood model. Bayesian inference was introduced into molecular phylogenetics in the 1990s by three independent groups: Bruce Rannala and Ziheng Yang in Berkeley, Bob Mau in Madison, and Shuying Li in University of Iowa, the last two being PhD students at the time. The approach has become very popular since the release of the MrBayes software in 2001, and is now one of the most popular methods in molecular phylogenetics.

Phylogenetic comparative methods (PCMs) use information on the historical relationships of lineages (phylogenies) to test evolutionary hypotheses. The comparative method has a long history in evolutionary biology; indeed, Charles Darwin used differences and similarities between species as a major source of evidence in The Origin of Species. However, the fact that closely related lineages share many traits and trait combinations as a result of the process of descent with modification means that lineages are not independent. This realization inspired the development of explicitly phylogenetic comparative methods. Initially, these methods were primarily developed to control for phylogenetic history when testing for adaptation; however, in recent years the use of the term has broadened to include any use of phylogenies in statistical tests. Although most studies that employ PCMs focus on extant organisms, many methods can also be applied to extinct taxa and can incorporate information from the fossil record.

Wayne Paul Maddison, is a professor and Canada Research Chair in Biodiversity at the departments of zoology and botany at the University of British Columbia, and the Director of the Spencer Entomological Collection at the Beaty Biodiversity Museum.

<span class="mw-page-title-main">Character evolution</span>

Character evolution is the process by which a character or trait evolves along the branches of an evolutionary tree. Character evolution usually refers to single changes within a lineage that make this lineage unique from others. These changes are called character state changes and they are often used in the study of evolution to provide a record of common ancestry. Character state changes can be phenotypic changes, nucleotide substitutions, or amino acid substitutions. These small changes in a species can be identifying features of when exactly a new lineage diverged from an old one.

<span class="mw-page-title-main">Homoplasy</span> Gain or loss of the same feature independently in separate lineages during evolution

Homoplasy, in biology and phylogenetics, is the term used to describe a feature that has been gained or lost independently in separate lineages over the course of evolution. This is different from homology, which is the term used to characterize the similarity of features that can be parsimoniously explained by common ancestry. Homoplasy can arise from both similar selection pressures acting on adapting species, and the effects of genetic drift.

References

  1. Grimaldi, David; Engel, Michael S.; Engel, Michael S. (2005-05-16). Evolution of the Insects. Cambridge University Press. ISBN   9780521821490.
  2. Farris, J. S. (1982). "Outgroups and Parsimony". Systematic Biology. 31 (3): 328–334. doi:10.1093/sysbio/31.3.328. ISSN   1063-5157.
  3. 1 2 3 Nixon, Kevin; Carpenter, James (December 1993). "On Outgroups". Cladistics. 9 (4): 413–426. doi:10.1111/j.1096-0031.1993.tb00234.x. S2CID   221577454.
  4. Giribet, G.; Ribera, C. (June 1998). "The position of arthropods in the animal kingdom: a search for a reliable outgroup for internal arthropod phylogeny". Molecular Phylogenetics and Evolution. 9 (3): 481–488. doi:10.1006/mpev.1998.0494. PMID   9667996.
  5. Barriel, V.; Tassy, P. (June 1998). "Rooting with Multiple Outgroups: Consensus Versus Parsimony". Cladistics. 14 (2): 193–200. doi: 10.1111/j.1096-0031.1998.tb00332.x . S2CID   84759858.
  6. de la Torre-Barcena, Jose Eduardo; Kolokotronis, S.O.; Lee, Ernest; Stevenson, Dennis; Brenner, Eric; Katari, Manpreet; Coruzzi, Gloria; DeSalle, Rob (2009). "The Impact of Outgroup Choice and Missing Data on Major Seed Plant Phylogenetics Using Genome-Wide EST Data". PLOS ONE. 4 (6): e5764. Bibcode:2009PLoSO...4.5764D. doi: 10.1371/journal.pone.0005764 . PMC   2685480 . PMID   19503618.
  7. Maddison, Wayne; et al. (1984). "Outgroup Analysis and Parsimony" (PDF). Systematic Zoology. 33 (1): 83–103. doi:10.2307/2413134. JSTOR   2413134.
  8. Wilberg, Eric W. (2015-07-01). "What's in an Outgroup? The Impact of Outgroup Choice on the Phylogenetic Position of Thalattosuchia (Crocodylomorpha) and the Origin of Crocodyliformes". Systematic Biology. 64 (4): 621–637. doi: 10.1093/sysbio/syv020 . ISSN   1063-5157. PMID   25840332.
  9. O'BRIEN, MICHAEL J.; LYMAN, R.LEE; SAAB, YOUSSEF; SAAB, ELIAS; DARWENT, JOHN; GLOVER, DANIEL S. (2002). "Two Issues in Archaeological Phylogenetics: Taxon Construction and Outgroup Selection". Journal of Theoretical Biology. 215 (2): 133–150. doi:10.1006/jtbi.2002.2548. PMID   12051970.
  10. David A. Baum; Stacey D. Smith (2013). Tree Thinking: An Introduction to Phylogenetic Biology. Roberts. p. 175. ISBN   978-1-936221-16-5.
  11. Jarvis, E.; et al. (December 2014). "Whole-genome analyses resolve early branches in the tree of life of modern birds". Science. 346 (6215): 1320–1331. Bibcode:2014Sci...346.1320J. doi:10.1126/science.1253451. PMC   4405904 . PMID   25504713.
  12. Stevens, P. F. (1991). "Character States, Morphological Variation, and Phylogenetic Analysis: A Review". Systematic Botany. 16 (3): 553–583. doi:10.2307/2419343. JSTOR   2419343.
  13. Rineau, Valentin; Grand, Anaïs; Zaragüeta, René; Laurin, Michel (May 1, 2015). "Experimental systematics: sensitivity of cladistic methods to polarization and character ordering schemes". Contributions to Zoology. 84 (2): 129–148. doi: 10.1163/18759866-08402003 .
  14. Prado-Martinez, Javier; Marques-Bonet, Tomas (2013). "Great ape genetic diversity and population history". Nature. 499 (7459): 471–475. Bibcode:2013Natur.499..471P. doi:10.1038/nature12228. PMC   3822165 . PMID   23823723.
  15. Murphy, William; Pringle, Thomas; Crider, Tess; Springer, Mark; Miller, Webb (2007). "Using genomic data to unravel the root of the placental mammal phylogeny". Genome Research. 17 (4): 413–421. doi:10.1101/gr.5918807. PMC   1832088 . PMID   17322288.
  16. Cameron, Chris; Garey, James; Swalla, Billie (2000). "Evolution of the chordate body plan: New insights from phylogenetic analyses of deuterostome phyla". PNAS. 97 (9): 4469–4474. doi: 10.1073/pnas.97.9.4469 . PMC   18258 . PMID   10781046.
  17. Matthews, Sarah; Donoghue, Michael (1999). "The Root of Angiosperm Phylogeny Inferred from Duplicate Phytochrome Genes". Science. 286 (5441): 947–950. doi:10.1126/science.286.5441.947. PMID   10542147.