Long branch attraction

Last updated

In phylogenetics, long branch attraction (LBA) is a form of systematic error whereby distantly related lineages are incorrectly inferred to be closely related. [1] LBA arises when the amount of molecular or morphological change accumulated within a lineage is sufficient to cause that lineage to appear similar (thus closely related) to another long-branched lineage, solely because they have both undergone a large amount of change, rather than because they are related by descent. Such bias is more common when the overall divergence of some taxa results in long branches within a phylogeny. Long branches are often attracted to the base of a phylogenetic tree, because the lineage included to represent an outgroup is often also long-branched. The frequency of true LBA is unclear and often debated, [1] [2] [3] and some authors view it as untestable and therefore irrelevant to empirical phylogenetic inference. [4] Although often viewed as a failing of parsimony-based methodology, LBA could in principle result from a variety of scenarios and be inferred under multiple analytical paradigms.

Contents

Causes

LBA was first recognized as problematic when analyzing discrete morphological character sets under parsimony criteria, however Maximum Likelihood analyses of DNA or protein sequences are also susceptible. A simple hypothetical example can be found in Felsenstein 1978 where it is demonstrated that for certain unknown "true" trees, some methods can show bias for grouping long branches, ultimately resulting in the inference of a false sister relationship. [5] Often this is because convergent evolution of one or more characters included in the analysis has occurred in multiple taxa. Although they were derived independently, these shared traits can be misinterpreted in the analysis as being shared due to common ancestry.

In phylogenetic and clustering analyses, LBA is a result of the way clustering algorithms work: terminals or taxa with many autapomorphies (character states unique to a single branch) may by chance exhibit the same states as those on another branch (homoplasy). A phylogenetic analysis will group these taxa together as a clade unless other synapomorphies outweigh the homoplastic features to group together true sister taxa.

These problems may be minimized by using methods that correct for multiple substitutions at the same site, by adding taxa related to those with the long branches that add additional true synapomorphies to the data, or by using alternative slower evolving traits (e.g. more conservative gene regions).

Results

The result of LBA in evolutionary analyses is that rapidly evolving lineages may be inferred to be sister taxa, regardless of their true relationships. For example, in DNA sequence-based analyses, the problem arises when sequences from two (or more) lineages evolve rapidly. There are only four possible nucleotides and when DNA substitution rates are high, the probability that two lineages will evolve the same nucleotide at the same site increases. When this happens, a phylogenetic analysis may erroneously interpret this homoplasy as a synapomorphy (i.e., evolving once in the common ancestor of the two lineages).

The opposite effect may also be observed, in that if two (or more) branches exhibit particularly slow evolution among a wider, fast evolving group, those branches may be misinterpreted as closely related. As such, "long branch attraction" can in some ways be better expressed as "branch length attraction". However, it is typically long branches that exhibit attraction.

The recognition of long-branch attraction implies that there is some other evidence that suggests that the phylogeny is incorrect. For example, two different sources of data (i.e. molecular and morphological) or even different methods or partition schemes might support different placement for the long-branched groups. [6] Hennig's Auxiliary Principle suggests that synapomorphies should be viewed as de facto evidence of grouping unless there is specific contrary evidence (Hennig, 1966; Schuh and Brower, 2009).

A simple and effective method for determining whether or not long branch attraction is affecting tree topology is the SAW method, named for Siddal and Whiting. If long branch attraction is suspected between a pair of taxa (A and B), simply remove taxon A ("saw" off the branch) and re-run the analysis. Then remove B and replace A, running the analysis again. If either of the taxa appears at a different branch point in the absence of the other, there is evidence of long branch attraction. Since long branches can't possibly attract one another when only one is in the analysis, consistent taxon placement between treatments would indicate long branch attraction is not a problem. [7]

Example

An example of long branch attraction. On this "true tree", branches leading to A and C might be expected to have a higher number of character state transformations than the internal branch or branches leading to B and D. LongBranch.png
An example of long branch attraction. On this "true tree", branches leading to A and C might be expected to have a higher number of character state transformations than the internal branch or branches leading to B and D. 

Assume for simplicity that we are considering a single binary character (it can either be + or -) distributed on the unrooted "true tree" with branch lengths proportional to amount of character state change, shown in the figure. Because the evolutionary distance from B to D is small, we assume that in the vast majority of all cases, B and D will exhibit the same character state. Here, we will assume that they are both + (+ and - are assigned arbitrarily and swapping them is only a matter of definition). If this is the case, there are four remaining possibilities. A and C can both be +, in which case all taxa are the same and all the trees have the same length. A can be + and C can be -, in which case only one character is different, and we cannot learn anything, as all trees have the same length. Similarly, A can be - and C can be +. The only remaining possibility is that A and C are both -. In this case, however, we view either A and C, or B and D, as a group with respect to the other (one character state is ancestral, the other is derived, and the ancestral state does not define a group). As a consequence, when we have a "true tree" of this type, the more data we collect (i.e. the more characters we study), the more of them are homoplastic and support the wrong tree. [8] Of course, when dealing with empirical data in phylogenetic studies of actual organisms, we never know the topology of the true tree, and the more parsimonious (AC) or (BD) might well be the correct hypothesis.

Related Research Articles

Cladistics is an approach to biological classification in which organisms are categorized in groups ("clades") based on hypotheses of most recent common ancestry. The evidence for hypothesized relationships is typically shared derived characteristics (synapomorphies) that are not present in more distant groups and ancestors. However, from an empirical perspective, common ancestors are inferences based on a cladistic hypothesis of relationships of taxa whose character states can be observed. Theoretically, a last common ancestor and all its descendants constitute a (minimal) clade. Importantly, all descendants stay in their overarching ancestral clade. For example, if the terms worms or fishes were used within a strict cladistic framework, these terms would include humans. Many of these terms are normally used paraphyletically, outside of cladistics, e.g. as a 'grade', which are fruitless to precisely delineate, especially when including extinct species. Radiation results in the generation of new subclades by bifurcation, but in practice sexual hybridization may blur very closely related groupings.

<span class="mw-page-title-main">Clade</span> Group of a common ancestor and all descendants

In biological phylogenetics, a clade, also known as a monophyletic group or natural group, is a grouping of organisms that are monophyletic – that is, composed of a common ancestor and all its lineal descendants – on a phylogenetic tree. In the taxonomical literature, sometimes the Latin form cladus is used rather than the English form.

In biology, phylogenetics is the study of the evolutionary history and relationships among or within groups of organisms. These relationships are determined by phylogenetic inference methods that focus on observed heritable traits, such as DNA sequences, protein amino acid sequences, or morphology. The result of such an analysis is a phylogenetic tree—a diagram containing a hypothesis of relationships that reflects the evolutionary history of a group of organisms.

In biology, phenetics, also known as taximetrics, is an attempt to classify organisms based on overall similarity, usually in morphology or other observable traits, regardless of their phylogeny or evolutionary relation. It is closely related to numerical taxonomy which is concerned with the use of numerical methods for taxonomic classification. Many people contributed to the development of phenetics, but the most influential were Peter Sneath and Robert R. Sokal. Their books are still primary references for this sub-discipline, although now out of print.

<span class="mw-page-title-main">Cladogram</span> Diagram used to show relations among groups of organisms with common origins

A cladogram is a diagram used in cladistics to show relations among organisms. A cladogram is not, however, an evolutionary tree because it does not show how ancestors are related to descendants, nor does it show how much they have changed, so many differing evolutionary trees can be consistent with the same cladogram. A cladogram uses lines that branch off in different directions ending at a clade, a group of organisms with a last common ancestor. There are many shapes of cladograms but they all have lines that branch off from other lines. The lines can be traced back to where they branch off. These branching off points represent a hypothetical ancestor which can be inferred to exhibit the traits shared among the terminal taxa above it. This hypothetical ancestor might then provide clues about the order of evolution of various features, adaptation, and other evolutionary narratives about ancestors. Although traditionally such cladograms were generated largely on the basis of morphological characters, DNA and RNA sequencing data and computational phylogenetics are now very commonly used in the generation of cladograms, either on their own or in combination with morphology.

<span class="mw-page-title-main">Phylogenetic tree</span> Branching diagram of evolutionary relationships between organisms

A phylogenetic tree is a branching diagram or a tree showing the evolutionary relationships among various biological species or other entities based upon similarities and differences in their physical or genetic characteristics. All life on Earth is part of a single phylogenetic tree, indicating common ancestry.

Molecular phylogenetics is the branch of phylogeny that analyzes genetic, hereditary molecular differences, predominantly in DNA sequences, to gain information on an organism's evolutionary relationships. From these analyses, it is possible to determine the processes by which diversity among species has been achieved. The result of a molecular phylogenetic analysis is expressed in a phylogenetic tree. Molecular phylogenetics is one aspect of molecular systematics, a broader term that also includes the use of molecular data in taxonomy and biogeography.

Evolutionary taxonomy, evolutionary systematics or Darwinian classification is a branch of biological classification that seeks to classify organisms using a combination of phylogenetic relationship, progenitor-descendant relationship, and degree of evolutionary change. This type of taxonomy may consider whole taxa rather than single species, so that groups of species can be inferred as giving rise to new groups. The concept found its most well-known form in the modern evolutionary synthesis of the early 1940s.

<span class="mw-page-title-main">Outgroup (cladistics)</span>

In cladistics or phylogenetics, an outgroup is a more distantly related group of organisms that serves as a reference group when determining the evolutionary relationships of the ingroup, the set of organisms under study, and is distinct from sociological outgroups. The outgroup is used as a point of comparison for the ingroup and specifically allows for the phylogeny to be rooted. Because the polarity (direction) of character change can be determined only on a rooted phylogeny, the choice of outgroup is essential for understanding the evolution of traits along a phylogeny.

<span class="mw-page-title-main">Apomorphy and synapomorphy</span> Two concepts on heritable traits

In phylogenetics, an apomorphy is a novel character or character state that has evolved from its ancestral form. A synapomorphy is an apomorphy shared by two or more taxa and is therefore hypothesized to have evolved in their most recent common ancestor. In cladistics, synapomorphy implies homology.

In phylogenetics, maximum parsimony is an optimality criterion under which the phylogenetic tree that minimizes the total number of character-state changes. Under the maximum-parsimony criterion, the optimal tree will minimize the amount of homoplasy. In other words, under this criterion, the shortest possible tree that explains the data is considered best. Some of the basic ideas behind maximum parsimony were presented by James S. Farris in 1970 and Walter M. Fitch in 1971.

Computational phylogenetics is the application of computational algorithms, methods, and programs to phylogenetic analyses. The goal is to assemble a phylogenetic tree representing a hypothesis about the evolutionary ancestry of a set of genes, species, or other taxa. For example, these techniques have been used to explore the family tree of hominid species and the relationships between specific genes shared by many types of organisms.

<span class="mw-page-title-main">Autapomorphy</span> Distinctive feature, known as a derived trait, that is unique to a given taxon

In phylogenetics, an autapomorphy is a distinctive feature, known as a derived trait, that is unique to a given taxon. That is, it is found only in one taxon, but not found in any others or outgroup taxa, not even those most closely related to the focal taxon. It can therefore be considered an apomorphy in relation to a single taxon. The word autapomorphy, first introduced in 1950 by German entomologist Willi Hennig, is derived from the Greek words αὐτός, autos "self"; ἀπό, apo "away from"; and μορφή, morphḗ = "shape".

Ancestral reconstruction is the extrapolation back in time from measured characteristics of individuals to their common ancestors. It is an important application of phylogenetics, the reconstruction and study of the evolutionary relationships among individuals, populations or species to their ancestors. In the context of evolutionary biology, ancestral reconstruction can be used to recover different kinds of ancestral character states of organisms that lived millions of years ago. These states include the genetic sequence, the amino acid sequence of a protein, the composition of a genome, a measurable characteristic of an organism (phenotype), and the geographic range of an ancestral population or species. This is desirable because it allows us to examine parts of phylogenetic trees corresponding to the distant past, clarifying the evolutionary history of the species in the tree. Since modern genetic sequences are essentially a variation of ancient ones, access to ancient sequences may identify other variations and organisms which could have arisen from those sequences. In addition to genetic sequences, one might attempt to track the changing of one character trait to another, such as fins turning to legs.

Bayesian inference of phylogeny combines the information in the prior and in the data likelihood to create the so-called posterior probability of trees, which is the probability that the tree is correct given the data, the prior and the likelihood model. Bayesian inference was introduced into molecular phylogenetics in the 1990s by three independent groups: Bruce Rannala and Ziheng Yang in Berkeley, Bob Mau in Madison, and Shuying Li in University of Iowa, the last two being PhD students at the time. The approach has become very popular since the release of the MrBayes software in 2001, and is now one of the most popular methods in molecular phylogenetics.

Phylogenetic comparative methods (PCMs) use information on the historical relationships of lineages (phylogenies) to test evolutionary hypotheses. The comparative method has a long history in evolutionary biology; indeed, Charles Darwin used differences and similarities between species as a major source of evidence in The Origin of Species. However, the fact that closely related lineages share many traits and trait combinations as a result of the process of descent with modification means that lineages are not independent. This realization inspired the development of explicitly phylogenetic comparative methods. Initially, these methods were primarily developed to control for phylogenetic history when testing for adaptation; however, in recent years the use of the term has broadened to include any use of phylogenies in statistical tests. Although most studies that employ PCMs focus on extant organisms, many methods can also be applied to extinct taxa and can incorporate information from the fossil record.

Distance matrices are used in phylogeny as non-parametric distance methods and were originally applied to phenetic data using a matrix of pairwise distances. These distances are then reconciled to produce a tree. The distance matrix can come from a number of different sources, including measured distance or morphometric analysis, various pairwise distance formulae applied to discrete morphological characters, or genetic distance from sequence, restriction fragment, or allozyme data. For phylogenetic character data, raw distance values can be calculated by simply counting the number of pairwise differences in character states.

Transformed cladistics, also known as pattern cladistics is an epistemological approach to the cladistic method of phylogenetic inference and classification that makes no a priori assumptions about common ancestry. It was advocated by Norman Platnick, Colin Patterson, Ronald Brady and others in the 1980s, but has few modern proponents. The book, Foundations of Systematics and Biogeography by David Williams and Malte Ebach provides a thoughtful history of the origins of this point of view.

<span class="mw-page-title-main">Character evolution</span>

Character evolution is the process by which a character or trait evolves along the branches of an evolutionary tree. Character evolution usually refers to single changes within a lineage that make this lineage unique from others. These changes are called character state changes and they are often used in the study of evolution to provide a record of common ancestry. Character state changes can be phenotypic changes, nucleotide substitutions, or amino acid substitutions. These small changes in a species can be identifying features of when exactly a new lineage diverged from an old one.

<span class="mw-page-title-main">Homoplasy</span> Gain or loss of the same feature independently in separate lineages during evolution

Homoplasy, in biology and phylogenetics, is the term used to describe a feature that has been gained or lost independently in separate lineages over the course of evolution. This is different from homology, which is the term used to characterize the similarity of features that can be parsimoniously explained by common ancestry. Homoplasy can arise from both similar selection pressures acting on adapting species, and the effects of genetic drift.

References

  1. 1 2 Bergsten, Johannes (2005-04-01). "A review of long-branch attraction". Cladistics. 21 (2): 163–193. doi: 10.1111/j.1096-0031.2005.00059.x . ISSN   1096-0031. PMID   34892859. S2CID   55273819.
  2. Anderson, F. E., & Swofford, D. L. (2004). Should we be worried about long-branch attraction in real data sets? Investigations using metazoan 18S rDNA. Molecular Phylogenetics and Evolution, 33(2), 440-451.
  3. Huelsenbeck, J. P. (1997). Is the Felsenstein zone a fly trap?. Systematic Biology, 46(1), 69-74.
  4. Brower, AVZ. 2017. Statistical consistency and phylogenetic inference: a brief review. Cladistics, 34(5), 562-567 (DOI: 10.1111/cla.12216).
  5. Felsenstein, J. (1978). Cases in which parsimony or compatibility methods will be positively misleading. Systematic Biology, 27(4), 401-410.
  6. Coiro, Mario; Chomicki, Guillaume; Doyle, James A. (August 2018). "Experimental signal dissection and method sensitivity analyses reaffirm the potential of fossils and morphology in the resolution of the relationship of angiosperms and Gnetales" (PDF). Paleobiology. 44 (3): 490–510. doi:10.1017/pab.2018.23. ISSN   0094-8373. S2CID   91488394.
  7. Siddall, M. E.; Whiting, M. F. (1999). "Long-Branch Abstractions". Cladistics. 15: 9–24. doi:10.1111/j.1096-0031.1999.tb00391.x. S2CID   67853737.
  8. Huelsenbeck, J. P. and D. M. Hillis. 1993. Success of phylogenetic methods in the four-taxon case. Syst. Biol.42:247-264.