ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Correspondence

The deep(er) roots of Eukaryotes and Akaryotes

[version 1; peer review: 1 approved, 1 approved with reservations, 1 not approved]
PUBLISHED 13 Feb 2020
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

Abstract

Background: Locating the root node of the “tree of life” (ToL) is one of the hardest problems in phylogenetics. The root-node or the universal common ancestor (UCA) divides descendants into organismal domains. Two notable variants of the two-domains ToL (2D-ToL) have gained support recently. One 2D-ToL posits that eukaryotes (organisms with nuclei) and akaryotes (organisms without nuclei) are sister clades that diverged from the UCA and that Asgard archaea are sister to other archaea, whereas the other proposes that eukaryotes emerged within archaea and places Asgard archaea sister to eukaryotes. Williams et al. (Nature Ecol. Evol. 4: 138–147; 2020) re-evaluated the data and methods that support the competing two-domains proposals and concluded that eukaryotes are the closest relatives of Asgard archaea.
Critique: We argue that important aspects of estimating evolutionary relatedness and assessing phylogenetic signal in empirical data were overlooked. We focus on phylogenetic character reconstructions necessary to describe the UCA or its closest descendants in the absence of reliable fossils. It is well known that different character types present different perspectives on evolutionary history that relate to different phylogenetic depths. Which 2D-ToL is better supported depends on which kind of molecular features – protein-domains or their component amino acids – are better for resolving common ancestors at the roots of clades. In practice, this involves reconstructing character compositions of the ancestral nodes all the way back to the UCA. We believe the criticisms of 2D-ToL focus on superficial aspects of the data and reflects common misunderstandings of phylogenetic reconstructions using protein domains (folds).  
Clarifications: Models of protein domain evolution support more reliable phylogenetic reconstructions. In contrast, even the best available amino acid substitution models fail to resolve the archaeal radiation, despite employing thousands of genes. Therefore, the primary domains Eukaryotes and Akaryotes are better supported in a 2D-ToL.

Keywords

Asgard archaea, 2D, tree of life, LUCA, phylogenomics, nonstationary, rooting, eukaryogenesis

Background

Models of character evolution are essential to determine the evolutionary relationships of organisms. Phylogenetic models that employ protein structural domains as characters place Asgards as sister to other archaea (Figure 1a), and archaea sister to bacteria in the “tree of life” (ToL)13. Whereas several analyses that employ amino-acids as characters fail to resolve the archaeal radiation (Figure 1b) or to identify a distinct ancestor of archaea4. Conflicts between different studies that employ different character types are often due to incompatible assumptions about the character-evolution processes57. In a recent study, Williams et al.4 compared the performance of several character-evolution models to evaluate which one of the ToL hypotheses is better supported. The authors tested the performance of several substitution models for amino acid characters using empirical data, but models for protein-domain characters with simulated data4.

63b834dc-4e81-4fb7-9c14-70ecd74d57e5_figure1.gif

Figure 1. Different 2D “tree of life” (2D-ToL) variants supported by different types of molecular characters using the best-fitting probability models1,4.

(a) The rooted tree (phylogeny) inferred by estimating the evolution of species-specific changes in protein domain composition. Directional character-evolution models place the root between eukaryotes and akaryotes. Named groups of organisms, including Asgardarchaeota are resolved into clades (i.e. a single ancestor). The Asgard archaea are sister to all other archaea, with euryarchaea being the closest relatives. The phylogeny shown is a condensed form obtained after collapsing the clades of the full tree shown previously1. (b) The unrooted tree inferred by estimating the evolution of protein/gene-specific changes in amino acid composition. The unrooted-tree is the same as in Figure S8d in the article by Williams et al.4. The group archaea, and Asgard archaea are unresolved; and a distinct archaeal ancestor is absent. Time-reversible character evolution models cannot identify the root (the universal common ancestor (UCA)) as well. Alternative rootings polarize the branching order in opposite directions implying incompatible relationships among the major organismal clades. Regardless of the rooting, neither Asgard archaea nor archaea as a whole can be resolved as a monophyletic group. Further, Argards do not share a unique common ancestor with other archaea. Even the best-fitting amino acid evolution models cannot resolve the archaeal radiation despite employing thousands of genes4. The poor resolution of archaea is seen in virtually all trees, with or without inclusion of long branches of bacteria. In such ambiguous cases, “character polarization” as in (a) is likely to be efficient, rather than the more commonly used “graphical polarization” of unrooted trees. Clade support is indicated for key groups as (a) Bayesian posterior probability, (b) bootstrap percentage.

The authors present a comprehensive analysis of protein sequence data and lucid arguments about the fit of the amino acid substitution models to the relevant datasets examined. However, description of the protein domain characters (which they refer to as protein folds) and relevant published analyses were not adequately explained (see references 2 and 3) or were overlooked (see references 1 and 8). Further, based on simple frequency distributions they suspect that identification of UCA and character compositions at the root node could be biased. Such simple frequency distributions can be misleading. Careful and rigorous analyses of empirical datasets13,8 that demonstrate the robustness of rooting and tree topology against many potential biases were ignored4. Here, we would like to clarify certain aspects of the published protein domain-based phylogenies so as to avoid further misunderstandings and to highlight their advantages for phylogenetic reconstruction.

Williams et al.4 rely on (i) simulated data to reject a robust phylogeny inferred from empirical data (Figure 1a) that supports the evolutionary kinship of eukaryotes and akaryotes (akaryote 2D-ToL)13; and (ii) the so-called bacterial rooting to interpret a partially resolved, unrooted-ToL (Figure 1b), asserting that Asgard archaea are the closest relatives of eukaryotes (eocyte 2D-ToL)4. Both assertions are questionable, since (i) simulated data neither reproduce nor represent empirical distributions, and (ii) poorly resolved trees obscure evolutionary relationships. We argue that Williams et al.4 have overlooked important aspects of assessing phylogenetic signal in empirical data, and that it may be premature to reject a well-supported phylogeny13 based on simulated data4.

Which molecular feature is a better phylogenetic character?

Reversibility of amino-acid replacements due to biochemical redundancy makes determining character compositions of ancestral nodes ambiguous, as character polarity is ambiguous. This has been a sticking point for locating a distinct archaeal common ancestor (CA) to resolve the archaeal radiation. This is routinely seen as a conspicuous absence of the archaeal CA as well as the universal CA (UCA) in unrooted trees (e.g. Figure 1b), inferred using time-reversible models of character evolution4,9,10. Without a distinct node to unite the archaeal branches, the archaea are unresolved, whereas eukaryotes and bacteria are resolved so that their CA nodes are discernable.

Protein structural domains, unlike amino acids, are biochemically non-redundant (see below) and have proven to be excellent “genomic characters”1,2 that support a robust akaryote 2D-ToL (Figure 1a). Though undervalued, they afford many conceptual and technical advantages over amino acids for reliable phylogenetic modeling1,7,11 and estimating ancestral compositions2,3,12:

  • Substitutions between structural domains do not occur, unlike amino acid replacements, since each domain defines a distinctive biochemical function1 (Figure 2a).

  • The natural bias in gain/loss rates, arising from the difficulty of parallel gains and the relative ease of parallel losses, is useful for implementing directional (rooted) character-evolution models3,12,13.

63b834dc-4e81-4fb7-9c14-70ecd74d57e5_figure2.gif

Figure 2. Compositions of unique protein-domains identify with organismal families whereas amino acid compositions of individual domains relate to gene families.

(a) Protein-domains are considered to be independent evolutionary units with a distinct tertiary fold, amino acid sequence and biochemical function. The majority of proteins are multi-domain proteins formed by duplication and recombination of domain units. Covariation of protein-domain composition among the 125 species sampled by Williams et al.4 (top) was compared by principal component analysis (PCA). Each circle in the PCA projection (top left) is a distinct species, defined by a species-specific domain cohort. Asgards are highlighted as filled circles. The frequency distribution (top right) shows the number of distinct protein-domains per species. Vertical intersecting lines in the histograms are the median numbers of protein-domains. Protein domain composition is characteristic of clades of species (top left). In contrast, covariation of amino acid composition (bottom) in a single-domain (super)family is not clade-specific, but gene family-specific. Multiple sequence alignments of a single domain (c.37.1) shared by 5/50 concatenated orthologous gene families from 125 species were sampled for the PCA projection. (b) Effects of severe perturbation of the domain composition in recovering clade-specific distributions was tested in a sample of 141 species. Although it is common to suspect that the rooting between akaryotes and eukaryotes could be biased due to a larger domain cohort in eukaryotes4, it is not the case2,3,12. Diversity of clade-specific domain composition (top right) measured simply as the number of protein domains4 is a poor descriptor of heterogeneity, and can be misleading. Clades are grouped by covarying “protein-domain types”, but not by numbers alone. The rooting is stable and the tree topology is virtually identical even after reducing the eukaryote cohort by 1/3rds (middle) or 2/3rds (bottom)8 of the original composition2. Description of the PCA projections and frequencies are the same as in (a).

A key advantage of non-redundant characters is that estimating ancestral compositions and evolutionary paths of individual characters is much less ambiguous. In addition to identifying the root nodes, an added benefit of the built-in directionality is that mutually exclusive evolutionary fates of individual features – inheritance, loss or transfer – can be resolved efficiently using directional-evolution models1,8,13. Harish et al.13 demonstrated that difficult phylogenetic problems can be resolved efficiently by employing protein domain characters and directional evolution models.

To be clear, unrooted trees are not phylogenies per se, since the absence of root-ancestor(s) obscures ancestor-descendant polarity and phylogenetic relatedness14,15. Since identifying the closest relatives of extant groups is the same as determining the closeness of their common ancestors, time-reversible models and unrooted trees remain ineffective tools (Figure 1b). Thus, regardless of the gene-aggregation and tree-reconciliation method used for estimating a consensus unrooted tree4, the location of the archaeal CA or UCA remains ambiguous (Figure 1b). Support from fossils or other sources are not reliable, despite claims to the contrary4. Likewise, predicting the origins of single domains or single genes by estimating amino acid (or nucleotide) compositions also remains ambiguous (reviewed in refs 1,6,7). A sobering revelation is that some datasets/models may be of little use or relevance to resolve questions of deep time evolution – this is sad but true.

Will more complex models minimize uncertainties?

Williams et al.4 argue that (i) directional-evolution models12,13 may be unsuitable to predict the unique origin of homologous protein domains; and (ii) the akaryote 2D-ToL13 is an unsatisfactory explanation of the evolution of clade-specific compositions of protein domains (Figure 2). Their arguments seem to imply that phylogenetic signal can be recovered only by modeling evolution of amino acid composition. However, the fact that even the best-fitting substitution models are inadequate4, despite ever increasing model complexity to resolve conflicting signals (Figure 1b), suggests that different protein domain-families may require different but incompatible substitution models (Figure 2a). Further, such incompatibilities are likely to make estimating the absolute origins of single-domain families and single genes difficult, since a majority of genes are formed by duplication and recombination of distinct domains1. As a result, distinguishing between gene duplication and horizontal gene transfer, as well as quantifying the extent of duplications and transfers using primary sequences, is highly ambiguous13.

The KVR13 model for protein domain data1,2 is an extension of the Markov k states (Mk) model16, a generic probability model for discrete-state characters. A variant at k ≥20 is suitable for modeling evolution of amino acids or copy numbers of gene or protein domain families. While time-reversible variants produce unrooted trees, such directional models consistently recover a 2D phylogeny (Figure 1a) in which akaryotes are the closest relatives of eukaryotes1,2,8. The KVR model assumes that the root ancestor has a different character composition than the rest of the tree, which is essentially an irreversible acyclic process. This is fully consistent with the idea that, on a grand scale, the “tree of life” describes broad generalizations of singular events and major transitions underlying striking sister clade differences. Since parallel evolution of homologous protein-domains or distinct domain permutations is very rare, the KVR model adequately captures the evolution of unique features.

The assumptions of the KVR model are also consistent with the idea that the idiosyncratic compositions of homologous protein-domains (Figure 2) is a characteristic of the clades13. In contrast, amino acid compositions in single-domain families are not (Figure 2a). That is, patterns of covariation of species-specific protein-domain compositions clearly distinguish eukaryotes from akaryotes (and archaebacteria from eubacteria). The systematic covariation of homologous domains among the clades is best explained as phylogenetic effect. Consequently, the akaryote 2D-ToL (Figure1a) was consistently recovered with robust support for the major clades regardless of the taxonomic/protein domain diversity sampled (Figure 2b), and regardless of the model complexity13,8,12. By contrast, patterns of amino acid covariation are indiscriminate with regard to organismal families, although gene families can be efficiently identified.

The KVR model is an optimal explanation of the evolution of clade-specific composition of homologous features. Complex variants of the KVR model that account for rate variation among both characters and branches also consistently recovered the akaryote 2D-ToL (Figure 1a) despite significantly different model fits1. More complex models are available, such as the no-common-mechanism model17, an extremely parameter-rich model that allows each character to have its own rate, branch length and topology parameters. Even more complex models can be implemented, which assume that the tempo and mode of evolution changes at each internal node along the phylogeny4. However, such over-specified models may not be optimal for generalizing the evolutionary process and may over-fit observed patterns – a form of model misspecification. For instance, empirical datasets are limited to a finite set of homologous protein domains that range between 2,000 and 10,000 characters depending on the protein structure classification scheme1. By contrast, Williams et al.4 use 1,000,000 characters in their simulations to estimate the fit between the simulated data and over-complex models4. That said, it remains to be seen whether more complex models perform better with empirical datasets.

Data and methods

Data sources

Proteome sequences (predicted protein cohorts from genome sequences) were obtained from recently published studies4,8. Homologous protein structural domains were identified using the homology assignment tools provided by the SUPERFAMILY database as in previous studies13. Briefly, each proteome was queried against the hidden Markov model (HMM) library of homologous protein-domains defined at the Superfamily level in the SCOP (Structural Classification of Proteins) hierarchy. The taxonomic diversity of sequenced genomes and the number of unique protein domains identified for each species is shown in Table 1.

Table 1. Taxonomic diversity and number of unique protein domains assessed.

StudyNumber of species
sampled per clade
Number of unique
protein domains
Williams et al.4125 (Archaea:
39; Bacteria: 33;
Eukarya: 52)4
1,720
Harish and
Kurland8
141 (Archaea:
47; Bacteria: 47;
Eukarya: 47)8
1,732

Data analysis

Descriptive statistics of protein-domain compositions for each taxonomic sampling, including the frequency distribution and median number of protein domains for each clade (Archaea, Bacteria and Eukarya), were estimated and visualized using the ggplot2 package (v 3.2.1) in R (v3.6.2). Covariation of clade-specific protein-domain composition, as well as domain-specific amino acid composition, was compared using principal component analysis (PCA). Components were generated by an eigenvector decomposition of the character matrix. PCA scores were based on percentage identity of character compositions.

Data availability

Source data

The predicted protein cohorts from genome sequences taken from Williams et al.4 and Harish and Kurland8 were assessed.

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 13 Feb 2020
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Harish A and Morrison D. The deep(er) roots of Eukaryotes and Akaryotes [version 1; peer review: 1 approved, 1 approved with reservations, 1 not approved]. F1000Research 2020, 9:112 (https://doi.org/10.12688/f1000research.22338.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 13 Feb 2020
Views
63
Cite
Reviewer Report 29 May 2020
John Gatesy, Division of Vertebrate Zoology, Sackler Institute for Comparative Genomics, American Museum of Natural History, New York City, NY, USA 
Not Approved
VIEWS 63
Harish and Morrison explore rooting the tree of Life given recently proposed hypotheses. They might consider the following in editing/improving their manuscript:
 
  1. First sentence of the background. Not sure I agree; it depends on what
... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Gatesy J. Reviewer Report For: The deep(er) roots of Eukaryotes and Akaryotes [version 1; peer review: 1 approved, 1 approved with reservations, 1 not approved]. F1000Research 2020, 9:112 (https://doi.org/10.5256/f1000research.24642.r63265)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 22 Jun 2020
    Ajith Harish, Unaffiliated, Uppsala, 756 57, Sweden
    22 Jun 2020
    Author Response
    Response to reviewer

    We thank the reviewer for their suggestions. The comments helped us improve the clarity of the presentation. We revised the text extensively to address the issues raised.

    Suggestion:
    1.     First ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 22 Jun 2020
    Ajith Harish, Unaffiliated, Uppsala, 756 57, Sweden
    22 Jun 2020
    Author Response
    Response to reviewer

    We thank the reviewer for their suggestions. The comments helped us improve the clarity of the presentation. We revised the text extensively to address the issues raised.

    Suggestion:
    1.     First ... Continue reading
Views
58
Cite
Reviewer Report 04 May 2020
Jacob S. Berv, Department of Ecology and Evolutionary Biology and Museum of Paleontology, University of Michigan, Ann Arbor, MI, USA 
Stephen A. Smith, Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA 
Approved with Reservations
VIEWS 58
Review of: The deep(er) roots of Eukaryotes and Akaryotes
 
Jacob S. Berv and Stephen A. Smith

Introduction
In the present article (Harish and Morrison, 2020), Harish and Morrison argue that prior work (Williams ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Berv JS and Smith SA. Reviewer Report For: The deep(er) roots of Eukaryotes and Akaryotes [version 1; peer review: 1 approved, 1 approved with reservations, 1 not approved]. F1000Research 2020, 9:112 (https://doi.org/10.5256/f1000research.24642.r60012)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 22 Jun 2020
    Ajith Harish, Unaffiliated, Uppsala, 756 57, Sweden
    22 Jun 2020
    Author Response
    Response to reviewers

    We thank the reviewers for their detailed review and suggestions.
    ​​​​​​​
    Suggestion:
    One issue not addressed by Harish and Morrison but that we feel warrants comment regards branch lengths.
     
    Response: We ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 22 Jun 2020
    Ajith Harish, Unaffiliated, Uppsala, 756 57, Sweden
    22 Jun 2020
    Author Response
    Response to reviewers

    We thank the reviewers for their detailed review and suggestions.
    ​​​​​​​
    Suggestion:
    One issue not addressed by Harish and Morrison but that we feel warrants comment regards branch lengths.
     
    Response: We ... Continue reading
Views
225
Cite
Reviewer Report 31 Mar 2020
Edward Braun, Department of Biology, Genetics Institute, University of Florida, Gainesville, FL, USA 
Approved
VIEWS 225
It is challenging to review a reply without considering the original paper carefully. In this case it is doubly challenging because the most relevant portion of Williams et al. (2020)1 is itself a reply to Harish and Kurland (2017)2. This ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Braun E. Reviewer Report For: The deep(er) roots of Eukaryotes and Akaryotes [version 1; peer review: 1 approved, 1 approved with reservations, 1 not approved]. F1000Research 2020, 9:112 (https://doi.org/10.5256/f1000research.24642.r60618)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 22 Jun 2020
    Ajith Harish, Unaffiliated, Uppsala, 756 57, Sweden
    22 Jun 2020
    Author Response
    Response to reviewer

    We thank the reviewer for their detailed review and thoughtful comments. We agree with the issues raised by the reviewer. We revised the text accordingly.
     
    Suggestion: I have written ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 22 Jun 2020
    Ajith Harish, Unaffiliated, Uppsala, 756 57, Sweden
    22 Jun 2020
    Author Response
    Response to reviewer

    We thank the reviewer for their detailed review and thoughtful comments. We agree with the issues raised by the reviewer. We revised the text accordingly.
     
    Suggestion: I have written ... Continue reading

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 13 Feb 2020
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.