Independent accretion of TIM22 complex subunits in the animal and fungal lineages

Background: The mitochondrial protein import complexes arose early in eukaryogenesis. Most of the components of the protein import pathways predate the last eukaryotic common ancestor. For example, the carrier-insertase TIM22 complex comprises the widely conserved Tim22 channel core. However, the auxiliary components of fungal and animal TIM22 complexes are exceptions to this ancient conservation. Methods: Using comparative genomics and phylogenetic approaches, we identified precisely when each TIM22 accretion occurred. Results: In animals, we demonstrate that Tim29 and Tim10b arose early in the holozoan lineage. Tim29 predates the metazoan lineage being present in the animal sister lineages, choanoflagellate and filastereans, whereas the erroneously named Tim10b arose from a duplication of Tim9 at the base of metazoans. In fungi, we show that Tim54 has representatives present in every holomycotan lineage including microsporidians and fonticulids, whereas Tim18 and Tim12 appeared much later in fungal evolution. Specifically, Tim18 and Tim12 arose from duplications of Sdh3 and Tim10, respectively, early in the Saccharomycotina. Surprisingly, we show that Tim54 is distantly related to AGK suggesting that AGK and Tim54 are extremely divergent orthologues and the origin of AGK/Tim54 interaction with Tim22 predates the divergence of animals and fungi. Conclusions: We argue that the evolutionary history of the TIM22 complex is best understood as the neutral structural divergence of an otherwise strongly functionally conserved protein complex. This view suggests that many of the differences in structure/subunit composition of multi-protein complexes are non-adaptive. Instead, most of the phylogenetic variation of functionally conserved molecular machines, which have been under stable selective pressures for vast phylogenetic spans, such as the TIM22 complex, is most likely the outcome of the interplay of random genetic drift and mutation pressure.


Introduction
Mitochondria evolved from an ancient alphaproteobacterial endosymbiont (Martijn et al., 2018;Roger et al., 2017). The integration of the symbiont into host cell processes required the evolution of dedicated machinery for the import of host-encoded proteins (Roger et al., 2017). The establishment of symbiont protein import allowed the transfer of many genes from the symbiont to the host genome as well as domestication of symbiont metabolic processes (e.g., via the evolution of mitochondrial carrier family proteins MCFs) (Cavalier-Smith, 2006). Understanding how symbionts become organelles and integrate into host cell processes requires an understanding of how protein import machineries originate and diversify.
Proteins imported into the mitochondria require several dedicated protein complexes to ensure proper sorting and assembly into mitochondrial subcompartments, including the mitochondrial outer membrane (MOM), the mitochondrial inner membrane (MIM), the intermembrane space (IMS) and the matrix All of these protein complexes are inferred to have been present in the last eukaryotic common ancestor (LECA) and the general phylogenetic profiles of their components have been recently reported (Fukasawa et al., 2017;Mani et al., 2015). Investigations have so far focused on the broad distribution of subunits across eukaryotes, leaving some details unexplored, like the evolution of TIM22 complex components. Because animal and fungal TIM22 complexes are best characterized, both structurally and functionally, these lineages offer an ideal case to dissect the fine-grained evolutionary history of multi-protein complexes. Apart from the central Tim22 subunit (Fukasawa et al., 2017; Žárský & Doležal, 2016), the origin and evolution of the TIM22 complex components in animals and fungi has not been extensively investigated. In this paper, we explore the evolutionary history of the TIM22 complex in animals and fungi, using homology searching and phylogenetic methods. We found that each lineage's TIM22 complex accreted subunits independently. We place our findings in a larger theoretical framework recently developed by Lynch (Lynch, 2020; Lynch & Trickovic, 2020). We argue that most of the structural variation seen in the functionally conserved TIM22 complex across the Holozoa and the Holomycota is non-adaptive. The evolutionary history of the TIM22 complex probably represents an example of effectively neutral divergence from an optimal mean phenotype which has primarily been governed by the joint forces of drift and mutation. Using the reciprocal best BLAST search method, we identified orthologues of Tim29 in the genomes of most animal species and unicellular eukaryotes (i.e., protists) most closely related to animals ( Figure 1). This means Tim29 originated prior to the origin of animals. We could not identify Tim29 in Gallus gallus; however, orthologous sequences were recovered from other birds and reptiles, suggesting loss of Tim29 is limited to chickens. We used our set of Tim29 sequences to search for orthologues across eukaryotes using the HMMer server (Finn et al., 2011) at EBI. When restricting our taxon searching to exclude holozoans, we retrieved no hits below our 0.01 e-value significance cut off, strongly suggesting that no homologues of Tim29 exist outside Holozoa (Extended data, Supplemental Text File 1; Wideman et al., 2020).

Results and discussion
To determine when in the holozoan lineages the TIM22 subunits AGK, Tim10b, and Tim8b first appeared, we collected sequences related to AGK and the small Tims from diverse holozoan genomes. We aligned sequences using MUSCLE (Edgar, 2004) and performed phylogenetic reconstructions using RAxML (Stamatakis, 2014) and MrBayes (Ronquist et al., 2012).
The phylogenetic reconstruction of AGK and related sequences clearly distinguish putative clades of holozoan AGK, ceramide kinase, and sphingosine kinase indicative of a pre-metazoan ancestry of these enzymes (Figure 2A). We did not include representatives from an outgroup as the best BLAST hits of AGK outside the holozoan lineage were cyanobacteria, oomycetes, and plants, suggesting that a detailed analysis of this gene family is required to understand its origin and evolutionary history in eukaryotes.
For the small Tims, the reconstructed phylogenies (Figures 2B and 2C) include well-supported Tim10a and Tim13 clades. This   suggests that Tim10b was the result of a duplication of Tim9 at the base of animals ( Figure 2B), whereas Tim8a and Tim8b are the result of a duplication at the base of chordates ( Figure 2C). Although we did not recover Tim8a and Tim8b sequences from Branchiostoma floridae within the vertebrate clades of Tim8a and Tim8b ( Figure 2C, asterisks), their best BLAST hits are clearly Tim8a and Tim8b from vertebrates, respectively. We therefore conclude that Tim8a and Tim8b arose prior to the divergence of chordates from the rest of animals. We were unable to recover small Tim sequences from the Choanoflagellate Monosiga brevicollis, but this is likely due to an incomplete database as Tim9 and Tim10 are probably essential in holozoans.
These results demonstrate that TIM22 complex subunits were accreted very early in the holozoa. Tim29 and AGK (but see below) appear to be gained shortly after the holozoan lineage diverged from the holomycotan lineage, Tim10b originated shortly after the origin of animals, and Tim8b originated after the origin of chordates. This means that Tim29 and AGK predate the origin of animals and have persisted in this lineage for about a billion years and Tim10b arose shortly thereafter.
Tim54 is related to AGK and diverged at the base of fungi whereas Tim18 and Tim12 are the result of Saccharomycotina-specific gene duplications Components of the TIM22 complex were identified in fungi much earlier than the recent discoveries in animals (Kerscher Using the reciprocal best BLAST method, we were able to identify Tim54 in representatives from every major fungal lineage as well as Fonticula alba, but no other eukaryotic lineage (Figure 1). We were unable to identify Tim54 in Rozella allomycis or microsporidians except Mitosporidium daphniae, a short-branching microsporidian with canonical mitochondria (Haag et al.,  2014). We used our set of Tim54 sequences to search for orthologues across eukaryotes using the HMMer (Finn et al., 2011) server at EBI. When restricting our taxon searching to exclude fungi and Fonticula, we surprisingly retrieved AGK sequences from animals (141 hits above threshold) and Capsaspora as top hits (Extended data, Supplemental Text File 2; Wideman et al., 2020). These results indicate that Tim54 and AGK likely share a common ancestor; however, the diacylglycerol kinase (DAGK) domain is now virtually undetectable in fungal sequences.
To determine when Tim18 and Tim12 originated, sequences related to Tim18, Sdh4, Tim10, and Tim12 were collected from all Saccharomycotina in the Mycocosm database (Grigoriev et al., 2013). We trimmed long sequences and removed any spurious hits (as determined by reciprocal BLAST into the S. cerevisiae S288c genome). Sequences were aligned using MUSCLE (Edgar, 2004), manually trimmed, and phylogenies reconstruction using RAxML (Stamatakis, 2014) and MrBayes (Ronquist et al., 2012) for likelihood and posterior probability calculations, respectively. The phylogenetic reconstruction of Tim18 indicates that a duplication occurred after the divergence of early-branching Saccharomycotina (e.g. Lipomyces and Yarrowia), but before the divergence of a major clade that includes Wickerhamomyces and Saccharomyces ( Figure 3A). Wolfe & Shields, 1997). The phylogenetic reconstruction of Tim12 suggests that a duplication of the Tim10 protein occurred even earlier in Saccharomycotina ( Figure 3B) as only the earliest-branching species lack clear Tim12 representatives (e.g.,

Lipomyces starkeyi).
In contrast to the animal TIM22 complex, which accreted subunits early in the evolution of animals, we demonstrate that a gradual accretion of TIM22 complex subunits occurred in the lineage leading to S. cerevisiae. Tim54 is likely a divergent fungal AGK which lost the DAGK domain after the divergence of holomycota from holozoa (Figure 4). Tim18 and Tim12 are respectively derived from duplications of Sdh4 and Tim10 deep within the Saccharomycotina. It is unknown if other fungal lineages have undergone similar expansions of the TIM22 complex.

Conclusions
The first elements of a larger theoretical framework to quantitatively understand the macroevolution of cells, their organelles, and molecular machines have been developed (Lynch, 2020; Lynch & Trickovic, 2020). Our findings are consistent with the predictions made by the theory of effectively neutral divergence of mean phenotypes across major phylogenetic lineages (Lynch, 2020). This theory assumes that the selective pressures on many molecular machines have remained relatively constant for long stretches of macroevolutionary time. This is most likely the case for many multi-protein complexes whose functions are strongly conserved across vast phylogenetic spans. Mitochondrial protein import complexes, such as TIM22, are good examples of such strongly functionally conserved systems. The TIM22 complex has to physically interact with dozens or even hundreds of substrates for their proper insertion into the MIM. This implies that its functional divergence is constrained (and/or buffered) by its many physical interactions. However, slight deviations from an optimal (functional) phenotypic mean could be expected as a consequence of a permissive population-genetic environment, i.e., stronger drift due to historical bottlenecks or smaller effective population sizes. As selective pressures are assumed to remain constant for many conserved multi-protein complexes in stable cellular environments, most phenotypic divergence would be dictated by the combined action of random genetic drift and mutation pressure. Most divergence seen in multi-protein complexes or molecular machines, therefore, would primarily be non-functional and non-adaptive but mostly structural (e.g., subunit composition) in character.
Many other observations appear to be compatible with this view. The common recruitment of paralogous subunits, as well as the presence of highly derived and lineage-specific subunits in multi-protein complexes provide general examples. More specifically, kinetoplastids offer another example in a protein-import complex. Trypanosoma brucei lacks the TIM23 complex and instead contains a bifunctional Tim22 protein that acts both as a presequence and a carrier translocase complex

Homology searching
Tim22 orthologues were identified previously (Žárský & Doležal, 2016). We used the reciprocal best hit method to identify Tim54 and Tim29 orthologues in fungi and holozoans, respectively. Briefly, S. cerevisiae Tim54 and Homo sapiens Tim29 were used as BLASTp (Altschul et al., 1997) queries into opisthokont predicted proteomes (see Figure 1 for organism list) using the NCBI BLAST server or Mycocosm database (Grigoriev et al., 2013). The top hit was retrieved and used as a BLASTp query into S. cerevisiae (for Tim54) or H. sapiens (for Tim29) proteome. If the top hit was the original BLASTp   The ancestral TIM22 complex comprises a single subunit, Tim22, which likely interacted with the small Tims, Tim9 and Tim10. The ancestral opisthokont TIM22 complex contained an additional subunit, AGK, which retains a diacylglycerol kinase (DAGK) domain in holozoans. In the holomycotan lineage, the DAGK domain was lost or diverged beyond recognition, but stabilized into Tim54, which is well-conserved across Holomycota. Early in the evolution of Saccharomycotina, Tim12 was gained via a duplication of Tim10, and Tim18 was gained via a duplication of Sdh4. Shortly after the divergence of Holozoa from Holomycota, Tim29 was gained as a subunit of the TIM22 complex. Tim10b was gained after the divergence of Metazoa from the unicellular holozoans from a duplication of Tim9. Tim8a and Tim8b result from a duplication of Tim8 in the lineage leading to chordates (not shown).

Are sufficient details of methods and analysis provided to allow replication by others? Yes
If applicable, is the statistical analysis and its interpretation appropriate? I cannot comment. A qualified statistician is required.
Are all the source data underlying the results available to ensure full reproducibility? Yes Are the conclusions drawn adequately supported by the results? Yes region (identified by inference from human SPHK1 crystal structure). The authors report that AGK orthologs are readily identifiable when searching metazoan databases and can likewise be identified in a few unicellular eukaryotes that are closely related to animals (e.g. choanoflagelattes, Capsaspora). This is readily verified with simple BLASTP searches using human AGK as a query sequence. However, after reading the paper twice, I remain confused as to whether AGK orthologs have been identified in other protist databases. Specifically, how do the authors distinguish between AGK orthologs and the paralogous DAG kinases?
While it is tangential to the subject of this paper, I note that AGK-focused publications have a somewhat tortuous history (of which the authors are, understandably, probably unaware). As documented in their Figure 2A, AGK is most closely related to sphingolipid kinases. That group in turn is more distantly related to the diacylglycerol kinase (DAGK) family and the NAD + kinases. The earliest AGK literature focused on its putative kinase activity. The authors reference the 2005 Bektas et al. paper in support of the contention that AGK is an acylglycerol kinase as is the common practice in AGK papers since 2005. However, there is little in the Bektas paper or elsewhere to validate this claim. Indeed, AGK appeared in the literature originally as a multisubstrate lipid kinase (PMID: 15252046 1 ) and later as a possible ceramide kinase in Drosophila (PMID: 22069480 2 ). Another group reported a failure to detect any kinase activity, i.e. AGK as an "orphan" kinase (PMID: 16269826 3 ). I recommend that the authors consider citing all of these mutually contradictory reports since all save one have been neglected in the subsequent AGK literature -for no apparent reason except 'everybody else does it'. The more recent AGK literature includes the discovery that humans born deficient in the protein suffer from Sengers syndrome, which is characterized by mitochondrial insufficiency. Subsequent to that discovery, AGK was found to be a component of the Tim22 complex in human cells. Interestingly, a single amino change (G126E) in human AGK that would eliminate enzyme activity (by inference from the better characterized sphingosine kinases) does not inhibit the import function of the Tim22 complex in cultured cells.