Revisiting the phylogeny of phylum Ctenophora : a molecular perspective

The phylogenetic relationships of deep metazoans, specifically in the phylum Ctenophora, are not totally understood. Previous studies have been developed on this subject, mostly based on morphology and single gene analyses (rRNA sequences). Several loci (protein coding and ribosomal RNA) from taxa belonging to this phylum are currently available on public databases (e.g. GenBank). Here we revisit Ctenophora molecular phylogeny using public sequences and probabilistic methods (Bayesian inference and maximum likelihood). To get more reliable results multi-locus analyses were performed using 5.8S, 28S, ITS1, ITS2 and 18S, and IPNS and GFP-like proteins. Best topologies, consistent with both methods for each data set, are shown and analysed. Comparing the results of the pylogenetic reconstruction with previous research, most clades showed the same relationships as the ones found with morphology and single gene analyses, consistent with hypotheses made in previous research. There were also some unexpected relationships clustering species from different orders. This article is included in the Phylogenetics channel. 1 1


Introduction
The relationships among deep metazoans (Cnidaria and Ctenophora) and Parazoa (Porifera and Placozoa) are not totally clear 1 .In this paper we try to reconstruct the phylogeny inside the phylum Ctenophora with state of the art methods and compare our results with previous work 2,3 .Ctenophores are a key phylum for the understanding of the development of organ systems, triploblastic animals and bilateral symmetry 4 .Our goal was to reconstruct the phylogeny of previous studies using as many sequences as possible available on GenBank.The use of these sequences allowed us to perform a multilocus analysis (MLSA), instead of the single gene analyses previously performed.The sequences selected for this study have never been used for phylogenetic analysis exclusively of this phylum 5,6 .
Our research consists of 1) the analysis of ribosomal genes (5.8S; 28S; ITS1; ITS2 and 18S) and 2) the analysis of two ortholog genes found in ctenophores (a GFP-like non-fluorescent protein, and isopenicillin-N-synthase FYY1).
The ribosomal genes were analysed using partitioned nucleotide substitution models while the ortholog genes were analysed using partitioned amino acid substitution models.We compared the findings of our two approaches (ribosomal and ortholog genes) to each other and against the previously reported phylogenetic trees obtained from molecular data 3 and morphological data 2 .

Methods
All sequences corresponding to Ctenophora (Taxonomy ID: 10197) were retrieved from GenBank's nucleotide database.Short sequences (those shorter than 150 base pairs) or ambiguously labeled sequences (those not assigned to a specific species) were discarded.This criterion was used to obtain an almost complete matrix including most loci available of reported taxa across the phylum.
Seven loci were chosen for analysis; five corresponded to ribosomal RNA regions (5.8S; 28S; ITS1; ITS2 and 18S).The other two corresponded to ortholog genes: a putative non-fluorescent protein (GFP-like protein), and isopenicillin-N-synthase FYY1 (IPNS), a protein involved in the bioluminescence process.The sequences were extracted using the annotation of the retrieved records using Biopython 1.67 7 .
The taxa present for each analysis is listed in Table 1 and Table 2.All sequences used (with corresponding accession numbers) and scripts used for analysis are available at http://doi.org/10.5281/zenodo.193080 16.
Given the phylogenetic distance between the different taxa of this phylum, for the protein coding genes, we decided to work at the amino acid sequence level due to high sequence saturation at the nucleotide level.
The sequences corresponding to the ortholog genes were translated in silico using DNA2PEP 1.1 8 with standard genetic code, and aligned using MAFFT 7.222 9 .A MLSA was performed using these two loci.Alignments were concatenated using Python scripts and partitioned by gene to be analyzed for amino acid model and

Cydippida Pleurobrachiidae
Hormiphora plumosa best partition scheme using PartitionFinderProtein 1.1.1 10 Model adjustment was assessed using Bayesian information criterion (BIC).The best model found by PartitionFinderProtein 1.1.1 for IPNS partition was LG + G + I, and LG + G had better adjustment for GFP-Like partition.Phylogenetic reconstruction for the ortholog genes was carried out by maximum likelihood (ML) and Bayesian inference (BI) methods, using both Garli 2.01 11 and MrBayes 3.2.6 12 , with the proper amino acid substitution model parameters for each partition.
For the ML analysis, using Garli, a total of 5 independent ML searches were performed and supported with 65 bootstrap pseudoreplicates.For BI analysis, using MrBayes, two independent MCMC runs (four chains for each) were carried out for 1.000.000generations, using a relative burn-in discard of 35% of total sampled trees (sampling frequency of 100 generations).
For the five rRNA loci, the automated pipeline PhyPipe 13 11 .For this analysis, MrBayes was executed under the following parameters: two independent MCMC runs, four chains, 1.000.000generations, 35% of relative burn-in and sampling frequency of 100.For ML analysis, Garli was executed doing first a ML search (5 independent searches), then 1000 bootstrap pseudoreplicates were performed and mapped to the best ML topology using SumTrees from DendroPy 4.1.0package 15 .

Results
The majority of the phylum analysed in this study show a standard grouping condition; the organisms that are related in one of the analyses are also related in the other.This is more evident comparing at family level, where the individuals of the same family, and in some cases order, grouped with other organisms of the same order.Exceptions are discussed below.
In the reported trees there are families represented by several species while complete orders are represented by just one species.
For the purpose of clarity, from now on the families represented by several species will be discussed at the family level while the orders represented by just one species will be discussed using the representing species.
Thalassocalyce inconstans and Lampocteis cruentiventer are unexpectedly grouped together in both analyses, but the position is not the same in comparison with the other clades.The support values in the ribosomal tree are very low compared to the ortholog genes tree.
The order Cydippida is divided in five different clades or subgroups, shown in different tones of blue in Figure 1.We confirm that this group is paraphyletic as reported previously in 2,3.The species Bathyctena chuni (Bathyctenidae, Cydippida) is grouped with Ocyropsis maculata (Ocyropsidae, Lobata) in amino acid analysis, but there is no information on the ribosomal sequences of Bathyctena chuni, so it was not possible to compare.
In the amino acid tree, Pleurobrachiidae, Mertensiidae, Lampeidae, Euplokamididae form a clade; but the relationships between them are not clear, and the bootstrap values and posterior probability are low in this group.In the ribosomal analysis we could include the Platyctenida order, which grouped with high support in the clade formed by the mentioned families and order.In the ribosomal analysis, we see some shared features with one of Harbison's trees 2 .Also the Thalassocalyce-Lampocteis clade is related to this group, but the position varies depending of the analysis.Dryodora glandiformis groups with the clade formed by Lobata species, but this result is only evident in the ortholog genes tree, due to the lack of rRNA sequences for this particular taxon.
All the trees were rooted using Beroida as the outgroup, following the hypothesis that this is the most basal group.The same choice of root was made by Harbison 2 .Additionally, the Beroe genus is a good outgroup because it belongs to the class Nuda while the other studied species belong to the class Tentaculata (our ingroup).
Bathocyroe fosteri is present in an unexpected position in the tree.It should have been included in the Lobata clade.Instead, it was placed outside the subgroup containing the Lobata, Pleurobrachiidae, Mertensiidae, Lampeidae and Thalassocalycidae families.This finding is not compared with rDNA loci analysis since ribosomal sequences for Bathocyroe fosteri were not available.
In the research performed by Harbison

Discussion
The Haeckelidae family preserves its position in the phylogenetic trees placed as sister group of all the other Tentaculata taxa analysed, with high support, according to previous studies 2 .
The lack of reported DNA sequences of few groups, like the orders Crytolobiferida, Cambodjiida, Ganeshida; several families, like Eurhamplaeidae and some records reported as Ctenophora incertae sedis (Tentaculata incertae sedis), make it harder to have an entire vision of the phylogenetic relationships inside the phylum.The order Ganeshida is grouped with Lobata, according to Harbison 2 and the lack of this group may have caused a misplacement of the Thalassocalycida representant.
To improve the results, Coeloplanidae should be included in the protein phylogenetic analysis.Further studies including more sequences from families such as Leucotheidae, Lampoctenidae and Thalassocalycidae are also needed to solve the resulting polytomies and to obtain better support to confirm the relationship between Thalassocalyce inconstans and Lampoectis cruentiventer in the rRNA analysis.
Available ctenophore transcriptome data could be used to expand sampling of the protein-coding genes.
If that were done, a concatenated analysis of all of the markers used with only taxa sampled for 18S (so all taxa overlap for at least part of the alignment) with the addition of appropriate outgroups would be an interesting improvement.
I have read this submission.I believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.
I am also actively studying the evolution of Ctenophora but I can honestly say that Competing Interests: this does not impact my view on the present work.This article performs a meta-analysis of molecular phylogenetics within the Ctenophores using published sequences.I have previously had amiable correspondence with the authors and sent them data.Although the protein-coding sequences and some of the ribosomal RNA data came from my lab, I nonetheless feel I can give an unbiased assessment of their subsequent use.
Regrettably, I do not see enough original intellectual contribution or additional scientific value to justify its publication.At best, it is a minor contribution, in which case the interpretation needs to be improved, and at worst, a good portion of it is a re-publishing of work already published by other authors (Simion [1] and Podar [4] in particular).
A large part of the analysis is building trees using previously published ribosomal RNA datasets.This recapitulation does not add anything to the discussion of ctenophore internal relationships, and in fact, by rooting the tree with Beroe, they obscure the true evolution of the group as shown repeatedly since at least 2001 [4].They also fail to cite Simion, [1], which is the source of some of the data.Merely et al. adding that citation would not solve the fundamental issue, which is that there is no added value to their re-building of the same phylogeny.
The "novel" aspect of the paper is building trees based on two protein-coding genes which were also published previously in two separate papers [2,3].The trees based on their [our] protein datasets do not give any additional insights into ctenophore relationships except in that some species are present in those trees that are not represented in the 18S phylogeny.This taxonomic coverage does not reveal any particular insight.These data also already appeared in trees (albeit not limited to ctenophores only) in the original publication.
There is some confusion because there is no Hormiphora IPNS gene in their [our] dataset, yet it is listed in the table of genes and that species is present in the tree, apparently based on a GFP-like gene that was found.Furthermore, the IPNS genes are not single-copy [2], so are not reliable for phylogeny building.
There is a misspelling of Bathyctena in Table 1 and 2 and of Lampocteis in Table 2.
In summary, two gene trees, of which one gene which was found to be absent in a ctenophore lineage, Introduction 1st paragraph: -"deep metazoans" is an odd term; also "Parazoa" is no longer accepted as a valid name.All 4 taxa are clearly metazoans; they are best summarized as "non-bilaterian metazoans".
-The citation after the 1st sentence is from 1999; this should be replaced with something more recent as a lot of research in this area has been done since then.
-While the statement of the 1st sentence is true, reconstructing the phylogenetic relationships within Ctenophora does not help much to solve these issues, i.e. finding the position of Ctenophora in the animal tree of life is a separate issue that this study is unable to address.
-The "previous work" cited in the 2nd sentence is very old.There is a study from 2015 (Simion , et al.Zoology 118: 102-114) that also reconstructed internal relationships of Ctenophora based on multigene analyses.It is crucial that the authors interpret their results in light of that study.It is actually quite puzzling that the paper is not cited, especially because the authors used sequences originally reported in Simion et (2015).al.
that the paper is not cited, especially because the authors used sequences originally reported in Simion et (2015).al.
-The statement in the following sentence is highly debatable.As long as the phylogenetic position of ctenophores is not resolved (see e. -In the next sentence, "previous studies" should be replaced with "ctenophores". -The following 2 sentences suggest that this paper represents the first multilocus analysis addressing internal phylogeny of Ctenophora.As mentioned above, this is not true.In this study, the authors used 2 protein-coding genes and the 28S gene in addition to the 18S and ITS/5.8Smarkers already used by Simion (2015).This is what sets their study apart from the previous paper, and this has to be clearly et al. communicated.The paper should focus on discussing differences to the results of Simion in light of et al. expanding the set of markers (but also addressing the different taxon sampling in the 2 studies).
-The abbreviation "MLSA" is introduced for "multilocus analysis" -what does the "S" stand for?Maybe it should read "multilocus sequence analysis"?2nd paragraph: -"ribosomal genes" should read "ribosomal RNA genes" (also elsewhere in the MS), since there are also genes coding for ribosomal proteins.
-"ortholog" should be replaced with "protein-coding" (also elsewhere in the MS), since ribosomal RNA genes are also orthologs.3rd paragraph: -I think the taxonomic overlap between the protein-coding and the ribosomal RNA datasets is sufficient to conduct a combined analysis, to infer a tree based on all the evidence simultaneously.As far as I recall, using mixed nucleotide and amino-acid data is possible with RAxML and MrBayes; alternatively, the protein-coding partition could be analyzed on nucleotide level (possibly excluding 3rd codon positions if they are oversaturated).

Methods
-It is unclear how ambiguously alignable regions were treated.These have to be excluded prior to analysis, but a quick glance at the concatenated matrices provided in the data supplement (concat_matrix and concat_prot_corrected) suggests otherwise.Difficult-to-align regions can bias phylogenetic inference, so this is an important point to address.
-Information about the lengths of loci and concatenated alignments should be given.
-Information about how the trees were rooted should be given in this section.In the Results section it is mentioned that Beroida was used as the outgroup to all other ctenophores.However, this is poorly justified.For example, Simion  (2015)

Figure 1 .
Figure 1.Phylogenetic trees based on protein sequences of IPNS and GFP-like genes (on left) and rRNA loci (on right).Trees were constructed using Bayesian inference and maximum likelihood methods and consistent topologies were found within methods.Support values are shown at nodes in the form of posterior probability/bootstrap value.

Table 1 . Protein coding genes used for phylogenetic reconstruction
.(1)Sequence was available on GenBank.(-) sequence was not available on GenBank or not reported.

Table 2 . Ribosomal RNA genes used for phylogenetic reconstruction. (
1) Sequence was available on GenBank.(-) sequence was not available on GenBank or not reported.
2, the Lobata group was placed below the Cestida group.Later, this finding was not discussed byPodar et al. 2001 3as in the ribosomal data they used both groups are in a polytomy.Our finding, using rDNA, is in concordance with the findings of Podar et al. 2001 3 , but using the ortholog genes the finding is contrary to what was proposed by Harbison.We found, on the ortholog genes analysis, that Cestida group is the one derived from Lobata and not vice versa as suggested by Harbison.This finding has a high bootstrap value and posterior probability support.
found this group deeply nested within ctenophores.In general, et al.I suggest following closely the methodological protocol of Simion to make the 2 studies truly et al.
While I am always glad to see new studies on ctenophore phylogeny, I am very surprised that you did not cite Simion 2014 (of which I am the first author) for two reasons : et al.You used all the data sequenced in that study.Both study are very similar in topic and design, and should therefore be compared.Please find a link to the study : http://www.sciencedirect.com/science/article/pii/S0944200614000816