Phylommand-a command line software package for phylogenetics

Phylogenetics is an intrinsic part of many analyses in evolutionary biology and ecology, and as the amount of data available for these analyses is increasing rapidly the need for automated pipelines to deal with the data also increases. Phylommand is a package of four programs to create, manipulate, and/or analyze phylogenetic trees or pairwise alignments. It is built to be easily implemented in software workflows, both directly on the command prompt, and executed using scripts. Inputs can be taken from standard input or a file, and the behavior of the programs can be changed through switches. By using standard file formats for phylogenetic analyses, such as newick, nexus, phylip, and fasta, phylommand is widely compatible with other software.


Introduction
The improvement of high throughput sequencing methods, and the ability to produce more and/or longer reads at a cheaper price have allowed the use of these data in new areas of research (Ellison et al., 2011;Jumpponen & Jones, 2009;Lemmon et al., 2012).More or less, automated software pipelines are often an intrinsic part in the development of ways to process and analyze data for new types of research questions.
Phylogenetic analyses are an integral part of many biological studies.Phylommand is a package of four software -treebender, treeator, contree, and pairalign -with capabilities in manipulation and analyses of phylogenetic trees.The functions include rooting, splitting, and comparing trees, calculating parsimony and likelihood scores, as well as performing parsimonious stepwise addition of taxa, nearest neighbour interchange branch swapping, and construction of neighbour joining trees.In addition there are functions, such as calculating decisiveness (Sanderson et al., 2010), MAD scores (Smith et al., 2009), and matrix representation of trees, for use in supermatrix and supertree pipelines.Similar to the Newick utilities software suit (Junier & Zdobnov, 2010), that also act on phylogenetic trees, the phylommand programs are designed to be used on the command line, without the overhead from a graphical interface.Phylommand also accepts inputs from file or standard input, outputs the results to standard output (i.e. usually screen if not redirected), and works without any configuration files or user input (after execution).It is thus made to work in software pipelines.
Both phylommand and Newick utilities can use the newick file format and are therefore compatible, and they complement each other as most of the functions in phylommand are not included in the Newick utilities.Even if phylommand is made for pipelines, each software work independent of any other program (given appropriate input).

Implementation
Phylommand is written in the C++ programming language and is primarily distributed as source code.Its core utilities only depend on standard libraries to facilitate ease of compilation and use on multiple platforms.However, in addition to standard libraries, treeator may be compiled with the option to optimize model parameters under the maximum likelihood criterion using the NLopt library (Steven G. Johnson, The NLopt nonlinear-optimization package, http://ab-initio.mit.edu/nlopt);pairalign can be compiled in a parallelized version using pthreads, which may be useful to decrease runtime when doing pairwise alignment.As an addition to the four core programs rudisvg, a rudimentary svg viewer, that depends on the X11 library and a X11 server is included with phylommand (rudisvg).

Operation
The core programs of phylommand, in their basic version, have been successfully compiled and used on OS X 10.10 and Ubuntu Linux 16.04 (including Ubuntu 14.04 on Windows [anniversary edition of Windows 10]) using GNU g++ and make, and on Windows 10 using the same tools in MinGW.The svg viewer rudisvg has been successfully compiled and used on OS X 10.10 and Ubuntu 16.04 (Gnu/Linux; including Ubuntu on Windows).The behavior of the programs in phylommand are controlled by switches, and all programs have an extensive help documentation that can be accessed by the switch --help.Phylogenies can be given in either newick or nexus format.Pairalign reads DNA sequences in fasta format, and treeator reads character matrices/sequences in fasta, phylip, or nexus format (as outputted from for example AliView 1.19; Larsson, 2014) or work on distance matrices.The output formats for phylogenies are newick and nexus, but treebender can also output trees as standard vector graphics (svg).The svg output from treebender can be displayed by rudisvg.Treebender is faster at rooting trees than nw_reroot from Newick utilities (Table 1), but the difference is small compared to how much faster nw_ reroot is for the same task compared with packages in interpreted languages such as BioPerl (Perl), APE (R), and ETE (python; Junier & Zdobnov, 2010).

Use cases
In the examples, alignment_file.fstcan be replaced by any fasta formatted alignment file, and tree_file.tree(tree_file.treesif including more than one tree) can be replaced by any newick or nexus formatted file.Treebender can easily be used to create a svg image that can be piped into a file: Treebender can also be used to create monophyletic operational taxonomic units in a tree based on branch lengths (c.f.virtual taxa sunsu Öpik et al., 2009): treebender --cluster branch_length \ --cut_off 0.03 tree_file.tree To view a neighbour joining tree representation of a set of DNA sequences it is possible to use: The parsimony score and trees of all nearest neighbour interchange swaps from a topology can be given by: Contree can also be used to get supported conflicts between trees.
The output can be either text or svg formatted as html (Figure 1): contree -c 70 --html tree_file.trees These and further examples, and an example of a bash script to do a search for the most parsimonious tree and a Perl script to find groups that are alignable according to MAD scores without a predefined taxonomy, are distributed with the source code.

Summary
Phylommand offers an efficient way of manipulating and analyzing phylogenetic trees without the overhead of a graphical interface or specialized command line interpreter.It can be used in both automated (through scripts) or manual work-flows.Since it is made to be compilable with minimum reliance of non standard libraries it is possible to use it on most operating systems including UNIX like systems as OS X and Linux, and Windows.This increases its utility for pipelines that will need to work on different platforms.
The article provides a nice summary of what the tools in Phylommand do.However, the reader is referred to the manuals to learn about the exact features implemented in the toolkit; in the paper, only general descriptions of the functions included in each tool are included.Moreover, some of the operations (e.g.mid-point rooting) require non-trivial algorithms, and how the tool approaches these steps is unclear.
Appropriately, the authors make multiple references to their main competitor toolkit, namely Newick Utilities.There is one study comparing the two tools in terms of runtime, but this comparison is limited to a single task (rooting trees).Further, even though Newick Utilities is (to our knowledge) the only command line alternative, there are solutions to many of these problems outside of Newick Utilities in scripting languages (e.g.Dendropy for Python and APE in R).It can be argued that a command line tool has a different type of utility and is perhaps more usable.Nevertheless, since scripting languages are also relatively easy to use, comparisons to these platforms in terms of usability and speed would be informative.At a minimum, a mention would be needed.
The code is in C++ using standard libraries, making it simple to compile and run.We were able to compile it out-of-the-box with the "make" command on a Mac OS X laptop as well as on a CentOS and an Ubuntu server.
Regarding the algorithmic aspects of the toolkit, we have concerns about efficiency, mainly driven by our Regarding the algorithmic aspects of the toolkit, we have concerns about efficiency, mainly driven by our examination of the code related to the midpoint rooting (note that other potentially non-trivial functions may also have issues, but we only explored midpoint rooting).As mentioned before, the algorithmic details are not provided in the paper.After digging a bit into the code, we were able to understand the midpoint rooting function (tree.cpplines 528 through 577), and it appears a heuristic is performed: the algorithm starts with the initial rooting of the tree, and it then iteratively shifts the root to the left or right child to minimize the imbalance between the maximum distance to a left descendant leaf vs. a right descendant leaf.Once the imbalance does not improve beyond a hardcoded threshold (average of the maximum left tip distance and the maximum right tip distance, divided by 10,000), the iterative search terminates and the left and right branches of the "optimal" root are adjusted.
The main problem is the following.This "heuristic" solution is designed for a problem that has a trivial exact linear-time solution.Thus, the heuristic is unnecessary.With one bottom-up traversal and one top-down traversal of the tree, one can find the exact correct midpoint root in linear time.A linear time implementation of the midpoint rooting algorithm can be found here: https://github.com/uym2/MinVar-Rooting(runs in under a minute for 200,000 leaves).
The midpoint rooting code raises some questions.First, why is the number 10,000 hardcoded here?There is no restriction in the unit of distance used by trees that can be passed to this tool, so what if the input tree used a unit that resulted in huge integers for branch lengths?It seems as though hard-coding 10,000 could potentially cause issues.Second, this heuristic solution is quadratic-time in the worst case, but as mentioned earlier, an exact solution can be trivially obtained in linear time.Third, when given a tree rooted at a leaf, the algorithm fails (and the tool segfaults); why?This is a valid edge case.Below is the tree we tested: ((B:2000000000,(C:3000000000,D:4000000000)E:5000000000)F:1000000000)A; We did not investigate any other functions but given our reservations about the correctness of the midpoint rooting function, we have reservations about the correctness of other non-trivial functions performed by Phylommand.Explanations of the algorithms behind the non-trivial functions seem essential to the article.No competing interests were disclosed.

Competing Interests:
Relating to this, the article in general does not compare phylommand to existing tools, neither conceptually, nor in terms of performance.Newick utilities are mentioned several times, but there is such a wealth of phylogenetic software available, and it remains unclear how phylommand positions itself in that list.Perl-based, R-based and Python-based solutions are mentioned in a half-sentence, but from the article alone, it remains unclear what the "niche" of phylommand among the many existing tools can be.Likewise, I believe that a full assessment of the tool's performance requires more benchmarks than a speed test of one individual function (tree rooting) relative to one competitor.For the core implemented functions, it would be great to get at least some speed and memory benchmarks (relative to a limited set of competing tools).
I was able to download and run phylommand on my OS X system.In general, I commend the author for developing and providing a potentially very useful tool for the community.However, as detailed above, I believe that the present article needs more information to act as a stand-alone reference.I have read this submission.I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Is
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias You can publish traditional articles, null/negative results, case reports, data notes and more The peer review process is transparent and collaborative Your article is indexed in PubMed after passing peer review Dedicated customer support at every stage For pre-submission enquiries, contact research@f1000.com the rationale for developing the new software tool clearly explained?Partly Is the description of the software tool technically sound?PartlyAre sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?YesIs sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?PartlyAre the conclusions about the tool and its performance adequately supported by the findings presented in the article?Partly the rationale for developing the new software tool clearly explained?Partly Is the description of the software tool technically sound?Partly Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?Partly Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?Partly Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?Partly No competing interests were disclosed.Competing Interests:

Table 1 . Time to root 10 trees, using one tip as outgroup.
1Mean and standard deviation in seconds based on 10 separate runs on a workstation with a Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz, 24Gb RAM, running Ubuntu 16.04.1,with Linux kernel 4.4.0-34.