Comparative genomic analysis of crustacean hyperglycemic hormone (CHH) neuropeptide genes across diverse crustacean species

Background: Recent studies on bioactive peptides have shed light on the importance of these compounds in regulating a multitude of physiological, behavioral and biological processes in animals. Specifically, the neuropeptides of the crustacean hyperglycemic hormone (CHH) superfamily is known to control a number of important functions ranging from energy metabolism, molting, osmoregulation to reproduction. Methods: Given the importance of this peptide family, we employed a conservative approach utilizing extant transcriptome datasets from 112 crustacean species, which not only include important food crop species from the order Decapoda, but also from other lower order crustaceans (Branchiopoda and Copepoda), to identify putative CHH-like sequences. Results and conclusions: Here we describe 413 genes that represent a collection of CHH-like peptides in Crustacea, providing an important staging point that will now facilitate the next stages of neuroendocrine research across the wider community.


Introduction
Crustaceans and insects from the phylum Arthropoda have longstanding histories in peptide biology research, principally in areas related to the roles of peptide hormones in physiology and neuroendocrine signaling. Early discoveries have demonstrated that compounds in the crustacean nervous system were responsible for chromatophore control [1][2][3] . Four decades later, it was revealed that a compound known as the red pigment concentrating hormone functions as the first crustacean/invertebrate neuropeptide 4 . Since then, multiple studies have shed light on the highly pleiotropic functions of crustacean neuropeptides implicated in the regulation of a myriad of physiological processes such as light adaptation, molt inhibition, carbohydrate metabolism, reproduction and ion transport 5-9 .
The crustacean hyperglycemic hormone (CHH) represents a neuropeptide superfamily that is unique to arthropods 6,10-12 . This superfamily is made up of peptides containing ~7 0 amino acids originally isolated from the X-organ-sinus-gland system of the decapod Carcinus maenas 13 . Given their high degree of structural similarities, and the conservation of six cysteine residues, the molt-inhibiting hormone (MIH) and gonad-inhibiting hormone (GIH) were considered as part of this family collectively known as CHH/MIH/GIH. To date, at least 150 CHH peptides have been isolated and characterized, mainly in decapods through comparative studies on endocrinology 5-7,14-21 . Although there are reports on CHH peptides in other crustacean taxa such as Armadillidium vulgare (Isopoda) 22,23 , Daphnia pulex (Cladocera) 24 and Daphnia magna 15 , investigations beyond decapods have remained scant and the sequences of CHH/MIH/GIH genes in other crustacean taxa have remained elusive.
Here, we took advantage of the growing number of highthroughput crustacean datasets on public repositories to perform transcriptome mining of the CHH/MIH/GIH superfamily. To this end, we looked at crustacean species from three Classes ( Figure 1) and annotated CHH/MIH/GIH genes. This high confidence set of genes identified using our in silico framework provides an important basis for understanding neuropeptide biology underpinning physiological adaptations across diverse crustacean species.

Transcriptome datasets and query sets
We retrieved complete transcriptome datasets for 112 crustacean species available at the time of manuscript preparation from the European Nucleotide Archive. Five non-crustacean arthropod proteomes were retrieved from Uniprot. A complete list of accessions used in this study is provided in Supplementary  Table 1. We retrieved a list of query sequences used in subsequent homology searches from Uniprot and GenBank.

Identification of CHH/MIH/GIH peptides
To identify CHH/MIH/GIH gene orthologs, we used multiple Basic Local Alignment Search Tool (BLAST)-based approaches such as BLASTp and tBLASTn with varying Blocks Substitution matrices based on a previously published workflow 25 . The BLAST results were filtered by e-value of < 10 -6 , best reciprocal BLAST hits against the GenBank non-redundant (nr) database and redundant contigs having at least 95% identity were collapsed using CD-HIT. We then utilized HMMER (version 3.1) employing hidden Markov models (HMM) profiles 26 to scan for the presence of CHH Pfam domains 27 on the best reciprocal nr  Multiple sequence alignment and phylogenetic tree construction Multiple sequence alignments of CHH protein sequences were performed using MAFFT (version 7) 30 . Phylogenetic tree was built from the MAFFT alignment using RAxML WAG + G model to generate best-scoring maximum likelihood trees 31 . Geneious (version 7) was used to generate multiple sequence alignment images as well as graphical representations of the Newick tree 32 .

Results and discussion
We have annotated CHH/MIH/GIH genes from 112 crustacean transcriptome datasets representing three Classes: Malacostraca (Amphipoda: 56 species, Decapoda: 14 species, Isopoda: 27 species, Euphausiacea: 2 species and Mysida: 1 species), Branchiopoda (3 species), and Copepoda (9 species) (Supplementary Table 1). We also looked at 5 non-crustacean species from Arthropoda: Insecta (3 species), Arachnida (1 species) and Chilopoda (1 species) (Supplementary Table 1). Using sequence and motif similarity based approaches, we have conservatively identified a total of 413 genes from these transcriptomes ( Figure 2; Dataset 1 28 and Dataset 2 29 ).  Table 2). Within crustacean taxa, a range of sequence identities were observed: Branchiopoda (~25% to 93%), Copepoda (~12% to 30%) and Malacostraca (~10% to 98%) (Supplementary Table 2). This is reflected in the phylogeny where CHH/MIH/GIH sequences from related individuals form distinct clusters ( Figure 4). It was previously reported that multiple gene duplications of CHH family peptides occurred in the decapod lineage leading to a high degree of genetic polymorphism 15 , hence providing an explanation for our current observation. Two separate clusters of CHH genes exhibiting antagonistic patterns of expression were identified in the decapod Metapenaeus ensis, posited to represent an ancient gene duplication event 34 . Although it is not possible to pinpoint the genomic loci of CHH sequences identified from this study, it is likely that paralogous copies offer mechanisms for evolving new functions through functional divergence. CHH-like genes arising from duplication of the ancestral copy are subjected to reduced selective pressure and therefore may lose their hyperglycemic activity to adopt more specialized roles 15 . Further biochemical studies will be required to unravel the functions of the novel genes identified from this study.

Conclusions
We have generated a high confidence list of CHH/MIH/GIH sequences from distantly related crustaceans. As a fundamental step in a broader endeavor this data is now available to the wider community to allow detail functional analyses pertinent to the next stages of neuropeptide research. Given the paucity of CHH sequences beyond decapod crustaceans, our analysis forms a promising basis for studies ranging from biochemistry to the evolution of this elusive superfamily.   The manuscript presented by Chang and Lai describes a collection of CHH neuropeptide genes in over 100 crustacean species.

Supplementary
I read with interest this short report and could appreciate that a majority of neuropeptide studies have been limited to decapod crustacean species previously. In addition the presence of neuropeptide transcripts in other divergent crustaceans including those from basal lineages (copepods and branchiopods) are presented.
Like in their previous work (Chang and Lai, 2018, BMC Genomics ), a broad sampling strategy was followed. Here the approach was across multiple crustacean lineages, which allows for drawing conclusions on the evolutionary trajectory of CHH genes and their potential functional role during life histories.
Although the paper describes a single finding, it is a valuable asset, as it provides an inventory of a high-quality list of carefully curated CHH neuropeptide transcripts for future functional studies.
The paper is well-written and concise, and definitely fits with the scope and eligibility of the guidelines. F1000Research As a minor comment: it would be good to describe how the tree in Figure 1 was made in the legend.

BMC Genomics 19
Publisher Full Text

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Yes
No competing interests were disclosed. Competing Interests: The authors collected and analyzed transcriptome datasets for 112 crustacean species (representing three Classes -Malacostraca, Branchiopoda, and Copepoda), as well as those for 5 non-crustacean (arthropod) species (Insecta, Arachnida and Chilopoda) (Supplementary Table 1; Fig. 1), retrieved from public repository.

I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
Collectively, 413 genes are annotated based on sequence and motif similarity as encoding peptides belonging to the so-called CHH/MIH/GIH family. The number of CHH/MIH/GIH genes identified from each crustacean species is shown as heat maps (Fig. 2). Protein sequence, annotation information, etc. are listed (Dataset 1 and 2). Between-species % identity is reported (Supplementary Table 2). Multiple sequence alignment on representative CHH/MIH/GIH sequences is presented in Fig. 3 to show the 6 conserved cysteine residues (Fig. 3). A phylogenetic tree is built based on the multiple sequence alignment data (Fig. 4).
The main drawback of this work is it only results in a 413-sequence inventory of CHH/MIH/GIH family peptides, but with little information (nucleotide length, amino acid sequence).
The authors did not describe how the tree presented in Fig. 1 was built with what sorts of data. If they took a constructed tree from other source, this should be cited. There are obviously other current interpretations of a phylogenetic relationship of Crustacea, in addition to the one shown in the manuscript.
For annotation information (Dataset 2), it should be more discerning in that peptide should be assigned as CHH, MIH/GIH, MOIH (mandibular organ-inhibiting hormone) or ITP (ion transport peptide), instead of only giving an inclusive description -Crustacean CHH/MIH/GIH neurohormone family.
Results derived from further analysis of the sequence data (Figs, 2, 3, and 4) are not adequately discussed or lead to conclusions that have been previously established. discussed or lead to conclusions that have been previously established. For example, the only conclusion made with Fig. 3 is the 6 highly cysteine residues, a signature of the family already extensively described. Figure 4 is left without discussion. Moreover, it is not mentioned which type of peptide (CHH, MIH/HIV, or ITP) the different clusters highlighted in the tree belong to.
Part of the discussion (left column, p. 4) starting with "It was previously reported that multiple gene duplications of CHH family peptides….." is not particularly relevant to the data being discussed and is again making a point already extensively described and discussed. It should also be mentioned that genomic data are more adequate than transcriptomic data when discussing gene duplication and gene copy number (Fig. 2). Figure 2 is also left without discussion. In addition, data presented in this figure might be suffering from the criticism that the number of CHH/MIH/GIH genes assigned to each species is likely biased as the number will be absolutely influenced by the type of tissue used for sequencing (eyestalk ganglia, other tissue, or whole animal) and sequencing depth. Accordingly, the relevant information should be given in Supplementary Table 1 and comparison should only be made among species with the same sequenced tissue type (e.g., eyestalk ganglia) and comparable sequencing depth.
Overall, while the methods employed are largely (but not entirely) appropriate for the intended analysis, the goal of the study is not clearly set and this work only produces a sequence inventory without novel finding or solid discussion. Additional analyses expected to yield novel finding should be added to the manuscript. For example, with a vast amount of sequence information that could be extracted from a set of 413 peptides from animals encompassing 3 Classes, would it not be possible to uncover some function-defining sequence characteristics or motif? This piece of information would be useful for functional analysis, which the authors thought should be an important aspect of the next stage of research.

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Partly
No competing interests were disclosed. Competing Interests:

Referee Expertise: Endocrinology
We have read this submission. We believe that we have an appropriate level of expertise to We have read this submission. We believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above. 26  The current study is an interesting " " approach that mine for CHH/MIH/GIH neuropeptide in a wide in silico range of crustacean. Introduction is updated and concise. However, there are some minor points that will need to be addressed to provide clarity information for readers. Materials and methods is sound and appropriate, although I raise some particular suggestions to further expand the quality of the MS. Results and discussions are overall good. One major point that needs to be fixed is the phylogenetics tree, whereas the color code system does not correlate between legend and figure.
In addition, I have specific comments, (the fact that it does not appear as numbered pages and lines makes it a bit awkward, but see below): Page 2: …known as CHH/MIH/GIH-There are also Mandibular Inhibiting hormone (MOIH) and also ITP (Ion Transport Protein) to be part of the super family. See Ohira (2016) . …from three Classes -Please elaborate on this, are these branchiopoda,. malacostraca and copepoda?
( Figure 1) -How was Figure 1 generated ? Is this a phylogenetics based tree built on mitochondrial genome or it is just an illustration?
Genebank nr database… What about TSA (Transcriptome Shotgun Assembly) database? As that will increase the chance that the algorithm will discover more CHH/MIH/GIH? Since you mention in the introduction "Here, we took advantage of the growing number of high-throughput crustacean datasets on public repositories to perform transcriptome mining of the CHH/MIH/GIH superfamily." Adopting the TSA database will be better suited for the MS.
Page 4 Within crustacean taxa… Similarity was pretty high in Branchiopoda and Malacostraca (93% & 98%), but significantly lower in Copepoda. Can you propose an explanation for that?
It was previously reported that… In light of this, Veenstra (2016) has some very intensive mining of CHH in multiple decapod species. Why not provide a section that compares between the number of CHH/MIH/GIH predicted from your dataset with the above study?