Mitochondrial genomes of Anopheles arabiensis, An. gambiae and An. coluzzii show no clear species division

Here we report the complete mitochondrial sequences of 70 individual field collected mosquito specimens from throughout Sub-Saharan Africa. We generated this dataset to identify species specific markers for the following Anopheles species and chromosomal forms: An. arabiensis, An. coluzzii (The Forest and Mopti chromosomal forms) and An. gambiae (The Bamako and Savannah chromosomal forms). The raw Illumina sequencing reads were mapped to the NC_002084 reference mitogenome sequence. A total of 783 single nucleotide polymorphisms (SNPs) were detected on the mitochondrial genome, of which 460 are singletons (58.7%). None of these SNPs are suitable as molecular markers to distinguish among An. arabiensis, An. coluzzii and An. gambiae or any of the chromosomal forms. The lack of species or chromosomal form specific markers is also reflected in the constructed phylogenetic tree, which shows no clear division among the operational taxonomic units considered here.

. In our study we use the complete mitogenome for comparison, which would make the analysis more robust. In addition, we specifically included the different chromosomal forms in our analysis. These chromosomal forms are genetically diverged from each other and display strong assortative mating in the An. gambiae chromosomal forms (Touré et al., 1998). The An. coluzzii chromosomal forms differ from each other in their ecology: An. coluzzii-Mopti is found in dry areas whereas the An. coluzzii-Forest restrtict themselves to a wet climate (Lee et al., 2009).
In this study we wished to identify species-specific markers within the mtDNA for Anopheles arabiensis, An. coluzzii and An. gambiae, including among the chromosomal forms currently subsumed under the designations An. gambiae and An. coluzzii, with the goal of adding these to our existing Anopheles species detection assay (Lee et al. (2014)). We sequenced the whole mitogenomes of 70 individual mosquito specimens collected throughout Sub-Saharan Africa. The raw Illumina sequencing reads were mapped to the AgamP4 reference sequence, which included both nuclear and mitochondrial sequences. We explore the relationship among An. arabiensis, An. coluzzii, An. gambiae and four of the sub-specific chromosomal form mitogenome sequences.

Sample collection
Anopheles arabiensis raw Illumina sequencing reads were obtained from our previous study (Marsden et al. (2014)). These included 20 female An. arabiensis mosquitoes which were collected indoors in houses using mouth aspirators from three villages in Tanzania Table 1. The An. quadriannulatus mosquito, used as an outgroup for the phylogenetic analysis, was collected as larvae in the Shingwidzi area (23.1160°S 31.3752°E) in South Africa in 2015 and was reared to adult.

Amendments from Version 1
The main difference between this version and the previous one is the analysis we performed to construct the phylogenetic tree. The newly created tree is shown in Figure 1. This approach is more in line of what previous studies that looked at mitogenomes in Anopheles specimens have done. This did not change the conclusion of the paper. We also added a new table (Table 1) where we list the chromosomal inversion of each specimen, as was suggested by one of the reviewers. Furthermore, we added Supplementary Table S1 with all the detected SNPs on the mitogenome for the different Anopheles species and chromosomal forms. We also addressed most of the comments the reviewers had and clarified where needed.

Results and Discussion
We identified a total of 783 single nucleotide polymorphisms (SNPs) over the entire mitogenome. The majority of these (58.7%) were singletons (found on one of the 70 mitogenomes). We did not identify any SNPs unique to the species or chromosomal forms (Supplementary Table S1) and therefore conclude that mtDNA is not suitable for Anopheles gambiae complex species identification.
The lack of species-specific markers is also reflected in the phylogenetic tree (Figure 1). An. arabiensis, An. coluzzii and An. gambiae did not cluster separately, which is consistent with previous reports that compared mitochondrial genome sequence data from specimens originating from Kenya, Senegal  )), which could be the reason that we do not find any species-specific markers on the mitogenome. Our mitochondrial genome study does not provide conclusive evidence for hybridization and introgression among the taxa under study. However, our data suggest that this is a possibility and it would be consistent with results reported by (Fontaine et al., 2015) and (Besansky et al., 1997). Future modeling work may illuminate the likely contribution of different evoluationary forces that shapes mitogenome and nuclear genome evolution.

Data availability
Aligned sequences were submitted to the National Center for Biotechnology

Grant information
We thank University of California -Irvine, Malaria Initiatives (UCIMI) for their support.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
gambiae and An. coluzzii were resting?
"Similarly, we did this for the An. gambiae Savannah and Bamako chromosomal forms. We Authorsused the same definitions and methods to characterize the chromosomal forms as in Lanzaro & Lee, 2013." It is not clear to me if you examined the polytene chromosome of each specimen you Commentidentified as the Savannah, Bamako, Forest and Mopti forms. Please clarify.

Genome sequencing
"For the An. coluzzii and An. gambiae samples we used the same methods as described in Please add a short sentence to clarify if you sequenced the whole genome and from the full Commentsequence data you obtained the positions 1-13,470 of the mitogenome.

Data analysis
"The phylogenetic tree was generated using the Jukes-Cantor genetic distance model and Authors -Neighbor-Joining tree methods available in Geneious version 10.1.3." Authors should clarify their choice for sequence analysis. The Geneious software has been Commentdeveloped for editing and aligning DNA / amino acid sequences. There are several softwares, which have been largely used to infer phylogenetic relationships. I suggest authors to refining and improving the phylogenetic analysis using appropriate programs and models that have been chosen for the mitogenome data you have at hand. some references to previous studies performed on mtDNA of the examined species (for example Besansky 1997) and why you expected to obtain different results compared to previous studies.
Please revise also: "morphologically identical species that can only be distinguished with molecular markers" ( Please insert a sentence about chromosomal forms of An.gambiae.

Methods
Please specified the method for collecting as you already described for (e.g.
Please insert a table with inversion polymorphism of chromosomal forms analyzed.
Please add the source of the specimens you included in the phylogenetic analysis.

Results
Study design is well explained and results are given concisely. Table 2 also the number of specimens you included for each species in the analysis.

Please add in
Please add in Figure two an explanation of what "lineage" means for specimens.

An. arabiensis
Please give results (also without table or figure) for each country separately.

Discussion
Discussion is very concise but deals with most major points of interest. We would just suggest to explain better the conclusion on possible introgression (the more plausible hypothesis) between taxa and to evaluate other possible explanations for the absence of fixed differences between species (e.g. absence for divergent selection, or evolutionary characherestic of mitogenomes).

Is the study design appropriate and is the work technically sound? Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Are sufficient details of methods and analysis provided to allow replication by others? Yes
If applicable, is the statistical analysis and its interpretation appropriate? Yes

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Yes
No competing interests were disclosed.

Competing Interests:
We have read this submission. We believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias You can publish traditional articles, null/negative results, case reports, data notes and more The peer review process is transparent and collaborative Your article is indexed in PubMed after passing peer review Dedicated customer support at every stage For pre-submission enquiries, contact research@f1000.com