Keywords
Mitogenome, species identification, Africa, malaria vector, mosquitoes, Anopheles, single nucleotide polymorphisms, phylogenomics
Mitogenome, species identification, Africa, malaria vector, mosquitoes, Anopheles, single nucleotide polymorphisms, phylogenomics
The main difference between this version and the previous one is the analysis we performed to construct the phylogenetic tree. The newly created tree is shown in Figure 1. This approach is more in line of what previous studies that looked at mitogenomes in Anopheles specimens have done. This did not change the conclusion of the paper. We also added a new table (Table 1) where we list the chromosomal inversion of each specimen, as was suggested by one of the reviewers. Furthermore, we added Supplementary Table S1 with all the detected SNPs on the mitogenome for the different Anopheles species and chromosomal forms. We also addressed most of the comments the reviewers had and clarified where needed.
To read any peer review reports and author responses for this article, follow the "read" links in the Open Peer Review table.
Historically, mtDNA sequence has been used in taxonomy as a source of species diagnostic markers (Cronin et al. (1991); De Barba et al. (2014); Pegg et al. (2006)) or in population genetics and evolutionary studies (Fu et al. (2013); Harrison (1989); Llamas et al. (2016)). One advantage of using mitochondrial over nuclear DNA for such studies is that the mutation rate of mtDNA is about 10 times faster than nuclear DNA (Brown et al. (1979); Haag-Liautard et al. (2008)), hence amplifying the evolutionary trajectory of populations and species. In addition, mtDNA is easy to amplify, because there are more copies of mitochondrial DNA relative to nuclear DNA. Also, universal primers can be applied to a wide range of species. Widely used universal primers target the cytochrome b and cytochrome oxidase 1 genes (Tahir et al. (2016)), because both have conserved and highly variable regions. In addition to these, other genes as described in De Mandal et al. (2014), can also be used as markers. However, phylogenetic trees based on mtDNA can deviate from the ones that are derived from nuclear DNA (Phillips et al. (2013); Shaw (2002); Sota & Vogler, 2001).
The Anopheles gambiae species complex consists of eight morphologically identical species that can only be distinguished with molecular markers (Scott et al. (1993); Coetzee et al., 2013) or, for some of the species, by cytological examination of polytene chromosomes (Green, 1972; Pombi et al., 2008). The currently used molecular markers to distinguish between An. coluzzii and An. gambiae (Lee et al., 2014) are located within genomic islands of divergence located proximal to the centromeres (Turner et al. (2005)). Monitoring additional species-specific markers on mitochondrial DNA (mtDNA) could increase the ease of application and accuracy of species detection assays. In addition, mtDNA markers could enhance our understanding of divergence times among taxa within the complex.
Previous studies showed that there is a high amount of interspecific gene flow in mtDNA between An. coluzzii, An. gambiae and An. arabiensis specimens (Besansky et al., 2003; Besansky et al., 1997; Donnelly et al., 2004). Although these data suggested no evidence for clear species division among the various species, the studies only focused on the ND5 loci (Besansky et al., 2003; Donnelly et al., 2004) or included also cytochrome b and ND1 loci (Besansky et al., 1997). In our study we use the complete mitogenome for comparison, which would make the analysis more robust. In addition, we specifically included the different chromosomal forms in our analysis. These chromosomal forms are genetically diverged from each other and display strong assortative mating in the An. gambiae chromosomal forms (Touré et al., 1998). The An. coluzzii chromosomal forms differ from each other in their ecology: An. coluzzii-Mopti is found in dry areas whereas the An. coluzzii-Forest restrtict themselves to a wet climate (Lee et al., 2009).
In this study we wished to identify species-specific markers within the mtDNA for Anopheles arabiensis, An. coluzzii and An. gambiae, including among the chromosomal forms currently subsumed under the designations An. gambiae and An. coluzzii, with the goal of adding these to our existing Anopheles species detection assay (Lee et al. (2014)). We sequenced the whole mitogenomes of 70 individual mosquito specimens collected throughout Sub-Saharan Africa. The raw Illumina sequencing reads were mapped to the AgamP4 reference sequence, which included both nuclear and mitochondrial sequences. We explore the relationship among An. arabiensis, An. coluzzii, An. gambiae and four of the sub-specific chromosomal form mitogenome sequences.
Anopheles arabiensis raw Illumina sequencing reads were obtained from our previous study (Marsden et al. (2014)). These included 20 female An. arabiensis mosquitoes which were collected indoors in houses using mouth aspirators from three villages in Tanzania in 2012 (Lupiro ((-8.38000°N, 36.66912°W), Sagamaganga (-8.06781°N, 36.80207°W), and Minepa (-8.25700°N, 36.68163°W) in the Kilombero Valley) and 4 samples from Cameroon collected in 2005 (9.09957°N, 13.72292°W). The DNA was extracted from the head and thorax of each mosquito species and An. arabiensis mosquitoes were identified using Scott primers (Scott et al., 1993)). The adult An. gambiae and An. coluzzii samples were collected indoors using mouth aspirators in Kela, Mali (11.88683°N, -8.44744°W) in 2012 and Mutengene, Cameroon (4.0994°N, 9.3081°W) in 2011. We subdivided the An. coluzzii specimen into the Forest and Mopti chromosomal forms. Similarly, we did this for the An. gambiae Savannah and Bamako chromosomal forms. We examined the polytene chromosome to characterize the chromosomal forms as in Lanzaro & Lee, 2013 and used the same definitions. The results of chromosome determination are listed in Table 1. The An. quadriannulatus mosquito, used as an outgroup for the phylogenetic analysis, was collected as larvae in the Shingwidzi area (23.1160°S 31.3752°E) in South Africa in 2015 and was reared to adult.
Sequencing methods for An. arabiensis samples are as described in Marsden et al. (2014). In short, individually barcoded Illumina paired-end sequencing libraries, with insert sizes of 320-400 basepairs (bp) using NEXTflex Sequencing kits (NOVA-5144) and barcodes (NOVA-514102)(Bio Scientific, Austin, TX, USA), were sequenced on an Illumina HiSeq2000 (Illumina, San Diego, CA, USA) with 100-bp paired-end reads using twelve samples per lane. For the An. coluzzii and An. gambiae samples we used the same methods as described in Norris et al. (2015) and Main et al. (2015). For the latter species, libraries were created using the Nextera DNA Sample Preparation Kit (FC-121-1031) and TruSeq dual indexing barcodes (FC-121-103)(Illumina) and the samples were sequenced on an Illumina HiSeq2500 with 100-bp paired end reads. We sequenced the whole genome, but only mapped the raw sequences to the NC_002084 reference mitogenome sequence.
De-multiplexed raw reads were trimmed using Trimmomatic (Bolger et al. (2014)) version 0.36 and mapped to the mitogenome reference sequence of An. gambiae (Genbank accession number = NC_002084 (Beard et al. (1993))). Freebayes (v1.0.1) (Garrison & Marth, 2013) was used for mitochondrial variant calling assuming single ploidy and without population prior. Mapping statistics were calculated using qualimap version 2.2 (Okonechnikov et al. (2016)) and the data is represented in Table 2. Following the recommendation of Crawford and Lazarro (Crawford & Lazzaro, 2012), we used a minimum depth of 8 to call variants for each individual. Between positions 1-13,470bp of the mitogenome, we obtained consistently high quality reads for all samples, which were used for further analysis. An AT-rich region located between 13,471 and 15,388 suffers from low or zero coverage for sequences generated with the Nextera library preparation kit. Therefore, we excluded these regions from further analysis. The Vcf2fasta program (Danecek et al. (2011)) was used to extract mitogenome sequences from vcf file to fasta format. Geneious version 10.1.3 was used for mitogenome alignments. The phylogenetic tree was generated using PhyloBayes MPI (Lartillot et al., 2013) using the CAT-GTR model on the genomic sequences, which is shown to give similar results compared to amino acid sequences (Foster et al., 2017). We ran the program twice for over 30000 iterations. Max difference between the two runs was 0.045 and minimum effective size was > 100 and created a consensus tree that we visualized in Geneious version 10.1.3. We used scikit-allel (v1.1.9), a software package for Python (Miles & Harding (2017)), to identify species specific markers.
Mapped reads indicates the reads that are mapped to the reference genome. Mean coverage indicates the average depth of reads on the mitochondrial DNA and standard deviation indicates the coverage deviation across the mitochondrial DNA.
We identified a total of 783 single nucleotide polymorphisms (SNPs) over the entire mitogenome. The majority of these (58.7%) were singletons (found on one of the 70 mitogenomes). We did not identify any SNPs unique to the species or chromosomal forms (Supplementary Table S1) and therefore conclude that mtDNA is not suitable for Anopheles gambiae complex species identification.
The lack of species-specific markers is also reflected in the phylogenetic tree (Figure 1). An. arabiensis, An. coluzzii and An. gambiae did not cluster separately, which is consistent with previous reports that compared mitochondrial genome sequence data from specimens originating from Kenya, Senegal and South Africa (Besansky et al. (1997)) and Burkina Faso, Cameroon, Kenya, Mali, South Africa, Tanzania and Zimbabwe (Fontaine et al. (2015), Supplemental material).
The phylogenetic tree fails to reveal a clear division of the operational taxonomic units included in this analysis. Colors indicate the species or chromosomal form and numbers at the branches indicate the accuracy of the inferred branches on a scale of 0–1, where 1 represents the highest confidence. The three An. arabiensis lineages are previously reported by Maliti and co-workers (Maliti et al., 2016).
Our data may indicate that there is no divergent selection in mitogenome among An. gambiae complex. Since mitochondrial genomes have a higher (1–10 times) substitution rate than nuclear genomes (Havird & Sloan, 2016; Lynch & Walsh, 2007), one might expect some level of divergence in the mitogenome in the absence of selection if the taxa have been separated by reproductive barrier even if they are in sympatry just as people have observed in nuclear genome. Therefore, our data showing lack of any species-specific markers on the mitogenome may due to the results of episodic hybridizations occurred between two species. Of note, 36 of the samples that we used in our study originated from Kela (Mali). Kela is located near the village of Selinkenyi, where previous studies have shown a history of hybridization and introgression between An. gambiae and An. coluzzii (Lee et al. (2013); Main et al. (2015); Norris et al. (2015)), which may have resulted in shared polymorphisms in their mitochondrial genomes. Shared polymorphisms in their mitochondrial genomes, where history has not been reported, also appeared to have occurred in Mutengene (Cameroon), where both An. gambiae and An. coluzzii occur sympatrically. Hybridization between either An. coluzzii or An. gambiae with An. arabiensis yields sterile males (Slotman et al. (2004)), but phylogenomic analysis of these species show patterns of introgression between all of them (Fontaine et al. (2015)), which could be the reason that we do not find any species-specific markers on the mitogenome. Our mitochondrial genome study does not provide conclusive evidence for hybridization and introgression among the taxa under study. However, our data suggest that this is a possibility and it would be consistent with results reported by (Fontaine et al., 2015) and (Besansky et al., 1997). Future modeling work may illuminate the likely contribution of different evoluationary forces that shapes mitogenome and nuclear genome evolution.
Aligned sequences were submitted to the National Center for Biotechnology Information (NCBI) Accession number: MG930826 - MG930896
Dataset 1. Aligned FASTA file of mitogenome samples 10.5256/f1000research.13807.d192892 (Hanemaaijer et al., 2018)
We thank University of California - Irvine, Malaria Initiatives (UCIMI) for their support.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
We thank Michelle Sanford for her assistance in the field collection in Cameroon in 2011. We thank Clare Marsden for providing the raw data of An. arabiensis samples.
Supplementary Table S1. List of SNP variants in the different Anopheles species and chromosomal forms.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Competing Interests: No competing interests were disclosed.
Is the work clearly and accurately presented and does it cite the current literature?
Yes
Is the study design appropriate and is the work technically sound?
Yes
Are sufficient details of methods and analysis provided to allow replication by others?
No
If applicable, is the statistical analysis and its interpretation appropriate?
Partly
Are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions drawn adequately supported by the results?
Partly
References
1. Foster PG, de Oliveira TMP, Bergo ES, Conn JE, et al.: Phylogeny of Anophelinae using mitochondrial protein coding genes.R Soc Open Sci. 2017; 4 (11): 170758 PubMed Abstract | Publisher Full TextCompeting Interests: No competing interests were disclosed.
Is the work clearly and accurately presented and does it cite the current literature?
Yes
Is the study design appropriate and is the work technically sound?
Yes
Are sufficient details of methods and analysis provided to allow replication by others?
Yes
If applicable, is the statistical analysis and its interpretation appropriate?
Yes
Are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions drawn adequately supported by the results?
Yes
Competing Interests: No competing interests were disclosed.
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | ||
---|---|---|
1 | 2 | |
Version 2 (revision) 15 Mar 19 |
read | |
Version 1 21 Mar 18 |
read | read |
Click here to access the data.
Spreadsheet data files may not format correctly if your computer is using different default delimiters (symbols used to separate values into separate cells) - a spreadsheet created in one region is sometimes misinterpreted by computers in other regions. You can change the regional settings on your computer so that the spreadsheet can be interpreted correctly.
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)