Intraspecific mitochondrial gene variation can be as low as that of nuclear rRNA

Background: Mitochondrial DNA (mtDNA) has long been used to date historical demographic events. The idea that it is useful for molecular dating rests on the premise that its evolution is neutral. Even though this idea has long been challenged, the evidence against clock-like evolution of mtDNA is often ignored. Here, we present a particularly clear and simple example to illustrate the implications of violations of the assumption of selective neutrality. Methods: DNA sequences were generated for the mtDNA COI gene and the nuclear 28S rRNA of two closely related rocky shore snails, and species-level variation was compared. Nuclear rRNA is not usually used to study intraspecific variation in species that are not spatially structured, presumably because this marker is assumed to evolve so slowly that it is more suitable for phylogenetics. Results: Even though high inter-specific divergence reflected the faster evolutionary rate of COI, intraspecific genetic variation was similar for both markers. As a result, estimates of population expansion times based on mismatch distributions differed between the two markers by millions of years. Conclusions: Assuming that 28S evolution is more clock-like, these findings can be explained by variation-reducing purifying selection in mtDNA at the species level, and an elevated divergence rate caused by diversifying selection between the two species. Although these two selective forces together make mtDNA suitable as a marker for species identifications by means of DNA barcoding because they create a ‘barcoding gap’, estimates of demographic change based on this marker can be expected to be highly unreliable. Our study contributes to the growing evidence that the utility of mtDNA sequence data beyond DNA barcoding is limited.


Introduction
Mitochondrial DNA (mtDNA) has long been a marker of choice for investigating concepts as diverse as estimating genetic diversity and effective population sizes, reconstructing species' evolutionary histories, exploring spatial genetic subdivisions, and identifying cryptic species. All these methods assume that mtDNA variation conforms to the neutral model of molecular evolution 1 , but violations of this premise have long been recognised 2 . Over the past decades, much evidence has accumulated that mtDNA can be strongly affected by selective sweeps and background selection [3][4][5][6] . As a result, the usefulness of the marker in assessing genetic diversity 7 and exploring spatial genetic structure in continuously distributed populations 8 has been questioned, and corrections of the mitochondrial molecular clock that account for selection have been proposed 9,10 .
The implications of reduced genetic diversity at the species or population levels due to purifying selection has so far received little attention. When mutations in mitochondrial genes occur at fewer sites than expected under the neutral model 11 , molecular dating of historical demographic events by means of evolutionary rate estimates that are typically based on interspecific divergence 12,13 will result in considerable underestimates. This is particularly likely because divergence between species can be strongly affected by diversifying selection that is driven by different environmental conditions 14,15 , resulting in a faster accumulation of mutations characterising each species than is expected under the neutral model.
Here, we explore this issue using mitochondrial and nuclear DNA sequence data from two common southern African snails of the genus Afrolittorina that show no spatial genetic structure throughout their ranges 16 . The finding that data from two genetic markers with mutation rates that are assumed to differ by at least an order of magnitude 17,18 have similar levels of intraspecific variation challenges the usefulness of mitochondrial DNA sequences for studying historical demographic changes.

Methods
Specimens of the snails Afrolittorina africana and A. knysnaensis were collected at 34 sites throughout South Africa (Table 1). DNA was extracted using the CTAB protocol 19 , amplified with universal COI primers 20 and 28S primers LSU5 21 and LSU1600 22 following Williams et al. 22 , and sequenced on an ABI PRISM 310 Genetic Analyzer (Applied Biosystems) using Big Dye Terminator v3.1 chemistry. Sequences were edited using MEGA7 23 , and 28S sequences were phased in PHASE v2.1.1 24 using default settings. Genealogical relationships between COI haplotypes and 28S alleles were reconstructed using the median-joining algorithm 25 in popArt v1.7 26 . To explore the effect of using interspecific evolutionary rates to estimate species-level population size changes, we calculated population expansion time 27 using Arlequin v3.5 28 using each marker's slowest and fastest published rates for marine gastropods (Table 2).

Amendments from Version 1
This is a minor update. Some sentences have been modified (including in the abstract) and several additional references have been included.
Any further responses from the reviewers can be found at the end of the article Figure 1. Median-joining haplotype networks constructed from a) COI sequences and b) 28S rRNA sequences of Afrolittorina knysnaensis (grey) and A. africana (white). Low intra-specific variation and high inter-specific variation of COI potentially illustrate purifying and diversifying selection, respectively. The size of circles is proportional to the frequency of each haplotype, cross-bars represent nucleotide differences, and black dots are missing haplotypes not found in the samples.

Results
Species-specific genetic clusters reconstructed from COI sequences were highly distinct (Figure 1a), with a minimum number of 44 nucleotide differences between the two species' most closely related haplotypes. In contrast, differentiation between 28S sequences (Figure 1b) was an order of magnitude smaller (4 differences).
In contrast to the high inter-specific differentiation between COI haplotypes, intra-specific genetic differentiation was comparatively low for this marker, and similar to that of 28S. In A. knysnaensis, six COI haplotypes and seven 28S haplotypes were found, while the maximum differentiation between the COI haplotypes was only two nucleotide differences, but 10 for 28S. The number of haplotypes was greater for A. africana, where 14 were found for COI and 10 for 28S. Maximum nucleotide differences for this species were seven in the COI network and five for 28S.
The practical implications of two markers with very different evolutionary rates based on inter-specific divergence having similar levels of intraspecific variation are illustrated in Table 2.
Using published rates, estimates of population expansion times were more than an order of magnitude greater based on the 28S data than based on the COI data.

Discussion
The usefulness of the mtDNA COI gene to uncover overlooked biodiversity is undisputed because of the marker's tendency to have a well-defined barcoding gap, as was found here. The two study species' COI sequences were much more strongly differentiated than their 28S sequences, potentially reflecting diversifying selection as a result of adaption to different thermal environments 16 . In contrast, there was comparatively little genetic variation at the intraspecific level for either marker, which is likely due to the commonly reported strong purifying selection acting upon the COI gene 6,9 .
Many researchers explore their mtDNA sequence data for additional information, but the selective forces that together create the barcoding gap 29 make its utility for other applications questionable 7,8 . In the present study, we have highlighted a largely unexplored problem that likely arises from selection effects in mtDNA data: the fact that demographic events using gene regions under variation-reducing purifying selection are dated using molecular clock calibrations affected by variation-increasing diversifying selection. The finding that intraspecific mtDNA variation can be as low as that of nuclear rRNA cautions against the continued use of mtDNA for exploring demographic trends by means of mismatch distributions or Bayesian skyline plots 30 , a practice that continues to dominate the recent literature 31-34 .
In our opinion, it is time to discontinue the use of fixed mtDNA rates based on divergence dating of closely related taxa, such as the closure of the Central American Seaway to date phylogenies of marine species 12,13 or the 2% rule in birds 35 . The very large datasets generated using next-generation sequencing have considerable potential to facilitate more accurate dating by identifying nuclear markers that conform to the assumptions of the molecular clock but, curiously, fixed rates based on mtDNA data are still being used to calibrate such datasets when no suitable fossil calibration points exist 36 . A possible solution may involve the identification of a suite of neutral markers that can be used to assess divergence between the species used in the original molecular dating studies, and 28S rRNA may be a suitable candidate.

Data availability
DNA sequences generated in this study were submitted to GenBank (COI accession numbers: MT331645-MT331814; 28S rRNA accession numbers: MT329760-MT330099). This study aimed to explore the genetic variation of two common southern African snails of the genus Afrolittorina. The main finding of this study is that intraspecific variations of nuclear 28S rDNA sequences are higher than such value from the mitochondrial gene in the snail populations. However, many points need to concern before acceptance. I listed the major points for revising the manuscript. Introduction part, the introduction part seems to focus on the selection process in the population to increase/decrease the genetic variation accumulation in the mt genome. The authors did not emphasize on the research questions; for example, Why did you want to study in two snails of the genus Afrolittorina? Why did you use the 28S rDNA as another genetic marker? What is the research question or hypothesis of this study? What was the reason behind deciding to study on A. aficana and A. knysnaensis? Finally, what is the main objective of this work? Whether you aimed to study the genetic differentiation between species of the snail based on different kinds of markers or study on population genetic, the content of this study could not support both objectives. For example, if the authors want to present on comparing the genetic variations that came out from the COI gene and 28S rDNA gene, the analysis used in this study was not suitable, the haplotype network analysis may not fit enough for the genetic variations you want to get. If the author wants to analyze the population genetically, the weak point is the numbers of the snails collected? And why the authors decided to collect from the various sites? What is the hypothesis behind your sampling? 1.
I don't understand why the authors have to analyze the molecular clock by comparing different originated markers like COI from one of the protein-coding genes in the mitochondrial genome and LSU (RNA-specifying gene) from the nuclear genome. Of course, the rate of evolution acting on these two genomes is different.

2.
If the authors considered in Figure 1, you would realize that 28S rDNA is not good enough for species discrimination between A. aficana and A. knysnaensis comparing with the COI gene. Only 2 nucleotide differences between those two snail species, while the nucleotide variation in the intra-specific level of A. knysnaensis is higher than that.

3.
The authors didn't discuss the evidence in snails with regards to variation, which is the point that showed in the result.

4.
There is no analysis to estimate either positive or negative selection, but the authors discussed it as the condition forced on the snail populations. It becomes over speculation.

5.
The title of this study should mention snail species because the genetic variation patterns are various depending on the groups of organisms.

Is the work clearly and accurately presented and does it cite the current literature? No
Is the study design appropriate and is the work technically sound?

Are sufficient details of methods and analysis provided to allow replication by others? No
If applicable, is the statistical analysis and its interpretation appropriate?

Not applicable
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? No
Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Molecular systematics.
We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above.

Author Response 19 Aug 2020
Peter Teske, University of Johannesburg, Auckland Park, South Africa

Reviewer Report: Abigail Hui En Chan and Urusa Thaenkham, Department of Helminthology, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand APPROVED WITH RESERVATIONS
COMMENT: This study aimed to explore the genetic variation of two common southern African snails of the genus Afrolittorina. The main finding of this study is that intraspecific variations of nuclear 28S rDNA sequences are higher than such value from the mitochondrial gene in the snail populations. RESPONSE: The statement that "intraspecific variations of nuclear 28S rDNA sequences are higher than such value from the mitochondrial gene" is not entirely correct, as in A. africana, COI has more haplotypes, whereas in A. knysnaensis, 28S has more haplotypes. We specifically state that "intra-specific genetic differentiation was comparatively low for this marker [i.e., COI], and similar to that of 28S." COMMENT: However, many points need to concern before acceptance. I listed the major points for revising the manuscript. Introduction part, the introduction part seems to focus on the selection process in the population to increase/decrease the genetic variation accumulation in the mt genome. The authors did not emphasize on the research questions; for example, Why did you want to study in two snails of the genus Afrolittorina? Why did you use the 28S rDNA as another genetic marker? What is the research question or hypothesis of this study? What was the reason behind deciding to study on A. aficana and A. knysnaensis? Finally, what is the main objective of this work?
RESPONSE: This article is a Brief Report, which has very strict word limits. For that reason, it is not possible to provide lengthy explanations for the issues raised. Briefly, the original purpose of the research was to determine whether each species is divided into regional populations (which was not found). The purpose was not to compare intraspecific variation between two markers with very different evolutionary rates, as that is an unexpected finding, and essentially a novel discovery that is reported in this paper. To answer the additional questions, we would like to refer the reviewers to the original PhD thesis on which this paper is based, which is now included among the references, and which is accessible via the following link: http://vital.seals.ac.za:8080/vital/access/manager/Repository/vital:5588?site_name=GlobalView COMMENT: Whether you aimed to study the genetic differentiation between species of the snail based on different kinds of markers or study on population genetic, the content of this study could not support both objectives. For example, if the authors want to present on comparing the genetic variations that came out from the COI gene and 28S rDNA gene, the analysis used in this study was not suitable, the haplotype network analysis may not fit enough for the genetic variations you want to get. If the author wants to analyze the population genetically, the weak point is the numbers of the snails collected? RESPONSE: The aim of the study was not to analyse each population genetically. We assume here that the reviewers equate 'populations' with sampling sites, please see the following paper for an explanation why this is problematic: Waples & Gaggiotti (2006) Molecular Ecology 15:1419-1439. As neither species shows regional sub-structure (see also the PhD thesis mentioned earlier) and each species thus comprises a single 'population', it is possible to pool data from all sites for intraspecific genetic analysis. The number of samples collected (93 and 82) is actually large compared to other papers of this nature. For example, in their highly cited paper on the barcoding gap (whose detection depends on both intraand inter-specific variation), Paulay & Meyer (2005) used an average number of 7.7 samples per cowrie species. The fact that neither species of Afrolittorina is genetically subdivided throughout its range is clearly important, and was removed from an earlier version of this article so as not to exceed the word count for Brief Reports. We have decided to briefly mention it in the last paragraph of the Introduction: "Here, we explore this issue using mitochondrial and nuclear DNA sequence data from two common southern African snails of the genus Afrolittorina that show no spatial genetic structure throughout their ranges." We now also cite the PhD thesis for anyone interested in the biology of these snails.
COMMENT: And why the authors decided to collect from the various sites? What is the hypothesis behind your sampling? RESPONSE: The original purpose of collecting samples from multiple sites that span much of each species' range was to determine whether or not these species are genetically subdivided into regional populations, which was not found. However, this is clearly not relevant to the present study, which focuses on intraspecific variation, not genetic structure.
Obtaining large numbers of samples from a few sites would be problematic as one cannot rule out that additional, genetically distinct 'populations' exist that were not sampled. Please note that Table 1 is not particularly important to understand this paper, and for that reason should not be the focus of a lengthy discussion related to sampling. We originally had it in an Extended Data section, but the editor requested us to move it to the main text.
COMMENT: I don't understand why the authors have to analyze the molecular clock by comparing different originated markers like COI from one of the protein-coding genes in the mitochondrial genome and LSU (RNA-specifying gene) from the nuclear genome. Of course, the rate of evolution acting on these two genomes is different. RESPONSE: The difference in evolutionary rates between these two markers is crucial to highlight the potential role of variation-reducing selection in COI. Without understanding this, it is not possible to understand the take-home message of this paper. We essentially point out that the known rate differences between these two markers manifest at the interspecific level. The novel finding of this study is that intra-specific levels of variation are surprisingly similar. To clarify, the statement "Of course, the rate of evolution acting on these two genomes is different" is only true between species, not within species.
COMMENT: If the authors considered in Figure 1, you would realize that 28S rDNA is not good enough for species discrimination between A. aficana and A. knysnaensis comparing with the COI gene. Only 2 nucleotide differences between those two snail species, while the nucleotide variation in the intra-specific level of A. knysnaensis is higher than that. RESPONSE: The purpose of this paper is not to suggest that 28S is a marker that is suitable for species discrimination, particularly when compared to COI. As we state in the first sentence of the Discussion, "The usefulness of the mtDNA COI gene to uncover overlooked biodiversity is undisputed because of the marker's tendency to have a well-defined barcoding gap, as was also found here." It is not clear what the reviewers mean by "not good enough", the two genetic clusters are clearly distinct, although inter-specific divergence is clearly much lower. The likely reason for this is that 28S evolves in a more clock-like fashion while COI is strongly affected by selective forces, and this is another important point to understand, as a marker that does not evolve in a clock-like fashion should not be used for intraspecific molecular dating. Please see the second paragraph in the Discussion: "In the present study, we have highlighted a largely unexplored problem that likely arises from selection effects in mtDNA data: the fact that demographic events using gene regions under purifying selection are dated using molecular clock calibrations affected by variation-increasing inter-specific divergence." Given the slow evolutionary rate of 28S, minimal differentiation between species is expected. What is unexpected are the similar levels of intraspecific variation (see title).
COMMENT: The authors didn't discuss the evidence in snails with regards to variation, which is the point that showed in the result. RESPONSE: The results report the simplest and most intuitive means of describing instraspecific variation: the number of haplotypes, and the maximum number of mutations between them. Several sentences in the Discussion deal with this, e.g. first paragraph: "In contrast, there was comparatively little genetic variation at the intraspecific level for both markers, which is likely due to the commonly reported strong purifying selection acting upon the COI gene." COMMENT: There is no analysis to estimate either positive or negative selection, but the authors discussed it as the condition forced on the snail populations. It becomes over speculation. RESPONSE: The paper originally included a test for selection, but as it was not informative  e1003527). Again, a detailed discussion of these issues would far exceed the word limit, and would not add much to the study. The simple dataset used here is clearly not sufficiently informative to conclude that purifying selection is present, and for that reason we have changed the earlier title "Purifying selection can reduce intraspecific mitochondrial gene variation to that of nuclear rRNA" (https://www.biorxiv.org/content/10.1101/2020.03.31.017764v1) to the present one, which merely focuses on intraspecific variation rather than selection. We nonetheless do not think that mentioning selection is this context is over-speculative. There is strong evidence that a finding of barcoding gaps in hundreds of studies is the result of selective forces, and we cite several important papers in this context, including two at the end of the following sentence in the Discussion: "In contrast, there was comparatively little genetic variation at the intraspecific level for both markers, which is likely due to the commonly reported strong purifying selection acting upon the COI gene." In terms of finding a COI barcoding gap, our study is merely an additional example, but the finding that intraspecific variation-reducing selection can be so significant that COI shows levels of variation similar to a marker that evolves much more slowly is novel.
COMMENT: The title of this study should mention snail species because the genetic variation patterns are various depending on the groups of organisms. RESPONSE: We respectfully disagree -the practice of including taxonomic information in the title is standard procedure in taxonomy journals, but it is undesirable in a multidisciplinary journal such as F1000Research. In general, it is well known that shorter titles that report results are more likely to get accessed ( https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3351256/). In the present case, the snail species, their geographic structure, their biology etc. are far less important than are the two genetic markers, and anyone who does not work on littorinid snails may consider the paper irrelevant to their own work if species names are included in the title. However, the finding of this study is likely applicable to a wide range of organisms given the mounting evidence for variation-reducing selection in mtDNA, and giving the impression that it is unique to two rocky shore snails would be an obstacle to further investigation into this issue. To provide an example, had Stoeckle & Thaler 2014 mentioned in the title of their article "DNA barcoding works in practice but not in (neutral) theory" that these findings are based entirely on birds, we would probably not have cited their paper, even though it is highly relevant. For that reason, we believe that it should be sufficient to mention "rocky shore snails" in the Abstract.

Is the study design appropriate and is the work technically sound? Partly
Are sufficient details of methods and analysis provided to allow replication by others? Yes

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Partly
Competing Interests: No competing interests were disclosed.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
Author Response 19 Aug 2020 Peter Teske, University of Johannesburg, Auckland Park, South Africa This is an interesting manuscript addressing the methodological issues with the continued trend of using mitochondrial DNA sequence data to infer genetic variation and population structure. The results are based on a very limited data set, however, they are only intended to further illustrate the issues highlighted by the authors. Apart from that, the study does not require much improvement. It is well written and structured and the results support the conclusions and therefore suggest indexing the article once the following issues have been considered and adequately resolved; COMMENT: The authors claim that this is the first study to use 28S ribosomal RNA at this taxonomic level. Do they mean in snails, as this gene has been used extensively for phylogenetics and some population level analysis? More info is needed. RESPONSE: We have conducted an extensive literature review on this issue, and have not found any comparable example. Even though intraspecific variation in 18S or 28S rRNA has been reported in various invertebrate taxa, upon closer inspection it becomes evident that this variation is either located in the more rapidly evolving ITS regions rather than in the rRNA, or the species studied have geographically distinct populations that may constitute cryptic species. However, as we cannot rule out the possibility that comparable examples exist, we have changed the sentence to: "Nuclear rRNA is not usually used to study intraspecific variation in species that are not spatially structured, presumably because this marker is assumed to evolve so slowly that it is more suitable for phylogenetics." The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias • You can publish traditional articles, null/negative results, case reports, data notes and more • The peer review process is transparent and collaborative • Your article is indexed in PubMed after passing peer review • Dedicated customer support at every stage • For pre-submission enquiries, contact research@f1000.com