South-East Asian strains of Plasmodium falciparum display higher ratio of non-synonymous to synonymous polymorphisms compared to African strains

Resistance to frontline anti-malarial drugs, including artemisinin, has repeatedly arisen in South-East Asia, but the reasons for this are not understood. Here we test whether evolutionary constraints on Plasmodium falciparum strains from South-East Asia differ from African strains. We find a significantly higher ratio of non-synonymous to synonymous polymorphisms in P. falciparum from South-East Asia compared to Africa, suggesting differences in the selective constraints on P. falciparum genome in these geographical regions. Furthermore, South-East Asian strains showed a higher proportion of non-synonymous polymorphism at conserved positions, suggesting reduced negative selection. There was a lower rate of mixed infection by multiple genotypes in samples from South-East Asia compared to Africa. We propose that a lower mixed infection rate in South-East Asia reduces intra-host competition between the parasite clones, reducing the efficiency of natural selection. This might increase the probability of fixation of fitness-reducing mutations including drug resistant ones.

Artemisinin combination therapy (ACT) is the frontline treatment for malaria caused by Plasmodium falciparum and has played a major role in reducing malaria mortality from an estimated 840,000 deaths in the year 2000 to 440,000 deaths in the year 2015 1 . The emergence and spread of artemisinin resistance in South-East Asia, however, poses a serious threat to malaria control, and the containment of artemisinin resistance is thus a global public heath priority [2][3][4][5][6][7][8] .
One of the most important unanswered questions in anti-malarial drug resistance is why it has repeatedly emerged in South-East Asia 3,5,6,9 . The resistance to chloroquine was first reported in South-East Asia in 1957 before spreading to India and Africa where it resulted in the significant increase in malaria child mortality possibly killing millions of children 10-12 . The resistance to sulphadoxine-pyrimethamine also emerged in South-East Asia in the late 1960s following a similar route to India and Africa 9 . Worryingly, the resistance to artemisinin has emerged independently at multiple places in South-East Asia 13-17 and is now present 25 km from the Indian border 16 threatening to follow the same trajectory as resistance to previous anti-malarial drugs. Improved understanding of the process of how and why antimalarial drug resistance emerges in South-East Asia could provide critical information in developing strategies to prevent the spread of the current wave of artemisinin resistance.
Here we ask whether there are evolutionary constraints on P. falciparum strains from South-East Asia that differ from African strains and thus might explain the higher predisposition of South-East Asia strains to evolve drug resistance. To address this question we utilized a recent large global genome sequencing data from ~3400 clinical samples which identified nearly million high-quality single nucleotide polymorphisms (SNPs) in the exonic regions of P. falciparum 18 .

Results
Higher ratio of non-synonymous to synonymous polymorphism in P. falciparum from South-East Asia Resistance to anti-malarial drugs often involves changes in the amino-acid sequence within specific proteins. Thus, we tested whether the ratio of non-synonymous (amino acid changing) to synonymous polymorphism is higher in South-East Asia (SEA). Figure 1 shows a significantly higher ratio of non-synonymous to synonymous polymorphism (N/S) in SEA samples compared to African samples with almost no overlap in their distributions. The mean and median N/S for samples from SEA were 2.33, compared to 2.06 for Africa (Wilcox test p-value 0, number SEA samples 1600, and number Africa samples 1647). The higher N/S in SEA compared to Africa was also evident at the gene level with a larger number of genes showing higher N/S in SEA than in Africa ( Figure 2). Mean and median N/S for genes in SEA samples were

Amendments from Version 1
There are two major amendments from version 1: 1. The Discussion section has been extensively modified and expanded to provide more support and context for our model for the higher propensity of SEA populations of P. falciparum to acquire drug resistance. 2.1 and 1.9 respectively, while for African samples the mean and median N/S were 1.9 and 1.8 respectively (paired t-test p-value 1E-43, paired Wilcox-test p-value 4E-27, n = 4792). There were 75 genes with more than 3-fold higher N/S in SEA samples relative to African samples and N/S of more than four in SEA. Interestingly, most of these genes were not related to antigenic variation (Supplementary Table 1), but perform basic housekeeping functions, suggesting that higher N/S of these genes in SEA might not be primarily driven by differential host immune selection. In addition to kelch13, -the only gene known to be causally associated with artemisinin resistance-the list includes CRT (chloroquineresistance transporter) which shows an 8-fold higher N/S in SEA samples compared to African samples and has previously been shown to be associated with artemisinin resistance in a genome-wide association studies (GWAS) study 14 . In summary, P. falciparum strains from SEA show a higher ratio of nonsynonymous to synonymous polymorphisms than African strains.
Higher non-synonymous changes at the conserved positions in South-East Asia Highly conserved proteins in P. falciparum show a much lower N/S, indicating the lower tolerance for non-synonymous polymorphism 18 . We tested whether the correlation between N/S and protein conservation might be different in SEA and Africa. The correlation between N/S and conservation was much weaker in SEA ( Figure 3) with Pearson correlation of -0.43 (95% CI: -0.41 to -0.46) compared to -0.69 (95% CI: -0.68 to -0.71) in Africa. The lower correlation in SEA suggests a higher ratio of non-synonymous to synonymous changes at conserved positions.
Indeed, non-synonymous polymorphisms specifically observed in SEA are more likely to occur at conserved positions compared to those specific to Africa (Figure 4). Samples from SEA show higher N/S compared to Africa when considering only conserved positions ( Figure 5). These results suggest a lower efficiency of negative selection in SEA in removing potentially deleterious mutations. This may be important for the acquisition of antimalarial drug resistance since drug-resistance mutations preferentially occur at the conserved sites 19 , e.g. artemisinin resistance mutations in Kelch13 occur in the conserved region of the protein 18 , resistance mutations also occur in the conserved regions in DHFR (dihydrofolate reductase), DHPS (dihydropteroate synthase), and CRT (chloroquine-resistance transporter) 19 . In summary, P. falciparum strains from SEA show a higher ratio of non-synonymous to synonymous polymorphisms at conserved sites in the protein sequences than African strains.

Lower mixed infection rate in South-East Asia
Blood samples may contain more than one haploid parasite clone due to mixed infections by multiple genotypes. The rate of mixed strain infection is generally lower in areas of low-transmission such as SEA 20 . The lower efficiency of negative selection in removing potentially deleterious mutations at conserved positions in SEA could result from lower competition between parasite clones in the hosts. Indeed, the estimated rate of mixed strain infections, detected by a high proportion of heterozygous calls in the sequencing data, was much lower in South-East Asia compared to Africa ( Figure 6). We also confirmed that N/S is SEA samples was higher than samples from Africa even when separately analysing predicted single strain and mixed strain samples (Supplementary Figure 1).

Discussion
Here we find a higher N/S ratio in strains from SEA compared to Africa. We also find that non-synonymous mutations have a higher likelihood to occur at conserved sites in SEA strains compared to African strains. In addition, we confirm a lower rate of mixed strain infection in SEA compared to Africa in the MalariaGEN dataset, the largest whole-genome dataset on P. falciparum till date.
Based on these three observations, we propose a model for the higher propensity of SEA populations to acquire drug resistance (Supplementary Figure 2). Lower mixed strain infections in SEA may allow even less-fit parasites to be transmitted to the next set of hosts due to reduced level of intra-host competition. In contrast, the higher mixed strain infection rate in Africa may drive more intense intra-host competetion, and may therefore reduce the probability of transmission of less-fit parasites. Thus, fitnessreducing mutations including drug-resistance mutations might have a higher chance of spreading in SEA compared to Africa in patients not taking drugs. Since Africa has higher rate asymptomatic infections as well as untreated patients, this would also result in higher competition between drug resistant and drug sensitive clones in the absence of drug, further decreasing the spread of drug resistance mutations with a fitness cost.
This model is consistent with a number of previous studies. Our observation of higher likelihood of fixation of potentially deleterious mutations in P. falciparum strains from SEA compared to African strains is consistent with the previous observation of higher rate of potentially deleterious copy number variations in P. falciparum from SEA compared to Africa 21 . These observations suggest relaxed negative selection on P. falciparum from SEA compared to Africa and that SEA strains would have lower fitness  Mixed strain infection by P. falciparum has recently been demonstrated to lead to within-host competition in patients 22 , the possible mechanisms of which might include strain-transcending immunity, resource competition (e.g. RBCs) or direct interference between strains 23-26 . While within-host competition seems to be the major explanation for lower N/S in African strains, mixed strain infection would also lead to higher rate of recombination between gametes of different genotypes and efficient removal of deleterious mutations in Africa. In any case, a higher rate of mixed strain infection is expected to increase the strength of purifying selection.
What are the implications of our model for the current wave of artemisinin resistance? The much larger population size of P. falciparum in Africa 21 , as also evidenced by the high rate of mixed strain infection ( Figure 6) should make it easier for resistance mutations to appear. Indeed, artemisinin resistance mutations in kelch13gene were observed in samples from Africa, including the most common artemisinin resistance mutation C580Y 18 . The C580Y mutation is capable of generating artemisinin resistance in vitro in the NF54 parasite strain considered to be of African origin 27 . This raises an important question as to why artemisinin resistance is not spreading in Africa. Since artemisinin resistance is likely to incur a fitness cost in the drug-free environment 28-30 , we propose that strains with these mutations are continuously arising in Africa but get competitively removed by the fitter drug-sensitive strains 30 in hosts not taking artemisinin. This effect might be pronounced by the greater proportion of asymptomatic and untreated patients in Africa. However once a strain acquires compensatory mutations that may reduce the fitness cost of the original mutation, it may be able to spread in a more competitive environment in Africa. While compensatory mutations can occur anywhere in the genome and may even spread in South-East Asia, these could be unlinked by recombination in areas with high transmission rate such as Africa 31 . Thus, compensatory mutations in the same gene might be more likely to spread in high-transmission areas. Indeed, drugresistance genes often acquire multiple mutations before spreading to Africa, e.g. pyrimethamine resistance gene dhfr acquired at least three different mutations in South-East Asia before it spread to Africa 9 . All chloroquine-resistant strains have the K76T mutation in CRT (chloroquine-resistance transporter) but are accompanied by a number of mutations in the same gene 32 . While at present kelch13 does not appear to have multiple mutations 33 , it would be critical to monitor the acquisition of additional mutations in the kelch13 which might compensate the fitness cost of kelch13 resistant mutations in the drug-free environment. Resistance to chloroquine and sulphadoxine-pyrimethamine spread from SEA to India to Africa 3 . Interestingly we observed a higher mixed strain infection rate in Bangladesh than in neighboring SEA. The Indian subcontinent has areas with widely variable transmission rates 34 . This might allow drug-resistant P. falciparum evolved in low transmission areas in SEA to gradually adapt to higher transmission areas in the Indian subcontinent, which could then spread to the high transmission areas in Africa. Therefore, it would be critical to track the spread of artemisinin resistance in the Indian subcontinent.
It is important to note that higher N/S in SEA populations does not necessarily imply higher mutation rate. Brown et al. previously found similar substitution rates in samples from Africa and SEA 35 . Mutation rate as measured by long-term in vitro culture was not higher in strains from SEA origin, either in the presence or absence of drug 36,37 . Thus mutation rate in SEA population appears to be similar to that of African population, but a higher fraction of mutations are observed at conserved non-synonymous positions in SEA. The MalariGEN study from where we obtained the dataset reported much higher density (per sample) of both synonymous and non-synonymous polymorphisms in Africa compared to SEA 18 . It is also important to note that higher density of SNP/sample does not imply higher substitution rate in Africa, rather it reflects the higher rate of mixed strain infection in Africa, i.e. more SNPs are identified in samples from Africa because of the higher number of different parasite clones per samples ( Figure 6). The authors of the MalariaGEN also wrote that at the gene level "we found virtually identical distributions of the ratio of non-synonymous to synonymous mutations (N/S ratio) in the two regions" 18 , however, no statistical test was performed by the authors. Furthermore, no comparison of N/S at the sample level was performed in the MalariaGEN study. Resistance to chloroquine and sulfadoxine-pyrimethamine appeared independently in SEA and South America 38 . While there were few samples from South-America in the MalariaGEN dataset ( Figure 6), we find that these samples also display lower mixed infection rate ( Figure 6), and N/S ratio in between the African and SEA samples ( Figure 1). Further analyses of a larger number of samples from South America could shed light on whether the mechanism we propose for a higher rate of resistance emergence in SEA might be applicable to South-America.
In summary, we propose that the lower transmission rates in SEA lead to a lower rate of mixed strain infection, which leads to reduced strength of natural selection. This, in turn, allows a higher rate of fixation of potentially deleterious mutations including drug resistance mutation. However, other factors such as drug usage, the level of immunity, and social factors 3,5,39 , could also contribute towards the faster development of resistance in SEA. Given the basic difference in the transmission rate between SEA and Africa, which is not easy to control, we should expect that SEA would remain a source of drug resistance malaria in the future.

Methods
The SNP data of P. falciparum was obtained from the Malari-aGen community webpage (https://www.malariagen.net/data/ p-falciparum-community-project-jan-2016-data-release) 18 . The SNP data consist of filtered and high quality 939,687 exonic SNPs with 631,715 non-synonymous and 307,972 synonymous SNPs. The data comprised 3,394 samples from 22 countries, with roughly equal number of samples from South-East Asia (1,600 samples) and Africa (1,647 samples). The N/S ratio for each sample was obtained by dividing the number of non-synonymous SNPs by the number of synonymous SNPs in that sample. Proteome sequences of P. falciparum, P. berghei, P. chabaudi, P. cynomolgi, P. knowlesi, P. reichenowi, P. vivax, P. yoelii were downloaded from PlasmoDB database and proteome sequences of S. cerevisiae, D. melanogaster, C. elegans and H. sapiens were downloaded from European Bioinformatics Institute (EBI) database. Orthologous sequences were identified using best bidirectional hit algorithm 40 and aligned using ClustalO 41 . The conservation score for P. falciparum proteins was calculated as the percentage of positions identical across all orthologous proteins from Plasmodium species. The N/S ratio for each gene in South-East Asia and Africa was calculated by dividing the number of unique non-synonymous SNPs by the number of unique synonymous SNPs across samples from the two geographical areas. There were 136 genes with zero synonymous SNPs in SEA and thus were excluded from the analyses. The Pearson correlation between N/S for each gene and the conservation score was calculated in R. All figures were created in R version 3.2.3. Mixed infection samples were defined as samples with >10% SNP calls as heterozygous. This cut-off was determined from the distribution of heterozygous SNPs across the samples (Supplementary Figure 3). It is important to note that this method is not likely to accurately classify each sample into a polyclonal (mixed infection) or a monoclonal sample, but the overall trend of higher rate of mixed infection in African samples compared to SEA samples is likely to be robust.
Author contributions G.P.S. and A.S. conceived and designed the study. G.P.S. performed the research. G.P.S. and A.S. wrote the manuscript. All authors reviewed the manuscript.

Competing interests
No competing interests were disclosed.

Grant information
The work is supported by J. C. Bose Fellowship to A.S. by Department of Science and Technology, Govt. of India and by an Early Career Fellowship to G.P.S. by the Wellcome Trust/DBT India Alliance (IA/E/15/1/502297).
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.  This revised article "South-East Asian strains of display higher ratio of Plasmodium falciparum non-synonymous to synonymous polymorphisms compared to African strains" address several of the different criticisms listed by the reviewer in the recent submitted peer-review. In particular, the authors have revised their discussion section to better highlight the contrast between their conclusions and the conclusions made by a similar MalariaGen analysis. The authors have also addressed errors highlighted by the reviewer in both figure legends and discussion concerning the effects of selection. Together, these changes create a stronger article worth being indexed.

I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
No competing interests were disclosed.

Title and Abstract
The title is appropriate.
The abstract makes some jumps in logic: e.g Furthermore, South-East Asian strains showed a higher " proportion of non-synonymous polymorphism at conserved positions, suggesting reduced negative selection." This pattern can equally well be explained by stronger selection in SEA than in Africa. positive

F1000Research
Major points The article comes to from those Results differ from the prior analysis of this dataset. different conclusions reported by the MalariaGEN consortium in eLife (ref 18) who write: Accordingly, we found virtually " identical distributions of the ratio of non-synonymous to synonymous mutations (N/S ratio) in the two regions ( )" page 6 bottom: MalariaGEN Community Project. eLife Figure 3c Plasmodium falciparum 2016;5:e08714. DOI: 10.7554/eLife.08714). It is unclear to me why this F1000 submission and the published eLife paper reach very different conclusions ? I suggest that the authors using the same dataset should cite the conclusions of the eLife paper and explain why their analysis comes to a very different conclusion. This point needs urgently needs satisfactory resolution as this is the central conclusion of the paper.
Reduced ability to score non-synonymous variants in African populations Methodology/analysis biases. could generate the N/S ratios observed. Such a bias could be generated because infections are more complex in Africa. Non-synonymous mutations tend to be at lower frequency than synonymous mutations. In complex mixed African infections low frequency non-synonymous mutations may be filtered out of the dataset (because very few reads show the mutation) but scored in SE Asian samples, where infections are simpler (so multiple reads show non-synonymous mutations). To eliminate this potential methodological bias, the authors should filter their data to minimize mixed infections prior to analysis. This could be done by setting a threshold for numbers of mixed base calls. Again, it is important to understand why the conclusions reached by this analysis differ from the published eLife paper.
As pointed out in the eLife paper there is a dramatic difference in the allele frequency spectrum in the two locations. The excess of rare variants in Africa could also potentially alter the N/S ratios.
Assuming the central result is correct, a simpler model explains these Population genetics explanation. results: The effective population size (Ne) of parasite populations is lower in SEA than in Africa. Therefore purifying selection is weaker in SEA than in Africa resulting in less efficient removal of deleterious mutations. Discussion of this simple alternative explanation would improve the paper. Also : The paper counts numbers of non-synonymous and synonymous mutations and Statistics used estimates the ratio (N/S). This is an unusual way to present such data -it would be more informative to show dN/dS -(Nonsynonymous changes per non-synonymous site/synonymous variants per synonymous site). This statistic is more useful because the expected ratio under neutrality is 1. This should not change the results but will be more easily interpretable with reference to a neutral model. Related questions: How is N/S ratio determined for genes in which there are NO synonymous mutations? Details of the methods for calculating this ratio, and how they differ from those used in the eLife paper should be provided.
The authors argue that the observed pattern More support needed for intrahost competition explanation: may result from more within infection competition in Africa relative to Asia. If competition is invoked to explain the results, then it would be useful to examine the categories of genes contributing to the excess of NS variations. This argument would imply that this effect should be seen in subsets of genes involved in within host competition. On the other hand, if there is no particular enrichment of particular gene classes, then the simple population genetics explanation seems more likely.
F1000Research classes, then the simple population genetics explanation seems more likely.

Conclusions
The authors argue that drug resistance may arise in SEA rather than Africa because Lack of balance. intrahost competition prevents emergence of resistance mutations associated with fitness costs. I agree in part -this may certainly contribute. However, greater selection for resistance to drugs in SE Asia is likely to be a critical factor. Most infections in SEA are symptomatic and infected individuals seek treatment. However, most infections in Africa are asymptomatic, so people to no seek treatment. Discussion of the difference in selection strength between continents would add balance to this paper.
I was very surprised to see this analysis of MalariaGEN data without involvement from Other issues. malariaGEN authors. My understanding is that access to these data requires a "Fort Lauderdate" type agreement with MalariaGEN (https://www.malariagen.net/data/terms-use/pf3k-terms-use). The authors may have already discussed this with relevant people at MalariaGEN -I would suggest that the authors directly contact Dominic Kwiatkowski, if they have not already done so.

I have read this submission. I believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.
No competing interests were disclosed.

Competing Interests:
Author Response 17 Oct 2016 , International Centre for Genetic Engineering and Biotechnology, New Delhi, India

Gajinder Singh
We are grateful for your valuable comments, which we believe have significantly improved our manuscript.
The abstract makes some jumps in logic: e.g "Furthermore, South-East Asian strains showed a higher proportion of non-synonymous polymorphism at conserved positions, suggesting reduced negative selection." This pattern can equally well be explained by stronger selection in positive SEA than in Africa.

Positive selection can indeed increase (though not in all cases) N/S for the Author response:
gene under selection, as has been observed for the gene in SEA. However, the pattern of kelch13 higher N/S in strains in SEA which can be observed at genome-wide level is much more consistent with relaxed negative selection in SEA, rather than higher positive selection in SEA, which typically acts at a few sites. Furthermore, reduced constraints at conserved sites at the genome-wide level supports relaxed negative selection rather than positive selection. We thus believe that our use of word "suggesting" is appropriate in the statement.

Major points
The article comes to from Results differ from the prior analysis of this dataset.

08714). It is unclear to me why this F1000 submission and the published eLife paper reach very different conclusions using the same dataset ? I suggest that the authors should cite the conclusions of the eLife paper and explain why their analysis comes to a very different conclusion. This point needs urgently needs satisfactory resolution as this is the central conclusion of the paper.
The authors in the MalariaGEN study did not perform any statistical test to Author response: support their statement that N/S in SEA and Africa is similar at the gene level. While the difference is subtle, it is highly statistically significant, as we report in our manuscript. Furthermore, the authors in MalariaGEN study did not analyse differences in N/S at the level, where we sample clearly see the differences between SEA and African strains (Figure 1). We have now added these statements to the Discussion section of the revised manuscript.

Reduced ability to score non-synonymous variants in African Methodology/analysis biases. populations could generate the N/S ratios observed. Such a bias could be generated because infections are more complex in Africa. Non-synonymous mutations tend to be at lower frequency than synonymous mutations. In complex mixed African infections low frequency non-synonymous mutations may be filtered out of the dataset (because very few reads show the mutation) but scored in SE Asian samples, where infections are simpler (so multiple reads show non-synonymous mutations). To eliminate this potential methodological bias, the authors should filter their data to minimize mixed infections prior to analysis. This could be done by setting a threshold for numbers of mixed base calls. Again, it is important to understand why the conclusions reached by this analysis differ from the published eLife paper.
Separately analysing predicted monoclonal and polyclonal samples did not Author response: change conclusions (Supplementary Figure 1 in the revised manuscript). This is not surprising given that we observed almost no overlap in the distributions of N/S in SEA and African samples (Figure 1), as we wrote in the manuscript. We would like to emphasize that authors in MalariaGEN study did not perform N/S analyses at the sample level.
As pointed out in the eLife paper there is a dramatic difference in the allele frequency spectrum in the two locations. The excess of rare variants in Africa could also potentially alter the N/S ratios.
As proposed by the reviewer in the previous comment, the excess of rare Author response: variants in Africa might increase N/S in Africa, thus the higher N/S in SEA could not be explained by excess rare variants in Africa. Indeed relaxed negative selection in SEA due to lower effective population Author response: (reflected in lower polyclonal infections) is exactly what we propose in our manuscript for our observation of higher N/S in SEA. Lower Ne (effective population size) is bound to be associated F1000Research observation of higher N/S in SEA. Lower Ne (effective population size) is bound to be associated with lower polyclonal infection rate, which has been shown to lead to within-host competition in P.

Assuming the central result is correct, a simpler model explains
. Thus we do not believe that the two explanations are alternative independent falciparum explanations, one of which needs to be discounted. Rather the observation made by Cheeseman and in our manuscript are convergent which further supports reduced negative selection in et al, P. in SEA and we have added these statements to the Discussion section of the revised falciparum manuscript.
: The paper counts numbers of non-synonymous and synonymous mutations and Statistics used estimates the ratio (N/S). This is an unusual way to present such data -it would be more informative to show dN/dS -(Nonsynonymous changes per non-synonymous site/synonymous variants per synonymous site). This statistic is more useful because the expected ratio under neutrality is 1. This should not change the results but will be more easily interpretable with reference to a neutral model.
As pointed out by the reviewer dN/dS comparison would not change the Author response: results.
is unusual in having very high N/S compared to other organisms , thus the P. falciparum usual expectation of dN/dS of 1 for neutral evolution is not applicable for .

P. falciparum
Related questions: How is N/S ratio determined for genes in which there are NO synonymous mutations? Details of the methods for calculating this ratio, and how they differ from those used in the eLife paper should be provided.
There were 136 genes with zero synonymous SNPs in SEA (none in Africa) Author response: and thus were excluded from the analyses. We have added this statement in the Method section of the revised manuscript.

The authors argue that the observed More support needed for intrahost competition explanation: pattern may result from more within infection competition in Africa relative to Asia. If competition is invoked to explain the results, then it would be useful to examine the categories of genes contributing to the excess of NS variations. This argument would imply that this effect should be seen in subsets of genes involved in within host competition. On the other hand, if there is no particular enrichment of particular gene classes, then the simple population genetics explanation seems more likely.
There is no dataset of genes known to be involved in within-host competition in Author response: . We do note however that many housekeeping genes show higher N/S in SEA than P. falciparum Africa (Supplementary Table 1). As discussed in response to previous comments by the reviewer, the lower Ne, lower rate of polyclonal infections and within-host completion are not mutually exclusive explanations, one of which needs to be discounted.

Conclusions
The authors argue that drug resistance may arise in SEA rather than Africa Lack of balance. because intrahost competition prevents emergence of resistance mutations associated with fitness costs. I agree in part -this may certainly contribute. However, greater selection for resistance to drugs in SE Asia is likely to be a critical factor. Most infections in SEA are symptomatic and infected individuals seek treatment. However, most infections in Africa are asymptomatic, so people to no seek treatment. Discussion of the difference in selection strength between continents would add balance to this paper. 1 2 F1000Research would add balance to this paper.
The higher rate asymptomatic infections as well as untreated patients in Africa Author response: would mean higher competition between drug resistant and drug sensitive clones in the absence of drug, further decreasing the spread of drug resistance mutations with a fitness cost. Thus higher immunity, lower treatment rates, and higher polyclonal infections are likely to work synergistically. We have added these statements to the Discussion section in the revised manuscript. We have taken permission from the authors of the MalariaGEN study, and their Author response: contribution has been appropriately acknowledged in our publication.

I was very surprised to see this analysis of MalariaGEN data without involvement
We would again like to thank you for your critical comments on our manuscript, and would be happy to address any further concerns that you may have in the revised version of the manuscript. In this article the authors conduct a detailed analysis that focused on the frequency of non-synonymous mutations across an extensive genome set derived by the MalariaGEN consortium from an extensive set of isolates collected mainly in Africa and SE Asia. The analyses extend the work presented P. falciparum recently (DOI: 10.7554/eLife.08714) where a trend was noted for a higher level of non-synonymous mutations in the SE Asian parasites a compared to that in the African parasites. Specifically, the authors have presented the analysis for the parasites from individual countries and focused on SNPs that occur at residues that are highly conserved in orthologues within a variety of species and across Plasmodium eukaryotes in general. The authors have thus provided convincing evidence for statistically significant albeit relatively subtle differences between the parasites from SE Asia and Africa (it is unclear to me why F1000Research 1.

5.
eukaryotes in general. The authors have thus provided convincing evidence for statistically significant albeit relatively subtle differences between the parasites from SE Asia and Africa (it is unclear to me why parasites from Peru and Colombia were included in the analysis, all the more as their numbers were relatively low).
Whereas, the quality of the data, the analyses and the results are not into question, I have some misgivings and indeed frustrations as to the conclusions put forward in an attempt to account for the differences observed. The scenario inspired by the observations that the authors propose as an explanation for the propensity of antimalarial drug resistance to emerge and spread in from P. falciparum SE Asia first before doing so in Africa is plausible and could be supported by the data presented. However, this scenario is based on a set of assumptions whose limitations are not discussed, without any justification as to why this particular scenario is the most likely.
What is the nature of the intra-host competition between the genotypes of that P. falciparum circulate in humans, and is there any evidence that this actually alters transmissibility? Immunity is the most obvious selective intra-host factor. To which extend differences in the levels of immunity among the patients that contributed the isolates analysed here alters the multiplicity P. falciparum of the infection or indeed the likelihood of genetic crosses?
Is there any evidence that parasite with the higher levels of non-synonymous mutations across the genome are biologically less fit? There is evidence that this is the case for drug resistant parasites , that for parasites growing is less straightforward and varies for different drugs.

in vitro in vivo
To what extend are the isolates collected from a limited number of febrile patients reflect the overall parasite population across a whole country. Could a bias be introduced simply because of differences in the degree of acquired immunity on admission, the time between the onset of the infection and the time treatment was sought, or the level of admission parasitaemia?
Exposure to drug is clearly a strong selective constraint for those genes implicated in overcoming the effects of the drug. What can be the nature of the selective constrains that maintain the higher levels of non-synonymous mutations across numerous genes spread throughout the genome? Clearly not all can be considered to provide some speculative compensatory effect to a potential reduction in fitness following drug selection.
It should be pointed out that in the 1950's resistance to chloroquine first appeared not only in SE Asia but also in Colombia. Similarly resistance to pyrimethamine did not first appear in SE Asia, but in all areas where the use of this drug became widespread and systematic (this was the case in both SE Asia and African countries in the 1950's). The "unanswered" question as to the reason why drug resistance has repeatedly emerged in SE Asia, might have more to do with drug usage than with some special property of the parasites.
Ultimately, I would suggest that the authors should present the limitations of their conclusions as well as alternative scenarios to account for their observations (and rank or discount them if they can). This will surely be welcome by readers. Advances in knowledge are seeded by diversity in speculation (conservative or less so) and genome wide analyses are a rich source for this. We are grateful for your valuable comments, which we believe have significantly improved our manuscript.

References
In this article the authors conduct a detailed analysis that focused on the frequency of non-synonymous mutations across an extensive genome set derived by the MalariaGEN consortium from an extensive set of P. falciparum isolates collected mainly in Africa and SE Asia. The analyses extend the work presented recently (DOI: 10.7554/eLife.08714) where a trend was noted for a higher level of non-synonymous mutations in the SE Asian parasites a compared to that in the African parasites. Specifically, the authors have presented the analysis for the parasites from individual countries and focused on SNPs that occur at residues that are highly conserved in orthologues within a variety of Plasmodium species and across eukaryotes in general. The authors have thus provided convincing evidence for statistically significant albeit relatively subtle differences between the parasites from SE Asia and Africa (it is unclear to me why parasites from Peru and Colombia were included in the analysis, all the more as their numbers were relatively low).
We would like to clarify that MalariaGEN authors did not write or suggest that Author response: N/S ratio of strains from SEA and Africa are different. We have discussed the P. falciparum difference between the work reported in MalariaGEN study and our manuscript in the Discussion section of the revised manuscript. We added data from Peru and Columbia in Figures 1, Figure 5 and Figure 6 only for completeness, but we have refrained from making any firm conclusions based on these samples. We have nevertheless commented upon the observations from South-American samples in the Discussion section in response to your comment.
Whereas, the quality of the data, the analyses and the results are not into question, I have some misgivings and indeed frustrations as to the conclusions put forward in an attempt to account for the differences observed. The scenario inspired by the observations that the authors propose as an explanation for the propensity of antimalarial drug resistance to emerge and spread in P. falciparum from SE Asia first before doing so in Africa is plausible and could be supported by the data presented. However, this scenario is based on a set of assumptions whose limitations are not discussed, without any justification as to why this particular scenario is the most likely.
What is the nature of the intra-host competition between the genotypes of P. falciparum that circulate in humans, and is there any evidence that this actually alters transmissibility? Immunity is the most obvious selective intra-host factor. To which extend differences in the levels of immunity among the patients that contributed the P. falciparum isolates analysed here alters the multiplicity of the infection or indeed the likelihood of genetic crosses?
We thank the reviewer for pointing this out. There is indeed evidence of Author response: intra-host competition between genotypes of in humans that reduces parasite density P. falciparum . The association between parasite density and gametocyte density in humans has also been previously shown . The reasons for intra-host competition however remain unknown. 1 2 F1000Research previously shown . The reasons for intra-host competition however remain unknown. Strain-transcending immunity, resource competition (e.g. RBCs), and direct interference between strains have been proposed as possible mechanisms responsible for within-host competition . It is also possible that lower recombination rates due to lower rate of mixed infections in SEA may reduce the removal of deleterious mutations. We have added these sentences to the Discussion section of the revised manuscript.
Is there any evidence that parasite with the higher levels of non-synonymous mutations across the genome are biologically less fit? There is evidence that this is the case for drug resistant parasites in vitro, that for parasites growing in vivo is less straightforward and varies for different drugs.
There is currently no evidence that strains from SEA are Author response: P. falciparum biologically less fit that strains from Africa. Indeed that is the prediction of our model. It would be fascinating to test this hypothesis. We have added these sentences to the Discussion section of the revised manuscript.
To what extend are the isolates collected from a limited number of febrile patients reflect the overall parasite population across a whole country. Could a bias be introduced simply because of differences in the degree of acquired immunity on admission, the time between the onset of the infection and the time treatment was sought, or the level of admission parasitaemia? It is possible that samples collected may not be representative of the whole Author response: country; however the consistent results we obtain for across countries in Africa and SEA suggest that our observations of higher N/S is SEA compared to Africa is robust to random variations introduced due to sampling.
It is possible that some biases in sampling may be introduced that could lead to differences in the observed rate of polyclonal infections between Africa and SEA. We separately analysed predicted monoclonal and polyclonal samples and find results similar to those obtained using all samples (Supplementary Figure 1 in the revised manuscript). We thus currently have no hypothesis of how biased sampling could actually lead to higher genome-wide N/S in SEA.
Exposure to drug is clearly a strong selective constraint for those genes implicated in overcoming the effects of the drug. What can be the nature of the selective constrains that maintain the higher levels of non-synonymous mutations across numerous genes spread throughout the genome? Clearly not all can be considered to provide some speculative compensatory effect to a potential reduction in fitness following drug selection. As further explained in the Discussion section, the higher genome scale N/S in Author response: SEA is consistent with relaxed negative selection. We hypothesise that relaxed negative selection could be due to low level of polyclonal infection and thus lower within-host competition in SEA.
It should be pointed out that in the 1950's resistance to chloroquine first appeared not only in SE Asia but also in Colombia. Similarly resistance to pyrimethamine did not first appear in SE Asia, but in all areas where the use of this drug became widespread and systematic (this was the case in both SE Asia and African countries in the 1950's). The "unanswered" question as to the reason why drug resistance has repeatedly emerged in SE Asia, might have more to do with drug usage than with some special property of the parasites. Indeed resistance to chloroquine and sulphadoxine-pyrimethamine also Author response: appeared independently in South-America (but not in Africa ). We have very few samples from South America (27 samples from 2 countries) to make robust conclusions. However it is interesting that samples from South America do show low rate of polyclonal infections and higher N/S compared to Africa (Figure 1 and Figure 6). We have added these sentences to the Discussion section of the revised manuscript. F1000Research observed in the initial MalariaGen publication(s) that South East Asian (SEA) populations exhibited a general trend of higher Non-synonymous to synonymous mutation ratio (N/S) compared to African populations. The authors replicate and expand this analysis and then compare this trend to a protein dataset comprised of Eukaryotic sequences to show that even among the most conserved sites SEA continues to accrue more NS polymorphisms than African isolates. Lastly the authors set out an argument for the mechanism behind this apparent relaxed negative selection within SEA Pf, attributing it to the lower rate of mixed genotype infections across SEA and the subsequent lack of competition within host. This would create the circumstances that would allow for multiple independent resistance mutations arising and fixing within the SEA population. Exploring general patterns of genome architecture within P. falciparum is vital to understanding the spread and prevention of resistance to anti-malarial drugs, and as a result this work is of interest to the malaria community. There are some clarifications required, and two significant areas of improvement for the paper: We thank you for your review and comments. We would like to clarify that Author response: higher N/S in strains from SEA compared to African strains has not been stated or P. falciparum suggested in the MalariaGen manuscript (Elife. 2016). In fact the authors of the MalariaGEN wrote that at the gene level "we found virtually identical distributions of the ratio of non-synonymous to synonymous mutations (N/S ratio) in the two regions", however, no statistical test was performed by the authors. Furthermore, no comparison of N/S at the sample level was performed in the MalariaGEN study. We have added these statements to the Discussion section of the revised manuscript.

Major clarifications:
More detailed results/discussion sections, and deeper discussion of previous literature. While the authors cite the MalariaGen publication (citation 18) they should more explicitly describe the relationship between that publication and this analysis. Specifically, highlighting Figure 5 of citation 18 and its supplemental material, which shows a similar genome-wide analysis of N/S ratio between Africa/SEA. It would be good to directly acknowledge this article/figure as the first description of this African/SEA pattern. Furthermore, it would be constructive for the reader if the authors added a deeper justification for their extension of this work.
The aim of Figure 5 of the MalariaGen manuscript (Elife. 2016) was to test Author response: whether neutral evolution could account for the pattern of mutations in . Figure 5a plots kelch13 genic N/S vs. protein conservation in Africa and SEA. The Figure shows that gene is an kelch13 outlier in the scatter-plot for SEA, but follows a general trend in Africa. Figure 5b plots the ratio of N/S ratios in SEA and Africa, for all genes (with ≥5 synonymous and ≥5 non-synonymous SNPs) and shows again that is an outlier. Figure 5-Supplementary Figure 1 is same as Figure  kelch13 5a, but also highlights other drug resistance genes. protein conservation in Africa and SEA and shows that k follows the normal trend among elch13 genes in Africa, but has far fewer synonymous SNPs than expected in SEA.
Thus the authors conclude that high prevalence of non-synonymous SNPs in in SEA is not kelch13 explainable by neutral evolution, but is consistent with neutral evolution in Africa. We would like to F1000Research South-East Asia compared to Africa and propose that this could explain the emergence of resistance in Asia. However this observation (and also the higher multiplicity of infections in Africa) is not novel.
Furthermore the large number of K13 alleles, their dynamics and relative contributions to the clinical phenotype are likely to also be important. I think they could improve by showing that their analysis helps to understand the spread of a specific allele of K13 such as the C580Y.
It would be useful if the authors could provide more explanations as to why they believe their findings are innovative and how they help us to understand the evolution of artemisinin resistance, since the mechanism is clearly different to that of other antimalarials.
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
No competing interests were disclosed.

Competing Interests:
Author Response 17 Oct 2016 , International Centre for Genetic Engineering and Biotechnology, New Delhi, India

Gajinder Singh
Thank you for your review and comments.
In this paper the authors analyze the data produced by the MalariaGen community that provides high quality data on over 900,000 SNP in more than 3000 samples from 22 countries. The authors report a significantly higher ratio of non-synonymous to synonymous polymorphisms in P. falciparum from South-East Asia compared to Africa and propose that this could explain the emergence of resistance in Asia. However this observation (and also the higher multiplicity of infections in Africa) is not novel.
We are not aware of any manuscript that reported higher N/S ratio of Author response: P. of SEA strains of compared to African strains. If the reviewer is referring to falciparum P. falciparum the MalariaGen study (Elife. 2016), the authors in that manuscript did not report or suggest these results. In fact the authors of the MalariaGEN wrote that at the gene level "we found virtually identical distributions of the ratio of non-synonymous to synonymous mutations (N/S ratio) in the two regions", however, no statistical test was performed by the authors. Furthermore, no comparison of N/S at the sample level was performed in the MalariaGEN study. We have added these statements to the Discussion section of the revised manuscript.
We agree with the reviewer that the general conclusion of a higher multiplicity of infection in Africa is not novel, indeed we wrote "The rate of mixed infection is generally lower in areas of low-transmission such as SEA " before we reported our results. However we have confirmed the higher mixed infection rate in Africa compared to SEA in the MalariaGEN dataset, which is the largest whole-genome dataset on till date. We have added these statements to the P. falciparum Discussion section of the revised manuscript.
Furthermore the large number of K13 alleles, their dynamics and relative contributions to the clinical phenotype are likely to also be important. I think they could improve by showing that their analysis helps to understand the spread of a specific allele of K13 such as the C580Y.
Our manuscript is an attempt to understand why in general anti-malarial Author response: 20