Keywords
SARS-CoV-2 infection, D614G, epidemiology, SNP, Sanger sequencing, low-cost method, Quito-Ecuador
SARS-CoV-2 infection, D614G, epidemiology, SNP, Sanger sequencing, low-cost method, Quito-Ecuador
SARS-CoV-2 is the agent responsible for the COVID-19 pandemic that has caused almost 4 million deaths worldwide (World Health Organization (WHO) 2021). The genome of this virus is approximately 30-kb long (Bar-On et al. 2020). Two Open Reading Frames (ORFs) (1a and 1b), that cover most of its genome, encode non-structural proteins needed for viral replication as well as an RNA-dependent RNA polymerase (RdRp). Structural proteins, such as the Spike (S) protein, nucleocapsid (N) protein, membrane (M) protein, and envelope (E) protein are encoded by short sub-genomic RNAs (sgRNAs) (Kim et al. 2020). The S protein recognizes and binds to the angiotensin-converting enzyme 2 (ACE2) receptor to gain entry to hosts cells.
Evidently, mutations in the gene encoding the S protein alter the affinity to the ACE2 receptor enhances infectivity (Watanabe et al. 2020), and increased infection capacity. The aspartate (D) to glycine (G) substitution related to the non-synonymous A23403G SNP (single nucleotide polymorphism) has been associated with the rapid spread of the virus (Yuan et al. 2020).
The A23403G (D614G) mutation was first described in early 2020, and rapidly detected in various regions around the globe (Isabel et al. 2020; Raghav et al. 2020; Dao et al. 2021; Elizondo et al. 2021; Molina-Mora et al. 2021; Pandey et al. 2021). Studies related to this mutation have used, in the majority, whole genome sequences (WGS). However, this approach is too expensive to be used in massive testing campaigns, especially in countries with limited resources. Moreover, the number of cases studied by WGS may not necessarily be representative of the number of cases detected by quantitative Polymerase Chain Reaction (qPCR) at these locations.
As a result, several methodologies have been developed as affordable alternatives that provide rapid detection of not only relevant SNPs in the S protein, but also SNPs in variants of concern (Hashemi et al. 2020; Bezerra et al. 2021; Chakraborty et al. 2021; Vogels et al. 2021). Low-cost rapid technologies have been put forward to investigate the A23403G (D614G) mutation. These technologies rely on: Probe-based real-time reverse transcriptase PCR (rRT-PCR) (Chan et al. 2022), an amplification refractory mutation system (ARMS) (Islam et al. 2021), and restriction fragment length polymorphism (RFLP) (Niranji & Al-Jaf 2021). Arguably, it is urgent to establish and standardize as many molecular tools as possible in order to expand the screening capacity of SARS-CoV-2 mutations. Consequently, and as an effort to scale up molecular diagnosis, we have developed and tested a cost-effective method based on Sanger sequencing (Jørgensen et al. 2021; Park et al. 2021) to detect the A23403G (D614G) mutation.
This investigation was carried out in Quito, the capital city of Ecuador, during March and October of 2020. The first SARS-CoV-2 case detected in the country was registered in March 2020, and from that point the virus has been reported in all 24 provinces. Until the 7th of July 2021, the number of positive cases in Ecuador was 467,878 (Johns Hopkins University & Medicine 2021). However, to this date, there are no studies reporting the prevalence of this mutation in the city or in the country, despite its predominance in various regions (Loureiro et al. 2021; Pandey et al. 2021; Viedma et al. 2021).
In the present study, we evaluated 1319 samples collected during March and 5032 gathered during October of 2020 (all nasopharyngeal swabs) by Zurita & Zurita Laboratorios and Laboratorio Clínico Inmunolab from both symptomatic and asymptomatic patients (see Underlying data) (Sevillano 2022). The two laboratories carried out detection independently. These clinical laboratories do not perform quantification in routine samples; results were reported as positive or negative. Samples from the first time point (March 2020) were extracted using the High Pure Viral Nucleic Acid kit (Product No. 11858874001, Roche, Germany), and analyzed using the LightMix® Modular SARS and Wuhan CoV E-gene kits (Product No. 09155368001, Roche, Germany). Both genes were analyzed using a LightCycler® 480 Real-Time PCR. Samples collected in October 2020 were extracted using CommaPrep® RNA Extraction Columns (Product No. RP20, Biocomma, China), following the provided guidelines, and analyzed using STANDARD M nCoV Real-Time Detection kit (Product No. M-NCOV-01, SB Biosensor, Korea) on a Stratagene MX3005P Real-Time PCR System or Allplex™ SARS-CoV-2 Assay (Product No. RV10248X, Seegene Inc, Korea) on CFX96 Touch Real-Time PCR Detection System. Interpretation of results (positive/negative) were carried out based on the guidelines provided by the manufacturers. Both kits are currently approved by the ARCSA (National Agency for Health Regulation, Control and Surveillance of Ecuador) for their use in routine diagnostics.
To estimate the prevalence of the A23403G (D614G) mutation, the sample size was first calculated based on the prevalence of SARS-CoV-2 positive cases, 0.2 for March and October (see Underlying data) (Sevillano 2022). The following equation, with a 10% margin of error and a confidence level of 95%, was used for calculating such value:
According to equation (1), where z is the statistic corresponding to the level of confidence, P is the expected prevalence, and d is the precision (Pourhoseingholi et al. 2013). The estimated number was 61, although we analyzed a total of 96 samples per month. Selection of samples was carried out using a random number generator.
Initially, the sequence encoding for the SARS-CoV-2 S protein (Gene Bank accession number: MT252819.1) was utilized for designing primers using the Geneious Prime software Version 2019.2.3 (https://www.geneious.com/prime/). Primers flanked the region between nucleotide positions 23,301 and 23,511 of the of WH-Human 1 coronavirus (MN908947) reference genome. Primers were aligned to genomes reported in Latin American countries in order to identify potential catastrophic mutations within the annealing sites of the primers. Forward sequence (D614G_F2): 5’-GATGCTGTCCGTGATCCACA-3’; reverse sequence (D614G_R2): 5’-AAACAGCCTGCACGTGTTTG-3’. The resulting amplicon was expected to be 230bp long. In the A23403 (D614) variant, adenine is the primary nucleotide that encodes the amino acid aspartate at position 614 of the polypeptide. Glycine has been observed to replace this aspartate if guanine is present instead of adenine, this variant is known as G23403 (G614).
At the next step, PCR analyses were performed on the positive samples with the following conditions: 45 °C for 15 min, 95 °C for 2 min with 45 cycles of 95 °C for 10 sec and 60 °C for 50 sec, followed by the reading of dissociation curves at temperatures from 55 °C to 95 °C. The reaction used 10 μL GoTaq G2 Hot Start Colorless Master Mix (Product No. M7422, Promega, USA), 50 U of RocketScript RTase H Minus (Product No. BQ-042-101-04, Bioneer, Korea), 1 μL of EvaGreen 20x (Product No. 31000-31000-T, Biotium, USA), 0.75 μl of primers D614G_F2 and D614G_R2 10 mM, 10ng RNAse Inhibitor (Product No. RB0478, BioBasic, Canada) and template up to a final volume of 15 μL. Samples were analyzed on a Stratagene MX3005P Real-Time PCR system, using EvaGreen dye, and melting curves (dissociation curves) were analyzed using the MxPro application. This analysis permits the measurement of the temperature at which DNA strands separate into single strands, which provides a reference of the melting temperature at which 50% of DNA is denatured.
Finally, all samples that produced an amplicon with a melting temperature around 81 °C were sent to be sequenced commercially using Sanger technology (Macrogen, Korea). The resulting sequences were aligned to the reference WH-Human 1 coronavirus (MN908947) genome using the Geneious Prime software Version 2019.2.3 (https://www.geneious.com/prime/).
At the first sampling point, 20% of the samples (264 out of 1319) tested positive for SARS-CoV-2. Likewise, at a later stage, 15% of the samples (777 out of 5032) were infected by the virus (Table 1).
Periods of study | ||||
---|---|---|---|---|
March | October | |||
Counts | % | Counts | % | |
Positive | 264 | 20 | 777 | 15 |
Negative | 1055 | 80 | 4255 | 85 |
Total | 1319 | 100 | 5032 | 100 |
Per month, 96 samples were evaluated and, after sequencing, were assigned to their corresponding type (Table 2). From these samples 99% possessed a guanine at position 23403 (G614), while 1% contained an adenine at this position (D614). In March and October, the G23403 (G614) variant represented more than 98% of the samples (see Underlying data) (Sevillano 2022). However, the A23403 (D614) variant were registered only in the samples collected in March.
Variants | Periods of study | |||
---|---|---|---|---|
March 2020 | October 2020 | |||
Counts | % | Counts | % | |
A23403 (D614) | 2 | 2 | 0 | 0 |
G23403 (G614) | 93 | 97.89 | 96 | 100 |
Total | 95 | 100 | 96 | 100 |
The sequence of 230 bp (Figure 1A) was sufficient to effectively identify the A23403G (D614G) mutation (Figure 1B). Moreover, high resolution melting curve analysis (HRM) was helpful to confirm the amplicon identity before Sanger sequencing, as the Tm value of the expected amplicon ranged from 78 °C to 82 °C. However, this approach was not suitable for genotype differentiation since no marked differences were observed between the wildtype version and the A23403G (D614G) mutation (Figure 1C).
A) Adenine to guanine modification and its consequences for the S protein. Primers flanking the area of interest between nucleotide positions 23,301 and 23,511 of the WH-Human 1 coronavirus genome (MN908947), used as reference in the sequencing analysis. B) Segments of the electropherograms by Sanger sequencing indicating the A23403G mutation. C) Melting temperature analysis of selected samples for the missense mutation. All samples had a similar Tm of around 81 °C indicating the correct amplicon size, although not sufficient to discriminate between the G and D types.
Since its appearance in late 2019, SARS-CoV-2 has overwhelmed the entire global population with its geographical distribution and mutational rate. The virus has been constantly evolving into new variants with high genetic diversity. To study the G23403 (G614) variant, researchers commonly analyze whole-genome sequences, available in public databases e.g., GISAID (https://www.gisaid.org/), as well as sequences of their own.
WGS is the most reliable and utilized method, notwithstanding its cost and inadequacy to work with a large number of samples. WGS is high-priced to be employed as a method for massive testing, predominantly in low-income countries. Consequently, several techniques have been put forward as alternatives for an economic and rapid detection of not only spike mutations, but also variants of concern (Rhee et al. 2019; Hashemi et al. 2020; Bezerra et al. 2021; Chakraborty et al. 2021; Islam et al. 2021; Park et al. 2021; Vogels et al. 2021; Chan et al. 2022).
In this investigation, the efficacy of a low-cost method, based on sanger sequencing, to detect the A23403G (D614G) mutation was assessed. This approach proved to be affordable, user-friendly and consistent, which simplifies its implementation in common laboratory facilities worldwide; especially in regions with limited health systems such as Sub-Saharan Africa and Latin America (Kirby 2020; Okoi & Bwawa 2020).
The primers employed to produce the amplicon were straightforwardly designed and can be easily adjusted to detect novel variants without major protocol amendments. Arguably, other mutations of relevance could be studied in the same way. For instance, the evaluated technique might prove useful for detecting the N439K mutation, associated with an increased affinity to ACE2 and reduced sensitivity to SARS-CoV-2 antibodies (Thomson et al. 2021).
Other approaches have been proposed to study the A23403G (D614G) mutation. For example, one study successfully tested a combined point-of-care nucleic acid and antibody assay (Mlcochova et al. 2020). Similarly, another method effectively detected the mutation, although based on an engineered Cas12a guide RNA (Meng et al. 2021). Despite encouraging results, these techniques appear particularly laborious and expensive for laboratories with limited resources. Rapid and economic approaches have also been developed, which are based on (i) probe-based real-time reverse transcriptase PCR (qRT-PCR) (Chan et al. 2022), (ii) an amplification refractory mutation system (ARMS) RT (Islam et al. 2021) and (iii) restriction fragment length polymorphism (RFLP) (Niranji & Al-Jaf 2021), the latter method has proved particularly intricate to optimize as original results (Hashemi et al. 2020) had to be revised (Niranji & Al-Jaf 2021).
In the present research, a simple straightforward technique consisting of a simple PCR reaction and further sequencing was tested, thus reducing the steps required for sample processing. This study aimed to use melting curve analysis to identify the screened mutation. This approach has been utilized successfully to detect different viruses (Liu et al. 2018) and recently applied to detect SARS-CoV-2 mutations (Barua et al. 2021; Sarkar et al. 2021).
Unfortunately, the melting curves obtained during amplification were not suitable to discriminate between variants. However, they did provide key information about the size of the amplicon, which is fundamental for sequencing the right samples. Clearly, this step must be optimized. Another limitation is that Sanger sequencing could be regarded as time consuming, although it brings several benefits as it is useful to detect developing mutations and even to differentiate variants of concern (Bezerra et al. 2021).
Data from the Emergency Operations Center of Pichincha (COE-Pichincha) showed that around 579 positive cases were registered in Quito by March 2020. In October 2020, the official number reported was 15,890 (CEO Provincial Pichincha 2021). However, no reports have been published with regards to the prevalence of the A23403G (D614G) mutation in Quito. In this investigation determined that 190 out of 192 samples (99%) belong to the G23403 (G614) variant, which was dominant at both time points. Other studies have similarly reported a predominance of this SNP in different regions (Loureiro et al. 2021; Pandey et al. 2021; Viedma et al. 2021); these outcomes show the rapid spread of this variant which might relate to a higher transmission rate, as suggested by certain authors (Arora et al. 2021; Ozono et al. 2021).
As aforementioned, in Ecuador, there are no studies reporting the prevalence of this missense mutation. Hence, to estimate its prevalence in the months studied, this research referred to GISAID (https://www.gisaid.org/) to retrieve WGS and metadata of SARS-CoV-2 from the period of study. Our in-silico study revealed that from the total of sequences registered in March, five belonged to the G23403 (G614) variant and three to A23403 (D614) variant. Likewise, in October, G23403 (G614) was predominant with four sequences out of five being this type (Table 3). Evidently, a major limitation of studying datasets is that they represent a small fraction of the cases detected by RT-PCR, especially in the periods of study.
Variant | Sampling period | |||
---|---|---|---|---|
March 2020 | October 2020 | |||
Counts | % | Counts | % | |
A23403 (D614) | 3 | 38 | 1 | 20 |
G23403 (G614) | 5 | 62 | 4 | 80 |
Total | 8 | 100 | 5 | 100 |
The results of this study corroborate what can be inferred from WGS analysis, the G23403 (G614) proved to be the most common in samples from Quito during March and October of 2020. Arguably therefore, using this affordable method permits us to expand the information drawn from databases.
This scheme was not intended as a replacement of WGS or other PCR-based approaches. Instead, it was devised as a supplementary technique that could be used, in combination with the aforementioned methods, to rapidly generate ample data on the prevalence and epidemiology of relevant SNPs around the globe, mainly in regions with limited facilities and equipment for detection.
The importance of the present approach lies in the speed and easiness with which results can be obtained. This simple technique might help a large group of laboratories to report results quicker and more abundantly, which appears fundamental in the context of public health management during the ongoing pandemic.
The present outcomes demonstrate that the evaluated methodology produced reliable and reproducible information with regards to the A23403G (D614G) mutation. Using this approach, we showed that the G23403 (G614) variant was the most common, during the months of March and October of 2020, among samples from Quito, the capital city of Ecuador. This information contributes to expanding the results obtained by WGS, which will undoubtedly benefit public health decisions during novel outbreaks. Future work must place emphasis on testing a larger number of samples to accurately estimate the prevalence of this mutation in the area, and to assess its relationship with the transmissibility and severity of the disease.
Repository: A Sanger sequencing-based method for a rapid and economic generation of SARS-CoV-2 epidemiological data: A proof of concept study to assess the prevalence of the A23403G SNP (D614G) mutation in Quito, Ecuador.
DOI: 10.17605/OSF.IO/PMVGJ (Sevillano 2022).
This project contains the following underlying data:
Data file 1. March 2020 Positive SARS-CoV-2 Tests
Data file 2. October 2020 Positive SARS-CoV-2 Tests
Data file 3. Samples distribution by period
Data file 4. March SARS-CoV-2 Sanger sequencing results of samples
Data file 5. October SARS-CoV-2 Sanger sequencing results of samples
Data are available under the terms of the Creative Commons Zero “No rights reserved” data waiver (CC0 1.0 Public domain dedication).
The ethical Committee of the General Coordination for Strategic Development in Health, as stipulated by the Ecuadorian Ministry of Public Health, approved the research described herein (Reference number: 074-2020). Data were anonymized for the purpose of protecting private information.
P.L-G.: methodology, editing, funding acquisition, methodology first draft; G.L-M.: conceptualization, editing, first draft; D.O-P.: conceptualization, methodology, supervision, final draft preparation, writing, editing; M.L-A.: conceptualization, final draft preparation, writing, editing; G.M-C.: methodology, editing, first draft, P.G-A.: methodology; G.S.: methodology, first draft; J.Z. and C.Z-S.: methodology, editing, funding acquisition.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the work clearly and accurately presented and does it cite the current literature?
No
Is the study design appropriate and is the work technically sound?
Partly
Are sufficient details of methods and analysis provided to allow replication by others?
Yes
If applicable, is the statistical analysis and its interpretation appropriate?
Not applicable
Are all the source data underlying the results available to ensure full reproducibility?
No source data required
Are the conclusions drawn adequately supported by the results?
Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Virology, Molecular Biology, Sanger Sequencing, Next generation sequencing, PCR, Loop-Lamp amplification, Measles/Rubella Viruses, Polioviruses, Rotaviruses, SARS-CoV-2, Hepatites viruses (HAV, HEV, HBV and (HCV)
Is the work clearly and accurately presented and does it cite the current literature?
Partly
Is the study design appropriate and is the work technically sound?
Yes
Are sufficient details of methods and analysis provided to allow replication by others?
Partly
If applicable, is the statistical analysis and its interpretation appropriate?
Yes
Are all the source data underlying the results available to ensure full reproducibility?
Partly
Are the conclusions drawn adequately supported by the results?
Partly
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Molecular laboratory diagnosis
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | ||
---|---|---|
1 | 2 | |
Version 1 31 Mar 22 |
read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)