ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article
Revised

Prediction of the effects of the top 10 synonymous mutations from 26645 SARS-CoV-2 genomes

[version 3; peer review: 1 approved, 2 approved with reservations, 3 not approved]
Previously titled: Prediction of the Effects of Synonymous Variants on SARS-CoV-2 Genome
PUBLISHED 29 Feb 2024
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Research Synergy Foundation gateway.

This article is included in the Emerging Diseases and Outbreaks gateway.

This article is included in the Coronavirus (COVID-19) collection.

Abstract

Background

The emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) had led to a global pandemic since December 2019. SARS-CoV-2 is a single-stranded RNA virus, which mutates at a higher rate. Multiple works had been done to study nonsynonymous mutations, which change protein sequences. However, there is little study on the effects of SARS-CoV-2 synonymous mutations, which may affect viral fitness. This study aims to predict the effect of synonymous mutations on the SARS-CoV-2 genome.

Methods

A total of 26645 SARS-CoV-2 genomic sequences retrieved from Global Initiative on Sharing all Influenza Data (GISAID) database were aligned using MAFFT. Then, the mutations and their respective frequency were identified. Multiple RNA secondary structures prediction tools, namely RNAfold, IPknot++ and MXfold2 were applied to predict the effect of the mutations on RNA secondary structure and their base pair probabilities was estimated using MutaRNA. Relative synonymous codon usage (RSCU) analysis was also performed to measure the codon usage bias (CUB) of SARS-CoV-2.

Results

A total of 150 synonymous mutations were identified. The synonymous mutation identified with the highest frequency is C3037U mutation in the nsp3 of ORF1a. Of these top 10 highest frequency synonymous mutations, C913U, C3037U, U16176C and C18877U mutants show pronounced changes between wild type and mutant in all 3 RNA secondary structure prediction tools, suggesting these mutations may have some biological impact on viral fitness. These four mutations show changes in base pair probabilities. All mutations except U16176C change the codon to a more preferred codon, which may result in higher translation efficiency.

Conclusion

Synonymous mutations in SARS-CoV-2 genome may affect RNA secondary structure, changing base pair probabilities and possibly resulting in a higher translation rate. However, lab experiments are required to validate the results obtained from prediction analysis.

Keywords

COVID-19, SARS-CoV-2, Synonymous mutations, RNA secondary structure

Revised Amendments from Version 2

The introduction section with relevant literatures has been added. The method section has been updated. Additional paragraph in the discussion section has been added to discuss the biological relevance of the findings. To make the comparison of RNA secondary structures generated by different prediction tools easier and more consistent, we rerun the analysis using RNAfold and IPknot++ tools with a Force Directed Graph Layout (FORNA) output, same as the outputs generated from MXfold2. We updated the part A, B and C in Figure 2-5. A new Extended Data 1 which compares RNA secondary structure of SARS-CoV-2 5’ UTR (1-480 nt) predicted using RNAfold without SHAPE data, RNAfold with SHAPE data, IPknot++ and MXfold2, has been added. The original Extended data 1-4 has been renamed to Extended data 2-5, respectively. The new Extended data 2 and 3 have been revised with RNAfold and IPknot++ FORNA graphs, respectively. Some citations have been removed, added or updated accordingly.

See the authors' detailed response to the review by Chandran Nithin
See the authors' detailed response to the review by Leyi Wang
See the authors' detailed response to the review by Takahiko Koyama
See the authors' detailed response to the review by Diego Forni

Introduction

In December 2019, coronavirus disease 2019 (COVID-19) cases first emerged from Wuhan, China1. Soon after, rapid spread of COVID-19 has resulted in a serious global outbreak. COVID-19 is an infectious and potentially lethal disease caused by a newly found coronavirus strain, known as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The virus causes clinical manifestation ranging from asymptomatic to severe pneumonia and in the worst scenario, death2. SARS-CoV-2 seems to have a higher transmission rate3 but lower mortality rate2 in comparison to Middle East respiratory coronavirus (MERS-CoV) and severe acute respiratory syndrome coronavirus (SARS-CoV).

SARS-CoV-2 is a single-stranded RNA virus with a genome size of 29,903 bases. In general, RNA viruses have a higher mutation rate than DNA viruses and this allows them to evolve rapidly, escaping the host immune defence response4. Different SARS-CoV-2 variants with multiple synonymous and nonsynonymous mutations have been reported since the beginning of the outbreak5. Some variants are classified as variants of concern (VOIs) since they are associated with the change in viral pathogenicity such as, higher disease severity, higher transmission rate, lower immunity response in the host as a consequence of the mutations6. However, it is expected that most of these mutations in SARS-CoV-2 genome are either neutral or mildly deleterious7. Numerous studies have been carried out to understand the molecular mechanisms of these nonsynonymous mutations on the functions of different SARS-CoV-2 proteins6. However, there are only a few studies on the synonymous mutations of SARS-CoV-2 genome8,9.

In this study, we focus on synonymous mutations instead of nonsynonymous mutations as researchers often overlook their biological importance. Synonymous mutations are also known as silent mutations because the nucleotide mutations result in a change in the RNA sequence without altering the amino acid sequence10. Synonymous mutations have been suggested to have no functional consequence on the fitness of organisms and their evolution in long term11. However, numerous recent studies had showed that synonymous mutations may affect the folding and stability of RNA structures12. Interestingly a large scale study of synonymous mutations in multiple yeast genes has shown that most of synonymous mutations are not neutral, affecting the fitness of the cell13. For RNA viruses, even though synonymous mutations generally do not change their pathogenicity directly, some studies reveal that synonymous mutations may affect the RNA secondary structure of the virus14 and also change the codon usage bias of the genes in the virus15,16. The use of mRNA-based COVID-19 vaccines reduce the severity of the disease. However, mRNA molecule is susceptible to the degradation due to the presence of 2’ OH group in the ribose. To improve the stability of mRNA vaccine, Zhang et al. (2023) designed a novel algorithm, which optimizes the codon usage and RNA secondary structure by using synonymous codon17.

Methods

Sequence retrieval

30,229 SARS-CoV-2 genomic sequences were downloaded from GISAID database (Global Initiative on Sharing All Influenza Data, RRID:SCR_018251)18 ranging from 31 December 2019 to 22 March 2021. SARS-CoV-2 genomic sequences were filtered by setting parameters to keep only sequences with complete genome and high coverage. The sequences were further filtered to remove those sequences with higher than 0.1% “N” unresolved nucleotides and ambiguous letters. A total of 3,584 sequences were removed by applying this filter. The reference sequence of SARS-CoV-2 genome (NC_045512.2)19 was retrieved in fasta format from NCBI database (NCBI, RRID:SCR_006472). It is a Wuhan isolate with a complete genome which comprises of 29,903 bases.

Multiple sequence alignment

The rapid calculation available in MAFFT online server (MAFFT, version 7.467, RRID:SCR_011811)20 was used to perform multiple sequence alignment (MSA) for 26,645 SARS-CoV-2 genomes. This option supports the alignment of more than 20,000 sequences with approximately 30,000 sites. The alignment length was kept, which means the insertions at the mutated sequences were removed, to keep the alignment length the same as the reference sequence. While other parameters were left as default.

Identification of mutations and their frequency in SARS-CoV-2 genomes

A simple Python script was written to identify the mutations in 26,645 SARS-CoV-2 genomes. To determine whether the identified mutations are synonymous or nonsynonymous, MEGA X software, version 10.2.5 build 10210330 (MEGA Software, RRID:SCR_000667)21 was utilized to perform the translation for inspection purposes. The presence of amino acid changes was identified by referring to the genomic position of the nucleotide mutations. Synonymous mutations with the top 10 highest frequencies were generated.

SARS-CoV-2 RNA secondary structure prediction

The RNA secondary structure of wild type and mutant sequences were predicted using RNAfold program, version 2.4.18 (Vienna RNA, RRID:SCR_008550)22 with the incorporation of SHAPE reactivity data obtained from the study done by Manfredonia et al. (2020)23. The RNA secondary structure prediction was performed using a sequence length of 250 nucleotides upstream and downstream of the mutation site. Other than RNAfold, another two programs which are IPknot++ version 2.2.1 (SCR_022557)24, and MXFold2 (SCR_022558)25 were also used to perform the RNA secondary structure prediction of SARS-CoV-2 wild type and mutants.

Base pair probability estimation

To predict how the mutations affect RNA local folding, base pair probability was estimated by utilizing MutaRNA, version 1.3.0 (MutaRNA, RRID:SCR_021723)26. MutaRNA is a web-based tool that allows prediction and visualization of the structure changes induced by a single nucleotide polymorphism (SNP) in an RNA sequence. It includes the base pair probabilities within RNA molecule of both wild type and mutant. The parameters used in MutaRNA were set as default except the window size was changed to 501nt.

Relative synonymous codon usage (RSCU)

Relative synonymous codon usage (RSCU) represents the ratio of the observed frequency of codons appearing in a gene to the expected frequency under equal codon usage. RSCU is calculated using the formula:

RSCUi=Xi1ni=1nXi,

where Xi implies the number of occurrences of codon i and n stands for the number of synonymous codons encoded for that particular amino acid.

Results and discussion

A synonymous mutation is a change in the nucleotide that does not cause any changes in the encoded amino acid. Synonymous mutations were previously considered to be less important, but they are now proven to have some effects on RNA folding, RNA stability, miRNA binding and translational efficiency27. Synonymous mutations may have significant effects on the adaptation, virulence, and evolution of RNA viruses28. Another study done also indicated that synonymous mutations have association with more than 50 human diseases such as hemophilia B, tuberculosis (TB), cystic fibrosis (CF), Alzheimer, schizophrenia, chronic hepatitis C and so on29. All these studies show that increasing importance has been associated with synonymous mutations over these years. Hence, it is necessary for us to study the effects of synonymous mutations of SARS-CoV-2 genome.

Identification of SARS-CoV-2 synonymous mutations

A total of 381 mutations were found in SARS-CoV-2 genomes by using python script, in which 150 of them are synonymous mutations. The distribution of synonymous mutations in 11 coding regions is shown in Figure 1. Among these mutations, ORF1a and ORF1b have a higher number of synonymous mutations at 76 and 33, respectively, which might be due to their longer sequence length. Besides that, our findings also show high C to U mutation rate in SARS-CoV-2 genome and this mutational skews are in line with the studies done by Rice et al. (2021) and Simmonds (2020)30,31. The high C to U mutation rate may be driven by host APOBEC-mediated RNA editing system and overexpression of APOBEC3 protein promotes viral replication and propagation in the human colon epithelial cell line32. These mutational skews are necessary to be considered when deducing the selection acting on synonymous variants in SARS-CoV-2 evolution9. Synonymous mutations are assumed subject to a lower selective pressure than nonsynonymous mutations, presumably the purifying selection force has stronger negative impact on the frequencies of nonsynonymous mutations. Interestingly there may be some selection force on synonymous mutations shown by a few studies, suggesting that these synonymous mutations are not random and neutral, may have some biological impact on viral fitness9,33,34.

d24466ee-473f-47eb-90a0-67e7e579465e_figure1.gif

Figure 1. Distribution of SARS-CoV-2 synonymous mutations in 11 coding regions.

The synonymous mutations in SARS-CoV-2 genomes with the top 10 highest frequency were listed in Table 1. As shown in Table 1, synonymous mutations with the highest frequency identified from SARS-CoV-2 genomes is C3037U mutation located in nsp3 of ORF1a, followed by C313U mutation in nsp1 of ORF1a and C9286U mutation in nsp4 of ORF1a. Mutations with higher frequency are mostly found in ORF1a and ORF1b. It is of great interest to find out the effect of these top 10 synonymous mutations on SARS-CoV-2 genome. However, it is important to take note that the high frequency of some mutations is not necessarily due to their positive effects. They may emerge during early stage of pandemic and are transmitted to all of their descendants, even though they have no or little effect on viral fitness35.

Table 1. SARS-CoV-2 synonymous mutations with the top 10 highest frequency.

ORFPositionNucleotide
Variation
Frequency
1ansp1313C > U8212
nsp2913C > U2820
nsp33037C > U24651
5986C > U2860
nsp49286C > U3880
1bnsp1214676C > U2859
15279C > U2842
16176U > C2838
nsp1418877C > U2603
M26735C > U2432

Similar to another companion paper, which focuses on the prediction analysis of nonsynonymous mutations of SARS-CoV-2 proteins36, the same SARS-CoV-2 virus genome data from GISAID database ranging from 1st January 20 to 22 March 21 were used in this study. The data collection time was overlapping with the period when the frequency of alpha variant reached the highest numbers around March–May 215. There are seven synonymous mutations identified as the defining mutations in the alpha variant, of which all except C241T are also reported in our study. Due to the rapid evolution of SARS-CoV-2 genome, it is beyond the scope of our study to keep track SARS-CoV-2 mutational profile and to predict the consequences of these mutations. Two independent studies reported that alpha or alpha-like SARS-CoV-2 variants are circulating among wild deer population in North America in late 202137,38. Although there is no reported case of viral spillback from deer to human transmission, we can’t simply rule out this possibility yet. Hence, our findings remain relevant despite of not using the latest genome dataset.

RNA secondary structure prediction and base pair probability estimation analysis

SARS-CoV-2 virus can form highly structured RNA elements, which may affect viral replication, discontinuous transcription and translation39,40. For example, SARS-CoV-2 forms a three-stemmed pseudoknot structure to promote programmed -1 ribosomal frameshifting to increase the synthesis of the proteins required for viral replication39,40. There are numerous high throughput studies on the characterization of RNA secondary structure of SARS-CoV-2 genome23,4144. In these recent high throughput studies, the RNA secondary structures of SARS-CoV-2 genome were determined experimentally using chemical probing methods, such as SHAPE-MaP23,41 or proximity ligation methods, such as RIC-seq43, COMRADES44. Although these data are very useful to determine the RNA secondary structures of SARS-CoV-2 virus, there is very little study on the effect of the synonymous mutations on RNA secondary structure, which may be beneficial or deleterious to the viral fitness. Therefore, we performed RNA secondary structure prediction and base pair probability estimation analysis of these top 10 highest frequency of synonymous mutations.

To improve the outcome of the study, multiple RNA secondary structure prediction tools, namely RNAfold with SHAPE reactivity data22, IPknot++24 and MXfold225 were applied in our study. In addition, MutaRNA analysis tool was used to estimate the base pair potential of the wild type and mutant sequences. RNAfold with SHAPE reactivity data uses thermodynamic approach to calculate the minimum free energy for the most probable RNA secondary structure by incorporating the nucleotide reactivity data derived from the experiments. If the reactivity value is high, the nucleotide is less likely to be paired, or vice versa. SARS-CoV-2 virus can form pseudoknot structures, which promote ribosomal frameshifting40. However, many RNA secondary structure prediction programmes don’t predict pseudoknot structure since the calculation is computationally demanding. IPknot++ is one of the few programmes, which can predict pseudoknot structure. MXfold2 predicts RNA secondary structure using deep learning method with a large amount of training dataset.

Although different tools may produce similar results for identical RNA sequences, it's important to note that there can be variations in prediction outputs due to differences in algorithms, parameter settings, inclusion of pseudoknot calculation, incorporation of experimental data and the assumptions of each tool. RNAfold, IPknot++ and MXfold2 apply the nearest neighbour model, using different thermodynamic parameters. In addition, MXfold2 implements deep learning models with max margin framework45. Multiple experimental genome-wide mapping of RNA secondary structures studies showed that 5’ UTR of SARS-CoV-2 RNA genome forms 7 conserved stem-loop structures, in some studies, 8 SLs, depending on the sequence length23,4144,46. To demonstrate the usability of prediction tools, we predicted RNA secondary structure of the sequence of 5’ UTR (1-480 nt) of SARS-CoV-2 (Extended data 1)47. The RNA secondary structures predicted by RNAfold with SHAPE data, IPknot++ and MXfold2 are similar, especially SL1, SL5-8 regions and they are comparable to most of the published experimental data23,4144,46. Both RNAfold with SHAPE data, and MXfold2 successfully predicted SL4, but IPknot++ predicted SL4 with pseudoknot structure, which has not been reported in other studies. Interestingly the result obtained from RNAfold without SHAPE data is quite different, possibly due to the missing experimental data. In addition, it has been shown that SARS-CoV-2 may adopt different RNA secondary structure conformations7,17,32,34,35,37. Our study is aimed to predict if the sSNP may affect RNA secondary structure and the outcomes allow us to prioritize variants for the experiment functional studies in the future. Using multiple prediction tools may help to increase the accuracy and reliability of the prediction result. The prediction results for all 10 synonymous mutations using these 3 tools and the base pair probability estimation results are summarized in the Table 2 (✓ - changes, × - no change). The results for all 10 synonymous mutations predicted with RNAfold, IPknot++ and MXfold2 are available in Extended data 2, 3 and 4, respectively47. The base pair probabilities for all 10 synonymous mutations are shown as circular plots in Extended data 547. The darker the edge is, the more likely the two connected bases to form base pair. Of these 10 synonymous mutations, four mutants which are all located in ORF1ab, namely C913U, C3037U, U16176C and C18877U mutants show pronounced changes between wild types and mutants in all 3 prediction tools, suggesting these synonymous mutations may have some biological impact on viral fitness. Having say that, it is also possible that other mutants with only one or two changes predicted by these analyses, may also affect RNA secondary structures, having some impact on viral fitness. It has been shown that SARS-CoV-2 virus can form elaborated RNA secondary structures at 5’ and 3’UTRs, and frameshifting element (FSE), located between the boundary of ORF1a and ORF1ab7,17,32,34,35,37. The 5’ UTR of SARS-CoV-2 is important for viral mRNA stability48 and protein translation49 while the 3’ UTR may be involved in viral proliferation in the host cell50. The FSE can form pseudoknot structures, which regulate the relative protein expression of ORF1a and ORF1ab during viral infection39,40.

Table 2. Summary of RNA secondary structure prediction and base pair probability estimation analysis of SARS-CoV-2 synonymous mutations.

RNAfold (SHAPE)IPknot++MXFold2MutaRNA
C313U××××
C913U
C3037U
C5986U×
C9286U××
C14676U×
C15279U×××
U16176C
C18877U
C26735U×

Other than 5’ and 3’ UTRs, Huston et al. (2021) found that ORF1ab region forms extensive RNA secondary structure network41. Coincidentally all four mutations, C913U, C3037U, U16176C and C18877U reported in our study are located within ORF1ab. C913U mutation is found in the Nsp2, near the start codon (position 806) in ORF1a in SARS-CoV-2 genome. As shown in Figure 2, the wild type structures predicted by RNAfold and MXfold2 shares some degree of similarity around position 95-330 of 501 base long structure. C913U mutation has a pronounced effect on RNA secondary structure predicted by RNAfold. C913U mutation results in the appearance or disappearance of multiple loops, not only at the nearby mutated residue, but also at the sites further apart, suggesting this mutation may affect its long-range RNA interaction. While MXfold2 predicts that U913 mutant forms a shorter stem and a larger hairpin loop compared to C913 wild type. However, the structure predicted by IPknot++ is quite different from others, in which, C913U results in change of pseudoknot structure. Figure 2D shows that the base pair interactions of wild type RNA are changed substantially by U913 mutation. Previously it has been shown that Nsp2 protein suppresses host immune response by inhibiting the mRNA translation of interferon gene51. Although C913U mutation does not alter the amino acid residue of Nsp2 protein, it may be worthwhile to see if this C913U mutation plays a direct or indirect role in host immune response through Nsp2 protein. Since C913U is near to Nsp1 and Nsp2 protein boundaries, the altered RNA secondary structure may affect ribosome stalling, which, in turn, affect folding of nascent polypeptides and translation initiation. In addition, Nsp1 protein facilitates viral propagation by inhibiting host protein translation machinery52 and promoting host mRNA degradation53. It will be interesting to investigate if the C913U mutation affects these functions.

d24466ee-473f-47eb-90a0-67e7e579465e_figure2.gif

Figure 2. The effect of C913U mutation on RNA secondary structure of nsp2 in ORF1a.

(A) RNA secondary structure of C913 wild type and U913 mutant predicted using RNAfold. (B) RNA secondary structure of C913 wild type and U913 mutant predicted using IPknot++. (C) RNA secondary structure of C913 wild type and U913 mutant predicted using MXfold2. (D) MutaRNA circular plots of base pairing probabilities of C913 wild type and U913 mutant. The black arrow indicates the position of WT and mutated nucleotides while the red arrow indicates the starting position of the query sequence.

C3037U mutation is found in the Nsp3 in ORF1a. As shown in Figure 3, both IPknot++ and MXfold2 predict that U3037 mutant forms longer stem and smaller internal loop compared to wild type. On the contrary, RNAfold predicts that a small internal loop fuses into a bigger internal loop in U3037 mutant. MutaRNA circular plot shows that there is some minor difference in base pair probabilities between C3037 wild type and U3037 mutant. Nsp3 is a papain-like protease, which hydrolyzes several Nsp proteins, involved in viral replication54. Hence, we should investigate the effect of this mutation on its cleavage activity, probably through the change in transcription or translation level of Nsp3.

d24466ee-473f-47eb-90a0-67e7e579465e_figure3.gif

Figure 3. The effect of C3037U mutation on RNA secondary structure of nsp3 in ORF1a.

(A) RNA secondary structure of C3037 wild type and U3037 mutant predicted using RNAfold. (B) RNA secondary structure of C3037 wild type and U3037 mutant predicted using IPknot++. (C) RNA secondary structure of C3037 wild type and U3037 mutant predicted using MXfold2. (D) MutaRNA circular plots of the base pairing probabilities of C3037 wild type and U3037 mutant. The black arrow indicates the position of WT and mutated nucleotides while the red arrow indicates the starting position of the query sequence.

U16176C mutation is located in the Nsp12, close to the boundary of Nsp12 and Nsp13 genes in ORF1b. As shown in Figure 4, U16176C mutation results in a drastic change in RNA secondary structure predicted using RNAfold. IPknot++ predicts C16176 mutant forms new pseudoknot structures, which are absent in wild type U16176. On the other hand, MXfold2 predicts C16176 mutant forms a larger multi-branched loop and a shorter stem compared to wild type. Similarly, MutaRNA result shows C16176 mutant affects base pair potential at multiple sites. Nsp12 is one of the subunits of RNA-dependent RNA polymerase (RdRp), which is required for RNA synthesis55. A study showed that a 1.4-kb-long SARS-CoV-2 RNA sequence (residues 15071-16451) located in the Nsp12 and Nsp13 regions is required to facilitate viral RNA packaging56. Since U16176C mutation may affect RNA secondary structure, it will be interesting to see if it affects viral RNA packaging. U16176C together with C14676U and C15279U have very similar number of frequencies as shown in Table 1. Interestingly IPknot++ predicted all of them result in changes in pseudoknot structure as shown in Extended data 3. We speculated that these three sSNPs may be functionally related. These mutations are located downstream of the frameshifting element (residues 13405–13488) and this element forms a pseudoknot to promote ribosomal frameshifting during viral replication57. It has been demonstrated that synonymous mutations affect both RNA secondary structure of the ribosomal frameshift signal and frameshifting efficiency in SARS-CoV virus58. Another study had shown that this ribosomal frameshifting structure in SARS-CoV-2 virus involves long-range sequence interaction of 1.5 kb44. It remains to be seen whether the long-range sequence interaction for ribosomal frameshifting can go beyond 1.5kb long.

d24466ee-473f-47eb-90a0-67e7e579465e_figure4.gif

Figure 4. The effect of U16176C mutation on RNA secondary structure of nsp12 in ORF1b.

(A) RNA secondary structure of U16176 wild type and C16176 mutant predicted using RNAfold. (B) RNA secondary structure of U16176 wild type and C16176 mutant predicted using IPknot++. (C) RNA secondary structure of U16176 wild type and C16176 mutant predicted using MXfold2. (D) MutaRNA circular plots of the base pairing probabilities of U16176 wild type and C16176 mutant. The black arrow indicates the position of WT and mutated nucleotides while the red arrow indicates the starting position of the query sequence.

C18877U mutation is located in Nsp14 in ORF1b. As shown in Figure 5, an additional internal loop is formed in U18877 mutant predicted by RNAfold. IPknot++ predicts U18877 mutant forms extra internal loops and longer hairpin near the mutated residue and it also affects the pseudoknot structure at 2 different sites further from the mutated residue. While MXfold2 predicts U18877 mutant forms one hairpin with multiple loops instead of one hairpin as seen in wild type. The changes at multiple base pairing sites due to the U18877 mutation is also observed in MutaRNA circular plot. Nsp14 is important to maintain high fidelity during viral RNA synthesis59.

d24466ee-473f-47eb-90a0-67e7e579465e_figure5.gif

Figure 5. The effect of C18877U mutation on RNA secondary structure of nsp14 in ORF1b.

(A) RNA secondary structure of C18877 wild type and U18877 mutant predicted using RNAfold. (B) RNA secondary structure of C18877 wild type and U18877 mutant predicted using IPknot++. (C) RNA secondary structure of C18877 wild type and U18877 mutant predicted using MXfold2. (D) MutaRNA circular plots of the base pairing probabilities of C18877 wild type and U18877 mutant. The black arrow indicates the position of WT and mutated nucleotides while the red arrow indicates the starting position of the query sequence.

RSCU analysis of SARS-CoV-2

Codon usage bias (CUB), which is non-random usage of synonymous codons, is common in all species. It is a phenomenon where some codons are preferred over others for a specific amino acid. SARS-CoV-2 replicates using host cell’s machinery and synthesizes its protein by utilizing host cellular components. Hence, codon usage bias may affect the replication of viruses60.

Relative synonymous codon usage (RSCU) is a widely used statistical approach61 that can be used to measure codon usage bias in coding sequences. The RSCU values of SARS-CoV-2 are shown in Table 3 and the most preferred codons for each amino acid are marked in bold. Stop codons (UAA, UAG, UGA) and codons which code for an amino acid uniquely (AUG, UGG) are excluded from RSCU analysis.

Table 3. RSCU values of SARS-CoV-2 genome.

Amino AcidSynonymous CodonsRSCU
AlaGCA1.09
GCC0.58
GCG0.16
GCU2.17
ArgAGA2.67
AGG0.81
CGA0.29
CGC0.58
CGG0.19
CGU1.46
AsnAAC0.65
AAU1.35
AspGAC0.72
GAU1.28
CysUGC0.45
UGU1.55
GlnCAA1.39
CAG0.61
GluGAA1.44
GAG0.56
GlyGGA0.82
GGC0.71
GGG0.12
GGU2.34
HisCAC0.61
CAU1.39
IleAUA0.92
AUC0.56
AUU1.53
LeuCUA0.66
CUC0.59
CUG0.30
CUU1.75
UUA1.63
UUG1.06
LysAAA1.31
AAG0.69
PheUUC0.59
UUU1.41
ProCCA1.59
CCC0.29
CCG0.17
CCU1.94
SerAGC0.36
AGU1.43
UCA1.67
UCC0.46
UCG0.11
UCU1.96
ThrACA1.64
ACC0.38
ACG0.20
ACU1.78
TyrUAC0.78
UAU1.22
ValGUA0.91
GUC0.56
GUG0.58
GUU1.95

Based on the RSCU values, the synonymous codons can be classified into five groups: i) codons with RSCU value equals to 1.0 are unbiased codons; ii) codons with RSCU value > 1.0 are codons preferred in a genome; iii) codons with RSCU value < 1.0 are codons less preferred in a genome; iv) codons with RSCU value > 1.6 are codons which are over-represented in a genome; v) codons with RSCU value < 1.6 are codons which are under-represented in a genome60. There are 15 preferred codons (RSCU value > 1.0) and 11 over-represented codons (RSCU value > 1.6) in SARS-CoV-2 genome as shown in Table 3. The preferred codons in SARS-CoV-2 genome are GCA (Ala), CGU (Arg), AAU (Asn), GAU (Asp), UGU (Cys), CAA (Gln), GAA (Glu), CAU (His), AUU (Ile), UUG (Leu), AAA (Lys), UUU (Phe), CCA (Pro), AGU (Ser) and UAU (Tyr) while the over-represented codons are GCU (Ala), AGA (Arg), GGU (Gly), CUU (Leu), UUA (Leu), CCU (Pro), UCA (Ser), UCU (Ser), ACA (Thr), ACU (Thr), and GUU (Val). The presence of the preferred and over-presented codons in a genome increases the protein synthesis rate.

Table 4 shows the RSCU analysis of the top 10 synonymous mutations. The codons in bold in the ‘codon change’ column are the codons with higher RSCU value, which means they are more preferred in SARS-CoV-2 genome. Most of the mutations change the codon to a more preferred codon as shown in Table 4. Since it is presumed that preferred codons have a higher translation rate compared to nonpreferred codons62, it is possible that most of the mutations may increase the translation efficiency of SARS-CoV-2, which may affect virus replication, transmission, and evolution.

Table 4. RSCU analysis of the top 10 synonymous mutations of SARS-CoV-2 genome.

ORFMutationCodon Change
1ansp1C313UCUC -> CUU
nsp2C913UUCC -> UCU
nsp3C3037UUUC -> UUU
C5986UUUC -> UUU
nsp4C9286UAAC -> AAU
1bnsp12C14676UCCC -> CCU
C15279UCAC -> CAU
U16176CACU -> ACC
nsp14C18877UCUA -> UUA
MC26735UUAC -> UAU

Conclusions

The effects of SARS-CoV-2 synonymous mutations in various aspects such as RNA secondary structure and codon usage bias were studied, even though they do not cause changes in amino acid residue of the protein. C913U, C3037U, U16176C and C18877U mutants show pronounced changes between wild type and mutant predicted in all 3 RNA secondary structure prediction tools, suggesting these mutations may have some biological impact on viral fitness. In addition, these mutations showed changes in base pair potential estimated by MutaRNA. All mutations except U16176C change the codon to a more preferred codon, which may result in higher translation efficiency. Due to the shortcomings of prediction tools, experimental studies, such as protein translation assays, RNA packaging assays, are needed to give a more comprehensive understanding of the biological consequences of synonymous mutations on SARS-CoV-2 virus.

Ethics and dissemination

No ethical approval is required for data analysis in this study (EA2702021).

Comments on this article Comments (0)

Version 4
VERSION 4 PUBLISHED 18 Oct 2021
Comment
Author details Author details
Competing interests
Grant information
Article Versions (4)
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Boon WX, Sia BZ and Ng CH. Prediction of the effects of the top 10 synonymous mutations from 26645 SARS-CoV-2 genomes [version 3; peer review: 1 approved, 2 approved with reservations, 3 not approved]. F1000Research 2024, 10:1053 (https://doi.org/10.12688/f1000research.72896.3)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 3
VERSION 3
PUBLISHED 29 Feb 2024
Revised
Views
11
Cite
Reviewer Report 11 Sep 2024
Tamar Schlick, Department of Chemistry and Courant Institute of Mathematical Sciences, New York University, New York, New York, USA 
Approved with Reservations
VIEWS 11
In this work, synonymous mutations of the SARS-CoV-2 genome are explored, with the rationale that these mutations impact function through altered RNA folding, despite unaltered protein products. Specifically, the researchers find 150 synonymous mutations after performing multiple sequence alignment and ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Schlick T. Reviewer Report For: Prediction of the effects of the top 10 synonymous mutations from 26645 SARS-CoV-2 genomes [version 3; peer review: 1 approved, 2 approved with reservations, 3 not approved]. F1000Research 2024, 10:1053 (https://doi.org/10.5256/f1000research.162881.r304727)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
13
Cite
Reviewer Report 27 Jul 2024
Diego Forni, Scientific Institute IRCCS E Medea, Bosisio Parini, Italy 
Not Approved
VIEWS 13
In this manuscript, Dr Boon et al. analyzed the most common synonymous mutations of  early pandemic SARS-CoV-2 genomes. They identified substitutions that influence the RNA secondary structure surrounding these positions. The authors have already modified and updated their manuscript based ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Forni D. Reviewer Report For: Prediction of the effects of the top 10 synonymous mutations from 26645 SARS-CoV-2 genomes [version 3; peer review: 1 approved, 2 approved with reservations, 3 not approved]. F1000Research 2024, 10:1053 (https://doi.org/10.5256/f1000research.162881.r304726)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 18 Sep 2024
    Chong Han Ng, Faculty of Information Science and Technology, Multimedia University, Bukit Beruang, 75450, Malaysia
    18 Sep 2024
    Author Response
    In this manuscript, Dr Boon et al. analyzed the most common synonymous mutations of early pandemic SARS-CoV-2 genomes. They identified substitutions that influence the RNA secondary structure surrounding these positions. ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 18 Sep 2024
    Chong Han Ng, Faculty of Information Science and Technology, Multimedia University, Bukit Beruang, 75450, Malaysia
    18 Sep 2024
    Author Response
    In this manuscript, Dr Boon et al. analyzed the most common synonymous mutations of early pandemic SARS-CoV-2 genomes. They identified substitutions that influence the RNA secondary structure surrounding these positions. ... Continue reading
Version 2
VERSION 2
PUBLISHED 05 Sep 2022
Revised
Views
12
Cite
Reviewer Report 18 Oct 2023
Roland Huber, Bioinformatics Institute, A*STAR, Singapore 
Not Approved
VIEWS 12
I agree with previous reviewers that the analyzed sequences represent only a limited sample of variation in SARS-CoV-2, specifically from early in the pandemic. This might introduce unexpected biases in the analysis. E.g. it would be more likely to observe ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Huber R. Reviewer Report For: Prediction of the effects of the top 10 synonymous mutations from 26645 SARS-CoV-2 genomes [version 3; peer review: 1 approved, 2 approved with reservations, 3 not approved]. F1000Research 2024, 10:1053 (https://doi.org/10.5256/f1000research.137703.r208617)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
22
Cite
Reviewer Report 06 Sep 2022
Chandran Nithin, Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, Warsaw, Poland 
Approved
VIEWS 22
Authors have addressed the comments. However, the ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Nithin C. Reviewer Report For: Prediction of the effects of the top 10 synonymous mutations from 26645 SARS-CoV-2 genomes [version 3; peer review: 1 approved, 2 approved with reservations, 3 not approved]. F1000Research 2024, 10:1053 (https://doi.org/10.5256/f1000research.137703.r149467)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Version 1
VERSION 1
PUBLISHED 18 Oct 2021
Views
38
Cite
Reviewer Report 28 Apr 2022
Chandran Nithin, Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, Warsaw, Poland 
Not Approved
VIEWS 38
The manuscript titled "Prediction of the Effects of Synonymous Variants on SARS-CoV-2 Genome" analyzes 30,299 SARS-CoV-2 genomes retrieved from GISAID for the period between 31st December 2019 and 22nd March 2021. The study identifies 381 mutations in the genome, of ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Nithin C. Reviewer Report For: Prediction of the effects of the top 10 synonymous mutations from 26645 SARS-CoV-2 genomes [version 3; peer review: 1 approved, 2 approved with reservations, 3 not approved]. F1000Research 2024, 10:1053 (https://doi.org/10.5256/f1000research.76505.r135257)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 05 Sep 2022
    Chong Han Ng, Faculty of Information Science and Technology, Multimedia University, Bukit Beruang, 75450, Malaysia
    05 Sep 2022
    Author Response
    The manuscript titled "Prediction of the Effects of Synonymous Variants on SARS-CoV-2 Genome" analyzes 30,299 SARS-CoV-2 genomes retrieved from GISAID for the period between 31st December 2019 and 22nd March ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 05 Sep 2022
    Chong Han Ng, Faculty of Information Science and Technology, Multimedia University, Bukit Beruang, 75450, Malaysia
    05 Sep 2022
    Author Response
    The manuscript titled "Prediction of the Effects of Synonymous Variants on SARS-CoV-2 Genome" analyzes 30,299 SARS-CoV-2 genomes retrieved from GISAID for the period between 31st December 2019 and 22nd March ... Continue reading
Views
54
Cite
Reviewer Report 20 Dec 2021
Leyi Wang, Veterinary Diagnostic Laboratory and Department of Veterinary Clinical Medicine, College of Veterinary Medicine, University of Illinois, Urbana, IL, USA 
Approved with Reservations
VIEWS 54
The manuscript entitled “Prediction of the Effects of Synonymous Variants on SARS-CoV-2 Genome” reports an analysis of the SARS-CoV-2 genome obtained in the GISAID database. The authors analyzed synonymous changes of the 30,229 SARS-CoV-2 sequences available online, and the effects ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Wang L. Reviewer Report For: Prediction of the effects of the top 10 synonymous mutations from 26645 SARS-CoV-2 genomes [version 3; peer review: 1 approved, 2 approved with reservations, 3 not approved]. F1000Research 2024, 10:1053 (https://doi.org/10.5256/f1000research.76505.r101828)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 05 Sep 2022
    Chong Han Ng, Faculty of Information Science and Technology, Multimedia University, Bukit Beruang, 75450, Malaysia
    05 Sep 2022
    Author Response
    The manuscript entitled “Prediction of the Effects of Synonymous Variants on SARS-CoV-2 Genome” reports an analysis of the SARS-CoV-2 genome obtained in the GISAID database. The authors analyzed synonymous changes ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 05 Sep 2022
    Chong Han Ng, Faculty of Information Science and Technology, Multimedia University, Bukit Beruang, 75450, Malaysia
    05 Sep 2022
    Author Response
    The manuscript entitled “Prediction of the Effects of Synonymous Variants on SARS-CoV-2 Genome” reports an analysis of the SARS-CoV-2 genome obtained in the GISAID database. The authors analyzed synonymous changes ... Continue reading
Views
71
Cite
Reviewer Report 02 Nov 2021
Takahiko Koyama, IBM TJ Watson Research Center, Yorktown Heights, NY, USA 
Not Approved
VIEWS 71
Boon et al. made an investigation of synonymous mutations of SARS-CoV-2 genomes. Authors first identified common synonymous mutations such as 3037C>T followed by secondary structure predictions using RNAfold. 

First, the number of genomes authors used is too ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Koyama T. Reviewer Report For: Prediction of the effects of the top 10 synonymous mutations from 26645 SARS-CoV-2 genomes [version 3; peer review: 1 approved, 2 approved with reservations, 3 not approved]. F1000Research 2024, 10:1053 (https://doi.org/10.5256/f1000research.76505.r97299)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 08 Nov 2021
    Chong Han Ng, Faculty of Information Science and Technology, Multimedia University, Bukit Beruang, 75450, Malaysia
    08 Nov 2021
    Author Response
    Boon et al. made an investigation of synonymous mutations of SARS-CoV-2 genomes. Authors first identified common synonymous mutations such as 3037C>T followed by secondary structure predictions using RNAfold.

    First, ... Continue reading
  • Author Response 30 Nov 2022
    Chong Han Ng, Faculty of Information Science and Technology, Multimedia University, Bukit Beruang, 75450, Malaysia
    30 Nov 2022
    Author Response
    First, the number of genomes authors used is too small and too old. Currently, nearly 5 million genomes are uploaded at GISAID; however, authors used only 30,229. The set does ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 08 Nov 2021
    Chong Han Ng, Faculty of Information Science and Technology, Multimedia University, Bukit Beruang, 75450, Malaysia
    08 Nov 2021
    Author Response
    Boon et al. made an investigation of synonymous mutations of SARS-CoV-2 genomes. Authors first identified common synonymous mutations such as 3037C>T followed by secondary structure predictions using RNAfold.

    First, ... Continue reading
  • Author Response 30 Nov 2022
    Chong Han Ng, Faculty of Information Science and Technology, Multimedia University, Bukit Beruang, 75450, Malaysia
    30 Nov 2022
    Author Response
    First, the number of genomes authors used is too small and too old. Currently, nearly 5 million genomes are uploaded at GISAID; however, authors used only 30,229. The set does ... Continue reading

Comments on this article Comments (0)

Version 4
VERSION 4 PUBLISHED 18 Oct 2021
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.