<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.72904.1</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Research Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>Prediction of the Effects of Nonsynonymous Variants on SARS-CoV-2 Proteins</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 1; peer review: 2 approved with reservations]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Sia</surname>
                        <given-names>Boon Zhan</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Boon</surname>
                        <given-names>Wan Xin</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-0424-5243</uri>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Yap</surname>
                        <given-names>Yoke Yee</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Kumar</surname>
                        <given-names>Shalini</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Ng</surname>
                        <given-names>Chong Han</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Funding Acquisition</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-2926-9831</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>Faculty of Information Science and Technology, Multimedia University, Bukit Beruang, Melaka, 75450, Malaysia</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:chng@mmu.edu.my">chng@mmu.edu.my</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>6</day>
                <month>1</month>
                <year>2022</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2022</year>
            </pub-date>
            <volume>11</volume>
            <elocation-id>9</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>21</day>
                    <month>12</month>
                    <year>2021</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2022 Sia BZ et al.</copyright-statement>
                <copyright-year>2022</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/11-9/pdf"/>
            <abstract>
                <p>
                    <bold>Background:</bold> SARS-CoV-2 virus is a highly transmissible pathogen that causes COVID-19. The outbreak originated in Wuhan, China in December 2019. A number of nonsynonymous mutations located at different SARS-CoV-2 proteins have been reported by multiple studies. However, there are limited computational studies on the biological impacts of these mutations on the structure and function of the proteins.</p>
                <p>
                    <bold>Methods</bold>: In our study nonsynonymous mutations of the SARS-CoV-2 genome and their frequencies were identified from 30,229 sequences. Subsequently, the effects of the top 10 nonsynonymous mutations of different SARS-CoV-2 proteins were analyzed using bioinformatics tools including co-mutation analysis, prediction of the protein structure stability and flexibility analysis, and prediction of the protein functions.</p>
                <p>
                    <bold>Results:</bold> A total of 231 nonsynonymous mutations were identified from 30,229 SARS-CoV-2 genome sequences. The top 10 nonsynonymous mutations affecting nine amino acid residues were ORF1a nsp5 P108S, ORF1b nsp12 P323L and A423V, S protein N501Y and D614G, ORF3a Q57H, N protein P151L, R203K and G204R. Many nonsynonymous mutations showed a high concurrence ratio, suggesting these mutations may evolve together and interact functionally. Our result showed that ORF1a nsp5 P108S, ORF3a Q57H and N protein P151L mutations may be deleterious to the function of SARS-CoV-2 proteins. In addition, ORF1a nsp5 P108S and S protein D614G may destabilize the protein structures while S protein D614G may have a more open conformation compared to the wild type.</p>
                <p>
                    <bold>Conclusion:</bold> The biological consequences of these nonsynonymous mutations of SARS-CoV-2 proteins should be further validated by in vivo and in vitro experimental studies in the future.</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>SARS-CoV-2</kwd>
                <kwd>nonsynonymous mutation</kwd>
                <kwd>co-mutation</kwd>
                <kwd>COVID-19</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1" xlink:href="http://dx.doi.org/10.13039/100012024">
                    <funding-source>Multimedia University</funding-source>
                    <award-id>IRFund2.0(65805)</award-id>
                </award-group>
                <funding-statement>This research is supported by Multimedia University, Malaysia, IRFund 2.0 (grant number MMUI/210119 awarded to Chong Han, Ng). The funder has no role in study design, data analysis, decision to publish or manuscript preparation.</funding-statement>
                <funding-statement>
                    <italic>The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</italic>
                </funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        <sec id="sec1" sec-type="intro">
            <title>Introduction</title>
            <p>A new coronavirus disease known as COVID-19, caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), was first reported in Wuhan, China in December 2019.
                <sup>
                    <xref ref-type="bibr" rid="ref1">1</xref>
                </sup> SARS-CoV-2 is a positive-sense single stranded RNA virus with a helical nucleocapsid. The genome size of SARS-CoV-2 is about 30 kilobases. There are 11 protein-coding genes from the SARS-CoV-2 genome including four structural genes (spike (S), envelope (E), membrane (M), and nucleocapsid (N) genes) and seven nonstructural genes (ORF1ab, ORF3a, ORF6, ORF7a, ORF7b, ORF8 and ORF10).
                <sup>
                    <xref ref-type="bibr" rid="ref2">2</xref>
                </sup>
            </p>
            <p>The SARS-CoV-2 virus can rapidly mutate to bypass the immune response of the host.
                <sup>
                    <xref ref-type="bibr" rid="ref3">3</xref>
                </sup> These mutations can be synonymous, nonsynonymous, deletions, insertions, or others. Nonsynonymous mutations are expected to have a greater impact than synonymous mutations since nonsynonymous mutations affect the amino acid sequences of a protein, subsequently they may change their structures and functions. According to Kim et al. (2020), a total of 767 synonymous and 1352 nonsynonymous mutations have been identified from SARS-CoV-2 genomes.
                <sup>
                    <xref ref-type="bibr" rid="ref4">4</xref>
                </sup> In another study, a total of 119 SNPs were identified using 11,183 SARS-CoV-2 genomes, in which there were 74 nonsynonymous mutations and 43 synonymous mutations.
                <sup>
                    <xref ref-type="bibr" rid="ref5">5</xref>
                </sup> From a study on the analysis of nonsynonymous mutations in structural proteins of SARS-CoV-2, it has been shown that S and N proteins have higher mutation rate per gene compared to that of E and M proteins.
                <sup>
                    <xref ref-type="bibr" rid="ref6">6</xref>
                </sup> However, the biological consequences of these mutations on the functions and structures of SARS-CoV-2 proteins remain unclear. In our study, computational analysis of the nonsynonymous mutations of SARS-CoV-2 proteins were performed using different bioinformatics tools including co-mutation analysis, protein structure stability and flexibility analysis, and protein function analysis to predict the effects of the mutations on the structures and functions of proteins.</p>
        </sec>
        <sec id="sec2" sec-type="methods">
            <title>Methods</title>
            <sec id="sec3">
                <title>Sequences and structures retrieval</title>
                <p>The SARS-CoV-2 genomes data were downloaded from 
                    <ext-link ext-link-type="uri" xlink:href="https://www.gisaid.org/">GISAID</ext-link> database (Global Initiative on Sharing All Influenza Data, RRID:SCR_018251).
                    <sup>
                        <xref ref-type="bibr" rid="ref7">7</xref>
                    </sup> In this study, a total number of 30,229 SARS-CoV-2 virus genomes data with collection dates ranging from 2020-01-01 to 2021-03-21 were retrieved. To make sure that only high-quality sequences were used, the filters including complete genome, high coverage and patient status, excluding low coverage were applied. The reference strain 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/nuccore/1798174254">NC_045512.2</ext-link> with a total number of 29903 bases was retrieved from 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/">NCBI</ext-link> database (NCBI, RRID:SCR_006472). The wild type protein structures obtained from 
                    <ext-link ext-link-type="uri" xlink:href="https://www.rcsb.org/">RCSB PDB</ext-link> (Research Collaboratory for Structural Bioinformatics Protein Data Bank, RRID:SCR_012820) are listed in 
                    <xref ref-type="table" rid="T1">Table 1</xref>.
                    <sup>
                        <xref ref-type="bibr" rid="ref8">8</xref>
                    </sup> Since N protein R203 and G204 are located at a disordered region which does not have a well-defined three-dimensional structure, no experimental structural data was available for the prediction analysis. A predicted model of N protein model (
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/protein/1798172432">QHD43423</ext-link>, estimate TM-score = 0.97) generated with 
                    <ext-link ext-link-type="uri" xlink:href="https://zhanggroup.org/COVID-19/">D-I-TASSER/C-I-TASSER pipeline</ext-link> was used.
                    <sup>
                        <xref ref-type="bibr" rid="ref9">9</xref>
                    </sup>
                </p>
                <table-wrap id="T1" orientation="portrait" position="float">
                    <label>Table 1. </label>
                    <caption>
                        <title>SARS-CoV-2 protein structures used in this study.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Protein</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Nucleotide changes</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Amino acid changes</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Template structure (PDB ID)</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">ORF1a nsp5</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">C10376T</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P108S</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7KPH</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="2" valign="top">ORF1b nsp12</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">C14408T</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P323L</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">6YYT</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">C14708T</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">A423V</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">6YYT</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="2" valign="top">S</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">A23063T</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">N501Y</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7A92</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">A23403G</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">D614G</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">7A92</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">ORF3a</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">G25563T</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Q57H</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">6XDC</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="4" valign="top">N</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">C28725T</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P151L</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">6VYO</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">G28881A</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">R203K</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">QHD43423</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">G28882A</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">R203K</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">QHD43423</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">G28883C</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">G204R</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">QHD43423</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
            </sec>
            <sec id="sec4">
                <title>Multiple sequence alignment of SARS-CoV-2 genomes</title>
                <p>Multiple sequence alignment (MSA) was performed using rapid calculation in 
                    <ext-link ext-link-type="uri" xlink:href="https://mafft.cbrc.jp/alignment/server/">MAFFT</ext-link> (MAFFT, version 7.467, RRID:SCR_011811) which supports alignment for more than 20,000 sequences.
                    <sup>
                        <xref ref-type="bibr" rid="ref10">10</xref>
                    </sup> After all SARS-CoV-2 sequences were aligned to the reference genome, the multiple sequence alignment file was visualized under MEGA X software, version 10.2.5 build 10210330 (
                    <ext-link ext-link-type="uri" xlink:href="http://megasoftware.net/">MEGA Software</ext-link>, RRID:SCR_000667).</p>
            </sec>
            <sec id="sec5">
                <title>Identification of nonsynonymous mutations and the statistics of the mutation in the SARS-CoV-2 proteins</title>
                <p>The 11 different coding sequences were extracted from these 30,229 strains according to their genomic positions in the reference strain (fasta file format) in NCBI, which is 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/nuccore/1798174254">NC_045512.2</ext-link>. Inappropriate sequences of base calling errors, &#x201c;N&#x201d; unresolved nucleotides, and undefinable gaps were omitted. Then, the frequency and number of nonsynonymous mutations in these 30,229 strains were identified using a Python script.</p>
            </sec>
            <sec id="sec6">
                <title>Co-mutation analysis of SARS-CoV-2 proteins</title>
                <p>The concurrence ratio of each nonsynonymous mutation in the SARS-CoV-2 genome was determined using 
                    <ext-link ext-link-type="uri" xlink:href="https://wan-bioinfo.shinyapps.io/GESS/">GESS database</ext-link> (The Global Evaluation of SARS-CoV-2/hCoV-19 Sequences, RRID:SCR_021847)
                    <sup>
                        <xref ref-type="bibr" rid="ref11">11</xref>
                    </sup> derived from GISAID web server. The concurrence search used for the analysis of the concurrence ratio in the top 10 nonsynonymous mutations is listed in 
                    <xref ref-type="table" rid="T2">Table 2</xref>. The frequency for each SNV in the concurrence search is greater than 0.1%. The chord diagram for co-mutations of nonsynonymous mutations in the SARS-CoV-2 genome was generated using 
                    <ext-link ext-link-type="uri" xlink:href="http://mkweb.bcgsc.ca/tableviewer/">Circos table viewer</ext-link> (Circos, RRID:SCR_011798).
                    <sup>
                        <xref ref-type="bibr" rid="ref12">12</xref>
                    </sup>
                </p>
                <table-wrap id="T2" orientation="portrait" position="float">
                    <label>Table 2. </label>
                    <caption>
                        <title>Concurrence ratio of top 10 nonsynonymous mutations in SARS-CoV-2 proteins.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="2" rowspan="1" valign="top">Coding region and amino acid change</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">ORF1a nsp5 P108S</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">ORF1b nsp12 P323L</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">ORF1b nsp12 A423V</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">S protein N501Y</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">S protein D614G</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">ORF3a Q57H</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">N protein P151L</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">N protein R203K</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">N protein R203K</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">N protein G204R</th>
                            </tr>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Coding region and amino acid change</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Nucleotide change</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">C10376T</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">C14408T</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">C14708T</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">A23063T</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">A23403G</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">G25563T</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">C28725T</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">G28881A</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">G28882A</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">G28883C</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>ORF1a nsp5 P108S</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>C10376T</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.9</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">98.3</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.2</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.9</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">4.6</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">97.2</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">91.6</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">91.6</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">91.7</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>ORF1b nsp12 P323L</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>C14408T</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.9</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.9</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">98.8</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.8</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.7</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.9</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.7</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.8</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.8</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>ORF1b nsp12 A423V</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>C14708T</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">98.3</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.9</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.9</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.8</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">98.3</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">98.6</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">98.6</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">98.6</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>S protein N501Y</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>A23063T</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.2</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">98.8</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.2</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">25.7</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.2</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">67.3</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">65.6</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">66.7</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>S protein D614G</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>A23403G</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.9</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.8</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.9</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.2</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.9</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">100</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.6</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.7</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.7</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>ORF3a Q57H</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>G25563T</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">4.6</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.7</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.8</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">25.7</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.9</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1.1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.4</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.3</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.2</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>N protein P151L</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>C28725T</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">97.2</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.9</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">98.3</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.2</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">100</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1.1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">98.1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">98.1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">98.1</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>N protein R203K</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>G28881A</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">91.6</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.7</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">98.6</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">67.3</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.6</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.4</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">98.1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.9</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.9</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>N protein R203K</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>G28882A</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">91.6</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.8</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">98.6</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">65.6</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.7</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.3</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">98.1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.9</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.9</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>N protein G204R</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>G28883C</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">91.7</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.8</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">98.6</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">66.7</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.7</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.2</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">98.1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.9</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99.9</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
            </sec>
            <sec id="sec7">
                <title>Prediction of mutation effect on protein stability and flexibility</title>
                <p>To predict the effects of the mutations on the stability and flexibility of the protein structure, the protein structures were analyzed with 
                    <ext-link ext-link-type="uri" xlink:href="http://biosig.unimelb.edu.au/dynamut/">DynaMut</ext-link> server (DynaMut, RRID:SCR_021849).
                    <sup>
                        <xref ref-type="bibr" rid="ref13">13</xref>
                    </sup> The free energy change between the wild type and mutant protein structure (&#x0394;&#x0394;G) predicts the status of protein stability, in which the values of &#x0394;&#x0394;G above zero indicate a good stabilization while any values below zero or negative indicate a destabilizing outcome. The difference in entropic energy between the wild type and mutant structures (&#x0394;&#x0394;S
                    <sub>Vib</sub> ENCoM) predicts the status of protein flexibility, in which the values of &#x0394;&#x0394;S
                    <sub>Vib</sub> ENCoM above zero indicate an increase in flexibility while any values below zero or negative indicate a decrease in flexibility.</p>
            </sec>
            <sec id="sec8">
                <title>Prediction of mutation effect on protein function</title>
                <p>
                    <ext-link ext-link-type="uri" xlink:href="https://sift.bii.a-star.edu.sg/sift4g/AboutSIFT4G.html">SIFT 4G</ext-link> (Sorting Tolerant From Intolerant For Genomes, RRID:SCR_021850)
                    <sup>
                        <xref ref-type="bibr" rid="ref14">14</xref>
                    </sup> and 
                    <ext-link ext-link-type="uri" xlink:href="http://provean.jcvi.org/index.php">PROVEAN</ext-link> (Protein Variation Effect Analyzer, RRID:SCR_002182)
                    <sup>
                        <xref ref-type="bibr" rid="ref15">15</xref>
                    </sup> were used to predict the deleteriousness of the nonsynonymous single nucleotide polymorphisms (nsSNPs) on SARS-CoV-2 protein structure. SIFT 4G predicts the effects of the mutations based on the sequence conservation and amino acid properties. For SIFT 4G analysis, gene annotation files (GTF), fasta files containing the SARS-CoV-2 genome sequences, and a variant call format file (VCF) comprising all the SNP of SARS-CoV-2 were obtained. After that, the SARS-CoV-2 genome database, built with the SIFT 4G algorithm, was created. Lastly, SIFT 4G annotator was applied to annotate the VCF file with SARS-CoV-2 genome database. Mutations with a SIFT 4G score of less than 0.05 were considered deleterious. PROVEAN predicts the effects of the mutations based on the principle of alignment-based score. For PROVEAN analysis, the amino acid sequence along with the amino acid variation were processed in the PROVEAN server to get the prediction result. Mutations with a value less than &#x2212;2.5 were considered as deleterious.</p>
            </sec>
        </sec>
        <sec id="sec9" sec-type="results">
            <title>Results</title>
            <sec id="sec10">
                <title>The statistics of nonsynonymous mutations in SARS-CoV-2 proteins</title>
                <p>From the multiple alignment analysis, we identified 231 nonsynonymous mutations from 30,229 SARS-CoV-2 genome sequences. 
                    <xref ref-type="fig" rid="f1">Figure 1</xref> shows the numbers of the nonsynonymous mutations found in 11 coding sequences of SARS-CoV-2 proteins. ORF1a has the highest numbers of nonsynonymous mutations, followed by S protein and N protein. The top 10 nonsynonymous mutations affecting 9 amino acids residues including ORF1a nsp5 P108S, ORF1b nsp12 P323L and A423V, S protein N501Y and D614G, ORF3a Q57H, N protein P151L, R203K and G204R are shown in 
                    <xref ref-type="table" rid="T3">Table 3</xref>.</p>
                <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                    <label>Figure 1. </label>
                    <caption>
                        <title>The numbers of nonsynonymous mutations in 11 coding sequences of SARS-CoV-2 proteins.</title>
                    </caption>
                    <graphic id="gr1" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/76515/63ee62ed-ec33-4f44-b277-5da112c3fec0_figure1.gif"/>
                </fig>
                <table-wrap id="T3" orientation="portrait" position="float">
                    <label>Table 3. </label>
                    <caption>
                        <title>Top 10 nonsynonymous mutations of SARS-CoV-2 proteins.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Protein</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Nucleotide changes</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Amino acid changes</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Frequency</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">ORF1a nsp5</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">C10376T</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P108S</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">4024</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="2" valign="top">ORF1b nsp12</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">C14408T</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P323L</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">27953</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">C14708T</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">A423V</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">3988</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="2" valign="top">S</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">A23063T</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">N501Y</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">4218</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">A23403G</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">D614G</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">28022</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">ORF3a</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">G25563T</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Q57H</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">5274</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="4" valign="top">N</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">C28725T</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P151L</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">4007</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">G28881A</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">R203K</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">18116</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">G28882A</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">R203K</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">18092</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">G28883C</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">G204R</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">18090</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
            </sec>
            <sec id="sec11">
                <title>Co-mutation analysis of SARS-CoV-2 proteins</title>
                <p>Some nonsynonymous mutations may be random and have no or little biological impact on viral transmission and pathogenesis. If a single nonsynonymous mutation co-mutates with other mutations, they may evolve together and interact functionally. To study co-mutation between different nonsynonymous mutations, the concurrence ratio of co-mutations in the top 10 nonsynonymous mutations was retrieved from GESS database website as shown in 
                    <xref ref-type="table" rid="T2">Table 2</xref>. The visualization of co-mutations in the top 10 nonsynonymous mutations generated with Circos table view is shown in 
                    <xref ref-type="fig" rid="f2">Figure 2</xref>. In this chord diagram, connection ribbons represent co-mutations and each ribbon between row and column segments represents the value of concurrence ratio in each top 10 nonsynonymous mutations. Single colours encoded in circular arranged segments represent its own specific mutation whereas rainbow colours represent co-mutation in each mutation. The size of circular arrangement segments is proportional to the total value of concurrence ratio in a row or column. The circular size segment of ORF3a Q57H (G25563T) with the smallest segment size means the total value of concurrence ratio in row or column of ORF3a Q57H (G25563T) having the lowest concurrence ratio. A high concurrence ratio shows high co-mutation between each mutation with thicker ribbon size. S protein D614G (A23403G) with all other nine nonsynonymous mutations had concurrence ratios greater than 99%. On the other hand, low concurrence ratio shows low co-mutation with thinner ribbon size, for example, mutation ORF3a Q57H (G25563T) had the lowest concurrence ratio, only having a high concurrence ratio with S protein D614G (A23403G) and ORF1b nsp12 (P323L) C14408T, the top 2 nonsynonymous mutations which were present in more than 90% of the reported sequences.</p>
                <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                    <label>Figure 2. </label>
                    <caption>
                        <title>Visualization of co-mutation in top 10 nonsynonymous mutations in SARS-CoV-2 proteins.</title>
                    </caption>
                    <graphic id="gr2" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/76515/63ee62ed-ec33-4f44-b277-5da112c3fec0_figure2.gif"/>
                </fig>
            </sec>
            <sec id="sec12">
                <title>Prediction of mutation effect on protein stability and flexibility</title>
                <p>
                    <xref ref-type="table" rid="T4">Table 4</xref> summarizes the results of predicted effects of mutations on protein stability and flexibility obtained from DynaMut. Only two mutations, namely ORF1a nsp5 P108S and S protein D614G were predicted to be destabilizing with &#x0394;&#x0394;G values of &#x2212;0.288 and &#x2212;0.072, respectively. For the prediction of protein flexibility, only S protein D614G was predicted to have an increase in flexibility with an &#x0394;&#x0394;S
                    <sub>Vib</sub> ENCoM value of 0.523.</p>
                <table-wrap id="T4" orientation="portrait" position="float">
                    <label>Table 4. </label>
                    <caption>
                        <title>Prediction of nonsynonymous mutation effect on SARS-CoV-2 proteins stability.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Protein</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Mutation</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">&#x0394;&#x0394;G (kcal/mol)</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Prediction outcome</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">&#x0394;&#x0394;S
                                    <sub>Vib</sub> ENCoM (kcal.mol
                                    <sup>&#x2212;1</sup>.K
                                    <sup>&#x2212;1</sup>)</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Molecule flexibility</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">ORF1a nsp5</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P108S</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2212;0.288</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Destabilizing</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2212;0.208</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Decrease</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="2" valign="top">ORF1b nsp12</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P323L</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1.784</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Stabilizing</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2212;0.432</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Decrease</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">A423V</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.776</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Stabilizing</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2212;0.348</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Decrease</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="2" valign="top">S</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">N501Y</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.013</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Stabilizing</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2212;0.088</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Decrease</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">D614G</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2212;0.072</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Destabilizing</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.523</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Increase</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">ORF3a</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Q57H</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.275</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Stabilizing</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2212;0.160</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Decrease</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="3" valign="top">N</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P151L</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1.111</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Stabilizing</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2212;0.325</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Decrease</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">R203K</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.749</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Stabilizing</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2212;0.107</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Decrease</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">G204R</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1.064</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Stabilizing</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2212;2.522</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Decrease</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
            </sec>
            <sec id="sec13">
                <title>Prediction of mutation effect on protein function</title>
                <p>The prediction results of nonsynonymous mutations in the SARS-CoV-2 proteins using SIFT 4G and PROVEAN are shown in 
                    <xref ref-type="table" rid="T5">Table 5</xref>. SIFT 4G functional missense mutation score predicted that the P108S mutation in ORF1a nsp5 was deleterious (score 0.00) while four mutations S protein D614G, ORF3a Q57H, N protein R203K and G204R were tolerated (&gt;0.05). However, the SIFT 4G results of ORF1b nsp12 P323L and A423V, S protein N501Y and N protein P151L mutations cannot be obtained due to missing data in the Ensembl database. For the PROVEAN score, three nonsynonymous mutations, namely ORF1a nsp5 P108S, ORF3a Q57H and N protein P151L were predicted to be deleterious (score &lt; &#x2212;2.5). However, six nonsynonymous mutations, namely ORF1b nsp12 P323L and A423V, S protein N501Y and D614G, N protein R203K and G204R were predicted to be neutral (score &gt; &#x2212;2.5).</p>
                <table-wrap id="T5" orientation="portrait" position="float">
                    <label>Table 5. </label>
                    <caption>
                        <title>Prediction of nonsynonymous mutation effect on SARS-CoV-2 proteins function.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Protein</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Mutation</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">SIFT 4G</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Provean</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">ORF1a nsp5</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P108S</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.00 (deleterious)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2212;3.71 (deleterious)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="2" valign="top">ORF1b nsp12</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P323L</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">-</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2212;0.91 (neutral)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">A423V</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">-</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1.21 (neutral)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="2" valign="top">S</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">N501Y</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">-</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2212;0.09 (neutral)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">D614G</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1.00 (tolerated)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.60 (neutral)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">ORF3a</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Q57H</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.61 (tolerated)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2212;3.29 (deleterious)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="3" valign="top">N</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P151L</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">-</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2212;4.93 (deleterious)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">R203K</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.11 (tolerated)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2212;1.60 (neutral)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">G204R</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.08 (tolerated)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">&#x2212;1.66 (neutral)</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
            </sec>
        </sec>
        <sec id="sec14" sec-type="discussion">
            <title>Discussion</title>
            <p>The top 10 nonsynonymous mutations of SARS-CoV-2 identified from 30,229 SARS-CoV-2 genome sequences were further analyzed with co-mutation analysis, prediction of the protein structure stability and flexibility analysis, and prediction of the protein function analysis. To determine if two nonsynonymous mutations of SARS-CoV-2 proteins co-mutate, concurrence ratio was calculated. Many nonsynonymous mutations showed a high concurrence ratio, suggesting these mutations may evolve together and interact functionally. The top 2 nonsynonymous mutations, S protein D614G and ORF1b nsp12 P323L (as known as RNA-dependent RNA polymerase) showed very high concurrence ratio with other mutations since they emerged in the early phase of the pandemic. Previously it has been shown that S protein D614G co-evolved with ORF1b nsp12 P323L.
                <sup>
                    <xref ref-type="bibr" rid="ref16">16</xref>
                </sup> The combination of both mutations may enhance viral fitness based on epidemiological data, although the molecular mechanisms of this evolutionary advantage remain elusive.
                <sup>
                    <xref ref-type="bibr" rid="ref16">16</xref>
                </sup> In another study, it has been predicted that multiple SARS-CoV-2 genes may have epistatic interactions linked to viral fitness.
                <sup>
                    <xref ref-type="bibr" rid="ref17">17</xref>
                </sup> The effects of a mutation can be neutral, harmful, or beneficial to the virus. It is expected that most single mutations have a small effect on viral fitness. It remains an arduous task to associate a specific phenotype with a single viral mutation since it is possible that a specific phenotype is contributed to by the effects of multiple mutations.</p>
            <p>There are huge numbers of single nucleotide polymorphisms (SNPs) present in the SARS-CoV-2 genome, hence evaluating the biological functions of all SNPs using experimental approaches is not feasible. Therefore, prediction of the effects of SNPs allows us to prioritize variants which may have some significant biological functions. Our study used the meta-prediction approach to perform functional predictions of nonsynonymous mutations to minimize the false positive rate. When two or three tools are combined, the prediction accuracy increases and reaches greater performance, however, the sensitivity is subsequently decreased as more tools are combined.
                <sup>
                    <xref ref-type="bibr" rid="ref18">18</xref>
                </sup>
            </p>
            <p>Of all these nine protein mutations, only two mutations namely ORF1a nsp5 P108S and S protein D614G were predicted to reduce their stability whereas only S protein D614G may have more a flexible conformation compared to the wild type. S protein binds to human ACE2 receptors to gain access to the host cell.
                <sup>
                    <xref ref-type="bibr" rid="ref19">19</xref>
                </sup> D614G mutation is found at S1 domain which is involved in receptor binding.
                <sup>
                    <xref ref-type="bibr" rid="ref20">20</xref>
                </sup> Two independent studies of S protein D614G mutant structures derived from cryo-electron microscopy analysis has demonstrated that the G614 mutant adopts a more open conformation compared to D614 wild type.
                <sup>
                    <xref ref-type="bibr" rid="ref21">21</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref22">22</xref>
                </sup> Interestingly, an 
                <italic toggle="yes">in vitro</italic> study has shown that S protein D614G mutation may enhance virus infectivity by promoting the packing of S protein into the virion, not by enhancing the binding of S protein to the ACE2 receptor.
                <sup>
                    <xref ref-type="bibr" rid="ref23">23</xref>
                </sup> On the other hand, ORF1a nsp5, also known as 3C-like protease is responsible for cleaving viral polypeptides during replication.
                <sup>
                    <xref ref-type="bibr" rid="ref2">2</xref>
                </sup> A study by Abe et al., (2021) has showed that ORF1a nsp5 protein P108S mutation diminished its activity, possibly leading to a reduction in disease severity.
                <sup>
                    <xref ref-type="bibr" rid="ref24">24</xref>
                </sup>
            </p>
            <p>Since the protein function depends directly on the three-dimensional structure of the protein, we wanted to see if these mutations may affect the function of the protein using SIFT 4G and PROVEAN prediction tools. The PROVEAN tool is applicable for all organisms. SIFT4G, instead of SIFT was used since it allows us to build a SARS-CoV-2 genome database with variant annotation. Interestingly ORF1a nsp5 protein P108S mutation was the only mutation found to be deleterious from both SIFT4G and PROVEAN functional analysis. Together with the DynaMut stability result, it has been demonstrated that this mutation may be harmful to the virus itself, and can be less damaging to the human host as reported by Abe et al., (2021).
                <sup>
                    <xref ref-type="bibr" rid="ref24">24</xref>
                </sup> On the other hand, ORF3a Q57H and N protein P151L mutations are predicted to be deleterious by the PROVEAN tool only. ORF 3a is an ion channel (viroporin) which is involved in viral egress steps through lysosomal trafficking.
                <sup>
                    <xref ref-type="bibr" rid="ref25">25</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref26">26</xref>
                </sup> ORF3a Q57H mutation not only causes a change in amino acid in ORF3a, but also produces a truncated ORF3b due to the overlapping protein-coding sequences shared by ORF3a and ORF3b.
                <sup>
                    <xref ref-type="bibr" rid="ref27">27</xref>
                </sup> However, there are conflicting results about the effect of the ORF3a Q57H mutation on the human host immune response.
                <sup>
                    <xref ref-type="bibr" rid="ref27">27</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref28">28</xref>
                </sup> N protein is involved in the liquid-liquid phase separation for the viral genome packaging.
                <sup>
                    <xref ref-type="bibr" rid="ref29">29</xref>
                </sup> N protein P151L mutation is located at the RNA binding domain. It has been proposed that this mutation may disrupt the protein-drug interaction.
                <sup>
                    <xref ref-type="bibr" rid="ref30">30</xref>
                </sup> Although another two N protein mutations, R203K and G204R were not predicted to be deleterious in our study, they have been identified in the alpha variant, B.1.1.7, gamma variant, P.1, lambda variant, C.37 and omicron variant, BA.1/B.1.1.529.
                <sup>
                    <xref ref-type="bibr" rid="ref31">31</xref>
                </sup> While N protein, T205I mutation has been reported in the beta variant, B.1.351 and Mu variant, B.1.621.
                <sup>
                    <xref ref-type="bibr" rid="ref31">31</xref>
                </sup> More recently, another N protein mutation, R203M has been reported in the delta variant, B.1.617.2.
                <sup>
                    <xref ref-type="bibr" rid="ref31">31</xref>
                </sup> Interestingly mutants with N protein S202R or R203M mutations can pack more RNA material compared to the wild type based on 
                <italic toggle="yes">in vitro</italic> studies.
                <sup>
                    <xref ref-type="bibr" rid="ref32">32</xref>
                </sup> These observations and experimental results suggest that N protein residues, S202, R203, G204 and T205 may play some role on viral RNA replication.</p>
        </sec>
        <sec id="sec15" sec-type="conclusion">
            <title>Conclusion</title>
            <p>In this study, ORF1a nsp5 P108S, S protein D614G, ORF3a Q57H and N protein P151L mutations have been predicted to alter their structures and/or functions. Since all the reported variants of concern contain multiple mutations present in multiple SARS-CoV-2 proteins, it is necessary to evaluate the impact of these mutations in combination on viral transmission and pathogenicity. The biological consequences of these nonsynonymous mutations of SARS-CoV-2 proteins should be further validated with 
                <italic toggle="yes">in vivo</italic> and 
                <italic toggle="yes">in vitro</italic> experimental studies in the future.</p>
        </sec>
        <sec id="sec16">
            <title>Ethics and dissemination</title>
            <p>No ethical approval is required for data analysis in this study (EA0802021).</p>
        </sec>
        <sec id="sec17">
            <title>Author contribution</title>
            <p>CHN contributes to the concept, design, supervision of the project. SBZ, WXB, YYY and SK contribute to the design, methodology, and data collection. SBZ, WXB, YYY and SK contributed to the analysis, and interpretation of data.</p>
            <p>All authors were involved in drafting and revising the manuscript and approved the final version.</p>
        </sec>
        <sec id="sec18">
            <title>Data and software availability</title>
            <p>SARS-CoV-2 virus genome sequence data were downloaded from the 
                <ext-link ext-link-type="uri" xlink:href="https://www.gisaid.org/">GISAID Database</ext-link>. The additional multiple alignment data can be obtained from FigShare</p>
            <p>Figshare: MSA (SARS-CoV-2). 
                <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.6084/m9.figshare.16681900.v4">https://doi.org/10.6084/m9.figshare.16681900.v4</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref33">33</xref>
                </sup>
            </p>
            <p>This project contains the following underlying data.
                <list list-type="bullet">
                    <list-item>
                        <label>&#x2022;</label>
                        <p>MSA_0 (31-12-2019 to 31-05-2020).fasta file contains multiple sequence alignment data of SARS-CoV-2 genome sequences ranging between 31-12-2019 and 31-05-2020.</p>
                    </list-item>
                    <list-item>
                        <label>&#x2022;</label>
                        <p>MSA_1 (01-06-2020 to 15-10-2020).fasta file contains multiple sequence alignment data of SARS-CoV-2 genome sequences ranging between 01-06-2020 and 15-10-2020.</p>
                    </list-item>
                    <list-item>
                        <label>&#x2022;</label>
                        <p>MSA_2 (16-10-2020 to 31-01-2021).fasta file contains multiple sequence alignment data of SARS-CoV-2 genome sequences ranging between 16-10-2020 and 31-01-2021.</p>
                    </list-item>
                    <list-item>
                        <label>&#x2022;</label>
                        <p>MSA_3 (01-02-2021 to 22-03-2021).fasta file contains multiple sequence alignment data of SARS-CoV-2 genome sequences ranging between 01-02-2021 to 22-03-2021.</p>
                    </list-item>
                </list>
            </p>
            <p>Data are available under the terms of the 
                <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International license</ext-link> (CC-BY 4.0).</p>
            <p>The python script used for the identification of SARS-CoV-2 genome mutations can be obtained through GitHub (
                <ext-link ext-link-type="uri" xlink:href="https://github.com/wxboon98/Mutations-Identification">https://github.com/wxboon98/Mutations-Identification</ext-link>).</p>
            <p>The same set of SARS-CoV-2 genomic data was also used to perform multiple sequence alignment analysis to identify the SARS-CoV-2 genomic mutations for another paper, titled &#x201c;Prediction of the Effects of Synonymous Variants on SARS-CoV-2 Genome&#x201d;.
                <sup>
                    <xref ref-type="bibr" rid="ref34">34</xref>
                </sup> MEGA-X software was used to determine if the mutations are synonymous or nonsynonymous mutations for the subsequent prediction and other analyses.</p>
        </sec>
    </body>
    <back>
        <ack>
            <title>Acknowledgments</title>
            <p>This research is supported by Multimedia University, Malaysia, IRFund 2.0 (grant number MMUI/210119 awarded to Chong Han, Ng). The funder has no role in study design, data analysis, decision to publish or manuscript preparation.</p>
        </ack>
        <ref-list>
            <title>References</title>
            <ref id="ref1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Huang</surname>
                            <given-names>C</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China.</article-title>
                    <source>

                        <italic toggle="yes">Lancet.</italic>
</source>
                    <year>Feb. 2020</year>;<volume>395</volume>(<issue>10223</issue>):<fpage>497</fpage>&#x2013;<lpage>506</lpage>.
                    <pub-id pub-id-type="pmid">31986264</pub-id>
                    <pub-id pub-id-type="doi">10.1016/S0140-6736(20)30183-5</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Mousavizadeh</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ghasemi</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>Genotype and phenotype of COVID-19: Their roles in pathogenesis.</article-title>
                    <source>

                        <italic toggle="yes">J. Microbiol. Immunol. Infect.</italic>
</source>
                    <year>2020</year>;<volume>54</volume>(<issue>2</issue>):<fpage>159</fpage>&#x2013;<lpage>163</lpage>.
                    <pub-id pub-id-type="doi">10.1016/j.jmii.2020.03.022</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Harvey</surname>
                            <given-names>WT</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>SARS-CoV-2 variants, spike mutations and immune escape.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Rev. Microbiol.</italic>
</source>
                    <year>2021</year>;<volume>19</volume>(<issue>7</issue>):<fpage>409</fpage>&#x2013;<lpage>424</lpage>.
                    <pub-id pub-id-type="pmid">34075212</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41579-021-00573-0</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kim</surname>
                            <given-names>J-S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Jang</surname>
                            <given-names>J-H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kim</surname>
                            <given-names>J-M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Genome-Wide Identification and Characterization of Point Mutations in the SARS-CoV-2 Genome.</article-title>
                    <source>

                        <italic toggle="yes">Osong. Public Health Res. Perspect.</italic>
</source>
                    <year>2020</year>;<volume>11</volume>(<issue>3</issue>):<fpage>101</fpage>&#x2013;<lpage>111</lpage>.
                    <pub-id pub-id-type="pmid">32528815</pub-id>
                    <pub-id pub-id-type="doi">10.24171/j.phrp.2020.11.3.05</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Yuan</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Fang</surname>
                            <given-names>Y</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Global SNP analysis of 11,183 SARS-CoV-2 strains reveals high genetic diversity.</article-title>
                    <source>

                        <italic toggle="yes">Transbound. Emerg. Dis.</italic>
</source>
                    <year>Nov. 2021</year>;<volume>68</volume>(<issue>6</issue>):<fpage>3288</fpage>&#x2013;<lpage>3304</lpage>.
                    <pub-id pub-id-type="pmid">33207070</pub-id>
                    <pub-id pub-id-type="doi">10.1111/tbed.13931</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Das</surname>
                            <given-names>JK</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Roy</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>A study on non-synonymous mutational patterns in structural proteins of SARS-CoV-2.</article-title>
                    <source>

                        <italic toggle="yes">Genome.</italic>
</source>
                    <year>2021</year>;<volume>64</volume>(<issue>7</issue>):<fpage>665</fpage>&#x2013;<lpage>678</lpage>.
                    <pub-id pub-id-type="pmid">33788636</pub-id>
                    <pub-id pub-id-type="doi">10.1139/gen-2020-0157</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref7">
                <label>7</label>
                <mixed-citation publication-type="other">
                    <collab>GISAID Initiative</collab>:
[Accessed: 23-Sep-2021].
                    <ext-link ext-link-type="uri" xlink:href="https://www.epicov.org/epi3/frontend#51c08f">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref8">
                <label>8</label>
                <mixed-citation publication-type="other">
                    <collab>RCSB PDB: Homepage</collab>:
[Accessed: 01-Dec-2021].
                    <ext-link ext-link-type="uri" xlink:href="https://www.rcsb.org/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref9">
                <label>9</label>
                <mixed-citation publication-type="other">
                    <collab>Modeling of the SARS-COV-2 Genome using D-I-TASSER</collab>:
[Accessed: 01-Dec-2021].
                    <ext-link ext-link-type="uri" xlink:href="https://zhanggroup.org/COVID-19/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref10">
                <label>10</label>
                <mixed-citation publication-type="other">
                    <collab>MAFFT - a multiple sequence alignment program</collab>:
[Accessed: 23-Sep-2021].
                    <ext-link ext-link-type="uri" xlink:href="https://mafft.cbrc.jp/alignment/software/closelyrelatedviralgenomes.html">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Fang</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>GESS: a database of global evaluation of SARS-CoV-2/hCoV-19 sequences.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>Jan. 2021</year>;<volume>49</volume>(<issue>D1</issue>):<fpage>D706</fpage>&#x2013;<lpage>D714</lpage>.
                    <pub-id pub-id-type="pmid">33045727</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkaa808</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Krzywinski</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Circos: An information aesthetic for comparative genomics.</article-title>
                    <source>

                        <italic toggle="yes">Genome Res.</italic>
</source>
                    <year>Sep. 2009</year>;<volume>19</volume>(<issue>9</issue>):<fpage>1639</fpage>&#x2013;<lpage>1645</lpage>.
                    <pub-id pub-id-type="pmid">19541911</pub-id>
                    <pub-id pub-id-type="doi">10.1101/gr.092759.109</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref13">
                <label>13</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Rodrigues</surname>
                            <given-names>CHM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pires</surname>
                            <given-names>DEV</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ascher</surname>
                            <given-names>DB</given-names>
                        </name>
</person-group>:
                    <article-title>DynaMut: Predicting the impact of mutations on protein conformation, flexibility and stability.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>Jul. 2018</year>;<volume>46</volume>(<issue>W1</issue>):<fpage>W350</fpage>&#x2013;<lpage>W355</lpage>.
                    <pub-id pub-id-type="pmid">29718330</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gky300</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref14">
                <label>14</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Vaser</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Adusumalli</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Leng</surname>
                            <given-names>SN</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>SIFT missense predictions for genomes.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Protoc.</italic>
</source>
                    <year>Dec. 2015</year>;<volume>11</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>9</lpage>.
                    <pub-id pub-id-type="pmid">26633127</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nprot.2015.123</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref15">
                <label>15</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Choi</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Chan</surname>
                            <given-names>AP</given-names>
                        </name>
</person-group>:
                    <article-title>PROVEAN web server: A tool to predict the functional effect of amino acid substitutions and indels.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2015</year>;<volume>31</volume>(<issue>16</issue>):<fpage>2745</fpage>&#x2013;<lpage>2747</lpage>.
                    <pub-id pub-id-type="pmid">25851949</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btv195</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref16">
                <label>16</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ilmj&#x00e4;rv</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Concurrent mutations in RNA-dependent RNA polymerase and spike protein emerged as the epidemiologically most successful SARS-CoV-2 variant.</article-title>
                    <source>

                        <italic toggle="yes">Sci. Rep.</italic>
</source>
                    <year>Jul. 2021</year>;<volume>11</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>13</lpage>.
                    <pub-id pub-id-type="doi">10.1038/s41598-021-91662-w</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref17">
                <label>17</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zeng</surname>
                            <given-names>H-L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Dichio</surname>
                            <given-names>V</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Horta</surname>
                            <given-names>ER</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Global analysis of more than 50,000 SARS-CoV-2 genomes reveals epistasis between eight viral genes.</article-title>
                    <source>

                        <italic toggle="yes">Proc. Natl. Acad. Sci.</italic>
</source>
                    <year>Dec. 2020</year>;<volume>117</volume>(<issue>49</issue>):<fpage>31519</fpage>&#x2013;<lpage>31526</lpage>.
                    <pub-id pub-id-type="pmid">33203681</pub-id>
                    <pub-id pub-id-type="doi">10.1073/pnas.2012331117</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref18">
                <label>18</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Sun</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Yu</surname>
                            <given-names>G</given-names>
                        </name>
</person-group>:
                    <article-title>New insights into the pathogenicity of non-synonymous variants through multi-level analysis.</article-title>
                    <source>

                        <italic toggle="yes">Sci. Rep.</italic>
</source>
                    <year>Feb. 2019</year>;<volume>9</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>11</lpage>.</mixed-citation>
            </ref>
            <ref id="ref19">
                <label>19</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Walls</surname>
                            <given-names>AC</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Park</surname>
                            <given-names>YJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Tortorici</surname>
                            <given-names>MA</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein.</article-title>
                    <source>

                        <italic toggle="yes">Cell.</italic>
</source>
                    <year>2020</year>;<volume>181</volume>(<issue>2</issue>):<fpage>281</fpage>&#x2013;<lpage>292.e6</lpage>.
                    <pub-id pub-id-type="pmid">32155444</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.cell.2020.02.058</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref20">
                <label>20</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Korber</surname>
                            <given-names>B</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus.</article-title>
                    <source>

                        <italic toggle="yes">Cell.</italic>
</source>
                    <year>Aug. 2020</year>;<volume>182</volume>(<issue>4</issue>):<fpage>812</fpage>&#x2013;<lpage>827.e19</lpage>.
                    <pub-id pub-id-type="pmid">32697968</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.cell.2020.06.043</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref21">
                <label>21</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Yurkovetskiy</surname>
                            <given-names>L</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Structural and Functional Analysis of the D614G SARS-CoV-2 Spike Protein Variant.</article-title>
                    <source>

                        <italic toggle="yes">Cell.</italic>
</source>
                    <year>Oct. 2020</year>;<volume>183</volume>(<issue>3</issue>):<fpage>739</fpage>&#x2013;<lpage>751.e8</lpage>.
                    <pub-id pub-id-type="pmid">32991842</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.cell.2020.09.032</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref22">
                <label>22</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Benton</surname>
                            <given-names>DJ</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The effect of the D614G substitution on the structure of the spike glycoprotein of SARS-CoV-2.</article-title>
                    <source>

                        <italic toggle="yes">Proc. Natl. Acad. Sci.</italic>
</source>
                    <year>Mar. 2021</year>;<volume>118</volume>(<issue>9</issue>):<fpage>e2022586118</fpage>.
                    <pub-id pub-id-type="pmid">33579792</pub-id>
                    <pub-id pub-id-type="doi">10.1073/pnas.2022586118</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref23">
                <label>23</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zhang</surname>
                            <given-names>L</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>SARS-CoV-2 spike-protein D614G mutation increases virion spike density and infectivity.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Commun.</italic>
</source>
                    <year>Nov. 2020</year>;<volume>11</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>9</lpage>.
                    <pub-id pub-id-type="doi">10.1038/s41467-020-19808-4</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref24">
                <label>24</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Abe</surname>
                            <given-names>K</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Pro108Ser mutant of SARS-CoV-2 3CLpro reduces the enzymatic activity and ameliorates COVID-19 severity in Japan.</article-title>
                    <source>

                        <italic toggle="yes">medRxiv.</italic>
</source>
                    <year>Feb. 2021</year>; p.<fpage>2020.11.24.20235952</fpage>.</mixed-citation>
            </ref>
            <ref id="ref25">
                <label>25</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Miao</surname>
                            <given-names>G</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>ORF3a of the COVID-19 virus SARS-CoV-2 blocks HOPS complex-mediated assembly of the SNARE complex required for autolysosome formation.</article-title>
                    <source>

                        <italic toggle="yes">Dev. Cell.</italic>
</source>
                    <year>Feb. 2021</year>;<volume>56</volume>(<issue>4</issue>):<fpage>427</fpage>&#x2013;<lpage>442.e5</lpage>.
                    <pub-id pub-id-type="pmid">33422265</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.devcel.2020.12.010</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref26">
                <label>26</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ghosh</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>&#x03b2;-Coronaviruses Use Lysosomes for Egress Instead of the Biosynthetic Secretory Pathway.</article-title>
                    <source>

                        <italic toggle="yes">Cell.</italic>
</source>
                    <year>Dec. 2020</year>;<volume>183</volume>(<issue>6</issue>):<fpage>1520</fpage>&#x2013;<lpage>1535.e14</lpage>.
                    <pub-id pub-id-type="pmid">33157038</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.cell.2020.10.039</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref27">
                <label>27</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Lam</surname>
                            <given-names>JY</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Loss of orf3b in the circulating SARS-CoV-2 strains.</article-title>
                    <source>

                        <italic toggle="yes">Emerg. Microbes Infect.</italic>
</source>
                    <year>2020</year>;<volume>9</volume>(<issue>1</issue>):<fpage>2685</fpage>&#x2013;<lpage>2696</lpage>.
                    <pub-id pub-id-type="pmid">33205709</pub-id>
                    <pub-id pub-id-type="doi">10.1080/22221751.2020.1852892</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref28">
                <label>28</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Chu</surname>
                            <given-names>DKW</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Introduction of ORF3a-Q57H SARS-CoV-2 Variant Causing Fourth Epidemic Wave of COVID-19, Hong Kong, China - Volume 27, Number 5&#x2014;May 2021 - Emerging Infectious Diseases journal - CDC.</article-title>
                    <source>

                        <italic toggle="yes">Emerg. Infect. Dis.</italic>
</source>
                    <year>May 2021</year>;<volume>27</volume>(<issue>5</issue>):<fpage>1492</fpage>&#x2013;<lpage>1495</lpage>.
                    <pub-id pub-id-type="pmid">33900193</pub-id>
                    <pub-id pub-id-type="doi">10.3201/eid2705.210015</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref29">
                <label>29</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Savastano</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ib&#x00e1;&#x00f1;ez de Opakua</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Rankovic</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Nucleocapsid protein of SARS-CoV-2 phase separates into RNA-rich polymerase-containing condensates.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Commun.</italic>
</source>
                    <year>Nov. 2020</year>;<volume>11</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>10</lpage>.
                    <pub-id pub-id-type="doi">10.1038/s41467-020-19843-1</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref30">
                <label>30</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Azad</surname>
                            <given-names>GK</given-names>
                        </name>
</person-group>:
                    <article-title>Identification and molecular characterization of mutations in nucleocapsid phosphoprotein of SARS-CoV-2.</article-title>
                    <source>

                        <italic toggle="yes">PeerJ.</italic>
</source>
                    <year>Jan. 2021</year>;<volume>9</volume>:<fpage>e10666</fpage>.
                    <pub-id pub-id-type="pmid">33505806</pub-id>
                    <pub-id pub-id-type="doi">10.7717/peerj.10666</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref31">
                <label>31</label>
                <mixed-citation publication-type="other">
                    <collab>CoVariants</collab>:
[Accessed: 03-Dec-2021].
                    <ext-link ext-link-type="uri" xlink:href="https://covariants.org/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref32">
                <label>32</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Syed</surname>
                            <given-names>AM</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Rapid assessment of SARS-CoV-2 evolved variants using virus-like particles.</article-title>
                    <source>

                        <italic toggle="yes">Science (80-.).</italic>
</source>
                    <year>2021</year>;<volume>6184</volume>.</mixed-citation>
            </ref>
            <ref id="ref33">
                <label>33</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Boon</surname>
                            <given-names>WX</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ng</surname>
                            <given-names>CH</given-names>
                        </name>
</person-group>:
                    <article-title>MSA (SARS-CoV-2). figshare.</article-title>
                    <source>

                        <italic toggle="yes">Dataset.</italic>
</source>
                    <year>2021</year>.
                    <pub-id pub-id-type="doi">10.6084/m9.figshare.16681900.v4</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref34">
                <label>34</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Boon</surname>
                            <given-names>WX</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sia</surname>
                            <given-names>Z</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ng</surname>
                            <given-names>CH</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Prediction of the Effects of Synonymous Variants on SARS-CoV-2 Genome.</article-title>
                    <source>

                        <italic toggle="yes">F1000Res.</italic>
</source>
                    <year>Oct. 2021</year>;<volume>10</volume>:<fpage>1053</fpage>.
                    <pub-id pub-id-type="doi">10.12688/f1000research.72896.1</pub-id>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report125647">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.76515.r125647</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Lee</surname>
                        <given-names>In-Hee</given-names>
                    </name>
                    <xref ref-type="aff" rid="r125647a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-8857-1355</uri>
                </contrib>
                <aff id="r125647a1">
                    <label>1</label>Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>10</day>
                <month>3</month>
                <year>2022</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2022 Lee IH</copyright-statement>
                <copyright-year>2022</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport125647" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.72904.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The authors examined 10 selected nonsynonymous mutations in SARS-CoV-2 genome for their predicted effect on protein stability as well as their co-mutations. Overall the paper is written well to understand the experiments and analysis results.</p>
            <p> </p>
            <p> 
                <bold>Methods:</bold>
            </p>
            <p> </p>
            <p> 1. Authors mentioned that there were 231 nonsynonymous mutations from the 30,229 SARS-CoV-2 genome sequences used in the analysis. However, only 10 were intensively investigated throughout the paper. Can you explain the criteria why these mutations were selected?</p>
            <p> </p>
            <p> 2. The number of nonsynonymous mutations in coding sequences (Figure 1) may need to be adjusted by the length of each coding sequence.</p>
            <p> </p>
            <p> 
                <bold>Results:</bold>
            </p>
            <p> </p>
            <p> 3. Given the diverse nature of sequences collected by GISAID, it would be helpful to understand if authors could provide more details about 30,229 sequences used in the study: geographic information for the origin of collection, genetic nomenclature (Nextstrain clade, PANGO lineage, variants of concern or interest by WHO).</p>
            <p> </p>
            <p> 4. Co-mutation analysis was particularly intriguing because the co-mutation frequencies were high for most mutations. Can you discuss more about this in the Discussion? Also, I wonder if it will persist when co-mutation analysis were done by genetic nomenclature.</p>
            <p> </p>
            <p> 5. Mutations are as both nucleotide changes and amino acid changes in most figures and tables, but Figure 2 only shows nucleotide changes while Table 4 and 5 show only amino acid changes. Can you put amino acid changes on Figure 2 for easy cross-match with other figures and tables?</p>
            <p> </p>
            <p> 
                <bold>Others:</bold>
            </p>
            <p> </p>
            <p> 6. Specifying 10 nonsynonymous variants in the title may help readers from misinterpreting that the paper conducted an intensive investigation of all possible nonsynonymous variants.</p>
            <p> </p>
            <p> 7. The findings reported by the paper might have been limited to the sequences collected from a time-period almost a year ago (2020-01-01 ~ 2021-03-21). Adding discussion about the impact of the study with the advent of omicron variants would be interesting to readers of wide backgrounds.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Not applicable</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Partly</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Yes</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>Genomics</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment8210-125647">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Ng</surname>
                            <given-names>Chong Han</given-names>
                        </name>
                        <aff>Multimedia University, Malaysia</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>9</day>
                    <month>5</month>
                    <year>2022</year>
                </pub-date>
            </front-stub>
            <body>
                <p>The authors examined 10 selected nonsynonymous mutations in SARS-CoV-2 genome for their predicted effect on protein stability as well as their co-mutations. Overall the paper is written well to understand the experiments and analysis results.</p>
                <p> </p>
                <p> Methods:</p>
                <p> </p>
                <p> 1. Authors mentioned that there were 231 nonsynonymous mutations from the 30,229 SARS-CoV-2 genome sequences used in the analysis. However, only 10 were intensively investigated throughout the paper. Can you explain the criteria why these mutations were selected?</p>
                <p> </p>
                <p> 
                    <bold>RE:</bold> The 10 nonsynonymous mutations are identified based on the mutations with the top 10 highest frequency identified from this study.</p>
                <p> </p>
                <p> 2. The number of nonsynonymous mutations in coding sequences (Figure 1) may need to be adjusted by the length of each coding sequence.</p>
                <p> </p>
                <p> 
                    <bold>RE:</bold> Figure 1 shows the total number of the mutations in each gene, eg D614G, N501Y in S protein. It is not relevant to include extra information.</p>
                <p> </p>
                <p> Results:</p>
                <p> </p>
                <p> 3. Given the diverse nature of sequences collected by GISAID, it would be helpful to understand if authors could provide more details about 30,229 sequences used in the study: geographic information for the origin of collection, genetic nomenclature (Nextstrain clade, PANGO lineage, variants of concern or interest by WHO).</p>
                <p> </p>
                <p> 
                    <bold>RE:</bold> We included the information on primary lineages for the past and present VOCs associated with the top 10 nonsynonymous mutations and their mutation frequency in Table 3. We also included the information on the geographical distribution of the SARS-CoV-2 dataset with the date range summarized in a table as extended data. While we can see diverse genome dataset coming from different regions, we don&#x2019;t know if there is a good correlation between the reported COVID case number and the number of SARS-CoV-2 genome data deposited to GISAID database. There can be some disparity in genomic surveillance in different countries due to these possible reasons, such as the quality of the sequencing data, the accessibility to research funding resource, the socioeconomic status, the government policy. Therefore, it is less relevant for our study since we are not aimed to monitor the SARS-CoV-2 mutation profile in different regions.</p>
                <p> </p>
                <p> 4. Co-mutation analysis was particularly intriguing because the co-mutation frequencies were high for most mutations. Can you discuss more about this in the Discussion? Also, I wonder if it will persist when co-mutation analysis were done by genetic nomenclature.</p>
                <p> </p>
                <p> 
                    <bold>RE:</bold> The GESS database we used does not support co-mutation analysis by the clades or lineages. If we use other tools, we may get different results. Therefore, we didn&#x2019;t do co-mutation analysis by the clades or lineages. However, we expand the discussion part on co-mutation analysis based on the information of mutation frequency percentage by the lineages derived from COVID CG database.</p>
                <p> </p>
                <p> 5. Mutations are as both nucleotide changes and amino acid changes in most figures and tables, but Figure 2 only shows nucleotide changes while Table 4 and 5 show only amino acid changes. Can you put amino acid changes on Figure 2 for easy cross-match with other figures and tables?</p>
                <p> </p>
                <p> 
                    <bold>RE:</bold> Figure 2 has been revised with the additional information on amino acid changes.</p>
                <p> </p>
                <p> Others:</p>
                <p> </p>
                <p> 6. Specifying 10 nonsynonymous variants in the title may help readers from misinterpreting that the paper conducted an intensive investigation of all possible nonsynonymous variants.</p>
                <p> </p>
                <p> 
                    <bold>RE:</bold> To reflect the scope of the study better, the title of paper has been revised to &#x201c;Prediction of the effects of the top 10 nonsynonymous variants from 30229 SARS-CoV-2 strains on their proteins.&#x201d;</p>
                <p> </p>
                <p> 7. The findings reported by the paper might have been limited to the sequences collected from a time-period almost a year ago (2020-01-01 ~ 2021-03-21). Adding discussion about the impact of the study with the advent of omicron variants would be interesting to readers of wide backgrounds.</p>
                <p> </p>
                <p> 
                    <bold>RE:</bold> Different SARS-CoV-2 variants of concern have specific sets of defining mutations; some are common among these VOCs while some are unique. Additional paragraph in the discussion section has been added to discuss the impact of our study and to explain why the study of some of the identified mutations remain relevant for the newer variants.</p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report123168">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.76515.r123168</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Ranjit Bagal</surname>
                        <given-names>Ujwal</given-names>
                    </name>
                    <xref ref-type="aff" rid="r123168a1">1</xref>
                    <role>Referee</role>
                </contrib>
                <contrib contrib-type="author">
                    <name>
                        <surname>Nayak</surname>
                        <given-names>Vishal</given-names>
                    </name>
                    <xref ref-type="aff" rid="r123168a1">1</xref>
                    <role>Co-referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-6211-2767</uri>
                </contrib>
                <aff id="r123168a1">
                    <label>1</label>Centers for Disease Control and Prevention, Atlanta, GA, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>28</day>
                <month>2</month>
                <year>2022</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2022 Ranjit Bagal U and Nayak V</copyright-statement>
                <copyright-year>2022</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport123168" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.72904.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The authors in this paper have tried to use an extensive set of SARS-COV-2 genomes to identify and select the top 10 non-synonymous mutations for analyzing the co-mutation effect, as well as its effect on the stability and flexibility of the protein structure. Overall, the paper is well written in terms of language and grammar, experiments were well conducted. The title suggests all the variants were analyzed. Hence, it can be modified to show that only the top 10 non-synonymous mutations were studied. Abstract is well written giving all details of the papers finding.</p>
            <p> </p>
            <p> SARS-COV-2 is an excellent example where there are public repositories with highly curated whole genome datasets and associated metadata are available for analysis. The authors need to use more data from 2020 to 2021 as the number of curated genomes available in GISAID is large (200,000+). This will make the analysis less obsolete.&#x00a0;</p>
            <p> </p>
            <p> The authors have applied meta-prediction methods for variant analysis. If relevant data in terms of genetic clade, and region where the sample was collected from, can be added to this analysis the results will be more relevant and can be verified using laboratory techniques. Hence, we suggest the authors try to incorporate these points and resolve a few issues mentioned below and resubmit the paper with additional tables and content.</p>
            <p> </p>
            <p> Below are a few comments for the authors we as reviewers suggest: 
                <list list-type="order">
                    <list-item>
                        <p>The introduction seems too small. More details about the virus and work showing the effect of non-synonymous mutations affecting the viral efficacy need to be mentioned.</p>
                    </list-item>
                    <list-item>
                        <p>In the Methods section 
                            <list list-type="order">
                                <list-item>
                                    <p>For the downloaded datasets, what was the threshold used for coverage? A table showing the number of genomes with date (range should do), coverage above threshold, genetic nomenclature (clade name), and geographical information will be helpful to understand the diversity within the dataset.</p>
                                </list-item>
                                <list-item>
                                    <p>You have performed a Co-mutation analysis using the GESS database. Does it provide information about the mutation frequency, which genetic nomenclature it was observed? If you can provide that information, it will be useful. The concurrence table is good, but with knowledge of the above information it will become more relevant.</p>
                                </list-item>
                                <list-item>
                                    <p>What was the criteria used for &#x201c;top 10 nonsynonymous mutations&#x201d;?</p>
                                </list-item>
                                <list-item>
                                    <p>For prediction of mutation effect on protein function, where was the GTF file, as well as the VCF files, obtained from? There is no mention of whole genome SNP analysis. This part is a bit confusing. Clarification is required.</p>
                                </list-item>
                                <list-item>
                                    <p>&#x201c;Mutations with a value less than-2.5 were considered as deleterious&#x201d;. Can you provide a reference showing why -2.5 is used as a threshold? Same with the SIFT 4G score threshold of 0.05.</p>
                                </list-item>
                            </list> </p>
                    </list-item>
                    <list-item>
                        <p>In the Results section: 
                            <list list-type="order">
                                <list-item>
                                    <p>Figure 2: Is it possible to add the amino acid changes (e.g., D614G) instead of just nucleotide mutations for better understanding?</p>
                                </list-item>
                                <list-item>
                                    <p>Is it possible to show the genetic nomenclature associated with the top 10 nonsynonymous mutations?</p>
                                </list-item>
                                <list-item>
                                    <p>Also, if possible, can you add figures showing the domain or the position on a 3D protein structure? This is optional as you have discussed the domain for few proteins in the discussion section.</p>
                                </list-item>
                            </list> </p>
                    </list-item>
                    <list-item>
                        <p>In the Discussion section: 
                            <list list-type="order">
                                <list-item>
                                    <p>You write &#x201c;showed very high concurrence ratio with other mutations since they emerged in the early phase of the pandemic&#x201d;. How did you come to this conclusion? With reference to our comments in the results and methods section, if you can add this information in a tabular format it will be more informative.</p>
                                </list-item>
                                <list-item>
                                    <p>&#x201c;The combination of both mutations may enhance viral fitness based on epidemiological data&#x201d;. There is no mention of the epidemiological data in the results section. If it's in the supplementary files, mention it.</p>
                                </list-item>
                                <list-item>
                                    <p>&#x201c;P108S mutation diminished its activity, possibly leading to a reduction in disease severity.&#x201d; Can you mention in which genetic clade it was observed?</p>
                                </list-item>
                            </list> </p>
                    </list-item>
                </list>
            </p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Partly</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Not applicable</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Partly</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Partly</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>No</p>
            <p>Reviewer Expertise:</p>
            <p>Genomics &amp; Evolutionary Biology</p>
            <p>We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment8209-123168">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Ng</surname>
                            <given-names>Chong Han</given-names>
                        </name>
                        <aff>Multimedia University, Malaysia</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>9</day>
                    <month>5</month>
                    <year>2022</year>
                </pub-date>
            </front-stub>
            <body>
                <p>
                    <italic>"The authors in this paper have tried to use an extensive set of SARS-COV-2 genomes to identify and select the top 10 non-synonymous mutations for analyzing the co-mutation effect, as well as its effect on the stability and flexibility of the protein structure. Overall, the paper is well written in terms of language and grammar, experiments were well conducted. The title suggests all the variants were analyzed. Hence, it can be modified to show that only the top 10 non-synonymous mutations were studied. Abstract is well written giving all details of the papers finding."</italic>
                </p>
                <p> </p>
                <p> 
                    <bold>RE: </bold>To reflect the scope of the study better, the title of paper has been revised to &#x201c;Prediction of the effects of the top 10 nonsynonymous variants from 30229 SARS-CoV-2 strains on their proteins.&#x201d;</p>
                <p> </p>
                <p> 
                    <italic>"SARS-COV-2 is an excellent example where there are public repositories with highly curated whole genome datasets and associated metadata are available for analysis. The authors need to use more data from 2020 to 2021 as the number of curated genomes available in GISAID is large (200,000+). This will make the analysis less obsolete."</italic>
                </p>
                <p> </p>
                <p> 
                    <bold>RE:</bold> We used SARS-CoV-2 virus genome data ranging from 1st January 20 to 22 March 21. The first draft of the manuscript was prepared in late June 21 and submitted in late August 21. Due to some delay in the editorial review, the paper was only published on 6th January 2022. Five nonsynonymous mutations, including ORF1b nsp12 P323L, S protein N501Y, S protein D614G, N protein R203K and N protein G204R identified in this study were also part of the defining mutations in the alpha variant. The mutational profile of SARS-CoV-2 genome is changing very rapidly. However, we are not aiming to monitor the mutational changes of SARS-CoV-2 genome since it is impossible to keep up with the exponential growth of these data. As of 26 April 2022, there are more than 10 million SARS-CoV-2 virus genomes sequences deposited in GISAID database. Although it is useful to get more recent dataset, it is out of the scope of our study to update the data. Both identification of the SARS-CoV-2 mutations and the prediction analysis of the biological consequences of these mutations are very time-consuming processes. Additional paragraph in the discussion section has been added to discuss the impact of our study and to explain why the study of some of the identified mutations remain relevant for the newer variants.</p>
                <p> </p>
                <p> 
                    <italic>"The authors have applied meta-prediction methods for variant analysis. If relevant data in terms of genetic clade, and region where the sample was collected from, can be added to this analysis the results will be more relevant and can be verified using laboratory techniques. Hence, we suggest the authors try to incorporate these points and resolve a few issues mentioned below and resubmit the paper with additional tables and content."</italic>
                </p>
                <p> </p>
                <p> 
                    <bold>RE: </bold>&#x00a0;We included the information on primary lineages for the past and present VOCs associated with the top 10 nonsynonymous mutations and their mutation frequency in Table 3. We also included the information about the geographical distribution of the SARS-CoV-2 dataset with the date range summarized in a table as extended data. We are not performing extra experiments to verify the prediction data since our work is primarily focused on prediction analysis of the mutations. The lab work is out of the scope of this paper, and it is too time-consuming and labour-intensive to perform experimental works.</p>
                <p> </p>
                <p> 
                    <italic>"Below are a few comments for the authors we as reviewers suggest:</italic>
                </p>
                <p>
                    <italic> The introduction seems too small. More details about the virus and work showing the effect of non-synonymous mutations affecting the viral efficacy need to be mentioned."</italic>
                </p>
                <p> </p>
                <p> 
                    <bold>RE:</bold> The introduction section with the examples of the effect of non-synonymous mutations affecting the viral efficacy has been included in the last paragraph.</p>
                <p> </p>
                <p> 
                    <italic>"In the Methods section</italic>
                </p>
                <p>
                    <italic> </italic>
                </p>
                <p>
                    <italic> For the downloaded datasets, what was the threshold used for coverage? A table showing the number of genomes with date (range should do), coverage above threshold, genetic nomenclature (clade name), and geographical information will be helpful to understand the diversity within the dataset."</italic>
                </p>
                <p> </p>
                <p> 
                    <bold>RE:</bold> For the downloaded dataset, high coverage filter has been applied. The high coverage is defined as only entries with &lt;1% Ns and &lt;0.05% unique amino acid mutations (not seen in other sequences in database) and no insertion/deletion unless verified by submitter, according to GISAID. We included the information on primary lineages for the past and present VOCs associated with the top 10 nonsynonymous mutations and their mutation frequency in Table 3. &#x00a0;We have the information about the geographical distribution of the SARS-CoV-2 dataset with the date range summarized in a table as extended data. While we observe a diverse genome dataset coming from different regions, we don&#x2019;t know if there is a good correlation between the reported COVID case number and the number of SARS-CoV-2 genome data deposited to GISAID database. There may be some disparity in genomic surveillance in different countries due to these possible reasons, such as the quality of the sequencing data, the accessibility to the research funding resource, the socioeconomic status, the government policy. Therefore, it is less relevant for our study since we are not aimed to monitor the SARS-CoV-2 mutation profile in different regions.</p>
                <p> </p>
                <p> 
                    <italic>"You have performed a Co-mutation analysis using the GESS database. Does it provide information about the mutation frequency, which genetic nomenclature it was observed? If you can provide that information, it will be useful. The concurrence table is good, but with knowledge of the above information it will become more relevant."</italic>
                </p>
                <p> </p>
                <p> 
                    <bold>RE:</bold> The GESS database doesn&#x2019;t have the information about the mutation frequency of the mutations associated with the lineages or clades. However, we included the information obtain from COVID CG database on primary lineages for the past and present VOCs associated with the top 10 nonsynonymous mutations and their mutation frequency in Table 3. In addition, we expand the discussion part on co-mutation analysis based on the information of mutation frequency percentage by the lineages.</p>
                <p> </p>
                <p> 
                    <italic>"What was the criteria used for &#x201c;top 10 nonsynonymous mutations&#x201d;?"</italic>
                </p>
                <p> </p>
                <p> 
                    <bold>RE:</bold> The top 10 nonsynonymous mutations are identified based on the mutations with the highest frequency identified from this study.</p>
                <p> </p>
                <p> 
                    <italic>"For prediction of mutation effect on protein function, where was the GTF file, as well as the VCF files, obtained from? There is no mention of whole genome SNP analysis. This part is a bit confusing. Clarification is required."</italic>
                </p>
                <p> </p>
                <p> 
                    <bold>RE:</bold> &#x00a0;Additional information on GTF and VCF files are added in the methods. The GTF and VCF files are deposited in Figshare and the related information are included in Data and software availability section. It is a whole genome SNP analysis, but the SIFT 4G results of ORF1b nsp12 P323L and A423V, S protein N501Y and N protein P151L mutations cannot be obtained due to missing data in the Ensembl database.&#x00a0;</p>
                <p> </p>
                <p> 
                    <italic>&#x201c;Mutations with a value less than-2.5 were considered as deleterious&#x201d;. Can you provide a reference showing why -2.5 is used as a threshold? Same with the SIFT 4G score threshold of 0.05.</italic>
                </p>
                <p> </p>
                <p> 
                    <bold>RE:</bold> &#x00a0;Both references for the scoring method of SIFT4G and PROVEAN have been included.</p>
                <p> In the Results section:</p>
                <p> </p>
                <p> 
                    <italic>"Figure 2: Is it possible to add the amino acid changes (e.g., D614G) instead of just nucleotide mutations for better understanding?"</italic>
                </p>
                <p> </p>
                <p> 
                    <bold>RE:</bold> Figure 2 has been revised with the additional information on amino acid changes.</p>
                <p> </p>
                <p> 
                    <italic>"Is it possible to show the genetic nomenclature associated with the top 10 nonsynonymous mutations?"</italic>
                </p>
                <p> </p>
                <p> 
                    <bold>RE:</bold> The information on the mutation frequency percentage in the primary lineages associated with the past and present variant of concern (VOC) have been updated in Table 3.</p>
                <p> </p>
                <p> 
                    <italic>"Also, if possible, can you add figures showing the domain or the position on a 3D protein structure? This is optional as you have discussed the domain for few proteins in the discussion section."</italic>
                </p>
                <p> </p>
                <p> 
                    <bold>RE:</bold> There are multiple SARS-CoV-2 proteins mentioned in the paper. We are not doing any work on protein structure modelling. The readers should refer to Protein Data Bank if they want to know more specific information on the protein domains.</p>
                <p> </p>
                <p> 
                    <italic>"In the Discussion section:</italic>
                </p>
                <p>
                    <italic> You write &#x201c;showed very high concurrence ratio with other mutations since they emerged in the early phase of the pandemic&#x201d;. How did you come to this conclusion? With reference to our comments in the results and methods section, if you can add this information in a tabular format it will be more informative."</italic>
                </p>
                <p> </p>
                <p> 
                    <bold>RE:</bold> Table 3 shows that S protein D614G and ORF1b nsp12 P323L mutations have the top 2 highest frequency. They are found in more than 90% of 30229 SARS-CoV-2 genome sequences, which are from the early batch of SARS-CoV-2 genome data. A similar finding has been reported by S. Ilmj&#x00e4;rv et al., &#x201c;Concurrent mutations in RNA-dependent RNA polymerase and spike protein emerged as the epidemiologically most successful SARS-CoV-2 variant,&#x201d; Sci. Rep., vol. 11, no. 1, pp. 1&#x2013;13, Jul. 2021.</p>
                <p> </p>
                <p> 
                    <italic>&#x201c;The combination of both mutations may enhance viral fitness based on epidemiological data&#x201d;. There is no mention of the epidemiological data in the results section. If it's in the supplementary files, mention it.</italic>
                </p>
                <p> </p>
                <p> 
                    <bold>RE</bold>: We are referring to the study published by S. Ilmj&#x00e4;rv et al., &#x201c;Concurrent mutations in RNA-dependent RNA polymerase and spike protein emerged as the epidemiologically most successful SARS-CoV-2 variant,&#x201d; 
                    <italic>Sci. Rep</italic>., vol. 11, no. 1, pp. 1&#x2013;13, Jul. 2021.</p>
                <p> </p>
                <p> 
                    <italic>&#x201c;P108S mutation diminished its activity, possibly leading to a reduction in disease severity.&#x201d; Can you mention in which genetic clade it was observed?</italic>
                </p>
                <p> </p>
                <p> 
                    <bold>RE:</bold> The genetic clade associated with ORF1a nsp5 P108S is 20B-T (lineage B.1.1.284). However, it is unknown if this mutation is associated with the past and current of variants of concern since the data is unavailable from COVID CG database.</p>
            </body>
        </sub-article>
    </sub-article>
</article>
