ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Brief Report
Revised

Interactive SARS-CoV-2 mutation timemaps

[version 2; peer review: 3 approved]
PUBLISHED 03 Jun 2021
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Pathogens gateway.

This article is included in the Bioinformatics gateway.

This article is included in the Emerging Diseases and Outbreaks gateway.

This article is included in the Coronavirus (COVID-19) collection.

Abstract

As the year 2020 came to a close, several new strains have been reported of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the agent responsible for the coronavirus disease 2019 (COVID-19) pandemic that has afflicted us all this past year. However, it is difficult to comprehend the scale, in sequence space, geographical location and time, at which SARS-CoV-2 mutates and evolves in its human hosts. To get an appreciation for the rapid evolution of the coronavirus, we built interactive scalable vector graphics maps that show daily nucleotide variations in genomes from the six most populated continents compared to that of the initial, ground-zero SARS-CoV-2 isolate sequenced at the beginning of the pandemic.
Availability: The tool used to perform the reported mutation analysis results, ntEdit, is available from GitHub. Genome mutation reports are available for download from BCGSC. Mutation time maps are available from https://bcgsc.github.io/SARS2/.

Keywords

SARS-CoV-2, COVID-19, Mutation time maps, GISAID, Interactive SVG

Revised Amendments from Version 1

We thank our Reviewers for their insightful comments and suggestions. In response to their review we revised the manuscript methods, results and discussion section, expanding on the potential utility of our maps and demonstrating how they may be of use, by showing clear examples of SARS-CoV-2 mutation emergence and early detection in the GISAID genome catalogue, by visual inspection of the timemaps. To that effect, we also present a revised figure (Fig. 1B) with annotations, showing how SVG interactivity facilitates those queries. We now also situate our work in relation to that of others who make use of GISAID data to analyze SARS-CoV-2 variants, and cite some of the most relevant work. Finally, we have improved on our SVG maps, adding integral navigation functionality to pan and zoom on each, for a more interactive experience.

See the authors' detailed response to the review by Jale Moradi
See the authors' detailed response to the review by Ingo Ebersberger and Ruben Iruegas
See the authors' detailed response to the review by Takahiko Koyama

Introduction

In the last few weeks of 2020, new severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) mutations in the United Kingdom (UK) were reported.1 Although coronavirus genome mutations have been previously discovered and announced throughout the year, including the widely discussed D614G missense change in the spike protein,2,3 the latest recurring surface protein mutations to be identified (e.g. N501Y, P681H) are cause for concern. The SARS-CoV-2 viral S gene encodes a surface glycoprotein, which upon interaction with host ACE-2 receptors, makes it possible for the coronavirus to gain entry to host cells and propagate. The reported changes to its sequence may be associated with increased virulence,4 infectivity3 and overall fitness.5 The global response to those recent reports has been swift, with several countries shutting down air travel from the UK. This highlights the severity of the situation and the importance to track genomic variations and their predicted effects over time and space.

The rapid evolution of the SARS-CoV-2 genome in human hosts has prompted us to map all nucleotide changes that have appeared in 2020, since the first genome sequence of a COVID-19 patient isolate from the outbreak epicentre in Wuhan, China was made public.6 For this, we leveraged the collaborative efforts of hundreds of institutions worldwide who, as of January 23rd 2021, have graciously shared over 260,000 SARS-CoV-2 genome sequences with the GISAID central repository since early January 2020.7 Our mutation timemaps show the staggering number of nucleotide variants that have accumulated on the whole viral genome throughout the year, and especially since fall 2020, and in the six most populated continents. Here we present key features of these maps and how they may be of utility to researchers.

Methods

We first downloaded all complete, high-coverage SARS- CoV-2 genomes from GISAID7 on January 23rd 2021 (human hosts samples collected). We then ran a genome polishing pipeline, which consists of ntHits8 (v0.1.0 -b 36 -outbloom -c 1 -p seq -k 25) followed by ntEdit9 (v1.3.4 -i 5 -d 5 -m 1 -r seq_k25.bf) and required at most 0.5 GB RAM and executed in ~1 sec. per genome on a single CPU. We used the first published SARS-CoV-2 genome isolate6 (WH- Human 1 coronavirus, GenBank accession: MN908947.3) as the reference and each individual GISAID genome in turn as source of kmers to identify base variation relative to the former. The variant call format (VCF) output files from ntEdit were parsed and we tallied, for each submitted GISAID genome, the complete list of nucleotide variations. We next organized each nucleotide variant by sample collection date, continent of origin and, when applicable, evaluated its effect on the gene product that harbours the change to output an interactive scalable vector graphics (SVG) file. The script we developed to generate the maps is written in PERL and distributed under GPLv3. Users wishing to generate custom maps can download the script from Zenodo.10 The full breadth of (unfiltered) SARS-CoV-2 nucleotide variations identified by this pipeline are updated on a weekly basis and are available for public download from https://www.bcgsc.ca/downloads/btl/SARS-CoV-2/mutations/.

Results and discussion

We analyzed nucleotide variations over time in over 260,000 SARS-CoV-2 viral genomes, submitted to the GISAID initiative7 from around the globe, relative to that of the ground zero COVID-19 clinical isolate.6 We mapped each mutation that was observed in five or more genomes each day. The 2020 calendar year from January 1st 2020 (day 1) to December 31st 2020 (day 366) is organized in a circle where each radius represents a day (1 day = 0.98 degree) and data points represent mutations along the reference genome sequence from 1 (closest to center) to 29,903 bp (near the outer rim). The size of each point is in log10 scale of the number of contributing viral genomes collected on that day that has the mutation, with colour assignments indicating the continent of origin where the mutation is observed. A mouse over each data point reveals the collection date, the nucleotide variant, the continent and associated number of contributing genome sequences (including daily sample fraction) and, when applicable, the gene product and predicted amino acid change.

From the SARS-CoV-2 genome mutation timemap (Figure 1A), we observe the first persistent mutations (≥5 genomes/day) appearing in late February 2020, including the prevalent D614G mutation in Europe on February 22nd (albeit since late January in fewer samples, Figure 1B). From there, the original coronavirus genome sustained many changes overtime (5,468 distinct variants mapped in 2020 as of January 23rd, 2021), including a sizeable proportion (56.8 %) of missense mutations. It is immediately evident from Figure 1A that variations from Europe account for a larger share (71.2%) of the variants mapped. Further, there appears to be a surge in variations identified in late summer/throughout fall 2020 in this continent. This may be explained by a disproportionate number of submissions with samples originating from this geographic location as the second wave hit hard. Thus, caution in interpreting the map is warranted. Of note, the spike protein gene variant N501Y, observed on our maps in Wales UK in late September 2020 (n = 2, 1.5% of Wales samples) (Figure 1B), is consistent with an earlier study reporting on its recurrent emergence within this time frame.1 From the map, we clearly observe its emergence as it increases in frequency by late December 2020 (n = 13, 15.5% of Wales samples) and spreads to different regions. We also note the emergence of several additional mutations in the spike protein gene, including D1118H, S982A, T716I, P681H andA570D, all visible in late 2020 as they rose to prominence in the GISAID genome catalogue (Figure 1B).

Fuelled by the raging COVID-19 pandemic, GISAID's data is enabling more than a dozen SARS-CoV-2 variant web-based visualizations including those hosted by NextStrain,11 CovMT12 and CoVariants.13 Those portals offer feature-rich and intuitive interfaces to navigate a comprehensive collection of graphs of SARS-CoV-2 variant lineages and compositions in key geographic locations. In some cases, the analysis results presented at these online portals is based on limited genome and nucleotide variation data subsets and the raw mutation call prediction for each sample is not readily available for download. With our project, we make all nucleotide variation calls public for each GISAID genome in the collection that is complete, high coverage, and with a complete associated sample collection date. We also provide tabulated data analysis results that are mutation-centric (https://www.bcgsc.ca/downloads/btl/SARS-CoV-2/mutations/), which is useful to evaluate mutation frequency overtime – data we have used in other SARS-CoV-2 related work to monitor emergence (not shown). With our timemaps, we offer an alternative visual display to what we have been accustomed to seeing this past year and a perspective that is not already covered by the aforementioned tools. We accomplished that by generating a comprehensive bird's eye view of all mutations that have accumulated in each GISAID genome since the beginning of the pandemic, to show the sheer scale of viral genome transformation that has happened – and still occurring – in human hosts. Of course, some of these displays have become dense as institutes worldwide submit new data to GISAID (7-fold more data in the catalogue since initial manuscript submission) and more nucleotide variations are detected overtime, but the maps still serve a purpose in illustrating the staggering accumulation of variations in time, from around the globe, and to identify mutation hotspots. Since our initial release of the maps, we have generated additional timemaps for all SARS-CoV-2 genes and some of the emerging variants of concerns (e.g. lineages B.1.1.7, B.1.351, B.1.617, P.1) that have come to dominate the landscape in certain jurisdictions, due to the advantages conferred by their associated mutation signatures (available from: https://bcgsc.github.io/SARS2). These alternate views are useful in more clearly identifying new nucleotide variations arising in time and in certain jurisdictions, within specific variants. Taken as a whole, our timemaps offer a fairly qualitative, but still all-encompassing and comprehensive, view ofSARS-CoV-2 genome evolution in human hosts, less than two years since the ground zero strain genome was first characterized. We note that, importantly, the maps also offer quantitative and actionable information, which can be accessed by interactive navigation. Interactive visualization features such as mouse hover reveal the variant effect/product, sample frequency and origin for a given mutation. The SVG platform used offers pan, zoom, tilt, highlight, click and drag functionality to inspect variants in detail, including the detection of possible emergence at a specific time and geographical location. Further, the software built to make our SVG timemaps is freely available to scientists interested in generating custom and flexible views of SARS-CoV-2 genes not yet offered by our interface. With our periodically updated SARS-CoV-2 timemaps totalling over 120 individual SVG displays, we offer unique longitudinal views of strain development in real time. Each timemap provides an extensive yearly panorama of SARS-CoV-2 nucleotide variations and the means to follow variant evolution in human hosts, over time and space.

ecd2aebd-33ea-490e-ab07-303d382f6d73_figure1a.gifecd2aebd-33ea-490e-ab07-303d382f6d73_figure1b.gif

Figure 1. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) evolution in human hosts.

ntEdit was used to map nucleotide variations between the first published coronavirus isolate from Wuhan, China in early January and over 260,000 SARS-CoV-2 genomes sampled from around the globe during the 2020 coronavirus disease 2019 (COVID-19) pandemic. The maps show missense mutations arising daily (A) in the world within the whole viral genome, with the reference genome represented by the vertical axis from bases 1 to 29.9 kbp and (B) in Europe within the spike protein gene. Alternating dark/light grey vertical rectangles and associated tracks depict, starting from the center, SARS-CoV-2 genes orf1ab, S, ORF3a, E, M, ORF6, ORF7a, ORF8, N, and ORF10. Mutations identified daily are represented by circles in a given radius and are coloured by regions and sized relative to raw count (panel A) or ratio (panel B) of the daily samples. A stacked bar plot (center) shows sample count. The 2020 calendar year mutations are organized clockwise from the upper vertical. Hovering the mouse cursor over each data point reveals additional insights about the mutation, and each map offers a navigation wheel allowing to pan inall direction and zoom in/out ( panel B). Panel B shows an annotated timemap of Europe, highlighting the detection of the first D614G spike protein gene mutation on January 28th 2020 (Germany, n = 3, upper right). We also highlight the N501Y spike mutation first observed on September 20th 2020 (panel B, inset) in only 1.7% (n = 2) of the Wales, UK daily genome samples, and at the end of the year on December 28th 2020 in 15.5% (n = 13) of the daily collected Wales, UK samples (data updated on May 24th, 2021). White arrows near the genome axis are used to draw attention to the emergence of spike protein gene mutations (from top to bottom) D1118H, S982A, T716I, P681H, A570D and N501Y.

Data availability

Source data

The SARS-CoV-2 genome sequences can be accessed via the GISAID central repository. Processed single nucleotide variant (SNV) data is available from https://www.bcgsc.ca/downloads/btl/SARS-CoV-2/mutations/.

Maps availability

Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

Author contributions

Study design: RLW. Analysis: RLW. Both authors wrote the manuscript.

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 03 Feb 2021
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Warren RL and Birol I. Interactive SARS-CoV-2 mutation timemaps [version 2; peer review: 3 approved]. F1000Research 2021, 10:68 (https://doi.org/10.12688/f1000research.50857.2)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 2
VERSION 2
PUBLISHED 03 Jun 2021
Revised
Views
5
Cite
Reviewer Report 21 Jun 2021
Ingo Ebersberger, Applied Bioinformatics Group, Institute for Cell Biology and Neuroscience, Goethe-University Frankfurt, Frankfurt, Germany 
Ruben Iruegas, Applied Bioinformatics Group, Institute for Cell Biology and Neuroscience, Goethe-University Frankfurt, Frankfurt, Germany 
Approved
VIEWS 5
I have ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Ebersberger I and Iruegas R. Reviewer Report For: Interactive SARS-CoV-2 mutation timemaps [version 2; peer review: 3 approved]. F1000Research 2021, 10:68 (https://doi.org/10.5256/f1000research.57062.r86694)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
4
Cite
Reviewer Report 14 Jun 2021
Jale Moradi, Department of Microbiology, Faculty of Medicine, Kermanshah University of Medical Sciences, Kermanshah, Iran 
Approved
VIEWS 4
The new version of the manuscript is improved in all sections, more details are provided in the result part by some examples that show the utility of the maps. The results indicate that the script using to generate maps is ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Moradi J. Reviewer Report For: Interactive SARS-CoV-2 mutation timemaps [version 2; peer review: 3 approved]. F1000Research 2021, 10:68 (https://doi.org/10.5256/f1000research.57062.r86693)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
6
Cite
Reviewer Report 08 Jun 2021
Takahiko Koyama, TJ Watson Research Center, IBM, Scarsdale, NY, USA 
Approved
VIEWS 6
The tool is now more interactive with zooming to look into more details. Although I personally do ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Koyama T. Reviewer Report For: Interactive SARS-CoV-2 mutation timemaps [version 2; peer review: 3 approved]. F1000Research 2021, 10:68 (https://doi.org/10.5256/f1000research.57062.r86695)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Version 1
VERSION 1
PUBLISHED 03 Feb 2021
Views
26
Cite
Reviewer Report 01 Jun 2021
Jale Moradi, Department of Microbiology, Faculty of Medicine, Kermanshah University of Medical Sciences, Kermanshah, Iran 
Approved with Reservations
VIEWS 26
The authors have geographically shown the nucleotide variations for global SARS-CoV-2 sequences in a time map. The sequences have been downloaded, polished and analyzed with ntHit and ntEdit. The Wuhan-Hu-1-NC_045512/MN908947 was set as the reference sequence, then the variations output ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Moradi J. Reviewer Report For: Interactive SARS-CoV-2 mutation timemaps [version 2; peer review: 3 approved]. F1000Research 2021, 10:68 (https://doi.org/10.5256/f1000research.53946.r85512)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 03 Jun 2021
    René Warren, Canada's Michael Smith Genome Sciences Centre, Canada
    03 Jun 2021
    Author Response
    We thank our Reviewer for their support of our work and insights. We also value the suggestions, as it helps us improve upon the work and broaden the interest.

    ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 03 Jun 2021
    René Warren, Canada's Michael Smith Genome Sciences Centre, Canada
    03 Jun 2021
    Author Response
    We thank our Reviewer for their support of our work and insights. We also value the suggestions, as it helps us improve upon the work and broaden the interest.

    ... Continue reading
Views
22
Cite
Reviewer Report 17 May 2021
Takahiko Koyama, TJ Watson Research Center, IBM, Scarsdale, NY, USA 
Approved with Reservations
VIEWS 22
Authors have developed a web based visualization tool for longitudinal evolution of SARS-CoV-2 genomes.

Although they have made unique representation of longitudinal strain developments, it is not clear the utility of the tool. For instance, while concentric ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Koyama T. Reviewer Report For: Interactive SARS-CoV-2 mutation timemaps [version 2; peer review: 3 approved]. F1000Research 2021, 10:68 (https://doi.org/10.5256/f1000research.53946.r85294)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 03 Jun 2021
    René Warren, Canada's Michael Smith Genome Sciences Centre, Canada
    03 Jun 2021
    Author Response
    Authors have developed a web based visualization tool for longitudinal evolution of SARS-CoV-2 genomes.

    Although they have made unique representation of longitudinal strain developments, it is not clear the
    ... Continue reading
  • Author Response 03 Jun 2021
    René Warren, Canada's Michael Smith Genome Sciences Centre, Canada
    03 Jun 2021
    Author Response
    We wanted to add to our previous response to our Reviewer. Once again, we are grateful for your suggestions to improve upon interactivity of the maps. Since your Review, we ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 03 Jun 2021
    René Warren, Canada's Michael Smith Genome Sciences Centre, Canada
    03 Jun 2021
    Author Response
    Authors have developed a web based visualization tool for longitudinal evolution of SARS-CoV-2 genomes.

    Although they have made unique representation of longitudinal strain developments, it is not clear the
    ... Continue reading
  • Author Response 03 Jun 2021
    René Warren, Canada's Michael Smith Genome Sciences Centre, Canada
    03 Jun 2021
    Author Response
    We wanted to add to our previous response to our Reviewer. Once again, we are grateful for your suggestions to improve upon interactivity of the maps. Since your Review, we ... Continue reading
Views
50
Cite
Reviewer Report 14 May 2021
Ingo Ebersberger, Applied Bioinformatics Group, Institute for Cell Biology and Neuroscience, Goethe-University Frankfurt, Frankfurt, Germany 
Ruben Iruegas, Applied Bioinformatics Group, Institute for Cell Biology and Neuroscience, Goethe-University Frankfurt, Frankfurt, Germany 
Approved with Reservations
VIEWS 50
The authors present interactive mutation time maps for SARS-CoV-2, which provide a highly resolving view of when, where and how frequent a particular mutation was detected in the sampled SARS-CoV-2 genome sequences provided via GISAID. The manuscript itself is rather ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Ebersberger I and Iruegas R. Reviewer Report For: Interactive SARS-CoV-2 mutation timemaps [version 2; peer review: 3 approved]. F1000Research 2021, 10:68 (https://doi.org/10.5256/f1000research.53946.r83795)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 17 May 2021
    René Warren, Canada's Michael Smith Genome Sciences Centre, Canada
    17 May 2021
    Author Response
    The authors present interactive mutation time maps for SARS-CoV-2, which provide a highly resolving view of when, where and how frequent a particular mutation was detected in the sampled SARS-CoV-2 ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 17 May 2021
    René Warren, Canada's Michael Smith Genome Sciences Centre, Canada
    17 May 2021
    Author Response
    The authors present interactive mutation time maps for SARS-CoV-2, which provide a highly resolving view of when, where and how frequent a particular mutation was detected in the sampled SARS-CoV-2 ... Continue reading

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 03 Feb 2021
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.