Prediction of the effectiveness of COVID-19 vaccine candidates [version 1; peer review: 1 approved, 1 approved with reservations]

A safe and effective vaccine is urgently needed to bring the current SARS-CoV-2 pandemic under control. The spike protein (SP) of SARSCoV-2 represents the principal target for most vaccines currently under development. This protein is highly conserved indicating that vaccine based on this antigen will be efficient against all currently circulating SARS-CoV-2 strains. The present analysis of SP suggests that mutation D614G could significantly decrease the effectiveness of the COVID-19 vaccine through modulation of the interaction between SARS-CoV-2 and its principal receptor ACE2.


Introduction
SARS-CoV-2 is the novel highly infectious human coronavirus which by May 2020 has infected 3 million and killed more than 200,000 people. Until immunity is induced in large populations throughout the world this coronavirus is likely to become endemic seasonally causing the coronavirus disease 2019 (COVID-19) in millions worldwide. Scientists and drug companies around the world are working hard to develop a vaccine against the disease with at least five candidate vaccines in clinical evaluation and another 71 in preclinical evaluation. The spike protein (SP) of SARS-CoV-2 is the principal target of most these vaccine candidates. Analysis of 5,700 isolates collected between December 2019 and April 2020 revealed only one mutation which was found in more than 1% of currently circulating viruses 1 . This finding suggests that a single vaccine based on the consensus sequence of highly conserved SP antigen should be efficacious against current global strains.
Human coronaviruses SARS-CoV and SARS-CoV-2 recognize the angiotensin converting enzyme 2 (ACE2) 2 as the natural receptor but present a distinct binding interface to ACE2 and a different network of residue-residue contacts. SARS-CoV and SARS-CoV-2 have comparable binding affinity but the SARS-CoV-2-ACE2 complex contains a higher number of contacts, a larger interface area, and decreased interface residue fluctuations relative to SARS-CoV 3 . These suggest that the receptor binding site (RBD)-ACE2 interface of SARS-CoV-2 resembles some properties of antibody-antigen interactions, which allow the accelerated evolution of spike protein (SP) binding to the ACE2 receptor, similar to the rapid evolution along the antibodyantigen affinity maturation process 3 . This opens the question about the effectiveness of the vaccine against SARS-CoV with mutations, which modulate its interaction with the receptor.
Previously, a novel bioinformatics approach for assessment of the effectiveness of the seasonal influenza vaccine was proposed 4 . This approach, which is based on electronic biology, was successfully used for the prediction of the influenza vaccine effectiveness for two successive flu seasons 5,6 . Here, this approach was used for the assessment of the effectiveness of the COVID-19 vaccine. This analysis showed that mutation D614G, which spreads globally, and which is present in >50% of all circulating viruses, could significantly decrease the effectiveness of the COVID-19 vaccine.

Viruses
We analyzed the subunit 1 of S protein (SP1) from SARS-CoV-2 with following mutations 7,8 : The SARS-CoV-2 S protein reference sequence (GenPept accession YP_009724390) is used in the analysis as the wild type (WT).

The informational spectrum method
The informational spectrum method (ISM), a virtual spectroscopy method, is developed for a fast and simple structure analysis of proteins and their functionally important domains. Physical and mathematical basis of ISM is described in detail elsewhere 9 and here the method will be presented only briefly.
A sequence of N amino acid residues is represented as a linear array of N terms, with each term given a weight. The weight assigned to a residue is the electron-ion interaction potential (EIIP) 10,11 , determining the electronic properties of amino acids, which are responsible for their intermolecular interactions 12 . In this way, the alphabetic code is transformed into a sequence of numbers. The signal obtained is then decomposed in periodical function by Fourier transformation. Thus, the initial information defined by the sequence of amino acids can now be presented in the form of an informational spectrum (IS), representing the series of frequencies and corresponding amplitudes. The IS frequencies correspond to the distribution of structural motifs with defined physicochemical characteristics determining long-range interaction properties of the protein.

Phylogenetic analysis
Phylogenetic analysis of SARS-CoV-2 SP1 was performed with the ISM-based phylogenetic algorithm ISTREE, which was previously described in detail 13 . This phylogenetic approach that allows the assessment of the biological effect of mutations was previously applied in the analysis of influenza viruses 13-16 , Ebola virus 17 and Zika virus 18,19 . Figure 1 gives the schematic presentation of this algorithm. Here, we used an ISM distance measure d defined on the specific frequency F = 0.257 which characterizes the interaction between SP1 and the ACE2 receptor 20 .
For the development of the conventional phylogenetic tree, based on multiple sequence alignment (MSA), we used the MEGA5 21 software package. For the MSA calculation of sequences, the MUSCLE algorithm of MEGA5 software was applied.

Results and discussion
In Figure 2a, we show the ISM-based phylogenetic tree for mutant SP1 from SARS-CoV-2. These SP1 are grouped in two separate clusters, A and B, that, according to the informational spectrum (IS) concept, have different interacting and immunological profiles 13-17 . There is no difference between the analyzed SP1 in the homology-based phylogenic tree (Figure 2b). Presented results suggest that most of the analyzed mutants (cluster A) will interact with ACE2 in a similar way as the WT virus. Only five mutations V367F, R408I, G476S, D111N, and D614G (cluster B) could significantly affect the interaction of SP1 with ACE2. Three of these mutations (V367F, R408I, G476S) are located in the RBD of SP.
The previous ISM-based phylogenetic analysis of hemagglutinins from seasonal influenza viruses showed that viruses which are in the same cluster with the vaccine virus are responsive to the seasonal flu vaccine and those viruses which are grouped in the separate clusters are resistant to the vaccine. This finding served as a base for the accurate prediction of the efficacy of the flu vaccine several months before the start of the flu season 5,6 . The same approach was also used for the design of antigens for the vaccine against ZIKA virus 18,19 . By analogy, it could be expected that SARS-CoV-2 with mutations V367F, R407I, G476S, D111N and D614G in SP are resistant to the COVID-19 vaccines that are based on the WT SARS-CoV-2.
The D614G mutation frequency is much higher than the others. This mutation was found in 55% of sequences sampled globally as of April 10, 2020 1 . The next most frequent mutation is found in only 0.87% of these sequences. A virus carrying the D614G mutation, which had not then been observed among sequences from China, was transmitted in January 2020 to Germany and became dominant in Europe and then globally within two months (found in 55% of sequences) 1 .
The D614G mutation is located in the middle of one of three epitopes of SP1 22 . This change of large acidic residue D into small hydrophobic residue G in the middle of the epitope would compromise the binding affinity to antibodies elicited by vaccines with WT SP.

Conclusions
The present results suggest that the highly prevalent D614G mutation in SP, although located outside of the receptor binding site, could decrease the efficacy of the vaccine by modulation of the interaction of SARS-CoV with the ACE2 receptor. In addition, this mutation may cause antigenic drift, resulting in vaccine mismatches which additionally could affect the efficacy of the vaccine. These possible obstacles should be considered for the future development of COVID-19 vaccines. The further ISM-based monitoring of the evolution of SARS-CoV-2 is important for identification of other mutations, which could affect the effectiveness of vaccines against this virus.

Data availability
All data underlying the results are available as part of the article and no additional source data are required.

Response
Presented ISM results indicate that the mutation D614G modulate the recognition and targeting between antibodies and SP antigen and not their "direct interaction" (chemical binding). If the mutation prevents the efficient recognition between antigen and an antibody, the neutralization will not be efficient even if the antibody would efficiently bind to the antigen. In other words, the antibody will not efficiently "find" its target in vivo.