Large-scale analysis of B-cell epitopes of envelope: Implications for Zika vaccine and immunotherapeutic development

Background: Cases of the re-emergence of Zika virus in 2015 were associated with severe neurologic complications, including Gillien-Barre syndrome in adults and congenital Zika syndrome in newborns. The major structural determinant of immunity to the Zika virus is the E protein. Although B-cell epitopes of Zika E protein were recently identified, data regarding epitope variations among Zika strains in pre-epidemic and epidemic periods are lacking. Methods: Here, we conducted systematic bioinformatics analyses of Zika strains isolated between 1968 and 2017. Multiple sequence alignment of E protein as well as B-cell epitopes annotations were performed. In addition, homology-based approach was utilized to construct three-dimensional structures of monomeric E glycoproteins to annotate epitope variations. Lastly, prediction of of N-glycosylation patterns and prediction of protein stability upon mutations were also investigated. Results: Our analyses indicates that epitopes recognized by human mAbs ZIKV-117, ZIKV-15, and ZIKV-19 were highly conserved, suggesting as attractive targets for the development of vaccines and immunotherapeutics directed against diverse Zika strains. In addition, the epitope recognized by ZIKV-E-2A10G6 mAb derived from immunized mice was mostly conserved across Zika strains. Conclusions: Our data provide new insights regarding antigenic similarities between Zika strains circulating worldwide. These data are essential for understanding the impact of evolution on antigenic cross-reactivity between Zika lineages and strains. Further in-vitro analyses are needed to determine how mutationsat predefined epitopes could impact the development of vaccines that can effectively neutralize Zika viruses.


Introduction
Zika is a positive-sense, enveloped, RNA virus of the Flaviviridae family 1 , which also includes dengue virus, West Nile virus (WNV), Japanese encephalitis virus (JEV), tick-borne encephalitis virus (TBEV), and yellow fever virus (YFV) 2 . Zika was originally discovered in a rhesus monkey in 1947 in Uganda 3 , and the first case of spread to humans was reported in 1952 4 . Since that time, the virus has spread globally, with Zika outbreaks reported in Micronesia in 2007 and in the Pacific islands in 2013-2014 5,6 . A recent outbreak that began in Brazil in 2015 that eventually spread to countries in North America and the Caribbean 7,8 .
The Zika virus genome encodes three structural proteins (capsid [C], premembrane [PrM], and envelope [E]) and seven non-structural proteins (NS1, NS2A, NS2B, NS3, NS4A, NS4B, and NS5) 2 . Similar to other flaviviruses, the structural proteins and viral genome form virions that assembles as immature particles at the endoplasmic reticulum of infected cells 9 . The immature particles are composed of 60 PrM/E protein heterodimers that protrude from the viral surface 10 . In the Golgi apparatus, PrM is cleaved by furin-like protease to produce mature M protein and Pr protein product 2 . After maturation, PrM and E are released and 90 E protein homodimers rearrange in a herringbone-like array forming mature Zika virus 11,12 .
The E protein is the major surface glycoprotein of flaviviruses and plays an essential role in virus attachment and fusion. Each E protein monomer consists of three domains: DI, DII, and DIII 13 , which undergo major rearrangements during the virus maturation cycle 14,15 . DI is a central beta-barrel domain; DII is a fingerlike dimerization domain; and DIII is an immunoglobulin-like domain 10,11 . DI, which connects DII to DIII, is essential for the conformational changes required for viral entry into cells 16 . DII contains a fusion loop (FL) that interacts with the endosomal membrane, whereas DIII contains the receptor-binding site and is thus essential for attachment of virus particles to the host cell 15,16 . DIII also plays an essential role in mediating the fusion of virus particles with the endosomal membrane after endocytosis 17 .
The global spread of Zika virus in conjunction with the neurologic consequences of infection have increased the urgency of efforts to develop Zika vaccines and immunotherapeutics. Humoral immunity is the major source of host protection against flaviviruses, in which neutralizing antibodies play an important role in virus clearance 18 . Antibodies generated against the E protein have been shown to block the entry of viruses into host cells 19 . Previous attempts to map the antigenic epitopes of the Zika E protein utilized antibodies specific for other flaviviruses, such as dengue virus [20][21][22] . Recently, several B-cell antigenic epitopes within an individual E domain were identified in studies of antibodies isolated from Zika-virus infected patients 23-25 and Zika-vaccinated mice 13,26,27 .
Although antigenic epitopes of the E protein have been characterized based on maps prepared using Zika-specific antibodies, no systematic analyses of the specificity of Zika monoclonal antibodies (mAbs) for available Zika E protein sequences have been conducted. Such data are particularly important, as RNA viruses exhibit high mutation rates and can generate mutations that enable them to evade the host immune system. Importantly, the structural stability of E protein is a key factor for antibody binding. The amino acids substitutions and subsequent effects on structural stability and antibody binding can be performed by a homology based in-silico approach of the three-dimensional (3D) structure of E proteins. Hence, analyzing sequence data to annotate mutations at key residues and subsequent prediction of the effect of those mutations on protein structure stability is essential. Therefore, identifying novel amino acids mutations that are likely to contribute in the immune evasion is important.
In the present study, therefore, we extracted all of the available E protein sequences for Zika isolates obtained from 1968 to 2017 and constructed three-dimensional (3D) structures of E proteins from various Zika strains using homology modeling. We also investigated the patterns and conservation of E protein B-cell epitopes and assessed their structural stability upon mutation.

Data selection
Complete Zika polypeptide sequences for isolates identified between 1968 and 2017 were obtained from the National Center for Biotechnology Information (NCBI) Zika resource 28 . A total of 409 complete polypeptide sequences were retrieved, and duplicate sequences were removed. Multiple sequence alignment was performed using MUSCLE in the Geneious tool, version 11.0, and the E protein region was extracted 29 . Lastly, MUSCLE alignment was performed for E protein sequences and duplicate E sequences were subsequently removed.

Phylogenetic analysis
A phylogenetic tree for all unique Zika virus E protein sequences was constructed using the maximum-likelihood method with the PhyML tool, version 3.0 30 , with 100 bootstrap replications. The tree applied an LG substitutional model to determine the divergence of E protein sequences. Lastly, the phylogenetic tree was edited using the Figtree tool, version 1.4.3.

Amendments from Version 1
The questions raised by the reviewer have been adequately addressed. Methods supplemented with additional information were added.

Introduction:
The release of PrM and E proteins of Zika was further elaborated.
Methods: Additional details of protein stability prediction parameters were added. Likewise, further details of N-Glycosylation prediction were also provided.
Results: Elaboration on factors affecting protein stability was added.

Homology modeling
The SWISS model 31 server was used to generate 3D structures of the E proteins of Zika isolates identified in 1968, 2007, 2013, 2015, and 2016. Chain A of the E protein structure (PDB: 5GZN) was used as a template for homology modeling. The best homology model was selected based on global model quality estimate (GMQE) and Qmean statistics. Each homologous 3D structure was evaluated using Ramachandran plots prepared with PROCHECK 32 . Hydrogen bonds were added using molprobity 33 . Each model was subjected to energy minimization using the ModRefiner server described by Xu and Zhang 34 .
Mapping of antigenic epitopes E protein-specific antigenic epitopes of monoclonal antibodies (mAbs) isolated from Zika-virus infected humans and Zika immunized mice were retrieved from the Immune Epitope Database (IEDB) 35 , which is a free resource funded by the National Institute of Allergy and Infectious Diseases devoted to disseminating antigenic epitope data. Linear and conformational B-cell epitopes with positive major histocompatibility complex ligands were selected. B-cell epitope regions mapped with B-cell receptor (BCR)-positive neutralizing antibodies were also selected. Epitopes that mapped with screening peptides and did not elicit an immune response were removed. A total of 7 human and 10 mouse (from mice immunized with E protein) B-cell epitopes were identified. The identified epitopes were annotated against aligned E sequences as well as the 3D structures of monomeric E proteins using the Chimera tool 36 . Potential sites of N-glycosylation were predicted using NetNglycan 1.0 server 37 . Potential N-glycosylation sites were defined by the sequence Asp/X/Ser/Thr, where X represents any amino acid except Pro. The default threshold of >0.5 was used as predictor of N-glycosylated residue.

Analysis of mutations on E protein stability
The effect of mutations on the stability of E protein was predicted using the mutation cutoff scanning matrix (mCSM) 38 , site-directed mutator (SDM) 39 , DUET 40 , and I-Mutant 2.0 41 tools. The mCSM is a machine-learning algorithm based on a 3D physiochemical environment, and the data are summarized as a graphical signature. The SDM is a statistical potential energy function based on the propensity of amino acids in wild-type and mutant proteins to assume folded and unfolded conformations. DUET is an integrated computational approach that utilizes both SDM and mCSM to predict the effect of nonsynonymous single-nucleotide polymorphisms on protein stability. Lastly, the I-Mutant webserver is a neural network-based tool for predicting mutation-associated free energy changes. The I-Mutant2.0 tool enables prediction of free energy changes under differing conditions of pH, temperature, neighboring residues, and solvent accessibility. For predicting protein stability, pH 7.0 and temperature at 25 C were applied.

Strain frequencies
Antigenic variations among Zika strains were examined by first obtaining the complete sequences of Zika polypeptides from the NCBI Zika resource 19 . A total of 409 Zika polypeptide sequences were retrieved. Identical polypeptide sequences were removed, resulting in a final total of 257 sequences. Sequences were aligned by MUSCLE using Geneious software, E protein sequences were extracted, and duplicate E protein sequences were removed, resulting in a total of 75 unique sequences (Table 1). Of note, the majority of the 75 unique E protein sequences were represented strains isolated in 2015 and 2016. Sequences from isolates collected in 2017 did not harbor any unique mutations in comparison to sequences from isolates collected in previous years; thus, 2017 sequences were removed after duplicate E protein sequences were removed.

Phylogeny
A phylogenetic tree of unique E protein sequences was constructed using the maximum-likelihood function in PhyML software, with 100 bootstrap replications ( Figure 1). Zika strain accession numbers shown in the phylogenetic tree denote the year of isolation. Notably, E protein sequences from isolates collected in 1968,2007,2010, and 2012 clustered in one group, indicating that the associated strains are closely related ( Figure 1). However, sequences from strains isolated in 2013 and 2014 exhibited divergence from the sequences of strains isolated in previous years ( Figure 1).

Antigenic epitopes
A number of recent reports describe the isolation of mAbs specific for Zika E protein 23-28 . These antibodies bind preferentially to epitopes located in DII and DIII of the E glycoprotein (Table 2). A total of 7 neutralizing mAbs have been isolated from Zika-virus infected humans, and all of these mAbs bind to discontinuous epitopes of E protein. Of note, ZIKV-117 and ZIKV-19 mAbs recognize epitopes located in DII (Table 2), whereas ZIKV-12 and ZIKV-15 mAbs recognize epitopes located in the FL region, and mAbs ZIKV-Z006, and ZIKV-116 as well as the ZKA 190 mAb recognize epitopes located in DIII. Significant overlap between antigenic epitopes specific for ZIKV-Z006 and ZIKV-116 was observed, as three residues in the epitope recognized by the ZIKV-116 mAb are shared with the epitope recognized by ZIKV-Z006 (Table 2). In addition, the epitope region recognized by the ZKA 190 mAb overlapped with that of the mAb specific for ZIKV-Z006 in two amino acids residues.
A total of 10 B-cell epitopes of Zika E protein were found to elicit humoral antibody responses in vaccinated mice ( Table 2). Five of these epitopes were shown to be linear and elicited the production of neutralizing antibodies (Table 2). An additional five discontinuous epitopes have been characterized based on antibodies obtained from vaccinated mice ( Table 2). The majority of those epitopes are bound to DIII domain of E.
To identify amino acid substitution mutations occurring in B-cell epitopes, we aligned the sequences of 75 unique E protein amino acids sequences among the 422 pre-epidemic and epidemic Zika strains identified. The sequence of Zika/Nigeria/9/9/1968 was used as a reference, and the E protein sequences were mapped against all of the mAbs from Zika infected humans or  Figure 2A). These amino acids substitutions were retained in all subsequent strains from 2007 to 2016. Interestingly, no unique mutations were observed in predefined B-cell epitopes for isolates collected between 2008 and 2014. However, in 2015, two additional substitutions of alanine residues for threonine residues appeared at amino acid positions 309 and 333 (Figure 2A). These mutations were not retained in subsequently isolated Zika strains, with the sequences quickly reverting to those of previously isolated strains. The greatest number of amino acid substitution mutations occurred in 2016; the majority of the mutations identified in 2016 involved several deletions in specific regions of the predefined B-cell epitopes ( Figure 2A). Of note, the Dominican Republic/6/6/2016 strain exhibited significant deletions in B-cell epitopes recognized by ZIKV-Z006 mAb ( Figure 2A). Surprisingly, 3 B-cell epitopes were completely conserved in the pre-epidemic and epidemic strains and indeed have not changed for nearly 50 years ( Figure 2A). These epitopes are recognized by the mAbs ZIKV-117, ZIKV-15, and ZIKV-19. Thus, as these mAbs could exhibit cross-reactivity against a wide range of Zika strains, they have potential for use in the development of vaccines and immunotherapeutics targeting Zika (Figure 2A).
Conversely, none of 10 mouse B-cell epitopes were completely conserved among all Zika strains ( Figure 2B, Figure 2C). Of note, the number of Zika strains exhibiting variations in the amino acid sequence at predefined antigenic epitopes of the E protein was higher in mAbs characterized in vaccinated mice than in mAbs characterized in infected humans ( Figure 2B, Figure 2C). In addition, both the linear and discontinuous epitopes of vaccinated mice exhibited variations beginning in 2014 and continuing in subsequent years ( Figure 2B, Figure 2C). The majority of mAbs in vaccinated mice recognized the DIII of E protein, and these epitopes exhibited higher rates of sequence variation than did the mAbs recognizing DII ( Figure 2B,      Figure 2C). Of note, the ZIKV-E-2A10G6 mAb binds to a highly conserved discontinuous epitope in which a single amino acid deletion was observed in Zika strain ATG29285|Homo sapiens|Mexico|17/05/2016. Remarkably, discontinuous epitopes bound the ZV-67 mAb exhibited the highest degree of variation in amino acid sequence among all the Zika B-cell epitopes examined ( Figure 2B). Sequence variations were also observed in all of the Zika E protein linear epitopes ( Figure 2C).

E protein homology modeling
The recently solved E protein structure of Zika enabled us to construct a homology model of various E protein sequences. Amino acid substitutions in B-cell epitopes were also annotated on the 3D structure of the E protein. The majority of mutations in the E protein were found to be located within DIII, whereas the B-cell epitopes in DII were highly conserved ( Figure 3). Zika strain Homo sapiens/French Polynesia/11/13 did not harbor any additional mutations in B-cell epitopes compared with Zika strain Homo sapiens/Micronesia/01/06/07 (Supplementary Figure 1).
It is known that N-glycosylation can mask antigenic epitopes. However, as the glycosylation site in the Zika E protein is located remotely from the predefined B-cell epitopes, glycosylation does not mask the B-cell antigenic epitopes (Figure 3). This is consistent with reports of glycosylation in E protein of WNV and JEV 18 . A recent report demonstrated that glycosylation at 154 is critical for Zika infection of both mammalian and mosquito hosts 17 . Our analyses indicate that antigenic changes occur less frequently in Zika strains ( Figure 3) and suggest that highly effective neutralizing Zika vaccines and immunotherapies for treating infections with known Zika strains are possible. Consequently, monitoring antigenic changes in E proteins over time would be useful for evaluating the cross-neutralizing potential of Zika vaccines against newly mutated strains.

Mutational stability
Mutation stability was carried out to predict the effects of nonsynonymous variants on the stability of E protein and antibody binding. Here, we analyzed the stability of monomeric E protein upon substitutional mutations. Amino acids substitutions can affect the strength, number of interaction, and protein folding and thus increase or decrease binding affinity. To investigate the effect of amino acids substitutions at antigenic epitopes on the stability of E protein and antibody binding of E, 4 prediction tools for mutational stability were selected.
We predicted the stability of B-cell epitope mutations using the 3D structure of Zika E proteins. In both the T309A and T335A substitutions, a polar threonine residue was substituted with a hydrophobic alanine residue. No change in hydrophobicity with the V391I mutation or change in charge with the D393E were observed. In the R335T mutation, a basic residue was substituted with an aromatic residue. Overall, these suggest that defined substitutions in the E glycoprotein are potentially destabilizing. However, these mutations had moderate destabilizing effect, as the ∆∆G values ranged between -0.3 and -0.7 kcal/mol (Table 3).

Discussion
Attempts to control the spread of Zika virus via mosquito control have met with limited success. Indeed, within the past 3 years, a Zika pandemic occurred. There is a significant gap in knowledge regarding immunogenic cross-reactivity between Zika strains, even six decades after the first human infection was reported. Bioinformatics approaches can play vital roles in identifying rapidly evolving amino acid residues and thereby facilitate precise mapping of key residues that drive antigenic escape in response to the generation of host neutralizing antibodies. In the present study, we evaluated conserved versus rapidly evolving antigenic regions in predefined B-cell epitopes of the Zika E protein in pre-epidemic and epidemic periods.
The finger-like DII of the Zika E protein contains a FL that is inserted into the endosomal membrane as a result of pHdependent conformational changes 14 . The FL is located within the beta-sheet structure in the terminal region of DII and contains a highly conserved hydrophobic peptide that triggers the structural changes required for fusion processes under conditions of low pH 15 . Our analysis demonstrated that B-cell epitopes in DII of Zika E protein are highly conserved. mAbs  are bound to the highly conserved region of DII and are therefore attractive candidates in the design of Zika vaccines and immunotherapeutics.
The immunoglobulin-like DIII of the Zika E protein contains receptor-binding sites and plays an essential role in attachment and fusion of the virus to host cells 11,12 . Importantly, DIII reportedly induces the production of type-specific neutralizing antibodies 26 , as mAbs isolated from patients infected with either Zika or dengue are highly specific. In the present study, we found that epitopes within E protein DIII vary greatly within Zika strains.
While dengue is considered as a single serotype, it is characterized by four distinct serotypes. In the present study, we compared mAbs of Zika E protein elicited in cases of Zika-virus infected humans versus mAbs induced by Zika vaccination in mice and identified several conserved epitope footprints. The conserved E protein epitopes could be useful in research aimed at developing vaccines that elicit the production of antibodies that provide protection against Zika strains but do not cross-react with dengue. For example, immunization with a peptide cocktail of antigenic DIII epitopes might provide broad protection against a variety of Zika strains yet demonstrate no cross-reactivity with dengue, thus eliminating the possibility of ADE associated with the anti-Zika antibodies.

Data availability
Zika protein sequences data can be found at the NCBI Zika resource.
Zika B-cell antigenic epitopes can be found at Immune Epitope Database (IEDB). In the present study, the authors extracted all of the available E protein sequences for Zika isolates obtained from 1968 to 2017 and constructed three-dimensional (3D) structures of E proteins from various Zika strains using homology modelling. Further the authors investigated the patterns and conservation of E protein B-cell epitopes and assessed their structural stability upon mutation. The authors did meticulous data analysis and discussed very thoroughly.

Is the work clearly and accurately presented and does it cite the current literature? Yes
Is the study design appropriate and is the work technically sound? Yes

Are sufficient details of methods and analysis provided to allow replication by others? Yes
If applicable, is the statistical analysis and its interpretation appropriate? I cannot comment. A qualified statistician is required. 1.

8.
valuable, which would provide guidance for later research in Zika vaccine development. Thus, I would recommend the indexing of this work in F1000Research.
The followings are some suggestions and comments that might help to make the report more suitable for indexing: In the 'Introduction' part, for the 2 paragraph, "After maturation, pr is released from the host cell" is that "PrM" being released instead? In ' Table 2', the first row, should "mAB" be "mAb"? In the 'Abstract', for the 'Results' part, "Our analyses indicate that epitopes recognized by human mAbsZIKV-117, ZIKV-15, and ZIKV-119 were highly c Should be "ZIKV-19", please correct it. Also, since there is mutation as shown in ' Fig 2B' other than the strict conservation pattern in 'Fig 2A', please rewrite the claim that "ZIKV-E-2A10G6 mAb derived from immunized mice was highly conserved across Zika strains". In the 'Results', for E protein homology modelling, the 2 paragraph, "Glycosylation does not mask the B-cell epitope epitopes" should be "antigenic epitopes?" In the 'Method' part, for 'Mapping of antigenic epitopes', "Potential N-glycosylation sites were defined by the sequence Asp/X/Ser/Thr, where X represents any amino acid except Pro. A threshold of >0.5 suggested an N-glycosylated residue." Please further explain what values >0.5 would suggest an N-glycosylated residue. The authors described in the 'Methods' that "The I-Mutant2.0 tool enables prediction of free energy changes under differing conditions of pH, temperature, neighboring residues, and solvent accessibility." For the data shown in ' Table 3', what conditions did authors apply to conduct the prediction?
In the 'Results', the last paragraph, "Overall, these suggest that defined substitutions in the E glycoprotein are potentially destabilizing." Could the authors further discuss what the consequence or effects of destabilization has on virus infection since authors said that the structural stability of E protein is a key factor for antibody binding. In the 'Discussion' part, the authors claimed that "The conserved E protein epitopes could be useful in research aimed at developing vaccines that elicit the production of antibodies that provide protection against Zika strains but do not cross-react with dengue." Yet from the manuscript, it seems that no conserved epitopes in DIII has been identified. If so, authors might need to rewrite the last part of the discussion.

Are sufficient details of methods and analysis provided to allow replication by others? Partly
If applicable, is the statistical analysis and its interpretation appropriate? I cannot comment. A qualified statistician is required.
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results?
Partly nd nd