Keywords
Coronavirus, adaptive immunity, immunogenicity, T cell cross-reactivity, vaccine development
This article is included in the Emerging Diseases and Outbreaks gateway.
This article is included in the Coronavirus (COVID-19) collection.
Coronavirus, adaptive immunity, immunogenicity, T cell cross-reactivity, vaccine development
The emergence and rapid spread of the recent novel coronavirus known as 2019-nCoV has posed a serious global health threat1 and has already caused a huge financial burden2. It has further challenged the scientific and industrial community for quick control practices, and equally importantly to develop effective vaccines to prevent its recurrence. In facing a rapid epidemical outbreak to a novel and unknown pathogen, a key bottleneck for a proper and deep investigation, which is fundamental for vaccine development, is the limited -- to almost no -- access of the scientific community to samples from infected subjects. As such, in silico predictions of targets for vaccines are of high importance and can serve as a guidance to medical and experimental experts for the best and timely use of the limited resources.
In this regard, we report our recent effort to computationally identify immunogenic and/or cross-reactive peptides from 2019-nCoV. We provide a detailed screen of candidate peptides based on comparison with immunogenic peptides deposited in the Immune Epitope Database and Analysis Resource (IEDB) database including those derived from Severe acute respiratory syndrome-related coronavirus (SARS CoV) along with de novo prediction from 2019-nCoV 9-mer peptides. Here, we found i) 28 SARS-derived peptides having exact matches in 2019-nCoV proteome previously characterized to be immunogenic by in vitro T cell assays, ii) 22 nCoV peptides having a high sequence similarity with immunogenic peptides but with a greater predicted immunogenicity score, and iii) 44 + 19 nCoV peptides predicted to be immunogenic by the iPred algorithm and 1G4 TCR positional weight matrices respectively.
We collected all peptides in IEDB (3, as of 13-02-2020) reported positive in T cell assays and have human as the host organism. We then conducted a local sequence alignment of 10 2019-nCoV open reading frames (ORFs) against 35,225 IEDB peptides, and found 28 exact matches. Surprisingly, all identical hits (towards target peptide length > 3) were from SARS-CoV (Table 1, Data Table 14). These peptides have been shown to bind various HLA alleles, although with higher tendency towards HLA-A:02:01, from both class I and class II, and can be target for CD8+ and CD4+ T cells respectively.
| IEDB.peptide | 2019-nCoV.pattern | Antigen.Name | Allele.Name | 
|---|---|---|---|
| TLACFVLAAV | TLACFVLAAV | Membrane glycoprotein | HLA-A*02:01 | 
| AFFGMSRIGMEVTPSGTW | AFFGMSRIGMEVTPSGTW | N protein | |
| ALNTPKDHI | ALNTPKDHI | Nucleoprotein | HLA-A*02:01 | 
| AQFAPSASAFFGMSR | AQFAPSASAFFGMSR | nucleocapsid protein | HLA class II | 
| AQFAPSASAFFGMSRIGM | AQFAPSASAFFGMSRIGM | N protein | |
| GMSRIGMEV | GMSRIGMEV | Nucleoprotein | HLA-A*02:01 | 
| ILLNKHIDA | ILLNKHIDA | Nucleoprotein | HLA-A*02:01 | 
| IRQGTDYKHWPQIAQFA | IRQGTDYKHWPQIAQFA | N protein | |
| KHWPQIAQFAPSASAFF | KHWPQIAQFAPSASAFF | N protein | |
| LALLLLDRL | LALLLLDRL | Nucleoprotein | HLA-A*02:01 | 
| LLLDRLNQL | LLLDRLNQL | Nucleoprotein | HLA-A*02:01 | 
| LLNKHIDAYKTFPPTEPK | LLNKHIDAYKTFPPTEPK | N protein | |
| LQLPQGTTL | LQLPQGTTL | Nucleoprotein | HLA-A*02:01 | 
| RRPQGLPNNTASWFT | RRPQGLPNNTASWFT | nucleocapsid protein | HLA class I | 
| YKTFPPTEPKKDKKKK | YKTFPPTEPKKDKKKK | N protein | |
| ILLNKHID | ILLNKHID | Nucleoprotein | HLA-A*02:01 | 
| MEVTPSGTWL | MEVTPSGTWL | nucleocapsid protein | HLA-B*40:01 | 
| ALNTLVKQL | ALNTLVKQL | S protein | HLA-A*02:01 | 
| FIAGLIAIV | FIAGLIAIV | Spike glycoprotein precursor | HLA-A2 | 
| LITGRLQSL | LITGRLQSL | Spike glycoprotein precursor | HLA-A2 | 
| NLNESLIDL | NLNESLIDL | S protein | HLA-A*02:01 | 
| QALNTLVKQLSSNFGAI | QALNTLVKQLSSNFGAI | S protein | HLA-DRB1*04:01 | 
| RLNEVAKNL | RLNEVAKNL | Spike glycoprotein precursor | HLA-A*02:01 | 
| VLNDILSRL | VLNDILSRL | S protein | HLA-A*02:01 | 
| VVFLHVTYV | VVFLHVTYV | Spike glycoprotein precursor | HLA-A*02:01 | 
| GAALQIPFAMQMAYRF | GAALQIPFAMQMAYRF | S protein | HLA-DRA*01:01/DRB1*07:01 | 
| MAYRFNGIGVTQNVLY | MAYRFNGIGVTQNVLY | S protein | HLA-DRB1*04:01 | 
| QLIRAAEIRASANLAATK | QLIRAAEIRASANLAATK | S protein | HLA-DRB1*04:01 | 
In addition to 28 identical hits against SARS CoV, we observed a long tail in distribution of normalized alignment scores between 10 2019-nCoV ORFs and 35,225 IEDB peptides (Figure 1A). We therefore set out to further investigate potential vaccine targets among highly similar sequences.

A. Comparison of normalized sequence alignment score for peptides with exact and non-exact matches. B. Number of target peptides grouped by their source organism.
Taking the normalized alignment score of exact matches as a reference, we extracted 2019-nCoV peptides having score greater or equal to 4. As illustrated in Figure 1A, we observed 45 and 11 peptides having normalized alignment score ≥ 4 and ≥ 5 respectively (Figure 1A inset). The target peptides were originated from 10 different sources (Figure 1B) where a total 36 peptides were derived from strains associated to SARS CoV. Of interest, we also observed 7 hits having high sequence similarity to targets from Homo sapiens.
In order to investigate the extent to which the difference between the source (2019-nCoV) and target (IEDB) peptides influences the immunogenicity of the source peptides we used a recently published immunogenicity model5 to predict and compare the immunogenicity between the source and target peptides (Data Table 24).
We could see a similar (close to identical) immunogenicity scores for a number of IEDB and 2019-nCov peptides especially for those with high immunogenicity scores (Figure 2). While all 48 can be potential targets, of particular interest were those having higher immunogenicity score than IEDB peptides. Here, we list 22 out of 48 2019-nCoV peptides that scored higher compared to their targets that have been characterized to be immunogenic (Table 2). In this list 15 (68%) 2019-nCov peptides have a score higher than 0.5 whereas only 11(50%) of IEDB get a score immunogenicity score greater than 0.5.

2019-nCoV peptides having a high sequence similarity to immunogenic peptides and their targets were analysed for their immunogenicity potential by iPred algorithm.
It is worth noting that in general predicting immunogenicity of given a peptide is challenging and not a fully solved problem, and therefore current models for predicting immunogenicity are suboptimal. iPred is also not an exception. In fact, we could see that a substantial number of IEDB immunogenic peptides were scored < 0.5 (the threshold score used to classify immunogenic vs non-immunogenic). This led us to ask whether we can gather any other evidence of either immunogenicity or cross-reactivity.
As a complementary reciprocal approach, we conducted a de novo search of immunogenic peptides against the 2019-nCov proteome sequence. We scanned 9-mers from 2019-nCoV proteome with a window of 9 amino acids and step length of 1 amino acid (9613 in total). The immunogenicity of 9-mer peptides were predicted using iPred and MHC presentation scores were gauged using NetMHCpan 4.06 for various HLA types. In this task, we focused on haplotypes common in Chinese and European populations, which include HLA-A*02:01, HLA-A*01:01, HLA-B*07:02, HLA-B*40:01 and HLA-C*07:02 alleles (Data Table 34).
For different alleles, 0 denotes non-binding and 1 denotes binding predicted for specific HLA allele.
Based on MHC presentation and immunogenicity prediction, we detected 5 peptides predicted to bind 4 different HLA alleles of which 2 had strong immunogenicity scores (Figure 3). For those 65 strong binders to 3 different HLA types, 39 had immunogenicity scores ≥ 0.5 (Table 3). Collectively this analysis suggests a number of 9-mer immunogenic candidates for further experimental validation.
While our de novo candidates are appealing shortlisted targets for experimental validation, it does not provide information about target T cell receptors (TCRs). We therefore set out to interrogate the possibility of cross reactivity with one well-studied TCR.
T cell cross-reactivity has been instrumental for the T cell immunity against both tumor antigens and external pathogens. In that regard, a number of T cells have been extensively characterized including 1G4 CD8+ TCR, which is known to recognize the ‘SLLMWITQC’ peptide presented by HLA-A*02:01. We therefore set out to leverage the data from a recently published study7 and exploit the possibility of cross reactivity of this TCR to any 2019-nCoV peptide.
Here, we scanned all 9-mers from the 2019-nCoV proteome (9613 peptides) with Binding, Activating and Killing Position Weight Matrices (PWM, see the method section) and associated each peptide with the geometric mean of these three assays as a measure of immunogenicity (Data Table 44). The distributions of binding, activation and killing scores along with their multiplicative score and geometric mean are illustrated in Figure 4. Based on geometric mean, we observed 20 2019-nCoV peptides with a score > 0.8 and 516 peptides > 0.7. The 9-mer peptides with geometric mean > 0.7 and positive HLA-A*02:01 binding prediction by NetMHCpan 4.0 are listed in Table 4.

The positional weight matrices were obtained from 7 and 9613 9-mers generated from 10 2019-nCoV ORFs were computed for their TCR recognition potential.
We further analysed the MHC binding propensities and gathered peptides not only predicted positive by NetMHCpan but also to have leucine (L) and valine (V) in anchor positions 2 and 9 respectively. This led to identification of 44 2019-nCoV peptides of which 2 peptides had immunogenicity score > 0.7 and 12 peptides > 0.6 (Table 5). Thus, here we provide the list of peptides that are potential targets for 1G4 TCR recognition for subjects with HLA-A02:01 haplotype.
Peptides have geometric mean ≥ 0.6 and ≤ 0.7 (for those ≥ 0.7, refer to Table 4) by 1G4 TCR positional weight matrix and predicted positive for HLA-A*02:01 binding by NetMHCpan 4.0 (Rank = NetMHCpan rank).
In this study we provide a profile of computationally predicted immunogenic peptides from 2019-nCoV for functional validation and potential vaccine developments. We are fully aware that an effective vaccine development will require a very thorough investigation of immune correlates to 2019-nCoV. However, due to the emergency and severity of the outbreak as well as the lack of access to samples from infected subjects, such approaches would not serve the urgency. Therefore, computational prediction is instrumental for guiding biologists towards a quick and cost-effective solution to prevent the spread and ultimately help eliminate the infection from the individuals.
With a rising global concern of novel coronavirus outbreak, numerous research groups have started to investigate and publish their findings. At the time of preparing this manuscript, we became aware of a similar study conducted in comparing 2019-nCoV proteome with SARS CoV immunogenic peptides8. Our in silico approach takes the search beyond presenting only common immunogenic peptide between SARS and 2019-nCoV and provides the experimental community with a more comprehensive list including de novo and cross reactive candidates. On the other hand, considering the fact that two studies have been accomplished independently with distinct approaches, this serves to demonstrate a high level of confidence in reproducing the results. Reproducibility of computational prediction is always of high importance and becomes even more significant under urgent scenarios as of this outbreak.
Our study also suggests the need for further efforts to develop accurate predictive models and algorithms for the characterization of immunogenic peptides.
In this study, we provide potential immunogenic peptides from 2019-nCoV for vaccine targets that i) have been characterized immunogenic by previous studies on SARS CoV, ii) have high degree of similarity with immunogenic SARS CoV peptides and iii) are predicted immunogenic by combination of NetMHCpan and iPred/1G4 TCR positional weight matrices. Given the limited time and resources, our work serves as a guide to save time and cost for further experimental validation.
2019-nCoV open reading frame sequences were downloaded from NCBI (MN908947.3). All sequences subjected for analysis are deposited in GitHub repository.
The sequence similarity between 2019-nCoV open reading frames and previously characterized immunogenic peptides in IEDB was analysed by local alignment using R ‘pairwiseAlignment’ function from Biostrings v2.40.2 package. The local alignment utilized BLOSUM62 matrix, gapOpening of 5 and gapExtension of 5. The alignment score was normalized by length of target peptides.
We have used iPred5 to predict immunogenicity of each given peptide. Briefly, iPred employs peptides’ length and physicochemical properties of amino acids modelled by sums of ten Kidera factors and associates a score to each peptide reflecting its likelihood of recognition by a T cell.
In order to predict peptide binding to MHC we used NetMHCpan V46. This version of NetMHCpan that comes with a number of improvements, incorporate both eluted ligand and peptide binding affinity data into a neural network model to predict MHC presentation of each given peptide.
To gauge the level of 1G4 TCR cross-reactivity to list of 2019-nCoV virus, we have leveraged the data from a recently published study7. 1G4 or NY-ESO-1-specific TCR is a very well-studied and clinically efficacious TCR which recognize the peptide ‘SLLMWITQC’ presented by HLA-A*02:01. Karapetyan et al. have recently provided data from three experimental assays reflecting Binding, Activating and Killing upon each mutation at each position of all possible 9-mers using these three datasets. In a similar way to the original paper, we trained three Position Weight Matrices named B, A and K respectively from Binding, Activating and Killing assay. We defined the cross-reactivity score of a given 9-mer sequence as the geometric mean of B, A and K.
We then scanned 2019-nCoV virus protein sequence with each of B, A and K PWMs and associated each of 9613 9-mers with a cross reactivity score. At the same we utilized NetMHCpan and associated each 9-mer with its presentation score. Our final list of cross-reactive candidate peptides were those with a cross-reactivity sore >= 0.8 and reported as strong binders from NetMHCpan and have ‘L’ and ‘V’ amino acids at anchor positions. The custom R codes are accessible from GitHub repository (see software availability4).
Replication code: https://github.com/ChloeHJ/Vaccine-target-for-2019-nCoV.git
Archived source code at time of publication: http://doi.org/10.5281/zenodo.36769084
Zenodo: In silico identification of vaccine targets for 2019-nCoV (Data tables). http://doi.org/10.5281/zenodo.36768869
This project contains the following underlying data:
– Table1 nCoV peptides having exact match with immunogenic SARS CoV peptides.xlsx (Table of nCoV peptides having exact match with immunogenic SARS CoV peptides)
– Table2 nCoV peptides with high sequence similarity with immunogenic IEDB peptides.csv (Table of peptides with high sequence similarity with immunogenic IEDB peptides)
– Table3 de novo search on 9-mer nCoV for immunogenic peptides by NetMHCpan and iPred.csv (Table of results of de novo search on 9-mer nCoV for immunogenic peptides by NetMHCpan and iPred)
– Table4 de novo search on 9-mer nCoV for immunogenic peptides by NetMHCpan and PWM.xlsx (Table of results of de novo search on 9-mer nCoV for immunogenic peptides by NetMHCpan and PWM)
Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).
We acknowledge further appreciate assistance and computing support from Unit and WIMM Centre for Computational Biology at MRC Weatherall Institute of Molecular Medicine. We thank G. Napolitani and M. Salio for insightful discussions about the project.
| Views | Downloads | |
|---|---|---|
| F1000Research | - | - | 
| PubMed Central Data from PMC are received and updated monthly. | - | - | 
Is the work clearly and accurately presented and does it cite the current literature?
Yes
Is the study design appropriate and is the work technically sound?
Yes
Are sufficient details of methods and analysis provided to allow replication by others?
Yes
If applicable, is the statistical analysis and its interpretation appropriate?
Yes
Are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions drawn adequately supported by the results?
Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: T cell responses in Health and Diseases. Modulation of Immune Responses. Clinical Immunology. Vaccines
Is the work clearly and accurately presented and does it cite the current literature?
Yes
Is the study design appropriate and is the work technically sound?
Yes
Are sufficient details of methods and analysis provided to allow replication by others?
Partly
If applicable, is the statistical analysis and its interpretation appropriate?
Not applicable
Are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions drawn adequately supported by the results?
Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: T cell immunology
Alongside their report, reviewers assign a status to the article:
| Invited Reviewers | |||
|---|---|---|---|
| 1 | 2 | 3 | |
| Version 2 (revision) 14 Apr 20 | read | ||
| Version 1 25 Feb 20 | read | read | |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)