Keywords
SARS-CoV-2, S protein, peptides, Variants, HLA alleles Moroccan population
This article is included in the Bioinformatics gateway.
The coronavirus disease 2019 (COVID-19) is an infectious disease, caused by the new coronavirus known as Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), and exhibits diverse clinical outcomes and symptoms in infected individuals, emphasizing the need to investigate how human genetic diversity influences the virus’s impact. This study aims to employ in silico methods to identify epitopes capable of eliciting an immune response, focusing on the most prevalent HLA-I and HLA-II alleles in the Moroccan population.
Our research consisted in predicting peptide-binding affinities between the most prevalent HLA Class I and Class II alleles in the Moroccan population and SARS-CoV-2 spike glycoprotein (S protein) peptides of variants isolated from strains of Moroccan patients. We performed the same analyses for SARS-CoV-2 wild type S protein to assess the ability of these HLA alleles to interact with peptides in the presence or absence of SARS-CoV-2 mutations.
In a broader sense, 12 distinct HLA Class I and Class II alleles in the Moroccan population have been identified as possibly interacting with 19 epitopes in the SARS-CoV-2 S protein. Findings of this study must be validated in both in vitro and in vivo models.
These data may help clarify the issue of host cell susceptibility and the outcome of SARS-CoV-2 infection, and may guide further research to uncover potential targets for the vaccination strategy.
SARS-CoV-2, S protein, peptides, Variants, HLA alleles Moroccan population
The COVID-19 pandemic has caused unparalleled economic and social disruption across the world. COVID-19 is a respiratory illness that results from an infection with the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The virus was initially in December 2019 in Wuhan, China, and has quickly spread throughout the world.1,2 By December 2022, the World Health Organization (WHO) had reported over 651 million cases of COVID-19, with more than 6 million deaths attributed to the disease. The first instance of the COVID-19 virus was registered in the Kingdom of Morocco on March 02, 2020, in Casablanca city. The Moroccan patient had acute pneumonia, and the infection was imported from Europe.3 Between January 3, 2020, and August 16, 2023, the Ministry of Health in Morocco reported a total of 1 275 320 confirmed cases of COVID-19 with 16 297 deaths in within the country. The mortality rate is stated at 1.3% (Ministry of Health, Morocco, CNOUSP report, April 2023).
A notable characteristic of COVID-19 that continually surprises us is the extensive range of clinical symptoms that patients exhibit across different populations. These symptoms range from mild to severe, with severe cases potentially leading to pneumonia, respiratory failure, multi-organ failure, and death.4 These differences highlight the significance of studying and understanding human genetic variation during infections. Previous studies have linked the susceptibility and outcomes of multiple infectious diseases to the genetic background of the host. The Human Leukocyte Antigen system is believed to be among the components that could explain differences in virus susceptibility and severity, and it has been recommended as a potential genetic factor that affects a person’s immune response to SARS-CoV-2.5,6
The human leukocyte antigen (HLA) system, which is a major component of the adaptive immune system, is in charge of recognizing and binding both endogenous and exogenous antigens.5 HLA molecules are classified into two classes: HLA class I and HLA class II. While HLA class II (DR, DQ, DP) molecules are primarily responsible for displaying peptides from external pathogens, HLA class I (A, B, C) molecules are crucial for immune protection against intracellular pathogens. HLA polymorphism is the highest degree of variability found in the genetic code of the proteins expressed by this system. This variability is considered an important factor in determining a person’s resistance to, or susceptibility to specific infectious illnesses.6,7
The SARS-CoV-2 genome encodes a number of structural proteins, including the spike protein (S), envelope protein (E), membrane glycoprotein (M), and nucleocapsid phosphoprotein (N), which is in accordance with other coronaviruses. Additionally, it encodes nonstructural proteins, such as open reading frame 1ab (ORF1ab), ORF3a, ORF6, ORF7a, ORF8, and ORF10.8 This study focused on the spike protein, which is the primary antigen present on the surface of the virus, facilitating SARS-CoV-2 entrance into human host cells,9,10 and additionally because of its high mutation rate, which enables it to alter its shape and escape host immune responses.11 The Spike protein, also known as S protein, is made up of two subunits, S1 and S2, which are connected by a furin cleavage site. The S1 subunit comprises the receptor-binding domain (RBD), which is responsible for the virus’s ability to adhere to the host cell membrane and initiate infection. The S2 subunit has a hydrophobic fusion loop which facilitates membrane fusion. Therapeutic developments against SARS-CoV-2 are often targeted at RBD.10 The spike is frequently studied for the development of neutralizing antibodies and vaccines, and is widely considered as a successful target for detection purposes.12–17
Due to their effectiveness and up speed, computational techniques are far superior to laboratory tests in the drug development process because they can anticipate the antigenic epitopes of specific viral proteins.18
In Morocco, like in other countries, the COVID-19 pandemic had a noticeable impact on the health system. Although many risk factors of COVID-19 severity have been described, data from North Africa are limited. This study used an immunoinformatics approach to predict the peptide-binding affinity between the most frequent HLA Class I and Class II alleles in the Moroccan population and SARS-CoV-2 S protein peptides of variants isolated from strains of Moroccan patients. The same analysis was also performed for the SARS-CoV-2 wild-type S protein to assess the ability of these HLA alleles to interact with peptides in the presence or absence of the SARS-CoV-2 mutation, and thus predict which epitopes would be most effective at acting as potent immunogens.
The most frequent HLA class I and class II alleles in the Moroccan population were obtained from the Allele Frequency Net Database (http://www.allelefrequencies.net/pop6001a.asp), which compiled data from studies conducted in different regions of Moroccan. The average allelic frequency of each allele was then computed using data from multiple regions.
To evaluate the HLA-peptide-binding affinity predictions, we obtained the mutated sequences of the SARS-COV-2 S protein for each variant of concern, including Alpha (B.1.1.7), Beta (B.1.351), Gamma (P.1), Kappa (B.1.617.1), Delta (B.1.617.2), and Omicron (B.1.1.529), from strains of Moroccan patients available on the GISAID databank.
The reference genome (WuhanHu-1 strain) was obtained from the NCBI Refseq database under GenBank accession number NC_045512.2. The amino acid sequence of the S protein from the reference genome was retrieved in FASTA format and used as a reference sequence for comparison with the sequences of the variants.
The NetMHCpan - v4.0 and NetMHCIIpan v. 3.2 programs were used to predict HLA peptide-binding affinity for HLA class I and class II alleles, respectively. The FASTA sequences from both the GISAID databank and the NCBI database were imported and analyzed. Using the aforementioned sequences, we predicted the binding affinity of each HLA allele to all potential 10-mer and 15-mer overlapping peptides for HLA class I and class II, respectively.
We used Vaxijen V.2.0 (http://www.ddg-pharmfac.net/vaxijen/VaxiJen/VaxiJen.html), to assess the potential antigenicity of the predicted peptides by setting a threshold score of less than 0.5 and employing a virus model to eliminate non-antigenic peptides.
The antigenicity response assesses the capability of the proposed epitopes to prompt an immune response.
Toxicity of the predicted peptides was set using the ToxinPred server (https://webs.iiitd.edu.in/raghava/toxinpred/multi_submit.php). This server enables the identification of highly toxic or non-toxic peptides from a large pool of peptides submitted by analyzing their key physico-chemical properties such as hydrophobicity, hydropathicity, amphipathicity, molecular weight, and pI charge.
The HLA binding affinity frequencies of peptides from the mutated S protein were compared with those from the reference S protein (Wild-Type) to identify which HLA alleles had varying binding affinities in the presence of mutations.
This analysis aimed to identify mutated peptides that bind well the most frequent HLA alleles in the Moroccan population.
Table 1 lists the most frequent HLA class I and class II alleles in the Moroccan population, according to the HLA Allele Frequency Net Database. The data reveals a prevalence of HLA class I alleles over class II alleles, with a total of 10 HLA-A, 6 HLA-B, 7 HLA-C, and 7 HLA-DRB1 alleles identified.
HLA Class II | |
---|---|
HLA-DRB1 | Frequency |
DRB1*01:02 | 0.0355 |
DRB1*03:01 | 0.152 |
DRB1*04:02 | 0.059 |
DRB1*04:05 | 0.045 |
DRB1*07:01 | 0.163 |
DRB1*13:02 | 0.071 |
DRB1*15:01 | 0.105 |
In Table 2, mutations identified in the SARS-CoV-2 S protein isolated from strains of Moroccan patients are presented, sourced from the GISAID databank. A total of 23 mutations were identified on the spike protein across the different variants of SARS-CoV-2. The mutations highlighted in bold are the common ones found among all six variants, while each variant also includes unique mutations not observed in other variants of concern.
Our dataset for peptide-binding affinity prediction comprises only peptides with a mutation in their core. Two hundred and twenty peptides were identified based on the results from NetMHCpan servers. Non-immunogenic or toxic peptides were discarded from further analysis. Consequently, eighty-five immunogenic peptides in the SARS-CoV-2 S protein were retained for the assessment of binding affinities (as shown in Table 3).
The comparison of HLA class I binding repertoires between wild type and mutated S protein peptides revealed that 10 mutations in the S protein had the potential to bind to HLA-A*02:01, HLA-A*29:02, HLA-A*68:02, HLA-B*45:01, and HLA-C*16:01.
Among these HLA alleles, HLA-A*02:01 and HLA-C*16:01 had one good binder each with T547K and T376A, respectively. HLA-A*29:02 had three good binders with H69-V70, P681R, and D614G. For HLA-A*68:02, four good binders were found with H69-V70, Y144-145, L252R, and T547K. Similarly, HLA-B*45:01 had four good binders with E484K, E484Q, E484A, and D614G.
Regarding HLA Class II, 9 mutated peptides were found able to interact with HLA-DRB1*01:02, HLA-DRB1*03:01, HLA-DRB1*04:02, HLA-DRB1*04:05, HLA-DRB1*07:01, HLA-DRB1*13:02, and HLA-DRB1*15:01.
Among these, HLA-DRB1*01:02, DRB1*04:02, and HLA-DRB1*04:05 had a common good binder with V213G. HLA-DRB1*03:01 and HLA-DRB1*15:01 had four good binders each, which were (H69-V70, K417N, K417T, and D614G) and (Y144-145, K417N, K417T, and V213G) respectively. On the other hand, HLA-DRB1*07:01 and HLA-DRB1*13:02 had five good binders each with (K417N, K417T, L252R, T19R, and V213G) and (N501Y, H69-V70, K417T, T19R, and V213G) respectively. Table 4 provides more details on the peptide sequences.
The HLA-DRB1*15:02 allele was the only one to exhibit good binding affinity to the N501Y mutation, which emerged in the Alpha, Beta and Gamma lineages. Conversely, the H69-V70, Y144-145, L452R, and D614G mutations were presented by both HLA class I and class II alleles, while the K417N, K417T, and V213G mutations were only presented in Class II.
NetMHCIIpan v. 3.2 did not predict any peptides for the most frequent HLA class II alleles in the Moroccan population for both the Spike V6F mutation and the wild-type spike sequence. All predicted peptides for the T478K mutation in the receptor binding domain (RBD) of the Delta variant were deemed non-antigenic by the Vaxijen tool. This mutation is believed to help evade recognition by the immune system,19 particularly with regards to antibodies and neutralization, or in impairing the interaction between the RBD and drugs.20 Tables 5 and 6 present the antigenicity scores of the selected binders.
Variant | Peptide | Vaxijen score | Final decision |
---|---|---|---|
Alpha | NVTWFHAISG | 0.6203 | - |
Delta | TNSRRRARSV | 1.1716 | * |
Alpha/Beta/Gamma/Delta | YQGVNCTEVP | 1.0639 | * |
Alpha | CNDPFLGVYH | 0.4600 | - |
Delta | GGNYNYRYRL | 1.2747 | * |
Beta/Gamma/Delta | GVKGFNCYFP | 0.7415 | - |
Kappa | GVQGFNCYFP | 0.7874 | - |
Omicron | GVAGFNCYFP | 1.2211 | * |
Omicron | NFNFNGLKGT | 1.3565 | * |
Omicron | SFSAFKCYGV | 0.5764 | - |
Variant | Peptide | Vaxijen score | Final decision |
---|---|---|---|
Alpha | FSNVTWFHAISGTNG | 0.8214 | - |
Beta | VRQIAPGQTGNIADY | 1.1378 | * |
Gamma | VRQIAPGQTGTIADY | 1.1046 | * |
Alpha/Beta/Gamma/Delta | NQVAVLYQGVNCTEV | 0.7970 | - |
Delta | DSKVGGNYNYRYRLF | 1.0304 | * |
Alpha/Beta/Gamma | QSYGFQPTYGVGYQP | 0.7291 | - |
Alpha | QFCNDPFLGVYHKNN | 0.7581 | - |
Delta | SSQCVNLRTRTQLPP | 1.2742 | * |
Omicron | PINLGRDLPQGFSAL | 0.9873 | * |
Due to their efficiency and speed, computational approaches have consequently arisen as potent alternatives or choices for the development of diagnostic tools for infectious diseases, as well as new immunotherapies and vaccines.10 Predictive computationals for identifying antigenic epitopes in viral or bacterial proteins are critical and valuable in the development of new drugs.18–21 It has been recommended to employ these tools before performing laboratory experiments because they are more efficient and faster to use.22 Several studies have utilized immunoinformatic approaches on different SARS-CoV-2 proteins to design potential epitope vaccine candidates against SARS-CoV-2,8,23–31 In this study, we designed SARS-CoV-2 S protein peptides that bind to the most frequent HLA class I and class II alleles in the Moroccan population. The aim was to evaluate the potential of these peptides to elicit an immune response using in silico methods. The SARS-CoV-2 S protein was chosen as the target in this study because it is the most mutated structural protein of SARS-CoV-2 and is frequently studied as it helps in target recognition and cellular entry. This protein promotes viral infection and is essential for the development of neutralizing antibodies and vaccines.10,12 Additionally, the S protein is commonly acknowledged as a suitable target for detection purposes.16
In this study, we selected twenty-three mutations from five SARS-CoV-2 variants (Alpha, Beta, Gamma, Delta, and Omicron) as they are considered key mutations associated with higher transmission and reinfection rates.32,33 We evaluated the ability of class I and class II HLA molecules to present the mutated peptides of the SARS-CoV-2 spike protein. Our findings showed that HLA class I molecules had a higher proportion of good binders compared to HLA class II alleles, with 10 versus 9, respectively.
The peptides SSQCVNLRTRTQLPP, VRQIAPGQTGNIADY, VRQIAPGQTGTIADY, DSKVGGNYNYRYRLF, and PINLGRDLPQGFSAL were selected as immunogens for HLA class II based on their antigenic score (1.2742; 1,1378; 1,1046; 1,0304 and 0.9873 respectively), non-allergenicity, and lack of toxicity. For HLA class I, the peptides NFNFNGLKGT, GGNYNYRYRL, GVAGFNCYFP, TNSRRRARSV, and YQGVNCTEVP exhibited the best antigenic scores of 1.3565; 1,2747; 1,2211; 1,1716; and 1,0639, respectively. The scores therefore indicate the stimulation of the immune system in response to the proposed epitopes. Given their binding affinity with the most frequent HLA alleles in the Moroccan population and antigenicity response, these epitopes may be promising candidates for vaccine development.
Potential immunogenic peptides have been identified as prospective COVID-19 vaccine targets using in silico studies. Previous research has indicated that HLA-A*02:01, among other alleles, exhibited the strongest binding to COVID-19 epitopes.34,35 Our study revealed that the “NFNFNGLKGT” peptide of the T547K mutation in the Omicron variant showed a higher antigenicity score and had the highest affinity to this particular allele. Consequently, this epitope could be a promising candidate for the development of a COVID-19 vaccine.
Regarding the HLA-C allele, a peptide belonging to the Omicron subvariant peptide with the sequence SFSAFKCYGV was predicted to have high binding affinity with HLA-C*16:02. However, it was not selected due to its Vaxijen score of 0.5764. This gene has been previously reported to have a less distinctive peptide repertoire when compared to HLA-A and HLA-B.36 HLA-C*16:01 was found to be more prevalent among individuals who had a mild form of COVID-19 compared with those with severe or critical forms of the disease in a cohort of Spanish Mediterranean Caucasians.37
HLA-A*68:02 was the MHC-I molecule that bound the immune epitopes of the S protein L452R mutation of the Delta variant (GGNYNYRYRL). A study conducted in Tapachula Chapas, found that although the frequency of HLA-A*68 was lower in COVID-19 patients who were ill, the allele provided 3.3 times more protection against a fatal outcome from SARS-CoV-2 infection in mestizo individuals.38 Both HLA-A*68:01 and HLA-A*68:02 have the ability to bind to a large number of SARS-CoV-2 peptides with varying degrees of affinity. Furthermore, a systematic review and meta-analysis found that HLA-A*68:02, along with other HLA class I and class II alleles, were associated with COVID- 19 severity.39
HLA-A, HLA-B, HLA-C, and HLA-DRB1 may serve as potential indicators of the severity and likelihood of death from COVID-19. However, further research on a larger scale are required to confirm this hypothesis.
A literature review has revealed that HLA class I alleles may be deemed as determinants of either resistance or susceptibility to COVID-19. This is due to the fact that these alleles have the ability to bind to SARS-CoV-2 peptides, leading to the modulation of the immune response against the virus.40
Several studies have employed a similar approach to design peptides for the protein S of SARS-CoV-2. Baruah et al. found five CD8+ T cell epitopes YLQPRTFLL, GVYFASTEK, EPVLKGVKL, VVNQNAQAL, and WTAGAAAYY, along with eight B cell epitopes that are more likely to bind MHC class I commonly found in China.41 In a second study, Bhattacharya et al. identified thirteen potential MHC-I antigenic peptides (SQCVNLTTR, GVYYHKNNK, GKQGNFKNL, GIYQTSNFR, VSPTKLNDL, KIADYNYKL, KVGGNYNYL, EGFNCYFPL, GPKKSTNLV, SPRRARSVA, LGAENSVAY, FKNHTSPDV, and DEDDSEPVL) and three potential MHC-II antigenic peptides (IHVSGTNGT, VYYHKNNKS, and FKNHTSPDV).42
Joshi et al. proposed the MHC-I ITLCFTLKR peptide as a potential vaccine candidate,43 while another study conducted on the Brazilian population found 24 epitopes that bind to 17 different MHC-I alleles.44 Other studies have also identified B-cell epitopes on spike protein for developing a protective vaccine against SARS-CoV-2.23,24,45
The current study’s findings are novel and have not been previously published. These findings are valuable for the development of broadly accessible vaccine epitopes targeting SARS-CoV-2, and can also offer valuable insights for investigating T-cell responses.
Nevertheless, to confirm their immunogenicity against SARS-CoV-2, further in-vitro experimental validation or in vivo studies are necessary.
This is the first Moroccan in silico study to assess potential immunogenic peptides within the S protein of various SARS-CoV-2 variants according to the most frequent HLA alleles in the Moroccan population, using an immunoinformatic approach. The findings of the current study have not been published previously. To sum up, we identified 19 epitopes in the SARS-CoV-2 S protein that can bind to 12 distinct HLA Class I and Class II alleles among the Moroccan population, as they were characterized by a probability of triggering an immune response. However, in order to validate their immunogenicity against SARS-CoV-2, additional in-vitro experimental validation or in vivo studies are essential.
MF: Methodology, Data curation, Formal analysis, Writing – original draft. BB: Data curation, Writing – Review & Editing. HO: Visualization, Writing – review & editing, Investigation. KS: Conceptualization, Methodology, Formal analysis, Supervision, Writing – review & editing.
Figshare: Dataset - Design of SARS-CoV-2 protein S peptides recognized by the most frequent HLA alleles in the Moroccan population using an immunoinformatics approach. https://doi.org/10.6084/m9.figshare.25737534.v1. 46
The dataset contains the following data:
• Data.docx: COVID-19 Wild Type Sequence and Selected Mutations from Various Variants
• Data.xlsx: Peptide-HLA Class I Binding Affinity Assessment Wild Type (WT) and Mutated (MT) Peptide Binding Scores
• Data.xlsx: Peptide-HLA Class II Binding Affinity Assessment Wild Type (WT) and Mutated (MT) Peptide Binding Scores
• Data.docx: SARS-CoV-2 Sequences of different variants retrieved from GISAID Databank
• Data.xlsx: Table 1. Averages frequency of most common HLA class I and II alleles in the Moroccan population
• Data.xlsx: Table 2. SARS-CoV-2 S protein mutations isolated from strains of Moroccan patients (n=23)
• Data.xlsx: Table 3. SARS-CoV-2 S peptides for Class I and II HLA alleles of wild and mutated types
• Data.xlsx: Table 4. Sequences of the S protein good binders with their HLA alleles
• Data.xlsx: Table 5. Vaxijen scores (antigenicity) of the predicted MHC class I allele binding peptides.
• Data.xlsx: Table 6. Vaxijen scores (antigenicity) of the predicted MHC class II allele binding peptides.
Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).
Extended data is not applicable in this instance. All relevant data and materials utilized in our study have been comprehensively outlined in the ‘Materials and Methods’ section and provided in the Data Availability section. This includes the data uploaded to the designated repository, as specified in the Data Availability section, which encompasses all materials pertinent to our research. We encourage readers to refer to the repository for access to the complete set of data and materials used in this study. Additionally, all servers, databases and methods utilized in this research are outlined in detail within the manuscript. For further inquiries regarding the data or materials, please contact the corresponding author.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the work clearly and accurately presented and does it cite the current literature?
Partly
Is the study design appropriate and is the work technically sound?
No
Are sufficient details of methods and analysis provided to allow replication by others?
Yes
If applicable, is the statistical analysis and its interpretation appropriate?
Not applicable
Are all the source data underlying the results available to ensure full reproducibility?
Partly
Are the conclusions drawn adequately supported by the results?
Partly
References
1. Lunardi LW, Bragatte MAS, Vieira GF: The influence of HLA/HIV genetics on the occurrence of elite controllers and a need for therapeutics geotargeting view.Braz J Infect Dis. 2021; 25 (5): 101619 PubMed Abstract | Publisher Full TextCompeting Interests: No competing interests were disclosed.
Reviewer Expertise: Immunoinformatics
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | |
---|---|
1 | |
Version 1 20 May 24 |
read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)