Keywords
Human leukocyte antigen, major histocompatibility complex, MHC, binding, allele, netMHCpan, immune system
This article is included in the Bioinformatics gateway.
Human leukocyte antigen, major histocompatibility complex, MHC, binding, allele, netMHCpan, immune system
Human leukocyte antigen (HLA) alleles are critical components of the immune system’s ability to recognize and eliminate tumors and infections.1 Infectious diseases in particular are thought to be a major source of selective pressure on the major histocompatibility complex (MHC) region which encodes HLA alleles and is one of the most diverse regions of the human genome.2–8 There is large diversity in the antigenic peptide sequences which individual HLA alleles can recognize and ultimately present to the adaptive immune system,9 with a positive correlation between increased sequence diversity recognition and fitness.10
Tools that can predict the extent to which a given HLA allele may have an affinity for a given peptide have critical implications for our ability to understand and translationally leverage antigen-specific immune response pathways. For instance, MHC binding affinity predictors have been – or otherwise have the potential to be – used to evaluate an individual or population’s susceptibility to viral infection,11 to develop an understanding of specific autoimmune conditions,12 to improve transplantation technologies,13 or even to assist in the development of personalized cancer vaccines.14–18 Numerous peptide-MHC binding prediction tools exist, and are key components in broader antigen prediction methodologies.19–22
The most widely adopted MHC binding prediction tools rely on neural network models trained on binding affinity (BA) and/or eluted ligand (EL) data. The most commonly cited tool, netMHCpan,23,24 uses both BA and EL data in a neural network architecture with a single hidden layer to predict allele-specific binding affinities. MHCflurry25 attempts to improve upon netMHCpan by increasing the number of hidden layers and augmenting BA and EL training data with unobserved decoys. MHCnuggets26 again trains on BA and EL data but uses a different architecture, with a long short-term memory layer and a fully connected layer to improve its predictions further across different peptide lengths. Lastly, HLAthena,27 while most similar in architecture to netMHCpan, relies on independently generated EL data from mono-allelic cell lines for training.
We sought to better characterize the outputs of these tools over a large and diverse set of peptides, across different tools and HLA alleles, as well as quantify the stability of these predictions. We also sought to measure allelic binding preferences and whether they may enrich for foreign v. self peptides. In this study, we performed a comprehensive in silico analysis of peptides from multiple viral proteomes, the human proteome, and randomly generated peptides across HLA class I alleles.
FASTA-formatted protein sequence data was retrieved from the National Center of Biotechnology Information (NCBI)28,29 using RefSeq as of 1-31-22 for BK, SARS-CoV-2, HHV-5, HHV-6, HSV-1, HSV-2, HSV-4, and Human. Protein sequence data was inputted into netchop v3.0 “C-term” model with a cleavage threshold of 0.1 to remove peptides that were not predicted to undergo canonical MHC class I antigen processing via proteasomal cleavage (of the peptide’s C-terminus). The results from netchop v3.0 were then kmerized sequentially into 8- to 12-mers. Code used for kmerization and netchop filtering can be found at: https://github.com/Boeinco/peptide-MHCassess. We additionally generated a set of 1 million random peptides of length 8-12 drawn uniformly at random. Peptide sets had negligible overlap (<1% shared between human vs viral vs random peptides).
MHC class I binding affinity predictions were performed for the peptides generated from the kmerization process above using four tools: netMHCpan v4.1,23 HLAthena v1.0,27 MHCflurry v2.0,25 and MHCnuggets v2.3.30 netMHCpan was run with default options with the ‘-l’ option to specify peptides of lengths 8–12. MHCflurry was run with default options. MHCnuggets was run with default options. HLAthena was run using the dockerized version of HLAthena with default options, which predicts peptides of length 8–11. MHC class I binding affinity predictions were performed for each of 24, 26, and 2, HLA-A, -B, and -C alleles, respectively. Only alleles that were in common between all four tools were used (52 total alleles in common between 2489 possible alleles). Binding affinity values were converted to binding probability values for MHCflurry and MHCnuggets using 1- log (binding affinity) /log(50000) in order to match HLAthena and netMHCpan binding probability predictions. Alleles were grouped into supertypes when applicable using the HLA class I revised classification.31
Peptides were converted into physical property matrices using amino acid sequence mapping into a 4*kmer length matrix containing each amino acid’s properties in sequence. The following physical properties of the amino acids were encoded: side chain polarity was recorded as its isoelectric point (pI),32 the molecular volume of each side chain was recorded as its partial molar volume at 37°C,33 the hydrophobicity of each side chain was characterized by its simulated contact angle with nanodroplets of water34 and conformational entropy was derived from peptide bond angular observations among protein sequences without observed secondary structure.35
Each dimensional reduction was performed on the pooled set of k-mers. UMAP dimensionality was performed using uwot UMAP R implementation v0.1.11. PCA was performed using default prcomp() functions in base R v4.1.3.
For each peptide source, binned matrices were computed using the bin2() function with 40×40 (1600) bins from the Ash v1.0.15 package36 in R v4.1.3. Bin values were then divided by the total number of peptides to create bins with the % of total peptides. In order to compare between two peptide sources, a matrix, called the difference matrix, is created by subtracting one matrix of a peptide source from another. Taking the absolute value of each bin in the difference matrix, then summing the values together, results in a single metric ranging from 0–2 measuring the difference in binned density between two peptide sources, the value 2 indicating that no peptides were shared between bins and the value 0 indicating the same percentage of peptides in every bin (Figure 1).
Bin values were then divided by the total number of peptides to create bins with the % of total peptides. In order to compare between two peptide sources, a matrix, called the difference matrix, is created by subtracting one matrix of a peptide source from another. Taking the absolute value of each bin in the difference matrix, then summing the values together, results in a single metric ranging from 0–2.
For each allele-peptide source combination, the percentage of peptides predicted to bind with a binding probability score of 0.5 or greater was calculated for all processed peptides.37 A binding score of 0.5 is estimated to be equivalent to 250–300 nM depending on the tool used. For each peptide source, alleles were ranked from best to worst binders (most to least peptides ≥ 0.5 score) t. In order to compute allele ordering similarity between two peptide sources for a single tool, Spearman’s Rank Correlation Coefficient was calculated between the two sets of allele ranks.
For the random group 1 vs random group 2 analysis, we conducted 100 replicates of dividing the randomly generated peptides into two random groups and performed a Spearman rank test of allele ordering between these groups for each of the tools.
Intraclass correlation coefficients (ICCs) were calculated using the ICC () function from the IRR v0.84.1 R package.38 Binding prediction scores for all one million randomly generated peptides were separated by tool and HLA allele, and an ICC was calculated as the interrater reliability metric between the four tools for each allele. ICC was also between the four tools on a per peptide basis, each peptide receiving a score across four tools using predictions separated by tool and peptide.
An earlier version of this article can be found on bioRxiv (https://doi.org/10.1101/2022.12.04.518984). Source code can be found on Github.39
We first assessed the consistency of peptide-specific MHC I binding affinity predictions across four tools (MHCnuggets, MHCflurry, HLAthena, netMHCpan) and 52 different HLA alleles. We found substantial disagreement in peptide-specific predictions between each tool, independent of allele (Figure 2A), with median intraclass correlation coefficient (ICC) of 0.207 and only 0.48% of peptides having ICC >0.75. On a per-allele basis, we found a wide range in consistency of predictions across tools, with a mean intraclass correlation as low as 0.12 for A02:07 and as high as 0.64 for A23:01 (Figure 2B). Among all of the peptides predicted by at least one tool to bind to at least one allele, only 7.9% were consistently predicted across all tools to bind to the same allele (Figure 2C).
A) Histogram of intraclass correlation coefficients (ICC) calculated for a set of 1 million random peptides across four tools (MHCnuggets, MHCflurry, HLAthena, netMHCpan), with ICC calculated as the overall correlation among tools across 52 HLA alleles. The dotted vertical line indicates the median ICC value (0.207) across all peptides. B) Histogram of ICCs for 52 HLA alleles between four tools (MHCnuggets, MHCflurry, HLAthena, netMHCpan). The number of alleles is shown on the y-axis and the ICC is shown on the x-axis. The dotted lines show the mean ICC for alleles belonging to each HLA class. Red, green, and blue colors represent data from -A, -B, and -C alleles, respectively. C) Detailed comparison of the complete set of random peptides predicted to bind (binding score ≥ 0.5) to HLA alleles according to each of four tools. Patterns of agreement or disagreement among groups of peptides predicted by different combinations of tools across 1 million random peptides are shown along each column (e.g. the first column corresponds to peptides predicted by HLAthena while the final column corresponds to peptides predicted by all tools). Each row indicates the predictions associated with the indicated tool. The number of peptides in each column (vertical bars) corresponds to the size of the subset predicted by the indicated combination of tools.
We next investigated aggregate peptide binding predictions across different HLA alleles according to each tool. As others have noted differential HLA allelic promiscuity in peptide presentation,31,40–42 we too found a wide range in the proportion of peptides a given allele was predicted to bind (Underlying data: Supplementary Figure 137). We uncovered significant inconsistencies in these predictions between tools (Figure 3). Note that this phenomenon is independent of binding affinity threshold (Underlying data: Supplementary Figure 237).
The lower left grouping of plots displays scatter plots of peptides predicted to bind (≥ 0.5 binding probability score) between 2 tools with each point representing the number of predicted binders for each HLA allele. The upper right grouping represents the Spearman correlation of the number of peptides predicted to bind to all alleles between tools. Note that MHCnuggets has a number of alleles with 0 random peptides predicted to bind. The diagonal panels show distribution of HLA allelic presentation from the random proteome for each tool. The number of peptides that putatively bind to each of the HLA alleles is shown along the x-axis as a series of horizontal bars with green, orange, and purple colors representing HLA-A, -B, and -C alleles, respectively, sorted in order of decreasing quantity of binders.
As each allele has a different amount of training data, we were next interested in exploring to what extent the quantity and quality of training data available to each tool might influence its allele-specific predictions. Indeed, some netMHCpan predictive models for some alleles are based on as few as 101 peptides, while others from MHCflurry are based on as many as 31,775 peptides (Underlying data: Supplementary Table 137). Note that we excluded from consideration the ~95% of alleles (4108) that were available for prediction but had no underlying allele-specific training data available (Underlying data: Supplementary Table 237). Ultimately, we found that the amount of training data available was not significantly related to the consistency of binding predictions between tools (Figure 4A), nor was it clearly related to the quantity of binding peptides predicted by tools (Figure 4B).
A) Scatterplot of ICC vs mean training data across 4 tools with each point representing data for a single HLA allele. The mean number of training peptides is shown on the x-axis while the ICC score is shown on the y-axis. B) Scatterplot of the relationship between training data and predicted peptide binding. The number of peptides used as training data for an allele is shown on the x-axis whereas the number of peptides predicted to bind for the same allele is shown on the y-axis. Each dot is a single allele with each color representing a different tool: red circles (HLAthena), green triangles (MHCflurry), blue squares (MHCnuggets), purple plus signs (netMHCpan). We note that netMHCpan does not make all of their training data available, thus the depicted quantity of training data represents an estimate.
According to the pathogen driven selection theory of MHC evolution, different HLA alleles are anticipated to be particularly attuned to foreign as opposed to self-antigens.3,8,43–46 We therefore sought to compare the predicted capacity of different HLA alleles to present different viral vs. self-antigens. Further, we wished to establish which specific alleles had the propensity to bind a larger fraction of peptides in general (allele promiscuity) by observing the relationship between an allele’s ability to bind random peptides versus peptides from a viral or human proteome.
We examined distribution of predicted allelic promiscuity across alleles for nine sets of peptides of viral, human, and random origin (See Methods). Confining attention to human and viral proteomes, we again found a wide range in the proportion of peptides a given allele was predicted to bind and also significant inconsistencies between tools (Underlying data: Supplementary Figure 337).
We found that the alleles with highest mean binding percentage for human and viral peptides were B15:03 (2.68%) and B15:02 (2.36%) and the allele lowest mean binding percentage were B18:01 (0.24%) and A01:01 (0.33%) (Underlying data: Supplementary Table 337). No alleles were predicted by any tool to preferentially present either viral or human peptides. Further, the distribution of predicted allelic promiscuity across alleles was highly consistent between human and viral proteomes, but not when applied to a set of random peptides (Figure 5). We noted that this phenomenon holds for closely related viruses across all tools and to a lesser extent for more distantly related viruses (Underlying data: Supplementary Figure 437).
A) Heatmap of spearman correlation between peptide sources for HLAthena-based predictions for human peptides, viral peptides, and randomly generated peptides. Numbers show Spearman correlation coefficients between each pair respectively, while color reflects the Spearman correlation with red approaching a Spearman correlation of 1. Analogous data is shown for netMHCpan, MHCflurry, and MHCnuggets in panels B, C, and D, respectively.
Confining attention to the nine alleles whose predictive models were likely most robust (based on a minimum of 2000 training peptides for every tool), we again found that the distribution of predicted allelic promiscuity across alleles was consistent between closely related viruses and to a lesser extent between more distantly related viruses (Underlying data: Supplementary Figure 537).
Reasoning that differences in peptide characteristics were the likeliest explanation for predicted differences in binding affinity between different alleles and peptide sources, we next studied the distribution of physical properties among different peptide sets. Human, viral, and random peptide sets all exhibited the same range of physical properties but were differentially enriched among different physical properties (Underlying data: Supplementary Figure 637). Between individual peptide sets, the differential enrichment ranged from 10% (CMV v. human) to 63% (BK v. random) of peptides (Underlying data: Supplementary Figure 737).
We next sought to discover the relationship between the peptide similarity in physical property space and distribution of predicted allelic promiscuity across alleles. Across all tools, there was a positive relationship between similarity in physical property space and distribution of predicted allelic promiscuity across alleles as evidenced by the negative correlation between peptide set difference and Spearman correlation coefficient (Figure 6).
A) Scatterplot for HLAthena-based predictions, where each point represents predictions for a species vs species pair. Peptide dissimilarity is shown on the x-axis, whereas Spearman correlation coefficients of predicted allelic promiscuity across alleles. Color represents the length of peptide, with 8-, 9-, 10-, and 11-mers shown in red, green, blue and purple, respectively. Analogous data is shown for netMHCpan, MHCflurr, and MHCnuggets in panels B, C, and D, respectively.
Next, we found that each allele has distinct preferences for different peptide physical properties, independent of peptide length (Figure 7A, Supplementary Figure 837). Some alleles (e.g. A01:01 and B08:01) have stronger preference for certain physical properties (Figure 7B, C), while others (B45:01) do not have as clear of a preference (Figure 7D).
A) The plotting coordinates represent the first two dimensions of a UMAP transform of peptide physical properties, which is divided into 1600 (40×40) equivalently-sized square bins (see Methods). For each bin where there is at least one HLA allele with >0.2% difference in proportion of all peptides predicted to bind v. non-binders, the identity of the most enriched allele is shaded in the color corresponding to that allele’s supertype as corresponding to the legend. B-D) Example plots of three different alleles (A01:01, B08:01, and B45:01) with different distributions of binders. Each box represents enrichment as the percent peptide difference between predicted binders and non-binders for the given allele. The color scale shows the percent of peptides difference in the given box, with red meaning a larger number of predicted binders and blue meaning a larger number of predicted non-binders.
To the best of our knowledge, this is the first study to examine the consistency of predictions of peptide-MHC binding across different tools, and to explore the quality and quantity of training data in this context. We note several limitations to this work. Firstly, we confined attention to MHC class I peptides and did not include predictions for MHC class II,47 of which there are numerous alleles. We also excluded from consideration any potential contributions of proteasomal cleavage or other antigen processing machinery to MHC binding.48–50 We did not seek to comprehensively assess all available tools for peptide-MHC binding affinity prediction, but rather confined our attention to four of the most widely used tools. The majority of our randomly generated peptides are not known to be found in nature and may not represent the optimal background distribution for measuring allele promiscuity or interrater reliability between tools primarily used for human and pathogenic peptides. While our analysis of peptides leveraged four essential and well-described amino acid physical properties, there may exist unassessed latent features that could capture additional variance and improve dimensionally-reduced comparisons. We did not assess the extent to which mass spectrometry biases in the training datasets might affect peptide-MHC predictions.51–54 Lastly, we did not evaluate individual tool performance based on known epitopes as this has been previously reported.23–27,55–59
Our work raises fundamental questions about the fidelity of peptide-MHC binding prediction tools. Why, for instance, can predictions be so discordant among tools for which training datasets are otherwise so similar? We especially worry about the real-world use of these prediction tools for alleles without any direct basis in training data. Why is the predicted range of allele promiscuity so substantial, and yet not demonstrative of any meaningful differences in enrichment between potential foreign versus self-antigens? Moreover, is this differential promiscuity a universal biological phenomenon, with certain alleles being generally poor functional presenters of antigen? If this is the case, what selective advantage might have evolutionarily maintained these alleles in the population? Evaluating more viruses – as well as bacteria, fungi, and other pathogens – and linking these analyses with metrics such as evolutionary distance may give greater insight into the relationship between HLA evolution and disease.
Zenodo: Underlying data for ‘Discordant results among MHC binding affinity prediction tools’, https://doi.org/10.5281/zenodo.7850939. 37
This project contains the following underlying data:
Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).
Source code available from: https://github.com/Boeinco/peptide-MHCassess
Archived source code at time of publication: https://doi.org/10.5281/zenodo.7803676. 39
License: MIT
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the work clearly and accurately presented and does it cite the current literature?
Yes
Is the study design appropriate and is the work technically sound?
Partly
Are sufficient details of methods and analysis provided to allow replication by others?
Partly
If applicable, is the statistical analysis and its interpretation appropriate?
Partly
Are all the source data underlying the results available to ensure full reproducibility?
Partly
Are the conclusions drawn adequately supported by the results?
Partly
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Molecular Immunology, Bioinformatics
Is the work clearly and accurately presented and does it cite the current literature?
Yes
Is the study design appropriate and is the work technically sound?
Yes
Are sufficient details of methods and analysis provided to allow replication by others?
Yes
If applicable, is the statistical analysis and its interpretation appropriate?
I cannot comment. A qualified statistician is required.
Are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions drawn adequately supported by the results?
Partly
Competing Interests: I am a founder, consultant and equity holder of Kelonia Therapeutics and Abata Therapeutics, an equity holder in 3T Biosciences, hold patents on technologies related to pMHC-TCR discovery, and have received consulting fees from Repertoire Immune Medicines.
Reviewer Expertise: I am an immunologist studying pMHC-TCR interactions
Is the work clearly and accurately presented and does it cite the current literature?
Partly
Is the study design appropriate and is the work technically sound?
Yes
Are sufficient details of methods and analysis provided to allow replication by others?
Yes
If applicable, is the statistical analysis and its interpretation appropriate?
Yes
Are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions drawn adequately supported by the results?
Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Bioinformatics, Computational Biology, Machine learning, Gene-targeted Therapies, RNA Biochemistry
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | |||
---|---|---|---|
1 | 2 | 3 | |
Version 1 07 Jun 23 |
read | read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)