In silico analysis of cross reactivity among phospholipases from Hymenoptera species [version 1; peer review: awaiting peer review]

Background: Phospholipases are enzymes with the capacity to hydrolyze membrane lipids and have been characterized in several allergenic sources, such as hymenoptera species. However, crossreactivity among phospholipases allergens are little understood. The objective of this study was to determine potential antigenic regions involved in cross-reactivity among allergens of phospholipases using an in silico approach. Methods: In total, 18 amino acids sequences belonging to phospholipase family derived from species of the order hymenoptera were retrieved from the UniProt database to perform phylogenetic analysis to determine the closest molecular relationship. Multialignment was done to identify conserved regions and matched with antigenic regions predicted by ElliPro server. 3D models were obtained from modeling by homology and were used to locate crossreactive antigenic regions. Results: Phylogenetic analysis showed that the 18 phospholipases split into four monophyletic clades (named here as A, B, C and D). Phospholipases from A clade shared an amino acid sequences’ identity of 79%. Antigenic patches predicted by Ellipro were located in highly conserved regions, suggesting that they could be involved in cross-reactivity in this group (Ves v 1, Ves a 1 and Ves m 1). Conclusions: At this point, we advanced to the characterization of potential antigenic sites involved in cross-reactivity among phospholipases. Inhibition assays are needed to confirm our finding.


Introduction
Allergic diseases have become a public health problem; the genetic background of patients (atopy) and the environmental conditions are considered the cause of the increased risk to develop allergic diseases 1 . Exposure to allergens (typically harmless antigens in the environment) also promotes an immune response mediated by IgE. Over the last few years, species belonging to the order Hymenoptera have been characterized as potential allergenic sources. They represent a common source of sensitization, with more than 200,000 species including bees, wasps, and ants. Most of the member of this order are cosmopolitan species, but some of them have an endemic distribution with a capacity of sensitization , like the Bombus sp. located more frequently in central and northern Europe, whereas the yellowjacket (YJ) (Vespula spp.) and honeybee (HB) (Apis mellifera) are allergenic sources in North America. Other wasps such as Polistinae are found in southern Europe and America [2][3][4] .
Allergic immune response from hymenopteran allergens has been studied in detail due to a high incidence of sting reactions to these insects. Approximately, 9.2% to 28.7% of the adult population is sensitized to the venom of hymenopterans 5 . Allergic response to Hymenoptera venom is one of the leading causes of anaphylaxis worldwide with a frequency of 27%, as compared to medications (41%) and foods (20%) 3,6 . Molecular, structural, and immunological characterization of hymenopteran venom allergens is advanced, in total 75 allergens from 31 different species have been explored, and since phospholipases are a family of allergens with clinical and biological relevance, some proteins belonging to this order such as hyaluronidase and antigen V are also considered relevant to sensitization to this allergenic 2,7,8 . Exposure to hymenoptera allergens is associated with bites and stings; it is considered that 56.6-94.5% of the general population have been bitten at least once in their life 9 .
Phospholipases (PLA) are a major component of the venom of these species, representing 75% of the total mass of the poison and has been characterized as one of the main allergens in Hymenoptera 10 . They can be found in venoms from other arthropods such as chelicerates, in the venom of ophidians, as well as in different tissues of mammals such as pancreatic juice, synovial arthritic fluid. The superfamily includes 42 groups distributed in four types: A, B, C, and D 11 . Phospholipases belonging to class A split into two groups: class A1 hydrolyzes the phospholipid ester bond between the first acyl and glycerol (1 acyl-SN-glycerol phosphate), while class A2 hydrolyzes the bond between the second acyl and glycerol (2 Acyl-SN-glycerol phosphate). They are a family of enzymes with different molecular weights, PLA1 has a molecular weight of 28 KDa, while PLA2 are classified as high molecular weight cytosolic PLA2 (40-85 kDa) and low molecular weight secretory PLA2 (14-18 kDa) with the capacity to hydrolyze fatty acids that are present on the cell membrane and other types of lipophilic substances or participate in the mechanism of regulation of gene expression through the production of free fatty acids, from which cyclooxygenases synthesize prostaglandins [12][13][14] .
The structure, function, mechanisms, and cell signaling of PLA have been extensively studied; one important aspect of PLA is their capacity to induce allergic responses. Several epitopes involved in the co-sensitization of some PLA that share structural homology and identity have been studied; this suggests a potential role in cross-reactivity. However, this is little understood and studies are needed to complement what has been reported. The aim of this work was to explore cross-reactivity and antigenicity of allergenic PLA using an in silico approach, using bioinformatics tools, where we identified several antigenic regions that may be involved in cross-reactivity among phospholipases.
Today it is evident how the use of bioinformatics tools for science has grown; it is considered the first step to carry out experimental studies because they create a functional prediction. Understanding and predicting an individual clinical cross-reactivity to allergens is key to better management, treatment, and progression of new therapies for allergy to Hymenoptera; prediction can be performed by methods for the identification and computational mapping of specific IgE epitopes or epitopes reported in the Immune Epitope Database and Analysis Resource, which can help identify the areas that may be affecting patients' health. Various studies have carried out on this methodology for predicting food allergen epitopes [16][17] .
The in silico methodology has been used in other work to report possible cross-reactivity based on proteins in studies of structural or functional homology, through bioinformatics tools 18 .

Selection of phospholipases and alignment
The amino acid sequences of phospholipases type A (A1 and A2) from 18 Hymenoptera species were selected according to the allergenic capacity reported. The sequences were obtained from the UniProt database (see Table 1 for a list of accession numbers). All Allergens that were reported in the WHO/IUIS Allergen Nomenclature Sub-Committee with a complete sequence were used. We did not include incomplete sequences for analysis. Three sequences are not reported as allergenic but were chosen to observe the differences in identity and the structures of several phospholipases. The identity degree among phospholipases was determined using the PRALINE web server. The parameters to perform the alignment were configured to use BLOSUM62 as the exchange matrix. The interactions used were 3 with an E value of 0.001.

Phylogenetic analysis
The Molecular Evolutionary Genetic Analysis (MEGA) program, version X was used to obtain phylogenetic trees, using the method of maximum parsimony of the taxa with the support of Bootstrap with 1000 repetitions as a measure of reliability and robustness under the assumption of a minimum evolution. In the topology, this model uses a comparative matrix to find the similarity between the amino acids of 18 sequences to establish the evolutionary proximity between the species. The matrix was constructed with all the amino acid sequences of the phospholipases recovered from the UniProt database and reported to the WHO/IUIS. Therefore, the more positive identity values found between the sequences, the greater their relationship will be, and the closer they will be located in the tree. All empty spaces were eliminated (complete deletions). From the global comparison and the homologies, the sum of the length of the branches (SBL) will be presented, which will determine the number of nodes and their position, including the "groups" of the evolutionarily closest sequences. Phylogenetic sub-analyses were carried out in order to identify the degree of identity of the groups formed. The alignment for phylogenetic analysis was carried out using CLUSTAL W, which performs alignments. The parameters to perform the multiple alignment were configured to use gap opening penalty of 10.00 and gap extension penalty of 0.20, and the divergent cutoff delay was 30%.

Generation of 3D models
The phospholipases with 3D structures not reported in the Protein Data Bank were obtained by modeling based on homology using the SWISS-MODEL server. Quality was evaluated by means of several tools, including the Ramachandran charts, WHATIF, the QMEAN4 index (The Qualitative Analysis of Energy Analysis) using ProSA-web and the SWISS-MODEL server. The results were expressed as a number between 0 and 1. Higher numbers indicate higher reliability and energy values (force field GROMOS96). ElliPro tools were used to predict lineal and discontinues epitopes on a representative phospholipase for group. Residues with larger scores are associated with greater solvent accessibility. Only residues with a score > 0.7 were selected.

Phospholipases found and phylogenetic results
We selected 18 sequences of allergenic phospholipases and three not allergenic to include in the analysis with 361 positions in the final dataset. The sequences were derived from several biological sources: five from bees, six wasps, three ants, and three sources not described as an allergen, mosquito, spider, and scorpion. The allergens of bees and wasps belong to group 1 and the ants to groups 1 and 2 ( Table 1).
The phylogenetic tree had a consistency index of 0.857256 with a retention index of 0.779682 and a composite index of

Identification of potential cross-reactive antigenic sites
Multiple alignments of the phospholipases of the different groups obtained from the phylogenetic analyzes were made.
We built four 3D models of the 18 phospholipases Ves s 1, Sol i 1, Culex quinquefasciatus and Centruroides hentzi. The remaining proteins were reported on the UniProt database. We considered structures for better visibility of antigenic patches, the parameters for structural quality control for homology models are found in Table 3. To compare the ElliPro results, we chose the main antigen patches with a score higher than 0.7 and more than three residues, taking as reference the epitope of one phospholipase of each group; group A: Ves m 1; group B: Bom p 1; group C: Sol i 1; Group D: Pol d 1 ( Table 2). The constitutional antigenic patches are shown in Figure 2.
Phospholipases from group A had a shared identity of 79% between their amino acid sequences ( Figure 3). A total of 704 residues were identified and conserved among the phospholipases analyzed, and for these group, we used Ves m 1 to identify the possible epitopes. We found three common linear antigenic patches and two constitutive antigenic patches with a score greater than 0.7.
Group B shares an identity of 35% between their amino acid sequences but when we exclude Api c 1, the identity increases to 64%. In total, 259 identical residues among the sequences were found. We found and included three linear epitopes and two discontinues antigenic patches in Bom p 1 with a score >0.7. (Figure 4).
Group C, which includes allergens from ants, showed the lowest identity, with only 23% and the highest number of gaps (600 residues missing). Sol i 1 was the protein furthest away from any of the Hymenoptera allergens and appears to be closely related with wasps' allergens. No common antigenic patches were detected; however, Sol l 1 presents an interesting antigenic patch with 46 residues and a score of 0.711.
For group D, 1916 residues exhibit an identity among the five sequences of allergens. This group exhibit a high identity Table 2. Residues conserved among phospholipases groups with antigenic potential.

Discussion
Phospholipases A1 and A2 are allergens of insects, which provide a diagnostic benefit for the differentiation of genuine cross-reactivity sensitization. However, the cross-reactivity of this group of allergens has scarcely been holistically explored. In this study, we were able to predict those possible antigenic regions that could explain the cross-reactivity of phospholipases in Hymenoptera through in silico analyses.
The 18 amino acid sequences of the allergens were aligned, and a phylogenetic analysis was carried out which yielded four monophylogenetic groups (A, B, C, D). Group A yielded the highest degree of identity among their amino acid sequences (79%). All the allergens of this group belong to the Vespula genus, one of the most studied sources of wasp allergens 7,19 .
In group B (Bom p 1, Bom t 1, Api m 1, Api c 1) two analyses were conducted, the first with the presence of the Api c 1 allergen where a degree of identity of (35%) was found and the second without the allergen, where we found a higher degree of identity at (64%). This showed that the alignment of these three species could explain a possible cross-reactivity. Group C (Pol a 1, Pol d 1, Poly p 1, Vesp c 1, Dol m 1.02) showed a level of identity of (64%). However, analysis of conserved and affected residues showed that Group A shares three antigenic regions that could contribute to their cross-reactivity.
IgE against cross-reactive carbohydrate determinants (CCD) is one of the main causes of double positivity and is present in most hymenopteran venom allergens with more frequency in venom from HB and YJ in patients that are allergic to insect bites 20 . The prevalence of this allergen has been described in more than 20% of patients allergic to honeybee venom; approximately one of four HB poisons and one of 10 YJ venom allergens have been found to be CCDsIgE-positive. The PLA2 structure contains the insect CCDs that are specified by the presence of a 3-core α-1 fucose 21  Currently, CCD-free allergens have been known to allow cross reactivity between proteins to be found without having a double positivity. Ves v 1, Api m 1, Dol m 1, Pol d 1 are allergens that lack cross-reactivity based on CCD and allow diagnoses without interference 19,23,24 . However, it should be clarified that these are mostly of recombinant origin because in its purified natural form possess CCD; for example, Api m 1 of natural origin has CCD and makes diagnosis difficult 4 . On the other hand, Sol i 1 is the only PLA1 hymenopteran venom known to have CCD, which could make the specific diagnosis of fire ant allergy difficult 25 .
Research on the allergenic capacity of Hymenoptera allergens has been characterized by individualized studies, with Api m 1, Sol i 1, Pol d 1, Ves m 1 among those most studied so far, but the possible cross-reactivity between phospholipase allergens A1 and A2 has not been holistically evaluated 2,24,26 .
No cross-reactivity between A. mellifera, S. invicta and V. vulgaris was detected, which supports our results, since there was no relationship between these allergens. However, when analyzed along with other allergens, it was observed that a certain degree of identity is maintained between these two proteins, suggesting a possible cross reactivity without CCD. Group A (Ves m 1, Ves s 1 and Ves v 1) being the most representative, the cross reactivity between Vespula spp. is strong due to the similarities in the composition of the poison and the structure of the individual allergens 27 . Different studies evaluate the identity of the yellow jackets; for example, a 1996 study reported that Ves v 1 had 95% identity with Ves m 1 and both yellow jacket phospholipases have about 67% sequence identity with the hornet protein Dol m 1 7 . Other authors demonstrated that Ves v 1 also shows an identity of 54% with Poly p 1, it being the lowest among the allergens studied and a study carried out in Spain with 59 previously diagnosed allergic patients with an allergy to vespids found that there could be a double sensitization between Ves v 1 and Pol d 1 because in 31% of patients they could not be clearly defined as sensitized only to Vespula or Polistes 28,29 . Consequently, the different Vespula poisons react strongly in a crossed manner, which would explain the high degree of identity found in the study (Group A (79%)). Of the three proteins, only Ves v 1 has been described as a CCD allergen, showing that this interaction between the Vespula phospholipases could be CCD-independent and related only by protein structure 19 .
The quaternary structure of the three Vespula phospholipases is also very similar, suggesting the possibility of present both linear and conformational epitopes ( Figure 6A). Therefore, we suggested that fragment inhibition studies be carried out to identify the possible antigenic peptide described in this study.
Group B showed a degree of identity of 35%, however, in the analysis, we found that if we performed the alignment without the Api c 1 allergen, the degree of conservation between Api m 1, Bom p 1 and Bom t 1 increased to 64%. So far, we have found no more information about the possible cross reactivity in these allergens. In this group, Api m 1 is the most characterized allergen; It contains the cross-reactive carbohydrate (CCD) determinants of insects that are defined by the presence of a 3-core α-1 fucose 30 .
For years, the detection of Api m 1 CCD challenges the differentiation of HB and YJ allergy. However, in vitro detection of immunoreactive sIgE from these insects showed double positivity in up to 59% of the patients 24 . PLA2s possess important venom allergens in other members of the genus Apis and Bombus that have been shown to have homology.
A. cerena (Api c 1) have been little explored but have been described as having high identity levels with other phospholipases, like A. mellifera (95%) 26 . In our study, we observed that when comparing the sequences of these phospholipases with those of the genus Bombus, that identity was not preserved since the identity we found was very low and when excluding it from the alignment, the sequences were more conserved 31 . Studies conducted on the genus Bombus found that the primary sequences of Bom t 1 and Api m 1 have an identity of 53% and their three-dimensional structures show conserved low protein surfaces 32 . However, the allergens selected from group B in our study showed a high conservation and structural homology leading to possible cross-reactivity ( Figure 6B).
As for Group C, we highlight that it was the only group that included phospholipases A1 and A2 in the clade, so a low identity was expected. We found that the ant phospholipases Sol i 1, Sol i 2 and Sol s 2 showed a degree of alignment identity with the other phospholipases in the primary sequences of 23%. This low identity is not enough to explain cross-reactivity in silico, even though allergen Sol i 1 has been extensively analyzed and other studies suggest that it may have a possible reactivity with the Centruroides species 33,34 .
The phylogenetic analyzes reported in this study revealed that Sol i 1 is the most divergent member among the currently identified hymenopteran venom group PLA1. As noted, Sol i 1 is in a group (group C) completely isolated from the clade consisting of wasp allergenic PLA1 (group A) and showed no structural homology ( Figure 6C). Furthermore, in multiple alignments, the fire ant exhibits the lowest level of sequence identity. However, studies have shown cross-reactivity between Sol i 1 and its wasp counterparts with amino acid sequence identity levels of 38% with Ves m 1, 36% with Ves v 1, 40% with Dol m, 1.35% with Pol d. 1.36% with Poly p 1 35 . However, a recent study suggests that peptide-based cross-reactivity between Sol i 1 and PLA1 of Polistinae wasps does not occur because the alignments and the phylogenetic and structural analyzes showed that it is an allergen further from its counterparts, in addition to possessing the lowest level of identity among the sequences studied, with 36%, and the highest RMSD value with 0.172 29 .
Several works have attempted to demonstrate cross-reactivity between A1 phospholipases 29,36 . The cross-reactivity based on PLA1 of the venoms of eight hymenoptera was analyzed and it was described that the identity of the primary sequence of Poly p 1 was conserved in 36% with Sol i 1, 74% with Pol d 1 and 71 % with Pol a 1. In our study no relationship was found between Poly p 1 with Sol i 1. However, group D, where we found the different species of Polistes (Pol a 1 and Pol s 1), Poly p 1, Dol m 1 and Vesp c 1, showed a high degree of identity of 64% and structure homology ( Figure 6D), enough to explain cross-reactivity 29 . An attempt was made to look for cross reactivity between Dol m 1, Ves v 1 and pol a 1 with mice; partial cross-reactivities in the T-cell epitopes of homologous vespid allergens was found, which supports our findings 7,29,36 .
Of the species chosen, three non-allergenic phospholipases (Centruroides hentzi, Parasteatoda tepidariorum, Culex quinquefasciatus) were taken to adjust the phylogenetic analysis, so as the results were produced, we observed that these phospholipases separated into two clades showing some affinity for some phospholipases allergens.
A study identified allergens in the venom of common striped scorpions. Eleven patients with scorpion venom allergy were assessed, where four patients had a history of anaphylaxis (with positive skin test responses) to imported fire ant venom (IFA) and at least two other had a history of large local reactions, suggesting that there could be a cross reactivity between proteins of these insects; this association would be clinically relevant 29 . This shows that despite not being described as allergens, it is necessary to carry out studies to verify their capacity to trigger sensitization.
Bioinformatic studies are high impact tools of great importance. Currently they are recognized as the first step to conducting an investigation, since they are in silico analyzes that facilitate a possible approximation to expected results, allow predictions or models, and serve as the basis for the emergence of large projects. In our study, we show possible antigenic regions involved in cross-reactivity between phospholipases A1 and A2, based on what was found with the use of in silico analysis we can say that they are proteins with a high degree of identity and that three antigenic regions were found, which would explain possible co-sensitization.

Conclusion
Potential antigenic sites were identified for the generation of cross-reactivity between the phospholipases analyzed in this study. The identity between these proteins of different species is relatively high, which shows that cross-reactivity between them is possible and their frequency in most cases can be high. These studies support diagnostic testing by component studies for venom allergy and the need to carry targeted mutagenesis tests is important to confirm their relevance in the allergenic capacity of phospholipases.

Data availability
All data underlying the results are available as part of the article and no additional source data are required.