Epitope specificity and protein signaling interactions driving epidemic occurrences of Ebola disease

Ebolavirus has as main hosts, humans and nonhuman primates where its pathogenic effects result to serious hemorrhagic fever with lethal effects. Despite the great advancement in deciphering the clinical course of the virus, specific mechanisms favoring Ebolavirus pathogenicity and transmission, and which genomic structures are most antigenic, are still to be clearly delineated. This study used functional protein phylogenetic analysis, pathway designs and antigenic epitope predictions to respectively; identify viral genomic regions closely related to host proteins, predict protein/genetic interactions favoring viral pathogenesis and identify frequency of MHC class I & II immune related host peptide variants whose transmission intensity value favors disease epidemicity. Viral glycoprotein (VGP) presented the highest genetic variation and though captured on the network with matrix protein (MXP), no direct interaction was observed. The majority of host interacting proteins presented with kinase functions, particularly a protein-signaling role observed in LCK, a Tyrosine-protein kinase with the most dominant interactions and viral related functions implicated in disease shock events. Four VGP and three MXP main antigenic epitopes identified, differentially showed high frequency to two MHC class I types. The same pattern was observed for VGP and MXP antigenic epitopes predicted to MHC class II allele variants, favoring high transmission intensity values within the host population, suggesting their involvement in Ebola epidemic upsurges. Related Ebola species with high transmission values were dominantly non-Zaire Ebolaviruses whose antigenic regions showed several repeats, implicating them in viral antigenic variations. Our analysis show that VGP and MXP are both critical for viral entry and pathogenicity in the host and with their species specific occurrence, their combined role in drug/vaccine design is critical. The identification of several antigenic epitopes in this study will be used in combination for drug/vaccine design and for better understanding related molecular targets in pathogenic pathways favoring Ebola disease burden.


Introduction
Ebolavirus is responsible for explosive though rare outbreaks of haemorrhagic fever common in equatorial Africa. The virus is transmitted from person to person through infected body fluids such as faeces, vomit and blood (Dowell et al., 1999). Ebolavirus comprises mainly of four subtypes; Zaire, Sudan, Cote d'Ivoire and Reston. Ebola ancestors are not clearly defined, though Marburg virus, which belongs in the same family as Ebola (Filoviridae; a grouping consisting of nonsegmented, negative-stranded RNA viruses) is a possible ancestor (Mayo & Pringle, 1998;Sanchez et al., 1993).
The average rate of case fatalities due to Ebolavirus is 78% (Kuhn, 2008). The Ebola outbreak in 2014 was considered the largest in history and estimates show about 2,473 infections with 1350 deaths occurring (WHO). The 2014 outbreak was first identified in February in Guinea, West Africa (Burke, 2003), then in March it spread into Liberia, to Sierra Leone in May and finally to Nigeria towards the end of July. This outbreak was considered the largest because of its exponential expansion, doubling every 34.8 days.
In major cities such as Conakry (Guinea), Monrovia (Liberia), Freetown (Sierra Leone) and Lagos (Nigeria) where viral emergence was observed, the spectre of local and international viral dissemination was raised.
Currently, no approved drugs have been identified against Ebolavirus disease, though rapid diagnosis, intensive care and support is playing a major role in improving survival. Characteristics of the Ebola disease include; high fever, headache, body aches, intense weakness, stomach pain, and lack of appetite. Manifestations of Ebola in patients are: vomiting, diarrhea, rash, impaired kidney and liver function and for critical cases, external and internal bleeding (http://www.nih.gov/news/health/aug2014/od-29.htm).
The current Ebola epidemic raised awareness worldwide given its international transmission observed in the US and Spain. Several genetic approaches to understanding the disease are currently being tested, the majority investigating the genomic composition of the virus, pathogenic pathways within the host and immune variants (MHC-HLA class I & II) commonly expressed by Ebola disease survivors and non-survivors (Sullivan et al., 2003). The HIV, ENV glycoprotein (gp160) and Ebola virus envelope glycoprotein (VGP) are genetically closely related hence, on disrupting the glycoprotein transmembrane domain of MHC class I at the cell surface, a facilitated viral evasion of the immune system was observed (Ploegh, 1998). This viral interaction at the cell surface is considered to occur with the CD209 antigen (DC-SIGN), making the VGP a prominent area of research for drug/vaccine development.
The Ebolavirus genome is 19 kb in length, with seven open reading frames that encode structural proteins including virion envelope glycoprotein (VGP), nucleoprotein (NP), and matrix proteins VP24 and VP40; nonstructural proteins, including VP30 and VP35; and the viral polymerase (Sanchez et al., 2001). Unlike the Marburg virus, the GP open reading frame of Ebolavirus has two gene products, a soluble 60-to 70-kDa protein (sGP) and a full-length 150-to 170-kDa protein (GP) that inserts into the viral membrane (Sanchez et al., 1996;Volchkov et al., 1995), through transcriptional editing.
The adaptive immune and inflammatory systems both respond to viral infection synchronously, such that some types of cells (e.g.: monocytes and macrophages) become principal targets for disease pathogenesis. This infection criteria was first suggested based on observations of Ebolavirus immunohistochemical localization in vivo, therefore, mononuclear phagocytes (monocytes and macrophages), endothelial cells, and hepatocytes constitute the main infectious targets (Baskerville et al., 1978;Baskerville et al., 1985). The recognition of specific Ebolavirus VGP epitopes by monoclonal antibodies conveyed immune protection in a murine model (Wilson et al., 2000) and in guinea pigs (Parren et al., 2002) infected with Ebolavirus.
From the current information known about viral interaction with host genes and pathways favoring cytokine activation, this study was undertaken to analyze the relationship between predicted viral antigenic epitopes and their population frequency as a means of transmission risk. This Ebola species-epitope variation was used to decipher viral species diversity and their related pathway, and how this interplay favors disease intensity within the host. The different viral epitopes identified in this study and their related immune peptides served as a means for predicting which viral species could be commonly found in a given Ebolavirus epidemic. This study opens up epidemiological aspects of viral pathogenesis and possible targets for drug/vaccine design.

Phylogenetic analysis
The sequence alignment for all retrieved protein variants of different Ebola sequences showed a 1.3% identity across all sequences and a 14.8% pairwise identity across amino acid residues for all sites. These values further improved for each genomic variant analysed for all Ebola species as follows: 23.3% identity across all sites and 68.8% pairwise identity for nuclear proteins, 17.5% identity across all sites and 57.2% pairwise identity for virion glycoproteins and 23.9% identity across all sites and 70.6% pairwise identity respectively for matrix proteins. The phylogenetic analysis showed two main clusters ( Figure 1A) of nuclear proteins on one arm (blue cluster) and the other arm contained glycoproteins in a sub cluster (red), and matrix proteins in another sub cluster (blue). Mammalian related immune protein sequences associated with viral antigenicity composed the rest of the sequences. The red sub cluster described above was further subjected to phylogenetic analysis and four sub clusters were identified ( Figure 1B) with the red composed of the Zaire Ebolavirus (29 variants), the black composed of the Reston Ebolavirus (14 variants) the green composed of the Sudan Ebola Figure 1. Functional-Phylogenetic analysis of Ebola species. 348 amino acid sequences for Ebolavirus species and mammals were aligned and their phylogeny on the unrooted tree showed two distinct clades (1A) wherein the blue cluster belonged to nuclear protein sequences of the viral species while the other cluster contained viral glycoprotein sequences (red sub cluster) and viral matrix proteins (green sub cluster) and the rest were dominantly mammalian proteins. 1B) The viral glycoproteins were subjected to further phylogenetic analysis and the unrooted tree showed four main clusters belonging to different species as follows; the red composed of Zaire Ebolavirus (29 variants), the black composed of the Reston Ebolavirus (14 variants) the green composed of the Sudan Ebolavirus (11 variants) and the blue composed of Bundibugyo and Taï forest Ebolaviruses (8 variants).

Protein-protein interaction analysis
The predicted network showed 69 nodes with 123 edges (Figure 2). The network analysis distinguished from most important to least important genes (nodes) depicted by name size (Figure 2

Figure 2. Ebolavirus protein interaction and pathway analysis. A)
The 348 extracted sequences were used to predict a protein interaction map with 69 nodes and 123 edges. Several mammalian sequences captured on the network showed direct interactions with each other with one main protein (LCK) a tyrosine kinase was involved with virus interactions and signalling functions. Name size related to number of interactions each protein had within the network. Two viral proteins were captured in the network; VP40 a matrix protein and ENV a glycoprotein. The red node at top of the network interacting with several other nodes represents a cluster of microRNAs within the network and their related genes they regulate based on their interactions. B) Several regulatory elements were seen to interact with the nodes of interest (red circles). MiRNAs (yellow squares), drug targets (purple squares), transcription factors (light green squares).

Epitope prediction and MHC HLA class I & II allele distribution among Ebolavirus species
The most antigenic epitopes for each virus structural proteins (glycoproteins and matrix proteins) were statistically evaluated for their peptide antigenic score against peptide frequency (count) for the HLA allele types ( Figure 5A-F). For HLA type I, the VGP epitope peptides showed t = 0.24338 while the MXP proteins showed t = 0.09190. For the HLA type II (HLA-DRB alleles), the VGP showed t = 2.8506E-107 while the MXP showed t = 2.21965E-49 and for HLA type II (HLA-DQA-A alleles), the VGP showed t = 8.96323E-08 while the MXP showed t = 0,046921205. For HLA class I alleles, four main epitopes (FLLQLNETI, FLTNTIRGV, FQVDHLTYV and NMTNFLFQV) were predicted for VGP while three main epitopes (AIMLASYTV, ALLTGSYTI and LTM-VITPDY) were predicted for MXP. Both viral structural proteins showed high frequency values for HLA*A02:01 and HLA*A02:02 alleles ( Figure 6A, Table S2) with VGP higher for the latter than the former and vice versa for MXP. The validated epitopes for VGP showed that; FLLQLNETI was specific to HLA*A02:02 while FQVDHLTYV and NMTNFLFQV was variably distributed for all the selected HLA class I alleles ( Figure 6B). The validated epitopes for MXP showed that; AIMLASYTV was variably distributed for selected HLA class I alleles while ALLTGSYTI and LTMVITPDY were allele specific ( Figure 6C). FLLQLNETI was specific to Zaire Ebolavirus with a unique occurrence while the other epitopes were either specific or distributed among Bundibugyo and Taï forest Ebolaviruses with multiple occurrences ( Figure 6D). AIMLASYTV was more specific to Reston and Sudan Ebolaviruses with multiple occurrences while the other epitopes were either specific to Sudan Ebolavirus or Marburg virus with unique occurrences ( Figure 6E).
Several epitopes were predicted for HLA class II-DQA-B alleles; VGP (n = 41) and MXP (n = 11) showing high frequencies for HLA-DQA1_03_01-DQB1_03_01 and HLA-DQA1_05_04-DQB1_05_03 ( Figure 8A, Table S4) with VGP higher for the former than for the latter. The four epitopes with highest VGP frequencies per HLA-DQA-B allele coverage were; ETEY-LFEVDNLTYVQ (13.55%) & COV = 3/9, FHKEGAFFLYDR-LAS (8.64%) & COV = 1/9, RGVNFAEGVIAFLIL (7.24%) & COV = 3/9 and WVIILFQRTFSIPLG (6.78%) and COV = 1/9 ( Figure Figure 8C). Based on VGP epitope and HLA-DQA-B distribution per Ebola species; ETEYLFEVDNLTYVQ and FHKEGAFFLYDRLAS was dominantly Zaire Ebolavirus, RGVNFAEGVIAFLIL was dominantly Sudan and Reston Ebolaviruses and WVIILFQRTFSIPLG was solely Zaire Ebolavirus ( Figure 8D). Based on MXP epitope and HLA-DQA-B distribution per Ebola species; YSFDSTTAAIM-LASY was relatively distributed equally among the different Ebolaviruses, HTPSGVASAFILEAT was dominantly Reston Ebolavirus, IYSFDSTTAAIMLAS was solely Reston Ebolavirus and HTPSGVASAFILEAK was equally distributed between Reston and Taï Ebolaviruses ( Figure 8E).   AYPMSILPTRPSVIV  DFKIVPIDPAKSIIG  DKIQTIMNAIPDLKI  DLTFPEKIQAIMTSL  EPTVKGVPAWLPLGI  GAVNVLRPGLSLHPK  GIPDHPLRLLRLGNQ  GQPIIPILLPKYIGM  GQPIIPVLLPKYVGL  HPLRLLRMGNQAFLQ  IMLASYTITHFGKAT  IPDHPLRLLRLGNQA  IPIWLPLGIADQKMY  IPIWLPLGVADQKTY  IQTIMNAIPDLKIVP  ISFHPKLRPILLPGR  ITHFGKTSNPLVRIN  IYSFNSTTAAIMLAS  KIQTIMNAIPDLKIV  KQIPIWLPLGVSDQK  LPIKSSRAVSGIQQK  LPKYIGLDPVAPGDL  LPKYVGLDPISPGDL  MEAVYPMRTMNSG…  MQDFKIVPIDPAKSI  NALRPGLSLHPKLRP  NTYMQYLNPPPYADH  PAHPLRMLREGNQAF  PEYIEAVYPMRTVST  PEYMEAVYPMRTM…  PIWLPLGIADQKMYS  PLRLLRLGNQAFLQE  PMSMLPIKSSRAVSG  PQYFTFDLTALKLIT  QAIMNFLQDLKIVPI  QAIMTSLQDLKIVPI  QPIIPVLLPKYIGLD  QTIVNLMQDFKIVPI  RPGLSFHPKLRPVLL  SGALRPGLSFHPKLR  TAPPEYIEAVYPMRT  TFDLTALKLITQPLP  TTAAIMLASYTITHF  TYPMSILPTRPSVIV  VLMKQIPIWLPLGIA  VNALRPGLSLHPKLR  WLPLGIMSNFEYPLA  YIGLDPISPGDLTMV  YPMSILPTRPSVIVN  YVGLDPISPGDLTMV   AGLITGGRRARREAI  CLSYLYQKPRTRSLT  EAAVSHLTTLATIST  EGVVAFLILPQAKKD  ELSFTAVSNRAKNIS  ETEYLFEVDNLTYVQ  FSLINRHAIDFLLTR  GIGIIGVIIAITALL  GPGIEGLYTAVLIKN  GYAFHKEGAFFLYDR  HLMGFTLSGQKVADS  IFISLRLFVFQSRGR  IILFHKVFSIPLGVV  ILNRKAIDFLLQRWG  KIKPTVSVIFISLRL  KTSFLVWAIILFQRA  LGIVTNSTLRATEID  LILIQGTKNLPILEI  LRLFVFQSRGRQVLF  LVYFRRKRSILWREG  NLHFQILSTHTNNSS  PIPLGVVHNNTLQVS  PMTTTIALSPTMTSK  QFLFQLNDTIHLHQQ  QLFLRATTELRTYSL  RGEELSFEALSLNET  RTFSILNRKAIDFLL  RWGFRAGVPPKVVNC  SEELSVIFVPRAQDP  SKQLQGENLHFQILS  SPVVSVLTAGRTEEM  STNQLRSVGLNLEGN  STVIYRGVNFAEGVI  TEFLFQVDHLTYVQL  TGTLIWKVNPTVDTG  TNTIAGVAGLITGGR  TRKIRSEELSFTVVS  TTEFLFQVDHLTYVQ Figure 5. Analysis of peptide antigenicity of viral sequences and their related frequency within a host population. Antigenic epitopes were predicted for MHC class I and MHC class II. Four (A) against three (B) epitopes for VCP and MXP respectively were validated for MHC class I alleles, 221 (C) against 154 (D) epitopes for VGP and MXP respectively were validated for MHC class II HLA-DRB1 alleles and 41 (E) against 11 (F) epitopes for VGP and MXP respectively were validated for MHC class II HLA-DQA-B alleles. Generally their peptide antigenicity was higher that host population frequency. Frequency of antigenic peptides for both VGP and MXP were dominant for HLA*A02:01 and HLA*A02:02 alleles (A) with VGP frequency higher in the latter than in the former and vice versa. Epitope distribution for the different immune alleles showed that antigenic sites for each genomic structure (VGP and MXP) could be recognised by particular immune alleles while others were recognized by all (B, VGP and C, MXP). The identified epitopes were seen to occur variably among species with some epitopes being specific to a given viral specie and others common to a species (D, VGP and E, MXP).    For the top four selected Epitopes, frequency distribution for the different immune alleles showed that antigenic sites for each genomic structure (VGP and MXP) could be recognised by particular immune alleles, others were recognized by all and some could not be recognized by any of the immune alleles (B, VGP and C, MXP). The identified epitopes were seen to occur variably among species with some epitopes being specific to a given viral specie and several others common to a species and for some species, none had the epitope site (D, VGP and E, MXP).  Frequency of antigenic peptides for both VGP and MXP were dominant for HLA-DQA1_05_04-DQB1_05_03 allele with that of VGP higher than MXP but only MXP presented a high frequency for HLA-DQA1_03_01-DQB1_03_01 allele (A). For the top four selected epitopes, frequency distribution for the different immune alleles showed that antigenic sites for each genomic structure (VGP and MXP) could be recognised by particular immune alleles, others were recognized by all and some could not be recognized by any of the immune alleles (B, VGP and C, MXP). The identified epitopes were seen to occur variably among species with some epitopes being specific to a given viral specie and several others common to a species and for some species, none had the epitope site (D, VGP and E, MXP).

Prediction of transmission intensity (TRI) values in the wake of Ebola epidemic
Based on the data collected, peptide scores and peptide frequencies expressed in the viral population (simulating host population) were used to calculate the transmission intensity ratio predicting how fast or slow the virus will spread within its host population ( Table 2).
The majority of the MHC HLA class I peptides showed a value greater than 1 while the majority of the MHC HLA class II peptides showed a value less than one. In general, TRI < 1, showed multiple expressed epitopes within each species and high distribution within the population and vice versa for TRI > 1. Zaire and Reston Ebolaviruses showed the lowest TRIs in the selected data.

Discussion
In this study, Ebolavirus amino acid sequence data was obtained for Zaire, Sudan, Côte d'Ivoire (Taï forest), Uganda (Bundibugyo) and Philippines (Reston) species. These sequences were subjected to phylogenetic analysis, which took into consideration all known functional motifs and domains for each sequence to construct the tree. It was observed that, species variation was higher for VGP compared to Nuclear and Matrix proteins of the viral genome. This observation could have resulted from pairwise alignments, whose identity scores accounted for species specific clustering observed.  (2002) showed that the mean amino acid distances between the same NP subtypes was about 45%, compared to 65-66% in VGP. These values were higher than those observed in this work with probable difference being that we performed an amino acid sequence alignment, whilst the authors carried out a nucleotide sequence alignment. The longest branch on the tree associated with Marburg virus suggest an ancestral lineage which could be a consequence of evolutionary time wherein, Marburg virus and Ebolavirus were isolated in 1967 (Siegert et al., 1967) and 1976(Bowen et al., 1977Johnson et al., 1977), respectively. They also show a genomic difference ≥50% at the nucleotide level (Towner et al., 2006). The presence of one gene overlap in Marburg virus compared to several in Ebolavirus delineates the main genomic difference between them (Kuhn, 2008).
The observed functional relationship between related hosts (mammal) and Ebolavirus proteins, identified within the interactome, showed direct interactions between the former, but none was observed between VP40 and ENV (gp160), and likewise to host proteins. Given the identification of gag, pol, and env (ENV) retroviral genes, the pol gene, which encodes reverse transcriptase (RT), is considered the most conserved of the retroid elements (Simmons et al., 2002). The polypeptide encoded by the env gene is cleaved into two proteins; a surface protein for receptor recognition, and a transmembrane subunit, which anchors the ENV complex to the membrane. The ENV transmembrane unit interacts directly with the cell membrane, favoring viral penetration (Bénit et al., 2001) into the cell which has been elucidated for human immunodeficiency virus type 1 (HIV-1) (Connolly et al., 1999;Xu et al., 1998) and Ebolavirus (Sanchez et al., 2001). The ENV-dependent pathways for HIV accounts for almost 50% of antigen presentation mediated principally by the lectin DC-SIGN (CD209). CD209 molecules, form part of dendritic cells equipped with molecular diversity for HIV-1 capture, which binds the viral envelope gp120 with high affinity. Efficient cross presentation of HIV antigens from incoming virions activates anti-HIV CD8 + cytotoxic T lymphocytes. The major role of CD209 in viral entry into the host cell, solidifies its identity within the predicted interactome. Though the observed functional roles include intracellular virion transport, protein signaling and peptide identification, associated with CD209, no direct interaction was observed with the viral structural genes predicted on the network. This could suggest a domain specific function in CD209 for viral-host interactions, and though similar in Ebolaviruses, the functional pathway may be different.
The Ebolavirus VGP is synthesized as a secreted (sGP) or full-length transmembrane form, having distinct biochemical and biological properties. Multiple lines of evidence suggest that VGP plays a prominent role in clinical manifestations of Ebolavirus infection. A change of immune response could result from inhibition of neutrophil activation by sGP, while the transmembrane form could contribute to symptoms of hemorrhagic fever due to viral attack of reticuloendothelial and blood vessel cells (Ströher et al., 2001;Yang et al., 1998). The different host cell targets and roles played by sGP and the transmembrane form, makes them potential targets that could be exploited for control of Ebolaviral burden. The type of preferred cells by each VGP form, should determine the type of pathogenic manifestation and nature of transmission intensity observed within the population. Therefore, drug targets common across viral species or specific to most virulent forms will be vital for quick clearance of an infection.
The assembly and budding of Ebolaviruses from the cell membrane is a process directed by the viral matrix protein, VP40 (Harty et al., 2000;Panchal et al., 2003). VP40 alone is capable of assembling and budding filamentous virus-like particles from cells (Geisbert & Jahrling, 1995;Johnson et al., 2006;Noda et al., 2002). The specific interaction of VP40 with membrane-associated VGP and VP24 during the budding process, is a possible occurrence. Work done by Bornholdt et al. (2013), showed that binding of VP40 to RNA forms a ring structure at a perinuclear location. Observation of these rings was only in infected cells, but not mature purified Ebolaviruses (Gomis-Rüth et al., 2003), which suggests that, the ring structure was not necessary for matrix assembly. Therefore the VP40 RNAbinding rings could possess a critical function within infected cells (Gomis-Rüth et al., 2003;Hoenen et al., 2010). Previous data showed that, recombinant viruses that have an RNA-binding knockout mutation for VP40 (R134A) failed to replicate (Hoenen et al., 2005), suggesting a dual role for VP40, which includes: viral transcription with assembly, and budding, for the release of new viral particles. Given that VP40 is structural and shows possible interactions with VGP, suggests a contributory role to host immune activation pathway, and though directing different cellular processes, their immune-interaction makes them potential targets for vaccine design and control of Ebolavirus disease.
The close relationship between Ebola genomic structures; VP40, CD209 and ENV and between HIV-1 and Ebolavirus, motivated our use of VP40 and VGP for the prediction of antigenic peptide epitopes required for generating potent host antibodies against the virus. The interactive nature between ENV and CD209 proteins, could explain the downstream signaling reactions necessary for cytokine activation. Identifying target signaling molecules activated by interaction of CD209 with Ebolavirus antigens, could suggest a possible path for curbing the loss of body fluids during epidemic outbreaks. The presence of LCK and its kinase function suggests a protein signaling role in driving the infection and loss of fluid pathway associated with Ebolavirus. Its involvement in CD4 and CD8 binding makes its pathway, an interesting target area for Ebolavirus disease control. The signaling of T cell receptors (TCR), which results from peptide-MHC binding, requires the phosphorylation of immunoreceptor, tyrosine-based activation motifs (ITAMs) and CD3 complex, associated with SFK Lck (Palacios & Weiss, 2004). Generally, it is accepted that, the activation of T cells favors Lck activation by pMHC binding. Given the stable association of Lck to CD4 and CD8 coreceptors (Rudd et al., 1988;Veillette et al., 1988), other interaction models propose that MHC binding activates Lck by trans-autophosphorylation. Basing on this information, the role-played by LCK within the network and its interaction with immune cells becomes very clear. The implication of the LCK gene in Ebolavirus disease intensity could be related to a hyper activation cascade leading to massive release of cytokines (Cytokine storm), which is an event common to patients with septic shock (Mackenzie & Lever, 2007). The Lck protein kinase pathway identified in the network as a cluster, showed interaction with proteins including SRC (Proto-oncogene tyrosine-protein kinase), MERTK (Tyrosine-protein kinase), TYRO3 (Tyrosine-protein kinase receptor), UFO (Tyrosine-protein kinase receptor), ACK1 (Activated CDC42 kinase 1), BMPR2 (Bone morphogenic protein receptor type-2), YES (Tyrosine-protein kinase) and FYN (tyrosine protein kinase). The UniProt Knowledge Base (KB) (www.uniprot.org), associates these proteins to downstream ligand binders and signal transducers, which activates other proteins in a phosphokinase related interaction type. Several of these are involved in disease pathways including SRC (colon carcinoma), MERTK (retinitis pigmentosa, MIM: 613862), UFO (thyroid tumorigenesis) and BMPR2 (pulmonary hypertension primary, PPHL 1, MIM: 178600). This cluster suggests that, the role played by LCK in kinase function and implicates it to epidemic levels observed for Ebolavirus disease.
MiRNAs are potentially known to regulate several proteins (Grimson et al., 2007) and also modulate their concentrations in a dose-dependent manner (Baek et al., 2008;Selbach et al., 2008).
After Grigoryev et al. (2011) identified several miRNAs significantly differentially expressed in T-lymphocytes, they showed a decrease in mRNA expression for target genes of upregulated miRNA molecules. This was confirmed by inhibiting two of these miRNA targets (miR-221 and miR-155) in CD4 + cells, which increased lymphocyte proliferation by liberating four (PIK3R1, FOS, IRS2, and IKBKE) gene targets, hence favouring their proliferation and survival. MiR-155 overexpression has been shown to increase in vivo antiviral CD8 + T cell responses (Gracias et al., 2013). Among the highlighted miRNA gene targets above, IKKE (IKBKE) was the only one captured in this study and identified in the top three proteins. MiRNAs and their related disease pathway, hsa05131 (microbial infection, shigellosis) and cellular process, hsa01430 (cell communication), could suggest the possibility of successful viral transmission and disease occurrence through gene regulation within each pathway. Knowledge of other regulatory elements such as transcription factors and drug targets for related genes within the network, suggests a complex regulatory platform for the control of Ebolavirus disease. Transcription factors to gene targets like CD209 known for antigen binding and immune presentation could be further molecularly analysed as potential targets for gene regulation and disease prevention within the host.
Gene products in the HLA region (chromosome 6) facilitate antigenic protein processing and peptides presentation to CD8 + cytotoxic T lymphocytes (CTLs) and CD4 + T helper cells (Coffin, 1996;Dyer et al., 1997). HLA allele's multiplicity has favoured great species diversity as individuals differentially adapt to the wide range of microbial agents such as HIV-1. Optimal immune defence against HIV-1 could depend on the combined intervention of CD4 + and CD8 + lymphocytes (Rosenberg et al., 1997) wherein each polymorphic class I and class II loci contributes separately to the resultant immune response. CD209 favors specificity of HIV antigen presentation to MHC-II alleles while dendritic cells promote HIV-specific CD4 + cells activation and eventual infection. Similarity between HIV-1 and Ebolavirus in the use of antigen-immune interactions to penetrate the host cell could be seen from related genes identified in the network. Components of the immune system with protective role against infection of Ebolavirus are not clearly known due to the fact that the pathogen replicates at unexpected high rates, overwhelming host infected cells and immune defenses with related protein synthesis mechanisms (Sanchez et al., 2001).
From our study, HLA class I peptide antigenicity and frequency values for VGP and MXP within the pathogen population, showed no significant statistical difference. Their specificity was tied to HLA*A02:01 and HLA*A02:02 alleles for all the validated peptides, unlike differential selectivity observed for other tested host alleles. This was in conformity with host expression of HLA*A02 allele against the HIV-1 virus expressed in the Botswana (Novitsky et al., 2001) and Rwandan population (Tang et al., 1999). The FLLQLNETI epitope for VGP was specific to the Zaire Ebolavirus with a single occurrence while the other VGP and all the MXP epitopes were related to other viral species with multiple occurrences. Although the Zaire strain has been considered the most lethal (Feldmann et al., 1994;Sanchez et al., 1996), the observed transmission intensity (TRI) for the FLLQLNETI epitope was relatively low compared to the FLTNTIRGV epitope for VGP (Bundibugyo and Taï forest), with a TRI twice that of the former. This brings two questions to mind; 1) Is the Ebolavirus epidemic caused mainly by the Zaire Ebola species? and, 2) What causes the sudden change in antigenicity that results in epidemic trends currently observed in West Africa (Liberia, Sierra Leone and Equatorial Guinea). One possible solution could be due to the high phylogenic interspecies differences observed within the VGP genomic structure, suggesting a species specific (pairwise identity) pathogenic driven occurrence. The grouping of Zaire Ebolavirus species with other related species and not vice versa could suggest a derivation of other species from the Zaire virus, hence the host immunity may have adapted to keeping the Zaire strain below epidemic levels. Introduction of high antigenic variants from related strains could be the driving force of the epidemic levels observed.
HLA class II antigenic against frequency values for identified epitopes, presented with high statistical differences for VGP and MXP genomic structures. The validated peptides showed high antigenic values for HLA II thus favouring a high TRI. The most common host allelic frequencies for VGP and MXP were HLA-DRB1*01:01 and HLA-DRB1*01:04 and were consistent with that of the Botswana population carrying the HIV-1 virus though different for other alleles observed with high frequencies in that population. The viral species identified for selected HLA-II peptide epitopes were predominantly and specifically Zaire Ebolavirus. All epitopes presented with low TRIs except for QKMYSFDST-TAAIML which had a TRI > 10 for the MXP genomic structure.
This was contrary to the HLA-I high TRI data observed for VGP genomic structure of the Zaire Ebolavirus. Given that specificity of antigen presentation to HLA-II immune proteins is facilitated by CD209, MXP should contribute highly to immune response and signal transmissions downstream. MXP has not always been associated with antigenic interactions but the fact that it could interact with VGP at the viral surface makes its contribution to immune response of considerable importance.
Antigenicity of peptide epitopes for HLA-DQA-B were statistically significant for VGP and not MXP. The multiple occurrences of several antigenic epitopes for a given viral species could suggest repeat sequences in tandem within the genome known as retrotransposons, which favor high genome variation and transmission. Previous work done by several authors on the similarity of reverse transcriptase (pol gene) in retroelements including retrotransposons and retroviruses such as Ebolavirus, suggest that retroviruses could have evolved from long terminal repeat (LRT) retrotransposons by acquiring a functional env gene (McClure, 1991;Xiong & Eickbush, 1990). Therefore, the env gene (VGP) should be critical for viral transmission in its host and could be targeted with anti-retroviral technology currently used in HIV patients. Its identification in the interactome suggests an interaction with several human proteins involved in signal transduction. The interplay between the MXP, VGP and their related interacting human proteins makes all these various genes potential targets for Ebolavirus control through drug/vaccine design or the use of genetic technology through interference of viral replication.

Conclusion
This study is the first of its kind that has exploited all possible viral antigenic targets and possible systemic routes in the host that could be targeted individually or together for the control of Ebolavirus disease. From this study, the phylogenetic relationship distinguished in the viral envelope of Ebolaviruses and their similarity to HIV-1 viral genomic structures shows that common pathways to pathogenic manifestations within the host are eminent. The ability to clearly understand cellular pathways driving pathogenicity is critical for arresting rapid and abundant cytokine activation leading to shock and organ malfunction. Though the Zaire Ebolavirus has been identified as the most virulent; the TRI for this species generally shows a low transmission rate compared to others, which is in accordance with viral-host cohabitation and adaptation. This suggests that, epidemic occurrences of the Ebolavirus disease could result from related species which have no genetic adaptation with its host. The sequence repeats predicted within antigenic epitopes, occurred more for the non-Zaire Ebola viral species. The identified antigenic epitope targets for this species could be used for drug/ vaccine designs. Potential microRNA's, transcription factors and known drugs identified for most gene targets, suggest possible points in the viral pathogenic pathway that could be the focus of modern genetic technology for arresting viral replication and epidemics.

Phylogenetic analysis of retrieved Ebola and related human protein sequences
The Geneious software (V.  (Table 3).
Peptides were selected on two criteria; threshold affinity for strong binders (0.5 for percentage rank and 50 for IC50) and threshold affinity for weak binders (1 for percentage rank and 100 for IC50). Only strong binders were considered for further analysis. Files containing strong binder epitopes for each Ebola structural genomic class variant and for each species was generated in FASTA format and submitted to Vaxijen software (V.2.0) (http://www.ddg-pharm fac.net/vaxijen/VaxiJen/VaxiJen.html). Viral antigenic prediction filter was used at a cutoff of 0.5 and all epitopes with antigenicity values (peptide score) ≥ 0.5 separately for MHC class I and II alleles were further considered for analysis.
Calculating Transmission Intensity (TRI) for selected epitopes For every validated epitope in our data, the peptide scores and peptide frequencies were considered for the calculation of a novel value called TRANSMISSION INTENSITY (TRI) which translates how high or low the disease transmission will develop into an epidemic situation within the population. The value was based on the ratio of peptide score (PS)/peptide frequency (PF). If TRI was around one, it means the transmission is stable and could easily be contained to elimination. If TRI > 1, it means, the transmission poses a bigger risk to the infected individuals given that high antigenicity will activate a high kinetic and signalling response pathway (Yamashita et al., 1998) reaching epidemic levels due to the host immune system generating high antibody levels against the virus possibly leading to hypotensive shock, characterized by; fever, myalgia, malaise, severe bleeding and coagulation abnormalities, including gastrointestinal bleeding, rash, and a range of hematological irregularities, such as lymphopenia and neutrophilia (Colebunders & Borchert, 2000). If TRI < 1, it means, the transmission poses a low risk within the population and could easily be eliminated by the host immune system. Given that antigenicity is low and peptide frequency is high, the host quickly develops immunity against the pathogen. The different transmission intensities will depend on the MHC HLA allele type and its frequency within the population.

Statistical analysis
Batch analysis for evaluating node attributes for the network was used to fit the best line between the data points using the least squares method (Weisstein, www.mathworld.wolfram.com). GO terms for the predicted interactome were validated based on the Hypergeometric test after Benjamini & Hochberg False Discovery Rate (FDR) (Benjamini & Hochberg, 1995) correction and at a significance level of 0.05. Files for each structural genomic class variant per species was used to evaluate frequency distribution for epitopes against peptide scores, HLA allele frequency and frequency within each species protein sequence (supporting Table S2, Table S3 and Table S4). After normalizing every value to the largest in each dataset (epitope peptide antigenic score and epitope peptide frequency) of different Ebola sequences, paired student one tailed t-test was performed to compare means between the groups (α = 0.05). This was to evaluate how epitope targets were distributed across different Ebola species and their epidemiological implications.

Data availability
The complete list of UniProt KB protein sequence codes for extracted Ebolavirus sequences used for analysis can be found in Table S1.

Author contributions
All authors contributed to the ideas that led to the development of the manuscript. A.D.O and D.A.A initiated the research idea, D.A.A did the primary research, analysed the data and wrote the manuscript. A.D.O did the proof reading and edited the manuscript for submission.

Competing interests
No competing interests were disclosed.

Grant information
The author(s) declared that no grants were involved in supporting this work.
UniProt KB protein sequence codes for extracted Ebolavirus sequences used for analysis Click here to access the data. Table S2.
Predicted antigenic epitopes evaluated for frequency distribution against MHC class I immune alleles and different Ebola species Click here to access the data. Table S3.
Predicted antigenic epitopes evaluated for frequency distribution against MHC class II HLA-DRB1 immune alleles and different Ebola species Click here to access the data. Table S4.
Predicted antigenic epitopes evaluated for frequency distribution against MHC class II HLA-DQA-B immune alleles and different Ebola species Click here to access the data.