mimicINT: A workflow for microbe-host protein interaction inference

Sébastien A. Choteau; Kevin Maldonado; Aurélie Bergon; Marceau Cristianini; Mégane Boujeant; Lilian Drets; Christine Brun; Lionel Spinelli; Andreas Zanzoni

doi:10.12688/f1000research.160063.2

Home Browse mimicINT: A workflow for microbe-host protein interaction inference

ALL Metrics

Views

Downloads

Get PDF

Get XML

Export

▬

✚

Software Tool Article

Revised

mimicINT: A workflow for microbe-host protein interaction inference

[version 2; peer review: 2 approved, 1 approved with reservations]

Sébastien A. Choteau¹, Kevin Maldonado¹, Aurélie Bergon¹, [...] Marceau Cristianini¹, Mégane Boujeant¹, Lilian Drets¹, Christine Brun^1,2, Lionel Spinelli¹^*, Andreas Zanzoni ¹^*

Sébastien A. Choteau¹, Kevin Maldonado¹, [...] Aurélie Bergon¹, Marceau Cristianini¹, Mégane Boujeant¹, Lilian Drets¹, Christine Brun^1,2, Lionel Spinelli¹^*, Andreas Zanzoni ¹^*

^* Equal contributors

PUBLISHED 28 Mar 2025

Author details Author details

¹ Aix-Marseille University, Inserm, TAGC, UMR_S1090, Turing Centre for Living Systems, Marseille, France
² CNRS, Marseille, France

Sébastien A. Choteau
Roles: Conceptualization, Data Curation, Formal Analysis, Investigation, Methodology, Software, Visualization, Writing – Original Draft Preparation

Kevin Maldonado
Roles: Data Curation, Methodology, Software

Aurélie Bergon
Roles: Data Curation, Methodology, Software

Marceau Cristianini
Roles: Data Curation, Software

Mégane Boujeant
Roles: Investigation, Resources

Lilian Drets
Roles: Data Curation, Software

Christine Brun
Roles: Conceptualization, Funding Acquisition, Project Administration, Writing – Review & Editing

Lionel Spinelli
Roles: Conceptualization, Data Curation, Methodology, Project Administration, Software, Supervision, Writing – Review & Editing

Andreas Zanzoni
Roles: Conceptualization, Data Curation, Formal Analysis, Funding Acquisition, Investigation, Methodology, Project Administration, Supervision, Validation, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the Bioinformatics gateway.

This article is included in the Cell & Molecular Biology gateway.

Abstract

Background

The increasing incidence of emerging infectious diseases is posing serious global threats. Therefore, there is a clear need for developing computational methods that can assist and speed up experimental research to better characterize the molecular mechanisms of microbial infections.

Methods

In this context, we developed mimicINT, an open-source computational workflow for large-scale protein-protein interaction inference between microbe and human by detecting putative molecular mimicry elements mediating the interaction with host proteins: short linear motifs (SLiMs) and host-like globular domains. mimicINT exploits these putative elements to infer the interaction with human proteins by using known templates of domain-domain and SLiM-domain interaction templates. mimicINT also provides (i) robust Monte-Carlo simulations to assess the statistical significance of SLiM detection which suffers from false positives, and (ii) an interaction specificity filter to account for differences between motif-binding domains of the same family. We have also made mimicINT available via a web server.

Results

In two use cases, mimicINT can identify potential interfaces in experimentally detected interaction between pathogenic Escherichia coli type-3 secreted effectors and human proteins and infer biologically relevant interactions between Marburg virus and human proteins.

Conclusions

The mimicINT workflow can be instrumental to better understand the molecular details of microbe-host interactions.

Keywords

Protein-protein interactions, interaction inference, microbe-host interactions, molecular mimicry, short linear motifs

Corresponding author: Andreas Zanzoni

Competing interests: No competing interests were disclosed.

Grant information: This work was supported by: the European Union’s Horizon 2020 Research and Innovation Programme [Project ID 101003633, RiPCoN] to CB; the JPI HDHL-INTIMIC action co-funded by the Agence Nationale de la Recherche [ANR-17-HDIM-0001, DIME] to CB; and France 2030, the French Government program managed by the French National Research Agency [ANR-16-CONV-0001], and from Excellence Initiative of Aix-Marseille University - A*MIDEX [AMX-21-PEP-043] to AZ. SAC received funding from the “Espoirs de la recherche” program managed by the French Fondation pour la Recherche Médicale (FDT202106013072). Funding for article processing charges provided by INSERM.

Copyright: © 2025 Choteau SA et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Choteau SA, Maldonado K, Bergon A et al. mimicINT: A workflow for microbe-host protein interaction inference [version 2; peer review: 2 approved, 1 approved with reservations]. F1000Research 2025, 14:128 (https://doi.org/10.12688/f1000research.160063.2) First published: 27 Jan 2025, 14:128 (https://doi.org/10.12688/f1000research.160063.1) Latest published: 28 Mar 2025, 14:128 (https://doi.org/10.12688/f1000research.160063.2)

Revised Amendments from Version 1

A typo was fixed. We added the Inserm as funding agency for articles processing charges.

See the authors' detailed response to the review by Ylva Ivarsson
See the authors' detailed response to the review by Sobia Idrees

Introduction

Most pathogens interact with their hosts to reach an advantageous niche and ensure their successful dissemination. For instance, viruses interfere with important host-cell processes through protein-protein interactions to coordinate their life cycle.¹ It has been shown that host cell networks subversion by pathogen proteins can be achieved through interface mimicry of endogenous interactions (i.e., interaction between host proteins).^2,3 This strategy relies on the presence in pathogen protein sequences of host-like elements, such as globular domains and short linear motifs (SLiMs), that can mediate the interaction with host proteins.^4–6

Over the last years, many computational methods have been developed to predict pathogen-host protein interactions, some of which are based on the detection of sequence or structural mimicry elements.^7–9 Such approaches allowed, for instance, to suggest potential molecular mechanisms underlying the implication of gastrointestinal bacteria in human cancer^10,11 or to discriminate between viral strains with different oncogenic potentials,¹² thus showing that protein-protein interaction predictions can be instrumental in untangling microbe-host disease associations. Nevertheless, the source code of many of these tools is not freely available to the community (e.g., Refs. 11–13) providing the predictions through a database (e.g., Ref. 12), or can be only used through a web interface,^14,15 thus limiting reproducibility and tool usability.

In this context, and inspired by our previous work,¹⁰ we have developed the mimicINT workflow, and its webserver companion mimicINTweb (https://mimicintweb.tagc.univ-amu.fr), to enable large-scale interaction inference between microbe and human proteins based on the detection of host-like elements and the use of experimentally identified interaction templates.^16,17

Methods

Implementation

mimicINT detects putative molecular mimicry elements in microbe sequences of interest that can mediate the interaction with host proteins ( Figure 1). mimicINT is written in Python (https://www.python.org/) and R (https://www.r-project.org/) languages and exploits the Snakemake workflow manager for automated execution.¹⁸ It consists of four main steps: (i) the detection of host-like elements in microbe sequences; (ii) the collection of domains on the host protein (iii); the interaction inferences between microbe and host proteins; and (iv) the functional enrichment analysis on the list of inferred host interactors.

Figure 1. Overview of the mimicINT workflow.

(A) By providing a fasta file of protein sequences of the query species (e.g., microbe sequences), mimicINT allows identifying both the (B) domain- and (C) SLiM-mediated interfaces of interactions. (D) Using publicly available interaction templates, mimicINT infers the interactions between the proteins of the query and target (i.e., host) species. (E) Finally, it provides a list of functional annotations significantly enriched in inferred protein targets.

In the first step, mimicINT takes the FASTA-formatted sequences of microbe proteins (e.g., viral or other pathogen proteins susceptible to be found at the pathogen-host interface) as input to detect host-like elements: domains and SLiMs. The domain identification is performed by the InterProScan stand-alone version¹⁹ using the domain signatures from the InterPro database.²⁰ By default, mimicINT retains InterProScan matches with an E-value below 10⁻⁵, a common threshold value used for detecting profile-based domain signatures in protein sequences in the context of interaction inference.²¹ The host-like SLiM detection exploits the motif definitions available in the ELM database¹⁷ and is carried out by the SLiMProb tool from the SLiMSuite software package.²² As SLiMs are usually located in disordered regions,²³ SLiMProb uses the IUPred algorithm²⁴ to compute the disorder propensity of each amino acid in the query sequences and generates an average disorder propensity score for every detected SLiM occurrence. For SLiM detection, the default IUPred disorder propensity threshold is set to 0.2, a value commonly used to limit false negatives,^22,25 and the minimum size of the predicted disorder region is set to 5, which is the optimal size to detect true positive SLiM occurrences.²⁶ Nevertheless, the user can choose all running parameters for the host-like element detection in the mimicINT configuration file.

In the second step, mimicINT gathers the domain annotations of the host proteins from the InterPro database through a REST API query.

In the third step, mimicINT infers the interactions between host and microbe proteins. This analysis takes as input the list of known interaction templates collected from two resources: (i) the 3did database,¹⁶ a collection of domain-domain interactions extracted from three-dimensional protein structures,²⁷ and (ii) the ELM database¹⁷ that provides a list of experimentally identified SLIM-domain interactions in Eukaryotes. The inference procedure checks whether any of the microbe proteins contain at least one domain or SLIM for which an interaction template is available. In this case, it infers the interaction between the given protein and all the host proteins containing the cognate domain (i.e., the interacting domain in the template). As motif-binding domains of the same group, like SH3 or PDZ, show different interaction specificities,²⁸ we have implemented a previously proposed strategy²⁹ to take these differences into account (see below the sub-section “Computation of the motif-binding domain similarity scores”). This approach assigns a “domain score” that can be used to rank, or filter inferred SLiM-domain interactions. Once this step is completed, the inferred interactions are stored in both tab-delimited and JSON files to facilitate the import in other applications, such as Cytoscape.³⁰

In the final step, to identify the host cellular functions potentially targeted by the pathogen proteins, mimicINT executes a functional enrichment analysis of host-inferred interactors. This analysis statistically assesses the over-representation of functional categories, such as Gene Ontology terms and biological pathways (e.g., KEGG and Reactome), using the g:Profiler R client.³¹

Given the degenerate nature of SLiMs,²³ their detection is prone to generate false positive occurrences. For this reason, we implemented an optional sub-workflow that, using Monte-Carlo simulations, assesses the probability of a given SLiM to occur by chance in query sequences and, thus, can be used to filter out potential false positives⁵ (see below the sub-section “Statistical significance of the SLiMs detected on the microbe sequences”).

To ease deployment and ensure reproducibility and scalability on high-performance computing infrastructures, mimicINT is provided as a containerized application based on Docker and Singularity.^32,33

Computation of the motif-binding domain similarity scores

To identify motif-binding domains that can be specifically associated with a given ELM motif class, we use the same strategy proposed by Weatheritt et al. in 2012,²⁹ which assumes that a domain significantly similar to a known motif-binding domain should also bind the same motif. We first compiled a list of experimentally identified motif-binding domains from the original list from Weatheritt et al. complemented by more recent annotations from the ELM database¹⁷ (August 2020). Obsolete ELM class identifiers from Weatheritt et al. were mapped to current ELM identifiers using the “Renamed ELM classes” file (http://elm.eu.org/infos/browse_renamed.tsv) and duplicated domain annotations were removed. In total, we collected 538 domains in 415 human proteins known to bind 212 ELM motif classes (73% of the 290 motif classes present in ELM, August 2020). The sequences of these 415 annotated proteins were fetched from UniprotKB.³⁴ We next fetched the sequences of 1452 reference Eukaryota proteomes (22,262,113 protein sequences in total) from UniprotKB (August 2020). We removed redundancy using the CD-HIT algorithm³⁵ to generate a database of 21,414,544 non-identical sequences. We used the GOPHER tool³⁶ from the SLiMSuite package²² to identify orthologous sequences of the annotated proteins in the database of non-identical eukaryotic sequences by reciprocal BLAST best hits. Selected orthologous proteins were aligned using the multiple sequence alignment algorithm Clustal Omega (v. 1.2.4).³⁷ Once the position of the motif-binding domain was identified within the alignment, we removed aligned domains with indels covering >10% of the annotated domain sequence. We iteratively realigned the sequences until a set of proteins was identified with <10% indels coverage. In total, we selected 701 multiple sequence alignments used as input for generating domain-specific HMM profiles with the hmmbuild program from the HMMER package v.3.1.1.³⁸ Subsequently, we scanned a representative set of the human proteome (20,350 “reviewed” sequences from UniprotKB) with the domain-specific HMMs using the hmmsearch program. We used an E-value cutoff of 0.01 to select the best hits and we rejected those hits covering less than 90% of the annotated motif-binding domain sequence length. Finally, the E-value of the best-scoring domain was converted into a domain similarity score using the iELM script downloaded from http://elmint.embl.de/program_file/.²⁹ Doing so, we computed at least one motif-binding domain similarity score for 1,461 human proteins.

Statistical significance of the SLiMs detected on the microbe sequences

To assess the probability of a given motif to occur by chance in microbe sequences, we implemented a previously proposed approach⁵ to randomly shuffle the disordered regions of each sequence of a microbe of interest to generate a large set of randomized microbe proteins. The number of shuffled sequences to be generated by mimicINT can be chosen by the user in the corresponding configuration file (see the mimicINT online documentation for more details). By default, mimicINT creates two sets of 100,000 randomly shuffled proteins (one set for each IUPred disorder propensity prediction mode, i.e. short and long), with the assumption that the input sequences belong to the same microbe species or strain. Once the shuffled sequences are generated, the occurrences of each detected motif are compared in each microbe input sequence to the occurrences observed in the corresponding set of shuffled sequences. To compute the probability (P) of each detected motif to occur by chance, mimicINT counts the number of times (m) out of the shuffled sequences (N) where there is at least the same number of instances of the given motif in the input sequence:

P = \frac{m + 1}{N + 1}

For example, if a given motif occurs twice in the input sequence, the methods count how many times the same motif is detected at least twice in the corresponding set of randomly shuffled sequences. The lower the value of P, the rarer the instances, thus suggesting that the given motif can be likely functional. In this work, we set the significant threshold equal 0.1, as reported in Ref. 5.

Webserver

The mimicINTweb server allows users not familiar with the command-line interface to run the mimicINT workflow through an easy-to-use web interface. The number of input sequences is limited to 50. A step-by-step tutorial is available on the mimicINTweb site (https://mimicintweb.tagc.univ-amu.fr/tutorial). The mimicINTweb server uses the Django framework (version 2.2.1 under Python 3.12.4) as web app core to manage URL routing, HTML rendering, authentication, administration, and backend logic. The Django component has been complemented with two additional application layers to guarantee server performances and security: Gunicorn (version 22.0.0) as web server gateway interface, and Nginx (version 1.25) as reverse proxy server.

Operation

The mimicINT workflow can be run on a Linux-based computer with at least 32 GB RAM and it has been successfully used on Ubuntu (16.04 and higher) and CentOS (7.4) distributions. The following software is required: Python (3.6 or higher), Snakemake (6.5 or higher), Docker (18.09 or higher) and IUPred (version 1.0). The workflow can be also deployed on high-performance computing (HPC) clusters. In this case, the Singularity application (2.5 or higher) is required. More detailed information can be found on the mimicINT GitHub repository (https://github.com/TAGC-NetworkBiology/mimicINT). The mimicINTweb server can be accessed from Linux, Windows or Mac OS based systems, and it has been tested with the following browsers: Chrome, Firefox and Safari.

Results

We sought to evaluate the ability of mimicINT to correctly infer SLiM-domain interactions, as this inference can generate many false positives,²⁹ using the default parameters for SLiM detection (see Implementation). To do so, we used as controls two datasets of established motif-mediated interactions (MDI) from the ELM database¹⁷: (i) 103 interactions between 87 viral and 44 human proteins (vMDI); (ii) 31 interactions between 16 bacterial and 23 human proteins (bMDI). We were able to correctly infer most of these interactions (91 vMDI, true positive rate = 88.3%; 21 bMDI, true positive rate = 67.7%). Notably, almost all the correctly inferred interactions have a domain score above 0.4 (87 out of the 91 vMDI, 19 out of 21 bMDI). As the availability of negative SLiM-mediated interaction datasets is very limited,^17,29,39 we estimated the false positive rate (FPR) by applying mimicINT to two sets of randomly generated interaction sets (degree-controlled, vMDI_rnd and bMDI_rnd, respectively). Thirty-four vMDI_rnd and 7 bMDI_rnd were inferred as motif-mediated (FPR = 33% and FPR = 23%, respectively). We next annotated the human proteins in the two random sets with domain scores. We kept only interactions for which the domain score was above 0.4,²⁹ thereby reducing the number of random interactions predicted as motif-mediated to 9 (FPR = 8.7%) for vMDI_rnd and 2 (FPR = 6.4%) for bMDI_rnd. Finally, we tested mimicINT on two sets of experimentally verified negative protein interactions from the Negatome 2.0 database⁴⁰: 37 viral-human and 4 bacterial-human interactions. Only two virus-human negative interactions (5.4%) were inferred as motif-mediated by mimicINT.

In light of these results, we used mimicINT in two tasks: (i) the identification of putative interfaces in experimentally identified interactions between secreted effectors from the enteropathogenic Escherichia coli serotype O157:H7 (EHEC) and human proteins; (ii) the inference of interactions between human and the Marburg virus (MARV) proteins, an emerging infectious agent for which experimental protein interaction data is scarce.

Interface identification in the EHEC-human protein interaction network

We collected 83 interactions between 24 EHEC secreted effectors and 74 human proteins by querying (January 2022) the IMEx consortium databases⁴¹ via the PSICQUIC interface.⁴² We gathered the sequences of EHEC effectors from⁴³ and ran mimicINT with default parameters. We computed the motif probabilities using the dedicated sub-workflow by performing 100,000 randomizations. We were able to identify a putative interaction interface for 26 of the 83 experimental EHEC-human interactions (31.3%) ( Figure 2A), which is higher than the number of interactions with identified putative interfaces in a degree-controlled randomized network (3 interactions, 3.6%). Most of the putative interfaces were identified using motif-domain interaction templates (MDI), namely 24 interactions, whereas the putative interfaces for 9 interactions were identified with domain-domain interaction templates (DDI). Interestingly, we identified putative interfaces with both MDI and DDI templates for 7 interactions ( Figure 2A). Among the interactions with MDI interfaces, almost all have a motif probability below 0.1 (23 interactions, see Supplementary File 1). Seven interactions have a domain score above 0.4 (29.2%) and their cognate motifs show a motif probability lower than 0.1 ( Figure 2B). This suggests that most of the identified putative interfaces can be considered as high confidence. To further support these inferences, we sought to verify whether the 26 putative interfaces corresponded to experimentally identified binding regions. To do so, we collected the biological features (i.e. “binding-associated region”, “necessary binding region”, “sufficient binding region”)⁴³ reported in the interaction records downloaded via the PSICQUIC interface, and we found that for half of the interactions with an inferred interface (13 interactions, 11 MDI and 2 DDI) there is supporting experimental evidence for at least one of the interaction partners ( Figure 2C). For 7 interactions (27%), mimicINT inferred correctly the interface elements of both EHEC effectors and human proteins. For the other 4 interactions, the experimental evidence supports the EHEC effector interface element only (see Supplementary File 1). Importantly, the 11 MDI inferences can be considered of high confidence as they have either a motif probability < 0.1 or a domain score > 0.4. Overall, these results indicate that high confidence mimicINT inferred interaction can identify bona fide interaction interfaces.

Figure 2. Application of the mimicINT workflow to identify potential interaction interfaces.

(A) Proportion of experimentally determined interaction between EHEC secreted effectors and human proteins with at least one putative interface inferred by mimicINT (left). Split proportion of EHEC-human interactions with a putative interface according to the interaction templates: motif-domain (MDI) and domain-domain (DDI). (B) Proportion of EHEC-human interactions with high-confidence MDI-inferred interfaces based on the computed motif probability (mp) and domain score (ds). See main text for more details. (C) Network representations of the interactions between EHEC secreted effectors (circle nodes) and human proteins (square nodes). Edges represented as parallel lines indicate interactions with experimentally identified binding regions in at least one of the interaction partners. Coloured edges represent interactions with at least one putative inferred interface by mimicINT. The network was generated using Cytoscape.³⁰

MARV-human protein interaction inference

We downloaded MARV protein sequences (7 proteins, Proteome ID: UP000180448, January 2022) from UniprotKB in FASTA format and ran mimicINT with default parameters. We also computed the motif probabilities using the dedicated sub-workflow by performing 100,000 randomizations.

In total, we inferred 11,431 interactions between 7 MARV and 2757 human proteins (see Supplementary File 2). Most of the inferred interactions, namely 10,101, are motif-domain interactions (MDI) between 7 MARV and 2324 human proteins, and the remaining 1,339 are domain-domain interactions (DDI) between 5 MARV and 479 human proteins (9 interactions were inferred with both MDI and DDI templates). The functional enrichment analysis performed by mimicINT on the full list of inferred host interactors returned 975 enriched annotations at FDR<0.01 (see Supplementary File 2). We further filtered out the functional categories annotating less than 5 or more than 500 proteins, obtaining a list of 763 enriched annotations (241 GO biological processes, 63 GO Cellular components, 6 CORUM complexes, 130 KEGG and 237 Reactome pathways, see Supplementary File 2), which points towards cellular processes and pathways related to viral infection and immune system ( Table 1). By applying the default thresholds on motif probabilities and domain scores on inferred MDI, we defined a set of 535 high-confidence MDI interactions between 7 MARV and 419 human proteins. We combined this set with the inferred interaction using DDI templates and ran a functional enrichment analysis on a list of 891 human interactors returning 908 enriched annotations at FDR<0.01. As above, after filtering on the size of functional categories, we obtained 743 enriched annotations (287 GO biological processes, 57 GO Cellular components, 1 CORUM complexes, 141 KEGG and 257 Reactome pathways, see Supplementary File 2). Interestingly, 27% of the enriched GO biological process annotations (77 out of 287) are related to infection and immunity,⁴⁴ and notably 8 out of the 10 most enriched.

Overall, these results reinforce the biological relevance of the inferred interactions, particularly those considered of higher confidence.

Table 1. Summary results of the functional enrichment analysis performed by mimicINT on the 2685 human proteins inferred as interactors of MARV proteins.

The top 10 most enriched terms are shown for Gene Ontology Biological Process (BP) and Cellular Component (CC) terms. For each enriched term the following information is reported: term identifier, term name, adjusted P-value, number of human proteins annotated with the given term in the statistical background (term size), number of inferred interactors annotated with the given term (intersection size). The terms reported in italic are related to viruses, infection and immunity according to Garcia-Moreno and colleagues.⁴⁴

Annotation source	Term ID	Term name	Adjusted P-value	Term size	Intersection size
Gene Ontology (BP)
	GO:0046777	protein autophosphorylation	1.86E-107	226	181
	GO:0018105	peptidyl-serine phosphorylation	1.45E-75	312	187
	GO:0018209	peptidyl-serine modification	2.71E-69	335	188
	GO:0018108	peptidyl-tyrosine phosphorylation	1.28E-51	371	178
	GO:0018212	peptidyl-tyrosine modification	5.82E-51	374	178
	GO:0002768	immune response-regulating cell surface receptor signaling pathway	5.45E-43	326	154
	GO:0002757	immune response-activating signal transduction	4.21E-41	296	143
	GO:0002429	immune response-activating cell surface receptor signaling pathway	4.21E-41	296	143
	GO:0018107	peptidyl-threonine phosphorylation	8.20E-40	112	81
	GO:0018107	immune response-regulating signaling pathway	1.28E-39	477	189
Gene Ontology (CC)
	GO:0019814	immunoglobulin complex	1.54E-90	147	132
	GO:0042101	T cell receptor complex	5.66E-79	120	111
	GO:0098802	plasma membrane signaling receptor complex	8.64E-45	287	145
	GO:0015629	actin cytoskeleton	6.65E-32	493	180
	GO:1902911	protein kinase complex	4.78E-31	125	78
	GO:0005911	cell-cell junction	3.81E-29	497	176
	GO:1902554	serine/threonine protein kinase complex	5.15E-28	108	69
	GO:0042571	immunoglobulin complex, circulating	3.29E-27	62	50
	GO:0000307	cyclin-dependent protein kinase holoenzyme complex	2.36E-22	52	42
	GO:0061695	transferase complex, transferring phosphorus-containing groups	2.90E-21	267	107

Discussion

We have developed mimicINT, an open-source computational workflow enabling large-scale interaction inference between microbe and host proteins. In the first use case presented here, we show that mimicINT can identify bona fide interaction interfaces in an experimentally generated interaction network between secreted pathogenic bacterial effectors and human proteins. Notably, we also successfully used it to identify interaction interfaces between commensal bacterial effectors and human proteins in a large-scale interaction dataset generated by yeast two-hybrid.⁴⁵ In the second use case, we used mimicINT to infer the interactions between viral and human proteins which are biologically relevant given the results of the functional enrichment analysis.

Although we developed mimicINT as a tool to infer protein interactions between microbe and human proteins, it can be used on any organisms whose proteins bear either domains or motifs with known interaction templates (e.g., human, mouse or fruit fly). For instance, we have recently used mimicINT to generate the first interactome of small human peptides encoded by short Open Reading Frames (sORFs).⁴⁶ Nevertheless, the only limitation of the workflow is the availability of motif-domain and domain-domain templates, which depends on the curation efforts done by teams maintaining the corresponding source database (i.e., ELM and 3did).

Finally, compared to other similar tools,⁴⁷ mimicINT provides two functionalities to define high-confidence inferred interactions based on motif-domain templates, that is the computation of (i) motif probabilities and of (ii) motif-binding domain similarity scores. As shown in the use cases, the application of these two strategies supports the identification of bona fide interaction interfaces in the EHEC-human interaction network and the biological relevance of the inferred MARV-human interactions.

All in all, given the increasing frequency of (re-)emerging infectious diseases and the accumulating evidence on the fundamental role played by microbes in chronic diseases,^48–50 there is no doubt that mimicINT will be useful to better understand the molecular details of the microbe-host relationships.

Data availability

Extended data

Supplementary data

The data analyzed and produced in this manuscript, including the protein sequences mentioned in the use cases, are available are available from:

Zenodo: mimicINT workflow: Use cases for interaction interface identification and protein interaction inference. DOI: https://doi.org/10.5281/zenodo.14614802.⁵¹

Underlying data

Sequence data

MARV protein sequences available from https://www.uniprot.org/proteomes/UP000180448

Short linear motif data

ELM motif class definitions are available from http://elm.eu.org/downloads.html#classes

ELM renamed class names are available from http://elm.eu.org/infos/browse_renamed.tsv

Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

Sequencing data

Software availability

Workflow source code available from: https://github.com/TAGC-NetworkBiology/mimicINT

Archived workflow source code at time of publication: https://doi.org/10.5281/zenodo.12078119 ⁵²

Webserver available from: https://mimicintweb.tagc.univ-amu.fr/

Webserver source code available from: https://github.com/TAGC-NetworkBiology/mimicINTweb

Archived webserver source code at time of publication: https://doi.org/10.5281/zenodo.14548031

The iELM script is available from: from http://elmint.embl.de/program_file/

License: GNU General Public License, V3.

Acknowledgements

The authors thank Paul de Boissier for helping in the early development of the workflow and Fabrice Lopez for technical advice. The authors are also grateful to the members of the DIME project for fruitful scientific discussions. Centre de Calcul Intensif d’Aix-Marseille is acknowledged for granting access to its high-performance computing resources.

References

1. Yamauchi Y, Helenius A: Virus entry at a glance. J. Cell Sci. 2013 Mar 15; 126(Pt 6): 1289–1295. PubMed Abstract | Publisher Full Text
2. Franzosa EA, Xia Y: Structural principles within the human-virus protein-protein interaction network. Proc. Natl. Acad. Sci. USA. 2011 Jun 28; 108(26): 10538–10543. PubMed Abstract | Publisher Full Text | Free Full Text
3. Garamszegi S, Franzosa EA, Xia Y: Signatures of pleiotropy, economy and convergent evolution in a domain-resolved map of human-virus protein-protein interaction networks. PLoS Pathog. 2013; 9(12): e1003778. PubMed Abstract | Publisher Full Text | Free Full Text
4. Davey NE, Travé G, Gibson TJ: How viruses hijack cell regulation. Trends Biochem. Sci. 2011 Mar; 36(3): 159–169. Publisher Full Text
5. Hagai T, Azia A, Babu MM, et al.: Use of host-like peptide motifs in viral proteins is a prevalent strategy in host-virus interactions. Cell Rep. 2014 Jun; 7(5): 1729–1739. PubMed Abstract | Publisher Full Text | Free Full Text
6. Via A, Uyar B, Brun C, et al.: How pathogens use linear motifs to perturb host cell networks. Trends Biochem. Sci. 2015 Jan; 40(1): 36–48. PubMed Abstract | Publisher Full Text
7. Arnold R, Boonen K, Sun MGF, et al.: Computational analysis of interactomes: current and future perspectives for bioinformatics approaches to model the host-pathogen interaction space. Methods. 2012 Aug; 57(4): 508–518. PubMed Abstract | Publisher Full Text | Free Full Text
8. Nourani E, Khunjush F, Durmuş S: Computational approaches for prediction of pathogen-host protein-protein interactions. Front. Microbiol. 2015; 6: 94.
9. Andrighetti T, Bohar B, Lemke N, et al.: MicrobioLink: An Integrated Computational Pipeline to Infer Functional Effects of Microbiome-Host Interactions. Cells. 2020 May 21; 9(5): E1278. Publisher Full Text
10. Zanzoni A, Spinelli L, Braham S, et al.: Perturbed human sub-networks by Fusobacterium nucleatum candidate virulence proteins. Microbiome. 2017 Aug 10; 5(1): 89. PubMed Abstract | Publisher Full Text | Free Full Text
11. Guven-Maiorov E, Tsai CJ, Ma B, et al.: Prediction of Host-Pathogen Interactions for Helicobacter pylori by Interface Mimicry and Implications to Gastric Cancer. J. Mol. Biol. 2017 Dec 8; 429(24): 3925–3941. PubMed Abstract | Publisher Full Text | Free Full Text
12. Lasso G, Mayer SV, Winkelmann ER, et al.: A Structure-Informed Atlas of Human-Virus Interactions. Cell. 2019 Sep 5; 178(6): 1526–1541.e16. PubMed Abstract | Publisher Full Text | Free Full Text
13. Becerra A, Bucheli VA, Moreno PA: Prediction of virus-host protein-protein interactions mediated by short linear motifs. BMC Bioinformatics. 2017 Mar 9; 18(1): 163. PubMed Abstract | Publisher Full Text | Free Full Text
14. Guven-Maiorov E, Hakouz A, Valjevac S, et al.: HMI-PRED: A Web Server for Structural Prediction of Host-Microbe Interactions Based on Interface Mimicry. J. Mol. Biol. 2020 May 15; 432(11): 3395–3403. PubMed Abstract | Publisher Full Text | Free Full Text
15. Reys V, Pons JL, Labesse G: SLiMAn 2.0: meaningful navigation through peptide-protein interaction networks. Nucleic Acids Res. 2024 Jul 5; 52(W1): W313–W317. PubMed Abstract | Publisher Full Text | Free Full Text
16. Mosca R, Céol A, Stein A, et al.: 3did: a catalog of domain-based interactions of known three-dimensional structure. Nucleic Acids Res. 2014 Jan; 42(Database issue): D374–D379. PubMed Abstract | Publisher Full Text | Free Full Text
17. Kumar M, Gouw M, Michael S, et al.: ELM-the eukaryotic linear motif resource in 2020. Nucleic Acids Res. 2020 Jan 8; 48(D1): D296–D306. PubMed Abstract | Publisher Full Text
18. Köster J, Rahmann S: Snakemake-a scalable bioinformatics workflow engine. Bioinformatics. 2018 Oct 15; 34(20): 3600. PubMed Abstract | Publisher Full Text
19. Jones P, Binns D, Chang HY, et al.: InterProScan 5: genome-scale protein function classification. Bioinformatics. 2014 May 1; 30(9): 1236–1240. PubMed Abstract | Publisher Full Text | Free Full Text
20. Blum M, Chang HY, Chuguransky S, et al.: The InterPro protein families and domains database: 20 years on. Nucleic Acids Res. 2021 Jan 8; 49(D1): D344–D354. PubMed Abstract | Publisher Full Text | Free Full Text
21. Schleker S, Garcia-Garcia J, Klein-Seetharaman J, et al.: Prediction and comparison of Salmonella-human and Salmonella-Arabidopsis interactomes. Chem. Biodivers. 2012 May; 9(5): 991–1018. PubMed Abstract | Publisher Full Text | Free Full Text
22. Edwards RJ, Paulsen K, Aguilar Gomez CM, et al.: Computational Prediction of Disordered Protein Motifs Using SLiMSuite. Methods Mol. Biol. 2020; 2141: 37–72. PubMed Abstract | Publisher Full Text
23. Davey NE, Van Roey K, Weatheritt RJ, et al.: Attributes of short linear motifs. Mol. BioSyst. 2012 Jan; 8(1): 268–281. PubMed Abstract | Publisher Full Text
24. Dosztányi Z: Prediction of protein disorder based on IUPred. Protein Sci. 2018 Jan; 27(1): 331–340. PubMed Abstract | Publisher Full Text | Free Full Text
25. Edwards RJ, Palopoli N: Computational prediction of short linear motifs from protein sequences. Methods Mol. Biol. 2015; 1268: 89–141. Publisher Full Text
26. Paulsen K: Optimising intrinsic disorder prediction for short linear motif discovery. UNSW Sydney; 2019 [cited 2022 Aug 24]. [Thesis]. Reference Source
27. Rose PW, Bi C, Bluhm WF, et al.: The RCSB Protein Data Bank: new resources for research and education. Nucleic Acids Res. 2013 Jan; 41(Database issue): D475–D482. PubMed Abstract | Publisher Full Text
28. Gfeller D, Butty F, Wierzbicka M, et al.: The multiple-specificity landscape of modular peptide recognition domains. Mol. Syst. Biol. 2011 Apr 26; 7: 484. PubMed Abstract | Publisher Full Text | Free Full Text
29. Weatheritt RJ, Luck K, Petsalaki E, et al.: The identification of short linear motif-mediated interfaces within the human interactome. Bioinformatics. 2012 Apr 1; 28(7): 976–982. PubMed Abstract | Publisher Full Text | Free Full Text
30. Shannon P, Markiel A, Ozier O, et al.: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003 Nov; 13(11): 2498–2504. PubMed Abstract | Publisher Full Text | Free Full Text
31. Raudvere U, Kolberg L, Kuzmin I, et al.: g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Res. 2019 Jul 2; 47(W1): W191–W198. PubMed Abstract | Publisher Full Text | Free Full Text
32. Merkel D: Docker: lightweight linux containers for consistent development and deployment. Linux J. 2014; 2014(239): 2.
33. Kurtzer GM, Sochat V, Bauer MW: Singularity: Scientific containers for mobility of compute. PLoS One. 2017; 12(5): e0177459. PubMed Abstract | Publisher Full Text | Free Full Text
34. UniProt Consortium: UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 2019 Jan 8; 47(D1): D506–D515. PubMed Abstract | Publisher Full Text | Free Full Text
35. Fu L, Niu B, Zhu Z, et al.: CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012 Dec 1; 28(23): 3150–3152. PubMed Abstract | Publisher Full Text | Free Full Text
36. Davey NE, Edwards RJ, Shields DC: The SLiMDisc server: short, linear motif discovery in proteins. Nucleic Acids Res. 2007 Jul; 35(Web Server issue): W455–W459. PubMed Abstract | Publisher Full Text | Free Full Text
37. Sievers F, Wilm A, Dineen D, et al.: Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 2011 Oct 11; 7: 539. PubMed Abstract | Publisher Full Text | Free Full Text
38. Eddy SR: Profile hidden Markov models. Bioinformatics. 1998; 14(9): 755–763. Publisher Full Text
39. Idrees S, Pérez-Bercoff Å, Edwards RJ: SLiMEnrich: computational assessment of protein–protein interaction data as a source of domain-motif interactions. PeerJ. 2018 Oct 31; 6: e5858. PubMed Abstract | Publisher Full Text | Free Full Text
40. Blohm P, Frishman G, Smialowski P, et al.: Negatome 2.0: a database of non-interacting proteins derived by literature mining, manual annotation and protein structure analysis. Nucleic Acids Res. 2014 Jan; 42(Database issue): D396–D400. PubMed Abstract | Publisher Full Text | Free Full Text
41. Orchard S, Kerrien S, Abbani S, et al.: Protein interaction data curation: the International Molecular Exchange (IMEx) consortium. Nat. Methods. 2012 Apr; 9(4): 345–350. PubMed Abstract | Publisher Full Text | Free Full Text
42. Aranda B, Blankenburg H, Kerrien S, et al.: PSICQUIC and PSISCORE: accessing and scoring molecular interactions. Nat. Methods. 2011 Jul; 8(7): 528–529. PubMed Abstract | Publisher Full Text | Free Full Text
43. Vieira MFM, Hernandez G, Zhong Q, et al.: The pathogen-encoded signalling receptor Tir exploits host-like intrinsic disorder for infection. Commun Biol. 2024 Feb 13; 7(1): 179. PubMed Abstract | Publisher Full Text | Free Full Text
44. Hermjakob H, Montecchi-Palazzi L, Bader G, et al.: The HUPO PSI’s molecular interaction format--a community standard for the representation of protein interaction data. Nat. Biotechnol. 2004 Feb; 22(2): 177–183. PubMed Abstract | Publisher Full Text
45. Garcia-Moreno M, Järvelin AI, Castello A: Unconventional RNA-binding proteins step intothe virus-host battlefront. Wiley Interdiscip. Rev. RNA. 2018 Aug 9; 9: e1498. PubMed Abstract | Publisher Full Text | Free Full Text
46. Young V, Dohai B, Hitch TCA, et al.: A gut meta-interactome map reveals modulation of human immunity by microbiome effectors. bioRxiv. 2023 [cited 2025 Jan 8]; p. 2023.09.25.559292. Publisher Full Text
47. Slivak M, Choteau SA, Pierre P, et al.: InteractORF, predictions of human sORF functions from an interactome study. bioRxiv. 2024 [cited 2025 Jan 8]; 2024.06.10.598216. Publisher Full Text
48. Zhang Y, Thomas JP, Korcsmaros T, et al.: Integrating multi-omics to unravel host-microbiome interactions in inflammatory bowel disease. Cell Rep. Med. 2024 Sep 17; 5(9): 101738. PubMed Abstract | Publisher Full Text | Free Full Text
49. O’Connor SM, Taylor CE, Hughes JM: Emerging infectious determinants of chronic diseases. Emerg. Infect. Dis. 2006 Jul; 12(7): 1051–1057. PubMed Abstract | Publisher Full Text | Free Full Text
50. Gargano LM, Hughes JM: Microbial origins of chronic diseases. Annu. Rev. Public Health. 2014; 35: 65–82. Publisher Full Text
51. Zanzoni A: mimicINT workflow: Use cases for interaction interface identification and protein interaction inference. [Dataset]. Zenodo. 2025. Publisher Full Text
52. Choteau S, Maldonado K, Boujeant M, et al.: TAGC-NetworkBiology/mimicINT: mimicINT, a computational workflow to infer protein-protein interactions (Version v1). [Dataset]. Zenodo. 2024. Publisher Full Text

Comments on this article Comments (0)

Version 2

VERSION 2 PUBLISHED 27 Jan 2025

Author details Author details

¹ Aix-Marseille University, Inserm, TAGC, UMR_S1090, Turing Centre for Living Systems, Marseille, France
² CNRS, Marseille, France

Sébastien A. Choteau
Roles: Conceptualization, Data Curation, Formal Analysis, Investigation, Methodology, Software, Visualization, Writing – Original Draft Preparation

Kevin Maldonado
Roles: Data Curation, Methodology, Software

Aurélie Bergon
Roles: Data Curation, Methodology, Software

Marceau Cristianini
Roles: Data Curation, Software

Mégane Boujeant
Roles: Investigation, Resources

Lilian Drets
Roles: Data Curation, Software

Christine Brun
Roles: Conceptualization, Funding Acquisition, Project Administration, Writing – Review & Editing

Lionel Spinelli
Roles: Conceptualization, Data Curation, Methodology, Project Administration, Software, Supervision, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

This work was supported by: the European Union’s Horizon 2020 Research and Innovation Programme [Project ID 101003633, RiPCoN] to CB; the JPI HDHL-INTIMIC action co-funded by the Agence Nationale de la Recherche [ANR-17-HDIM-0001, DIME] to CB; and France 2030, the French Government program managed by the French National Research Agency [ANR-16-CONV-0001], and from Excellence Initiative of Aix-Marseille University - A*MIDEX [AMX-21-PEP-043] to AZ. SAC received funding from the “Espoirs de la recherche” program managed by the French Fondation pour la Recherche Médicale (FDT202106013072). Funding for article processing charges provided by INSERM.

Article Versions (2)

version 2

Revised

Published: 28 Mar 2025, 14:128

https://doi.org/10.12688/f1000research.160063.2

version 1

Published: 27 Jan 2025, 14:128

https://doi.org/10.12688/f1000research.160063.1

© 2025 Choteau SA et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

SEE MORE DETAILS

CITE

how to cite this article

Choteau SA, Maldonado K, Bergon A et al. mimicINT: A workflow for microbe-host protein interaction inference [version 2; peer review: 2 approved, 1 approved with reservations]. F1000Research 2025, 14:128 (https://doi.org/10.12688/f1000research.160063.2)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?

Key to Reviewer Statuses VIEW HIDE

ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions

Version 2

VERSION 2

PUBLISHED 28 Mar 2025

Revised

Views

Reviewer Report 10 Apr 2025

Leandro Simonetti, Uppsala University, Uppsala, Sweden

Approved with Reservations

https://doi.org/10.5256/f1000research.179574.r365844

Choteau et al. article describes "mimicINT", a new open-source software tool aimed at aiding researchers identify human-pathogen interactions at large-scale. The tool can predict domain-domain and domain-motif interactions between a user-provided list of protein sequences from pathogens and human proteins by predicting instances where the pathogen's proteins mimic the human interaction interfaces. MimicINT predicts interactions using experimental data hand curated into the 3did and the ELM databases for domain-domain and domain-motif interactions, respectively. The authors benchmark mimicINT against publicly available host-pathogen interactions experimental data from bacteria and virus, and observe a robust performance correctly identifying these interactions, specially when using mimicINT calculated domain and motif probability scores to filter the output. Overall mimicINT is proven to be a useful tool for exploring the putative interactomes of pathogens with a human host and this can be useful in the current landscape of emerging new diseases.

Comments:

In Methods > Implementation, there's a misplaced semicolon: "[...] host protein (iii); the interaction inferences [...]" > "[...] host protein; (iii) the interaction inferences [...]"
In Methods > Implementation, is not clear to me how the domain annotations gathered in step (ii) are used by mimicINT down the pipeline
The first section of Results would benefit from having a sub-title, since it's not a general description of the section but a specific set of results
The output from the Web Server example provides a series of output files to explore. I think it would be useful to include the calculated domain scores and motif probability p-values for the relevant interaction types in case the user wants to use it for filtering the output. It would also be useful to have the predicted motif instance sequence included in the output table.
Web server was not accessible today (8th of April 2025) because of an expired SSL certificate, so I could not check the description of the downloaded files in the help section though the output folders structure was easy to understand

I found the manuscript to be clearly written, interesting and mimicINT to be a useful tool for the field of host-pathogen interactions since it offers a thorough first look into the putative host hijacked interactome, plus it provides specific interaction regions (domain and or motifs), offering experimentalist targettable interfaces. The manuscript is also very careful to state that the interactions are based on the curated data made available in the databases 3did and ELM, which constitutes a limitation but also shows the importance and usefulness of the effort put into those resources.

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Short Linear Motifs, Protein-Protein Interactions, Host-Pathogen Interactions, Bioinformatics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Views

Reviewer Report 09 Apr 2025

Ylva Ivarsson, Department of Chemistry, Uppsala University, Husargatan, Sweden

Approved

https://doi.org/10.5256/f1000research.179574.r373844

The manuscript is fine now with the corrections ... Continue reading

CITE

Report a concern

Respond or Comment

Version 1

VERSION 1

PUBLISHED 27 Jan 2025

Views

Reviewer Report 25 Feb 2025

Sobia Idrees, University of Technology, Sydney, Australia

Approved

https://doi.org/10.5256/f1000research.175876.r365838

The manuscript by Choteau et al. introduces mimicINT, a computational workflow designed to facilitate large-scale inference of host-pathogen protein-protein interactions. The workflow focuses on detecting molecular mimicry elements: short linear motifs and host-like globular domains to predict interactions between microbial and human proteins.

The authors have clearly described the methodology behind mimicINT, making it a valuable resource for studying host-pathogen interactions.

One of the manuscript's strengths is the application of mimicINT in two use cases: the interaction between Escherichia coli type-3 secreted effectors and human proteins and Marburg virus interactions. These examples demonstrate the biological relevance of the workflow and highlight its potential for uncovering insights into diverse host-pathogen systems.

The authors have developed an easy-to-use interactive server, making it more straightforward for users to run the workflow. Moreover, the authors have made mimicINT codes available on GitHub, giving access to the workflow and associated code. This will allow others to review and potentially build upon the work, contributing to the tool's transparency and reproducibility.

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: SLiM mediated interactions.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

CITE

Report a concern

Author Response 28 Feb 2025

Andreas Zanzoni, Aix-Marseille University, Inserm, TAGC, UMR_S1090, Turing Centre for Living Systems, Marseille, France

28 Feb 2025

Author Response

We thank the reviewer for these positive comments.
Competing Interests: No competing interests were disclosed.
We thank the reviewer for these positive comments.
We thank the reviewer for these positive comments.
Competing Interests: No competing interests were disclosed. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 28 Feb 2025

Andreas Zanzoni, Aix-Marseille University, Inserm, TAGC, UMR_S1090, Turing Centre for Living Systems, Marseille, France

28 Feb 2025

Author Response

We thank the reviewer for these positive comments.
Competing Interests: No competing interests were disclosed.
We thank the reviewer for these positive comments.
We thank the reviewer for these positive comments.
Competing Interests: No competing interests were disclosed. Close
Report a concern

Views

Reviewer Report 24 Feb 2025

Ylva Ivarsson, Department of Chemistry, Uppsala University, Husargatan, Sweden

Approved with Reservations

https://doi.org/10.5256/f1000research.175876.r363112

The manuscript by Choteau et al., describes a useful workflow for bioinformatic analysis of microbe-host protein interactions.

The manuscript reads well and the method is clearly described. The limitations set by the use of the currently annotated ELM instances is clearly described, and highlights the importance of maintaining such curation efforts.

As an experimentalist I appreciate in principle the on-line server. However, when testing it, it did not run (“OperationalError at /prediction/run; could not extend file "base/16384/16619_fsm": No space left on device; HINT: Check free disk space.”) so I could not really evaluate it. Even when running the text example provided the outcome was the same, which was a bit unfortunate. The authors should ensure that the server is running, or if the issue is on the user side, then please explain the limitations of the usability of the online version to the non-expert.

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Expert on the experimental identification and charachterization of SLiM-based interactions.

CITE

Report a concern

Author Response 28 Feb 2025

Andreas Zanzoni, Aix-Marseille University, Inserm, TAGC, UMR_S1090, Turing Centre for Living Systems, Marseille, France

28 Feb 2025

Author Response

First of all, we thank the reviewer for her positive remarks.

Regarding the mimicINTweb issue raised by the reviewer, we found a miss-configured logging option which filled the server ... Continue reading First of all, we thank the reviewer for her positive remarks.

Regarding the mimicINTweb issue raised by the reviewer, we found a miss-configured logging option which filled the server disk space. We have now fixed the problem. This should allow the reviewer to test the webserver.
First of all, we thank the reviewer for her positive remarks.

Regarding the mimicINTweb issue raised by the reviewer, we found a miss-configured logging option which filled the server disk space. We have now fixed the problem. This should allow the reviewer to test the webserver.
Competing Interests: No competing interests were disclosed. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 28 Feb 2025

Andreas Zanzoni, Aix-Marseille University, Inserm, TAGC, UMR_S1090, Turing Centre for Living Systems, Marseille, France

28 Feb 2025

Author Response

First of all, we thank the reviewer for her positive remarks.

Regarding the mimicINTweb issue raised by the reviewer, we found a miss-configured logging option which filled the server ... Continue reading First of all, we thank the reviewer for her positive remarks.

Regarding the mimicINTweb issue raised by the reviewer, we found a miss-configured logging option which filled the server disk space. We have now fixed the problem. This should allow the reviewer to test the webserver.
First of all, we thank the reviewer for her positive remarks.

Regarding the mimicINTweb issue raised by the reviewer, we found a miss-configured logging option which filled the server disk space. We have now fixed the problem. This should allow the reviewer to test the webserver.
Competing Interests: No competing interests were disclosed. Close
Report a concern

Comments on this article Comments (0)

Version 2

VERSION 2 PUBLISHED 27 Jan 2025

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2	3
Version 2 (revision) 28 Mar 25	read		read
Version 1 27 Jan 25	read	read

Ylva Ivarsson, Uppsala University, Husargatan, Sweden
Sobia Idrees, University of Technology, Sydney, Australia
Leandro Simonetti, Uppsala University, Uppsala, Sweden

Comments on this article

All Comments(0)

Add a comment

Browse by related subjects

Back to all reports

Reviewer Report

15 Views

10 Apr 2025 | for Version 2

Leandro Simonetti, Uppsala University, Uppsala, Sweden

15 Views Cite this report Responses(0)

Approved With Reservations

In Methods > Implementation, there's a misplaced semicolon: "[...] host protein (iii); the interaction inferences [...]" > "[...] host protein; (iii) the interaction inferences [...]"
In Methods > Implementation, is not clear to me how the domain annotations gathered in step (ii) are used by mimicINT down the pipeline
The first section of Results would benefit from having a sub-title, since it's not a general description of the section but a specific set of results
The output from the Web Server example provides a series of output files to explore. I think it would be useful to include the calculated domain scores and motif probability p-values for the relevant interaction types in case the user wants to use it for filtering the output. It would also be useful to have the predicted motif instance sequence included in the output table.
Web server was not accessible today (8th of April 2025) because of an expired SSL certificate, so I could not check the description of the downloaded files in the help section though the output folders structure was easy to understand

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Short Linear Motifs, Protein-Protein Interactions, Host-Pathogen Interactions, Bioinformatics

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

3 Views

09 Apr 2025 | for Version 2

Ylva Ivarsson, Department of Chemistry, Uppsala University, Husargatan, Sweden

3 Views Cite this report Responses(0)

Approved

The manuscript is fine now with the corrections made. So I recommend that it is accepted.

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Expert on the experimental identification and charachterization of SLiM-based interactions.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

16 Views

25 Feb 2025 | for Version 1

Sobia Idrees, University of Technology, Sydney, Australia

16 Views Cite this report Responses(1)

Approved

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

SLiM mediated interactions.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (1)

Back to all reports

Reviewer Report

25 Views

24 Feb 2025 | for Version 1

Ylva Ivarsson, Department of Chemistry, Uppsala University, Husargatan, Sweden

25 Views Cite this report Responses(1)

Approved With Reservations

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Expert on the experimental identification and charachterization of SLiM-based interactions.

Respond to this report

Responses (1)

Alongside their report, reviewers assign a status to the article:

Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions

[1] 1. Yamauchi Y, Helenius A: Virus entry at a glance. J. Cell Sci. 2013 Mar 15; 126(Pt 6): 1289–1295. PubMed Abstract | Publisher Full Text

[2] 2. Franzosa EA, Xia Y: Structural principles within the human-virus protein-protein interaction network. Proc. Natl. Acad. Sci. USA. 2011 Jun 28; 108(26): 10538–10543. PubMed Abstract | Publisher Full Text | Free Full Text

[3] 3. Garamszegi S, Franzosa EA, Xia Y: Signatures of pleiotropy, economy and convergent evolution in a domain-resolved map of human-virus protein-protein interaction networks. PLoS Pathog. 2013; 9(12): e1003778. PubMed Abstract | Publisher Full Text | Free Full Text

[4] 4. Davey NE, Travé G, Gibson TJ: How viruses hijack cell regulation. Trends Biochem. Sci. 2011 Mar; 36(3): 159–169. Publisher Full Text

[5] 5. Hagai T, Azia A, Babu MM, et al.: Use of host-like peptide motifs in viral proteins is a prevalent strategy in host-virus interactions. Cell Rep. 2014 Jun; 7(5): 1729–1739. PubMed Abstract | Publisher Full Text | Free Full Text

[6] 6. Via A, Uyar B, Brun C, et al.: How pathogens use linear motifs to perturb host cell networks. Trends Biochem. Sci. 2015 Jan; 40(1): 36–48. PubMed Abstract | Publisher Full Text

[7] 7. Arnold R, Boonen K, Sun MGF, et al.: Computational analysis of interactomes: current and future perspectives for bioinformatics approaches to model the host-pathogen interaction space. Methods. 2012 Aug; 57(4): 508–518. PubMed Abstract | Publisher Full Text | Free Full Text

[8] 8. Nourani E, Khunjush F, Durmuş S: Computational approaches for prediction of pathogen-host protein-protein interactions. Front. Microbiol. 2015; 6: 94.

[9] 9. Andrighetti T, Bohar B, Lemke N, et al.: MicrobioLink: An Integrated Computational Pipeline to Infer Functional Effects of Microbiome-Host Interactions. Cells. 2020 May 21; 9(5): E1278. Publisher Full Text

[10] 10. Zanzoni A, Spinelli L, Braham S, et al.: Perturbed human sub-networks by Fusobacterium nucleatum candidate virulence proteins. Microbiome. 2017 Aug 10; 5(1): 89. PubMed Abstract | Publisher Full Text | Free Full Text

[11] 11. Guven-Maiorov E, Tsai CJ, Ma B, et al.: Prediction of Host-Pathogen Interactions for Helicobacter pylori by Interface Mimicry and Implications to Gastric Cancer. J. Mol. Biol. 2017 Dec 8; 429(24): 3925–3941. PubMed Abstract | Publisher Full Text | Free Full Text

[12] 12. Lasso G, Mayer SV, Winkelmann ER, et al.: A Structure-Informed Atlas of Human-Virus Interactions. Cell. 2019 Sep 5; 178(6): 1526–1541.e16. PubMed Abstract | Publisher Full Text | Free Full Text

[13] 13. Becerra A, Bucheli VA, Moreno PA: Prediction of virus-host protein-protein interactions mediated by short linear motifs. BMC Bioinformatics. 2017 Mar 9; 18(1): 163. PubMed Abstract | Publisher Full Text | Free Full Text

[14] 14. Guven-Maiorov E, Hakouz A, Valjevac S, et al.: HMI-PRED: A Web Server for Structural Prediction of Host-Microbe Interactions Based on Interface Mimicry. J. Mol. Biol. 2020 May 15; 432(11): 3395–3403. PubMed Abstract | Publisher Full Text | Free Full Text

[15] 15. Reys V, Pons JL, Labesse G: SLiMAn 2.0: meaningful navigation through peptide-protein interaction networks. Nucleic Acids Res. 2024 Jul 5; 52(W1): W313–W317. PubMed Abstract | Publisher Full Text | Free Full Text

[16] 16. Mosca R, Céol A, Stein A, et al.: 3did: a catalog of domain-based interactions of known three-dimensional structure. Nucleic Acids Res. 2014 Jan; 42(Database issue): D374–D379. PubMed Abstract | Publisher Full Text | Free Full Text

[17] 17. Kumar M, Gouw M, Michael S, et al.: ELM-the eukaryotic linear motif resource in 2020. Nucleic Acids Res. 2020 Jan 8; 48(D1): D296–D306. PubMed Abstract | Publisher Full Text

[18] 18. Köster J, Rahmann S: Snakemake-a scalable bioinformatics workflow engine. Bioinformatics. 2018 Oct 15; 34(20): 3600. PubMed Abstract | Publisher Full Text

[19] 19. Jones P, Binns D, Chang HY, et al.: InterProScan 5: genome-scale protein function classification. Bioinformatics. 2014 May 1; 30(9): 1236–1240. PubMed Abstract | Publisher Full Text | Free Full Text

[20] 20. Blum M, Chang HY, Chuguransky S, et al.: The InterPro protein families and domains database: 20 years on. Nucleic Acids Res. 2021 Jan 8; 49(D1): D344–D354. PubMed Abstract | Publisher Full Text | Free Full Text

[21] 21. Schleker S, Garcia-Garcia J, Klein-Seetharaman J, et al.: Prediction and comparison of Salmonella-human and Salmonella-Arabidopsis interactomes. Chem. Biodivers. 2012 May; 9(5): 991–1018. PubMed Abstract | Publisher Full Text | Free Full Text

[22] 22. Edwards RJ, Paulsen K, Aguilar Gomez CM, et al.: Computational Prediction of Disordered Protein Motifs Using SLiMSuite. Methods Mol. Biol. 2020; 2141: 37–72. PubMed Abstract | Publisher Full Text

[23] 23. Davey NE, Van Roey K, Weatheritt RJ, et al.: Attributes of short linear motifs. Mol. BioSyst. 2012 Jan; 8(1): 268–281. PubMed Abstract | Publisher Full Text

[24] 24. Dosztányi Z: Prediction of protein disorder based on IUPred. Protein Sci. 2018 Jan; 27(1): 331–340. PubMed Abstract | Publisher Full Text | Free Full Text

[25] 25. Edwards RJ, Palopoli N: Computational prediction of short linear motifs from protein sequences. Methods Mol. Biol. 2015; 1268: 89–141. Publisher Full Text

[26] 26. Paulsen K: Optimising intrinsic disorder prediction for short linear motif discovery. UNSW Sydney; 2019 [cited 2022 Aug 24]. [Thesis]. Reference Source

[27] 27. Rose PW, Bi C, Bluhm WF, et al.: The RCSB Protein Data Bank: new resources for research and education. Nucleic Acids Res. 2013 Jan; 41(Database issue): D475–D482. PubMed Abstract | Publisher Full Text

[28] 28. Gfeller D, Butty F, Wierzbicka M, et al.: The multiple-specificity landscape of modular peptide recognition domains. Mol. Syst. Biol. 2011 Apr 26; 7: 484. PubMed Abstract | Publisher Full Text | Free Full Text

[29] 29. Weatheritt RJ, Luck K, Petsalaki E, et al.: The identification of short linear motif-mediated interfaces within the human interactome. Bioinformatics. 2012 Apr 1; 28(7): 976–982. PubMed Abstract | Publisher Full Text | Free Full Text

[30] 30. Shannon P, Markiel A, Ozier O, et al.: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003 Nov; 13(11): 2498–2504. PubMed Abstract | Publisher Full Text | Free Full Text

[31] 31. Raudvere U, Kolberg L, Kuzmin I, et al.: g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Res. 2019 Jul 2; 47(W1): W191–W198. PubMed Abstract | Publisher Full Text | Free Full Text

[32] 32. Merkel D: Docker: lightweight linux containers for consistent development and deployment. Linux J. 2014; 2014(239): 2.

[33] 33. Kurtzer GM, Sochat V, Bauer MW: Singularity: Scientific containers for mobility of compute. PLoS One. 2017; 12(5): e0177459. PubMed Abstract | Publisher Full Text | Free Full Text

[34] 34. UniProt Consortium: UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 2019 Jan 8; 47(D1): D506–D515. PubMed Abstract | Publisher Full Text | Free Full Text

[35] 35. Fu L, Niu B, Zhu Z, et al.: CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012 Dec 1; 28(23): 3150–3152. PubMed Abstract | Publisher Full Text | Free Full Text

[36] 36. Davey NE, Edwards RJ, Shields DC: The SLiMDisc server: short, linear motif discovery in proteins. Nucleic Acids Res. 2007 Jul; 35(Web Server issue): W455–W459. PubMed Abstract | Publisher Full Text | Free Full Text

[37] 37. Sievers F, Wilm A, Dineen D, et al.: Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 2011 Oct 11; 7: 539. PubMed Abstract | Publisher Full Text | Free Full Text

[38] 38. Eddy SR: Profile hidden Markov models. Bioinformatics. 1998; 14(9): 755–763. Publisher Full Text

[39] 39. Idrees S, Pérez-Bercoff Å, Edwards RJ: SLiMEnrich: computational assessment of protein–protein interaction data as a source of domain-motif interactions. PeerJ. 2018 Oct 31; 6: e5858. PubMed Abstract | Publisher Full Text | Free Full Text

[40] 40. Blohm P, Frishman G, Smialowski P, et al.: Negatome 2.0: a database of non-interacting proteins derived by literature mining, manual annotation and protein structure analysis. Nucleic Acids Res. 2014 Jan; 42(Database issue): D396–D400. PubMed Abstract | Publisher Full Text | Free Full Text

[41] 41. Orchard S, Kerrien S, Abbani S, et al.: Protein interaction data curation: the International Molecular Exchange (IMEx) consortium. Nat. Methods. 2012 Apr; 9(4): 345–350. PubMed Abstract | Publisher Full Text | Free Full Text

[42] 42. Aranda B, Blankenburg H, Kerrien S, et al.: PSICQUIC and PSISCORE: accessing and scoring molecular interactions. Nat. Methods. 2011 Jul; 8(7): 528–529. PubMed Abstract | Publisher Full Text | Free Full Text

[43] 43. Vieira MFM, Hernandez G, Zhong Q, et al.: The pathogen-encoded signalling receptor Tir exploits host-like intrinsic disorder for infection. Commun Biol. 2024 Feb 13; 7(1): 179. PubMed Abstract | Publisher Full Text | Free Full Text

[44] 44. Hermjakob H, Montecchi-Palazzi L, Bader G, et al.: The HUPO PSI’s molecular interaction format--a community standard for the representation of protein interaction data. Nat. Biotechnol. 2004 Feb; 22(2): 177–183. PubMed Abstract | Publisher Full Text

[45] 45. Garcia-Moreno M, Järvelin AI, Castello A: Unconventional RNA-binding proteins step intothe virus-host battlefront. Wiley Interdiscip. Rev. RNA. 2018 Aug 9; 9: e1498. PubMed Abstract | Publisher Full Text | Free Full Text

[46] 46. Young V, Dohai B, Hitch TCA, et al.: A gut meta-interactome map reveals modulation of human immunity by microbiome effectors. bioRxiv. 2023 [cited 2025 Jan 8]; p. 2023.09.25.559292. Publisher Full Text

[47] 47. Slivak M, Choteau SA, Pierre P, et al.: InteractORF, predictions of human sORF functions from an interactome study. bioRxiv. 2024 [cited 2025 Jan 8]; 2024.06.10.598216. Publisher Full Text

[48] 48. Zhang Y, Thomas JP, Korcsmaros T, et al.: Integrating multi-omics to unravel host-microbiome interactions in inflammatory bowel disease. Cell Rep. Med. 2024 Sep 17; 5(9): 101738. PubMed Abstract | Publisher Full Text | Free Full Text

[49] 49. O’Connor SM, Taylor CE, Hughes JM: Emerging infectious determinants of chronic diseases. Emerg. Infect. Dis. 2006 Jul; 12(7): 1051–1057. PubMed Abstract | Publisher Full Text | Free Full Text

[50] 50. Gargano LM, Hughes JM: Microbial origins of chronic diseases. Annu. Rev. Public Health. 2014; 35: 65–82. Publisher Full Text

[51] 51. Zanzoni A: mimicINT workflow: Use cases for interaction interface identification and protein interaction inference. [Dataset]. Zenodo. 2025. Publisher Full Text

[52] 52. Choteau S, Maldonado K, Boujeant M, et al.: TAGC-NetworkBiology/mimicINT: mimicINT, a computational workflow to infer protein-protein interactions (Version v1). [Dataset]. Zenodo. 2024. Publisher Full Text

mimicINT: A workflow for microbe-host protein interaction inference

Abstract

Background

Methods

Results

Conclusions

Keywords

Revised Amendments from Version 1

Introduction

Methods

Implementation

Figure 1. Overview of the mimicINT workflow.

Computation of the motif-binding domain similarity scores

Statistical significance of the SLiMs detected on the microbe sequences

Webserver

Operation

Results

Interface identification in the EHEC-human protein interaction network

Figure 2. Application of the mimicINT workflow to identify potential interaction interfaces.

MARV-human protein interaction inference

Table 1. Summary results of the functional enrichment analysis performed by mimicINT on the 2685 human proteins inferred as interactors of MARV proteins.

Discussion

Data availability

Extended data

Underlying data

Software availability

Acknowledgements

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated