DOCLASP - Docking ligands to target proteins using spatial and electrostatic congruence extracted from a known holoenzyme and applying simple geometrical transformations

The ability to accurately and effectively predict the interaction between proteins and small drug-like compounds has long intrigued researchers for pedagogic, humanitarian and economic reasons. Protein docking methods (AutoDock, GOLD, DOCK, FlexX and Glide to name a few) rank a large number of possible conformations of protein-ligand complexes using fast algorithms. Previously, it has been shown that structural congruence leading to the same enzymatic function necessitates the congruence of electrostatic properties (CLASP). The current work presents a methodology for docking a ligand into a target protein, provided that there is at least one known holoenzyme with ligand bound - DOCLASP (Docking using CLASP). The contact points of the ligand in the holoenzyme defines a motif, which is used to query the target enzyme using CLASP. If there are significant matches, the holoenzyme and the target protein are superimposed based on congruent atoms. The same linear and rotational transformations are also applied to the ligand, thus creating a unified coordinate framework having the holoenzyme, the ligand and the target enzyme. In the current work, the dipeptidyl peptidase-IV inhibitor vildagliptin was docked to the PI-PLC structure complexed with myo-inositol using DOCLASP. Also, corroboration of the docking of phenylthiourea to the modelled structure of polyphenol oxidase (JrPPO1) from walnut is provided based on the subsequently solved structure of JrPPO1 (PDBid:5CE9). Analysis of the binding of the antitrypanosomial drug suramin to nine non-homologous proteins in the PDB database shows a diverse set of binding motifs, and multiple binding sites in the phospholipase A2-likeproteins from the Bothrops genus of pitvipers. The conformational changes in the suramin molecule on binding highlights the challenges in docking flexible ligands into an already ’plastic’ binding site. Thus, DOCLASP presents a method for ’soft docking’ ligands to proteins with low computational requirements.

The ability to computationally predict protein-ligand interactions with accuracy is an invaluable asset, since it allows for large scale screening at minimal costs 1,2 . Consequently, computational methods that predict the favorable conformation of a protein-ligand complex have been the focus of intense research over the last few decades 3 . Protein docking methods are a subset of these methods, characterized by their ability to score a large number of possible conformations using fast algorithms. Among these, five programs -AutoDock 4 , GOLD 5 , DOCK 6 , FlexX 7 and Glide 8 -are the most cited, although it should be noted that 'the number of citations of a given paper is no measure of quality of the corresponding proteinligand docking software program' 9 . Typically, a protein-ligand docking program has two distinct phases -conformational sampling (or searching) and scoring 10 . Despite the significant progress in the field, there are several challenges arising from protein or ligand flexibility, entropic considerations or the presence of water molecules that need to be addressed 11 .
Previously, the conservation of spatial and electrostatic properties in cognate pairs of residues in the catalytic site of proteins with the same functionality has been used to develop a computational method (CLASP) for detecting binding and catalytic sites [12][13][14][15] . In the current work, this methodology has been extended by proposing a method for docking ligands into target proteins -DOCLASP (Docking using CLASP). DOCLASP takes as input a set of proteins with known structures which bind a particular ligand, and a target protein into which the ligand is to be docked. Each of these holo structures is used to define a motif consisting of the first four residues making non-hydrophobic interactions. These motifs are used to query the target protein, using an enhanced version of the search engine used by CLASP that uses precompiled databases 16 , and significant congruent matches are identified. These significant matches in the target protein are now superimposed to the binding residues (the motif) in the corresponding holoenzyme(s), thus creating a unified coordinate framework formed by the holoenzyme, the ligand and the target enzyme. This gives us the docked ligand to the target protein, which is outputted as a Pymol formatted file. In essence, the holoenzyme is replaced with the target enzyme if the contact points have a good spatial and electrostatic match in the target enzyme by aligning the congruent atoms. Thus, DOCLASP leverages the implicit search and scoring functions in CLASP to rank possible conformations.
The native activity of phosphoinositide-specific phospholipase C (PI-PLC) was previously shown to be inhibited by two dipeptidyl peptidase-IV (DPP4) inhibitors -vildagliptin (LAF-237) at micromolar concentrations, and K-579 at nanomolar concentrations using in vitro experiments based on CLASP analysis 15 . Since 'comparing docking programs can be difficult' 9 , the DOCLASP methodology is validated by docking vildagliptin to the PI-PLC structure in complex with myo-inositol 17 . The docked ligand is free from steric clashes and interacts with the exact side chain residues that bind myo-inositol, providing corroboration of the validity of the proposed methodology. Next, an inhibitor of polyphenol oxidase (PPO) 18 was docked to the solved structure of a PPO from walnut 19 , corroborating previous docking results from a modelled structure of the same protein 20 . Finally, the promiscuous binding of suramin, a well-known antitrypanosomial drug 21 , to nine non-homologous proteins in the PDB database revealed diverse binding motifs, and multiple binding sites even within phospholipase A2-like proteins from the Bothrops genus of pitvipers 22 . Also, the conformational changes in suramin upon binding underscores the complexity of docking algorithms, which must sample a much larger conformational space created by both the changing binding site residues and ligand 23 . Thus, the current work presents a fast methodology for docking ligands into protein structures based on spatial and electrostatic congruence of known binding sites to putative binding targets. DOCLASP uses the basic hypothesis of CLASP -the non-triviality of the spatial and electrostatic congruence in cognate pairs seen across different structures of the same catalytic function, which is extended to the related concept of ligand binding 12 . It takes as input a set of Z proteins with known structures (Equation 1) which bind a particular ligand (Lig), and a target protein into which Lig is to be docked (P target ). Each of these Z holo structures is used to define a motif consisting of N (=4) residues (Equation 2), taking the first four closest non-hydrophobic interactions into account (Algorithm 1).

Amendments from Version 2
The current version includes changes suggested by two referees, and has two additional tables and three additional figures.
Secondly, a comprehensive analysis of the binding of the antitrypanosomial drug suramin to different non-homologous proteins from the PDB database has been presented, and DOCLASP was used to dock suramin to a phospholipase A2-like protein.
This highlighted the non-specific binding of some ligands, and also the complexity of modelling a flexible ligand like suramin.

See referee reports
Each position of the motif has a set of amino acids specified to allow for stereochemically equivalent matches at that particular position (Equation 3), such that while matching amino acid type of r i should belong to GROU P i .
Previously, the K sets of N residues were obtained in P target using an exhaustive search procedure similar to the one used in SPASM 24 . An enhanced algorithm now precompiles all possible motifs of a set (N=4 in this case) of predefined amino acid residues from a protein structure that occur within a specified distance 16 , and selects the appropriate ones based on each motif (Equation 4). Any match below a user defined threshold score (S thresh ) is discarded.
In Case Pi matches Φ is null, the ligand Lig can not be docked to the target protein P target . M 1 P i , the first element, has the minimum CScore and represents the putative binding site in P target based on the holoenzyme P i . The set of putative binding sites Φ bindsite is thus defined (Equation 5).
In order to compute CScore, the 3D distances and PD are computed and values are then compared with the corresponding feature values obtained from the holoenzyme motif. The distance scores are normalized since that the same pairwise distance deviation should count more when the reference distance is less. While scoring electrostatic potential differences, deviations < 100 are ignored. Similarly for higher potential differences, the deviations are more loosely constrained than for lower potential differences. The differences are absolute values.
Each element of Φ bindsite (M 1 P i ) is now superimposed to the corresponding holoenzyme, based on the motif binding Lig in P i . In order to superimpose these motifs, linear and rotational transformations are applied on all atoms such that the first three atoms lie on the same plane (Z=0), the first atoms are the origin of the coordinate axis and the second atoms lie on the Y axis. This creates a unified coordinate framework having the holoenzyme, the ligand and the target enzyme, thus providing the docked ligand in the target enzyme. Essentially, the holoenzyme is replaced with the target enzyme if the contact points have a good spatial and electrostatic match in the target enzyme by aligning the congruent atoms. This docked ligand is now outputted as a Pymol formatted file.
The DOCLASP package is written in Perl on Ubuntu. Hardware requirements are modest -all results here are from a simple workstation (2GB ram) and runtimes were a few minutes at the most. Adaptive Poisson-Boltzmann Solver (APBS) and PDB2PQR packages were used to calculate the potential difference between the reactive atoms of the corresponding proteins 25,26 . The APBS parameters and electrostatic potential units were set as described previously in 12. All protein structures were rendered by PyMol (http://www.pymol.org/). Previous CLASP analysis of the spatial and electrostatic properties of active site residues in PI-PLC from B. cereus indicated that it is a prolyl peptidase, which was also validated by in vitro experiments 13 . Subsequently, it was shown that PI-PLC is inhibited by two dipeptidyl peptidase-IV (DPP4) inhibitors -vildagliptin (LAF-237) at micromolar concentrations, and K-579 at nanomolar concentrations. Since there are no DPP4 structures solved which ligand K-579, a DPP4 protein structure in complex with vildagliptin (PDBid:3W2TA) 28 provided the five closest atoms in the protein (E205, E206, S630, Y662 and Y547) (see Methods) that make non-hydrophobic interactions with the ligand (Table 1).
A subset of these atoms might be sufficient to ligand vildagliptin in the target protein. Thus, ( 5 4 ) = 5 motifs, each with four atoms, were created using the five closest atoms in the protein. The binding of ligands is known to induce electrostatic and spatial perturbations in the binding site. The spatial and electrostatic perturbations induced by the vildagliptin binding is shown by comparing the apo (PDBid:2OQIA) and the holoenzyme (PDBid:3W2TA) in Table 2. Hence, the electrostatic and spatial profile of the motif were obtained from the apo DPP4 enzyme (PDBid:2OQIA), and then used for querying the PI-PLC apo structure (PDBid:1PTDA).
These motifs were used to query the PI-PLC structure using an enhanced algorithm (PREMONITION) that precompiles all motifs in a database 16 . Table 3 shows the best matches obtained in the PI-PLC structure for the five partial motifs. All these matches have significant electrostatic congruence. The root mean square deviation (RMSD) have low values -however, this is a deceptive metric since these deviations are averaged out. The maximum pairwise distance is another metric to discriminate the spatial congruence, and should be used in combination with the RMSD value. All of these above mentioned matches have significant maximum pairwise distance deviation. Also, three matches do not comprise of active site residues (motifs 2, 3 and 4). However, it can be seen that the first and fifth matches comprises of active site residues (involved in the binding of myo-inositol in PDBid:1PTGA), and have three residues (Asp67, Asp198 and Trp178) in common.
Thus, these three residues from PI-PLC (Asp67, Asp198 and Trp178) and corresponding three residues from DPP4 (Glu205, Glu206 and Tyr662) were used to superimpose DPP4 and PI-PLC. Table 4 shows the congruence of these residues. The corresponding  Table 2. Changes in the conformation of the binding site due to vildagliptin (a DPP4 inhibitor) binding. We compare the pairwise distance and electrostatic potential difference (EPD) changes in the apo (PDBid:2OQIA) and holo (PDBid:3W2TA) enzymes. Note that the pairwise distance between these atoms change in the ligand free PDB (2OQIA) as compared to the protein with bound inhibitor (2BUBA). For example, the distance between E205OE2 and E206OE2 (pair ab) changes from 3.9 Å to 5.6 Å. Also, there is a definite change in the EPD between E205OE2 and E205OE2 (pair ab). D = Pairwise distance in Å. PD = Pairwise potential difference. The electrostatic potential are in dimensionless units of kT/e where k is Boltzmann's constant, T is the temperature in K and e is the charge of an electron.  Table 4. Spatial and electrostatic congruence of a three residue partial motif. The match is significant and comprises active site residues (involved in the binding of myo-inositol in PDBid:1PTGA). Pair 'ab' is considered to be electrostatically congruent since the PD values are close to zero, and can be considered almost equipotential. This is expected for atoms of the same type from the same residue (GLU205OE1/GLU206OE1 and ASP67OD1/ASP198OD1). These three residues from PI-PLC (Asp67, Asp198 and Trp178) and corresponding three residues from DPP4 (Glu205, Glu206 and Tyr662) were used to superimpose DPP4 and PI-PLC. D = Pairwise distance in Å. PD = Pairwise potential difference. APBS writes out the electrostatic potential in dimensionless units of kT/e where k is Boltzmann's constant, T is the temperature in K and e is the charge of an electron.

PDB
Atoms ( Table 3. Querying PI-PLC using partial motifs derived from the atoms in vildagliptin that make contact to the DPP4 enzyme (PDBid:3W2TA). The comparison is done using apo enzymes (PDBid:1PTDA for PI-PLC and PDBid:2OQIA for DPP4), since the binding of a ligand induces spatial and electrostatic changes in the active site. The fifth motif has the least rmsd deviation, and comprises of active site residues (involved in the binding of myoinositol in PDBid:1PTGA). Out of the best matches in the other four motifs, three do not comprise of active site residues (motifs 2, 3 and 4). Motif 1 has a reasonably significant match, and has three residues (Asp67, Asp198 and Trp178) in common with the best match for Motif 5. These three residues from PI-PLC (Asp67, Asp198 and Trp178) and corresponding three residues from DPP4 (Glu205, Glu206 and Tyr662) were used to superimpose DPP4 and PI-PLC. N = Motif number. D = Pairwise distance in Å. PD = Pairwise potential difference. Rmsd = Root mean square deviation. Max = maximum pairwise distance deviation. APBS writes out the electrostatic potential in dimensionless units of kT/e where k is Boltzmann's constant, T is the temperature in K and e is the charge of an electron. holo structures -PDBid:3W2TA for DPP4, and PDBid:1PTGA for PIPLC -were used for the superimposition. This superimposition applies geometric transformations such that Asp67OD1 and Glu205OE2 were at the center of the coordinate axis (coordinates = [0,0,0]), Asp198OD1 and Glu206OE2 lies on the X-Y axis (i.e. Y coordinate is 0) and Tyr662CZ and Trp178CZ2 were on the X-Y plane (i.e. Z coordinate is 0). Figure 1 shows the superimposed proteins. It is observed that (Asp67, Asp198 and Trp178) overlaps well with (Glu205, Glu206 and Tyr662).

PDB
These transformations were also applied to the vildagliptin molecule, and this resulted in a docked structure for this molecule into the PI-PLC protein. Figure 2 shows the vildagliptin docked into the PI-PLC structure which is complexed with myo-inositol (PDBid:1PTGA). The distances of the atoms in vildagliptin and myo-inositol that interact (excluding hydrophobic interactions) to the first ten residues in the PI-PLC structure are shown in Table 5. It is interesting to note that the residues shown in Table 5 are all part of side chain residues in close contact with the myo-inositol ring (shown in Figure 7 13 ). Further validation was obtained by observing that both Arg69/NH1 and His32/NE2 interact with atom O4 in vildagliptin, and Arg69/NH2 and His32/ NE2 interact with O2 in myo-inositol in the PI-PLC structure. The Pymol script for visualizing the docking (Supplementary Pymol.p1m) and a movie (SupplementaryMovie.avi) are also provided as Supplementary information.
It is important to comment on the previous hypothesis of a nucleophilic serine being responsible for the inhibition of PI-PLC using vildagliptin. The electrostatic and spatial profile of the motif 4 from DPP4 is compared to the electrostatic and spatial profile of matching active site residues in PI-PLC in Table 6, including serine in the comparison. It can be seen that Ser630 in DPP4 has a significant spatial difference as compared to Ser234 in PI-PLC (pair 'bc' has Table 5. Atoms of myo-inositol and vildagliptin that make contact to the residues of the PI-PLC structure (PDBid:1PTGA). These interactions exclude hydrophobic interactions, and the first closest ten atoms are chosen. Out of ten, seven residues obtained by docking vildagliptin using DOCLASP are seen to be equivalent to those that are known to bind myo-inositol to the PI-PLC structure 13 , while two more have the same amino acid type (marked by asterisks a difference of 5 Å), and also a reasonable electrostatic difference (pair 'ac' has a difference of 144 PD units). The relatively large distance over which Ser234 in PI-PLC interacts with myo-inositol (4.8 Å) indicates that Ser234 is not directly involved in the binding of the ligand. However, it is responsible for creating the electrostatic milieu that is required for other interacting residues to attain their appropriate potential. Even for DPP4, many inhibitors do not interact with the nucleophilic Ser630 (manuscript in preparation) -although the vildagliptin molecule does 28 . Thus, the previous Three residues from PI-PLC (Asp67, Asp198 and Trp178 in yellow) were superimposed to the corresponding three residues from DPP4 (Glu205, Glu206 and Tyr662 in red). Asp67OD1 and Glu205OE2 is at the center of the coordinate axis (coordinates = [0,0,0]) (in black), Asp198OD1 and Glu206OE2 lies on the X-Y axis (i.e. Y coordinate is 0) and Tyr662CZ and Trp178CZ2 are on the X-Y plane (i.e. Z coordinate is 0). Asp67, Asp198 and Trp178 in the PI-PLC protein overlaps well with Glu205, Glu206 and Tyr662 from DPP4, but the Ser234-Ser630 pair is not spatially congruent.

Figure 2. Docking vildagliptin to the PI-PLC structure in complex with myo-inositol (PDBid:1PTGA).
It can be seen that vildagliptin fits into the binding site of PI-PLC. It also makes non-hydrophobic contacts to the residues in the protein similar to those made by myo-inositol (Table 5). Table 6. Potential and spatial congruence of the residues binding vildagliptin in DPP4 structure (PDBid:2OQIA) to the putative binding site in PI-PLC structure (PDBid:1PTDA). Both the structures are apo enzymes, since the binding of a ligand induces spatial and electrostatic changes in the active site. Ser630 in DPP4 has a significant spatial difference as compared to Ser234 in PI-PLC (pair 'bc' has a difference of 5 Å), and also a reasonable electrostatic difference (pair 'ac' has a difference of 144 PD units). D = Pairwise distance in Å. PD = Pairwise potential difference. APBS writes out the electrostatic potential in dimensionless units of kT/e where k is Boltzmann's constant, T is the temperature in K and e is the charge of an electron. conjecture of a nucleophilic serine being directly responsible for the binding of DPP4 inhibitors to PI-PLC, as implied by the catalytic triad congruence, is incorrect 13 . However, this serine is indirectly responsible for driving the neighboring residues to an appropriate state. Spatial constraints are an additional discriminator.

PDB
DOCLASP was also used recently 29 to dock human karyopherin to the VP24 30 protein of the Reston Ebola strain using the VP24 from Zaire Ebola 31 as a template, and demonstrate that a single mutation might be one of the critical factors responsible for the non-pathogenic nature of Reston Ebola in humans 32,33 .

Docking phenylthiourea to polyphenol oxidase from walnut
Polyphenol oxidases (PPO/tyrosinases/catechol oxidases) are copper enzymes implicated in the biosynthesis of quinones 34 . The recently sequenced walnut genome sequence revealed the presence of two PPO genes (JrPPO1/2) 20 , only one of which was previously known (JrPPO1) 18 . Since there were no known structures of JrPPO1/2 at the time of writing 20 , SWISSMODEL was used to model JrPPO1 (modelJrPPO1) based on the structure of the homologous PPO from Ipomoea batatas (PDBid:1BUG, sweet potato). The structure of JrPPO1 was recently solved (solved-JrPPO1, PDBid:5CE9) 19 . DOCLASP docked phenylthiourea (URS) to modelledJrPPO1 and solvedJrPPO1 by aligning conserved copper binding histidines (His87, His108 and His117 in JrPPO1, Figure 3) (see modelledJrPPO1Docked.pdb and solvedJrPPO1Docked.pdb in Dataset1). The binding pose of URS showed a similar configuration in both modelledJrPPO1 and solvedJrPPO1 ( Table 7).

Analysis of suramin binding to a set of nine nonhomologous proteins
Human African Trypanosomiasis (HAT), endemic to sub-Saharan Africa, is caused by the parasite Trypanosoma brucei and is transmitted via the tsetse fly 35 . Suramin, a hexasulfonated naphthylurea, is a antitrypanosomial drug which has been in clinical use for decades, and more recently for the treatment of malignant tumors 36-39 . It has been known for a while now that suramin binds to several human proteins including cullin-RING E3 ubiquitin ligases 40 , serum albumin 41 , P2X receptors 42 , neutrophil elastases 43 ,   (Table 8). However, all possible hydrogen bonds for negatively charged residues are with the backbone atoms (O or N), while the positively charged residues hydrogen bond to suramin using the sidechain atoms. Furthermore, the non-polar residues (Gly, Ala and Pro) also have (possible) hydrogen bonds to the backbone atoms only. This corroborates the fact that the suramin binds to positively charged parts of the protein 56 . Each of these binding sites provides a motif to DOCLASP when suramin is to be docked to a target protein.
The extensive data available on suramin binding to different proteins highlights the complex nature of ligand docking. Three phospholipase A2-like proteins (PLA2) from poisonous vipers exist in the current PDB database which have suramin as a ligand: (i) PLA2-X, PDBid:1Y4L from Bothrops asper (pit viper), (ii) PLA2-Y, PDBid:3BJW from Echis carinatus (saw scaled viper) and (iii) PLA2-Z, PDBid:4YV5 from B. moojeni (Brazilian lancehead viper). Suramin has different binding sites in the two homologous subunits (Table 8) of PLA2-Y, which is significantly different from PLA2-X and PLA2-Z ( Figure 4a). Interestingly, inspite of conserved residues in PLA2-X and PLA2-Z, suramin binds to different sites in these proteins (Table 8). DOCLASP docking of suramin to PLA2-Z based on the binding residues from PLA2-X (Lys53, Lys69 and Tyr52) shows that this is reasonable binding site (see PLA2dockedsuramin.p1m in Dataset 1, Figure 4b).
The suramin molecule itself undergoes a significant conformational change on binding, and different proteins induce different conformational changes. Two molecules of suramin bound to different non homologous proteins (PDBid:1Y4LB and PDBid:1Y8EA - Table 1) is superimposed using DECAAF 57 by aligning three atoms -S17, S21 and S31 ( Figure 5). Different atoms of suramin are involved in the ligand binding for different proteins, and induce significant conformational changes in the drug. For example, the maximum distance between any two atoms in SVR: PDBid:1Y8EA is 30.211 Å (between O80 and O29), while the maximum distance in SVR:PDBid:1Y4LB is 26.4 Å (between O82 and O30). This underscores the complexity of docking algorithms, which must sample a much larger conformational space created by both the flexible binding residues and ligand 58 .
The number of docking methods available is such that even a detailed review could only provide a partial list of currently available docking methods 9 . The current work presents a template based, static method that leverages the spatial and electrostatic properties of the binding site. The definitive advantage of a static method can only be highlighted by emphasizing the known limitations of conformational sampling of the protein structure 59,60 . The problem is indeed exacerbated by the plasticity of the drug itself 61,62 . While DOCLASP is completely ineffectual in the absence of such a database, unlike de novo methods, it benefits from the burgeoning database of protein-ligand structures 63 . The conservation of electrostatic properties, extracted using APBS/PDB2PQR, is the strongest argument in favor of DOCLASP.
There are several limitations in the method. Firstly, it can be applied to those compounds which are bound to proteins whose structures have been solved. Additionally, it is requires the structure of the apoenzyme, as this is used to extract the query motif considering the structural and electrostatic changes induced by ligand binding. However, with an ever increasing number of protein structures being solved, this is not a severe limitation since most proteins with ligands also have their apo structures solved. Furthermore, the lack of congruent matches leads DOCLASP to return a null result. This Table 8. Suramin binding residues in nine non-homologous proteins from the PDB database: Interactions sorted based on the distance. R/A/LA/D: Residue number/Atom of the residue/Atom of ligand/distance between the interacting atoms (in Å). For example, 'LYS53/NZ/O80/2.6' means that the atom NZ from Lys53 is at 2.6 Å from the O80 of the suramin drug in PDBid:1Y4LB. The non-specific binding of suramin to phospholipase A2-like proteins (PLA2) is demonstrated by different binding sites for the homologous PLA2-X and PLA2-Z. Also, suramin binds to different sites even within the homologous subunits of PLA2-Y (marked with asterisks).    can be overcome by relaxing constraints, for example by checking for spatial congruence only. Finally, it is required to develop an energy function which will be able to discriminate poorly docked structures that have either significant steric clashes or are docked on the surface of the protein.

PDBid
To summarize, this work presents an implicit method for docking ligands to proteins, in which the search and scoring are implicit in the CLASP algorithm. One significant limitation of this method is the requirement of template protein structures in complex with the given compound. As future work, I intend to incorporate the flexibility of the ligand and protein to add further discrimination.

Competing interests
No competing interests were disclosed.

Grant information
The author(s) declared that no grants were involved in supporting this work. 1.
aligning two proteins to match 4 cognate pairs of residues ought to identify the binding site in a novel/target protein. The alignment process additionally provides the pose of the ligand in the target protein, if the matching is conducted with a holoenzyme.
As a proof of concept, the article is scientifically sound.
Major observations: It would be desirable to provide more testing sets. That is, expand the evaluation to more than binding of vildagliptin to PI-PLC. It is trivial to do so. The authors can use known protein-ligand complex cases, treat them as unknown effectively, and test whether they are able to reproduce the binding pose for the ligand.
Expanding the test set will also improve the evaluation of the proposed DOCLASP protocol. The selected case study is presented in detail, but how would one determine the effectiveness of the method at a large scale? What would be summary results on application of DOCLASP on a set of test cases?
The author states that a limitation of the method, as a template-based one, is the availability of protein-ligand pairs with known structures. Given the growth in structural data, this is bound to be less of a limitation now as opposed to ten years ago. However, what needs to be discussed is the sensitivity of the method. Is the author reliant on structures with good resolution? What would this be? If an X-ray structure has 2.5A or lower resolution as opposed to 1.5, should the results be trusted less?
Following on above, if two different poses are obtained by matching with two different holoenzymes, and one of the holoenzymes has resolution 2.5 but the other has resolution 1.5, should this be considered in what pose is recommended for the ligand in the binding protein? What is the reliance of predictions on quality of input?
In case of pose discrepancies from diversity of quality of holoenzymes, would energetic minimization help converge to the right pose? This would be an interesting experiment to conduct. In particular, the experiment can suggest that a specific threshold in quality is needed for the minimization to converge to the right pose. Minor: Consider using a different symbol to denote the number of holoenzymes, M, from the symbol used to denote motif, M. In particular, this can cause confusions regarding equation (5). P1...PM relates to the set of holoenzymes, whereas M1^{PM} relates to best-scoring motif extracted from the last holo-enzyme, PM.
Given that there are M holoenzymes, it is unclear whether the method produces M possible poses for the ligand, or if these poses are filtered or joined in some way to provide a super pose.
Consider using passive form rather than active "I" form when summarizing future work.
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
No competing interests were disclosed.

Competing Interests:
Author Response 06 Jun 2016 lower resolution as opposed to 1.5, should the results be trusted less? 4. Following on above, if two different poses are obtained by matching with two different holoenzymes, and one of the holoenzymes has resolution 2.5 but the other has resolution 1.5, should this be considered in what pose is recommended for the ligand in the binding protein? What is the reliance of predictions on quality of input?
Empirically, I would assume that the sensitivity of this method would improve with lower resolution, but I do not know how to establish that scientifically. DOCLASP also depends on APBS for computing electrostatic potential, and that is known to improve with lower resolution.

5.
In case of pose discrepancies from diversity of quality of holoenzymes, would energetic minimization help converge to the right pose? This would be an interesting experiment to conduct. In particular, the experiment can suggest that a specific threshold in quality is needed for the minimization to converge to the right pose.
Energetic minimization would certainly help in resolving steric clashes. Since there are several well established methods that can be applied to the output PDB from DOCLASP, I have not discussed these in the current paper.
Minor: 1. Consider using a different symbol to denote the number of holoenzymes, M, from the symbol used to denote motif, M. In particular, this can cause confusions regarding equation (5). P1...PM relates to22 the set of holoenzymes, whereas M1PM relates to best-scoring motif extracted from the last holo-enzyme, PM. Done (number of holoenzymes is denoted by Z).

2.
Given that there are M holoenzymes, it is unclear whether the method produces M possible poses for the ligand, or if these poses are filtered or joined in some way to provide a super pose. If there are multiple holoenzymes, DOCLASP will generate a pose for each (assuming there is a significant match of the motif from the holoenzyme). Currently, there is no method to create a single 'super pose' from these.
3. Consider using passive form rather than active "I" form when summarizing future work.

Done.
I appreciate your consideration of the revised manuscript, and look forward to hearing back from you.

best wishes, Sandeep
No competing interests were disclosed.

Competing Interests:
Referee Response 20 Jun 2016 , George Mason University, USA Amarda Shehu I will be keen to see the progress of this work, particularly with regards to an integrative setting that time taken to find another reviewer. Please find my detailed responses to your comments below.
The author has presented a novel docking algorithm based on the principle that spatial and electrostatic properties in cognate pairs of residues are conserved in the catalytic site of proteins endowed with the same function. Thus, in case the crystal structure of a protein-ligand complex is available, it can be used to dock the same ligand into the active site of a homologous protein (with identical catalytic activity). The author has tested the algorithm by docking vildagliptin into the active site of PI-PLC. 1. Examination of Figure 2 and Table 5 show quite a few short contacts between the docked ligand and amino acid chains. Under such circumstances the author could consider the inclusion of an energy minimization protocol to relieve the short contacts. 2. The author considers the method validated by the above mentioned docking exercise. However, the author could consider a case where the actual crystal structure of the solution exists and the RMSD between the experimental solution and the solution obtained by DOCLASP could be provided to give some estimate of the accuracy of the docked poses. Given the fact that the DOCLASP solution abounds in short contacts, further improvements could be made in the pose.
Since, DOCLASP uses templates from holoenzymes, re-liganding them back would be trivially correct all the time. Energy minimizations would certainly help in resolving the steric constraints. Such methods are standard, and can be applied to the output PDB of DOCLASP, but have not been done in the current version. I will consider including this in the future. Instead, I have included two additional test cases, DOCLASP was previously applied to the binding of phenylthiourea to polyphenol oxidases (PPO) from walnut (JrPPO1) (http://onlinelibrary.wiley.com/doi/10.1111/tpj.13207/). Incidentally, the structure of this protein was solved while the walnut genome manuscript was in review (PDBid:5CE9). Here, phenylthiourea was docked to this solved structure, resulting in almost the same pose as the one in which DOCLASP docked phenylthiourea to the SWISSMODEL-modelled structure of JrPPO1. Secondly, a comprehensive analysis of the binding of suramin to different non-homologous proteins from the PDB database has been presented, and DOCLASP was used to dock suramin to a phospholipase A2-like protein. This highlights the non-specific binding of some ligands, and also the complexity of modelling a flexible ligand like suramin.

On page 3, two lines below Equation 4
, a more explicit account of CScore could be given in the manuscript. I have modified the methods section to give a more explicit explanation of CScore.
I appreciate your consideration of the revised manuscript, and look forward to hearing back from you. best wishes, Sandeep No competing interests were disclosed. Competing Interests: