ABS – Scan : alanine scanning mutagenesis In silico for binding site residues in protein – ligand complex

Most physiological processes in living systems are fundamentally regulated by protein–ligand interactions. Understanding the process of ligand recognition by proteins is a vital activity in molecular biology and biochemistry. It is well known that the residues present at the binding site of the protein form pockets that provide a conducive environment for recognition of specific ligands. In many cases, the boundaries of these sites are not well defined. Here, we provide a web-server to systematically evaluate important residues in the binding site of the protein that contribute towards the ligand recognition through in silico alanine-scanning mutagenesis experiments. Each of the residues present at the binding site is computationally mutated to alanine. The ligand interaction energy is computed for each mutant and the corresponding ΔΔG values are calculated by comparing it to the wild type protein, thus evaluating individual residue contributions towards ligand interaction. The server will thus provide a ranked list of residues to the user in order to obtain loss-of-function mutations. This web-tool can be freely accessed through the following address: http://proline.biochem.iisc.ernet.in/abscan/. 1 1 2

Currently (as of April 3, 2014) 1 there exist more than 72000 experimentally determined protein structures complexed with small molecule ligands, providing an extensive data resource on protein binding sites. These binding sites vary in size ranging from six to thirty residues depending upon the size and the nature of the ligand. In most cases, the contribution of the individual amino acids towards the binding of a given ligand is not well understood. A well-established method of demonstrating the importance of a residue at the site is to create point mutants through site-directed mutagenesis 2 . Efforts towards characterization of entire functional site include tools such as alanine scanning mutagenesis (ASM) 3 where each residue is mutated to an alanine and its effect on the function is evaluated. ASM is indeed a well-used technique in experimental biology and has been successfully applied to the problems of protein folding and stability 4 , protein-protein 5,6 , and protein-ligand 7 interactions. The experimental success of this technique has resulted in further developments, including high-throughput and low-cost variants 8 , greatly expanding its reach. Yet, given the time, cost and effort required for carrying out experimental biochemistry, a large majority of proteins are yet to be studied through this method.
Due to availability of a variety of structural bioinformatics tools, it is now feasible to carry out alanine scanning mutagenesis computationally 9 . Spurred by the successes and widespread adoption of the ASM technique, various computational resources now exist for in-silico alanine scanning. Prominent examples include Modeller 10 and the Rosetta software suite 11 . However, most packages are commandline oriented and are out of reach for researchers. Alanine scanning webservers with intuitive user interfaces such as Robetta webserver 12 , the Rosetta Design web-server 13 , ROSIE 14 , FOLDX 15 , BeATMuSiC 16 , DrugScore PPI 17 exist for the problems of protein folding, protein stability and protein-protein interactions. Although, there are workflows to evaluate ligand-binding energetics which require significant computational time and setup through free-energy calculations involving Molecular Mechanics/Generalized Born Surface Area method (MM-GBSA) 18-20 , there is however, no intuitive web-tool available for analyzing alanine-scanning mutations of small-molecule binding site residues in real time. A common requirement for an experimental biochemist is to identify which amino acids to mutate in the protein to generate loss-of-function mutants. A web-tool to cater to that specific need will therefore be highly useful. The analysis will also provide deep insights into critical residues for interaction, residue pairs or sets that when mutated will abolish ligand binding and provide analytical insights for lead refinement in the process of drug discovery, as well as understand drug resistance due to mutations.
We present a computational workflow and webserver, Alanine Binding Site-Scan (ABS-Scan), for automated alanine-scanning mutagenesis of protein-ligand interface residues. The workflow combines the libraries of widely used software packages including Modeller 10 for site-specific alanine mutagenesis and Autodock 21 for energetic evaluation of protein-ligand complexes.

Workflow
This workflow allows a user to submit a protein-ligand complex of their interest ( Figure 1). The user is provided with an option of selecting a distance cut-off to define the binding site around a specific ligand for which, in-silico alanine scanning mutagenesis is carried out. Once the input parameters are obtained, the Modeller library is used to perform site-specific mutagenesis on all selected residues, coupled with steps of energy minimization 22 . This consists of initial steps of conjugate gradient (200 iterations with minimum atom shift of 0.001Å), followed by 200 steps of molecular dynamics simulation with steepest descent carried out at different temperatures. The initial restraints for the mutated model are derived from the wild-type protein structure. The analysis and results derived from alanine scanning mutagenesis relies on two assumptions: (a) The introduced point mutation does not drastically change the structure of the protein and (b) the mode of ligand interaction in point mutant is the same in comparison to wild-type complex. Care is taken to ensure that there are no steric clashes between the protein/ligand atoms during the process of minimization. The quality of the protein structures generated is estimated through Discrete Optimized Protein Energy (DOPE) score 23 , a statistical potential score that is calculated for each of the mutant. This scoring scheme is based on the improved reference consisting of non-interacting atom pairs in a homogenous sphere with radius dependent on sample native structure. The score therefore reflects the feasibility of interactions and the compactness of the modeled structure.
Each mutated structure, will then be scored by using Autodock 4.1 force field 21 , to calculate the energetics of a protein-ligand complex. The force-field is used here only to score the pose of proteinligand interaction and no docking is performed. By default, 'check_ hydrogens' flag is kept 'on' while preparing the receptor and Gasteiger

Amendments from Version 1
The 'Workflow' and the 'Validation' section has been revised appropriately in this version of the manuscript to address all the queries raised by the reviewers.
1. The ABS-Scan workflow now reports the DOPE scores (Discrete Optimized Protein Energy) to assess the quality of the mutant protein structure generated. The details on the evaluation of protein-ligand energetics has been explained in the 'workflow' section.
2. An option of residue-range has now been provided to deal with complex ligand that contains more than one moiety/ residue (Ex: 401-404 for PDB ID: 1J84).
3. In addition to CSAR dataset, the 'Validation' section has now been updated to include the analysis carried out on the 'PDBbind' dataset to assess the significance of ΔΔG scores reported.
4. 'Validation' now also included case studies describing three different examples to evaluate the predictions of ABS-Scan.
5. The 'Input' section of the web-server also provides an advanced option to deal with water molecules for proteinligand energetic calculations. This has been illustrated with an example in manuscript 6. In addition to citations of 'Modeller' and 'AutoDock Tools' in the manuscript, the source code -'alanine_scanning_v2.py' in github also has been updated to include all the 'original' citations and references for the tools used.

REVISED
charges are used for proteins and ligand. The contribution from a protein residue is determined by difference in interaction score of mutant and wild-type protein (ΔΔG value). These results are graphically presented to the user, along with a ranked list of residues in the given site that could be experimentally explored for sitedirected mutagenesis. A Jmol applet displays protein-ligand interactions with residues colored according to the computed extents of contribution towards interaction, while a table simultaneously displays inter-molecular energy scores. We also provide a helpsection explaining the results along with selected examples.

Validation and case studies
We evaluate the significance of ΔΔG score used to assess the contribution of individual residues at the binding site by systematically analyzing two different datasets. The first dataset was derived from CSAR Community Structure-Activity Resource (CSAR -www. csardock.org/). Decoys in this dataset contain artificial docked complexes of protein with ligands having similar chemical properties to native ligands, but known not to interact with the protein. The protocol could be successfully applied on 288 of 343 protein-ligand native and decoy complexes. The distribution of average ΔΔG scores obtained through ABS-Scan analysis for residues in the binding site for decoy dataset is seen to be different from the native proteinligand complexes (Figure 2A & B). An average ΔΔG score of 0.395 was obtained for the native protein-ligand complexes. The second dataset we used to obtain an estimate of ΔΔG score is derived from PDBbind database 24 and comprises 195 protein-ligand complexes (PDBbind core dataset). Around 135 of these protein-ligand complexes could be successfully processed using ABS-Scan workflow. In this case, an average ΔΔG score of 0.387 was observed for each mutated residue at the binding site. Hence, to determine the sensitivity of ABS-Scan, a cut-off of 0.5, which is a more stringent value, is chosen. ABS-Scan is seen to effectively discriminate between the decoy and the native complexes of CSAR dataset (p-value ~0.004 calculated with Student's t-test) in ~67% of the cases (ΔΔG ≥ 0.5). This clearly indicates that residues important for ligand interaction can be identified through this protocol ( Figure 2C). The detailed results of ΔΔG scores obtained for each of the mutation produced at the binding site for both these datasets can be accessed from the web-resource -http://proline.biochem.iisc.ernet.in/abscan/validation.
A suitable dataset for validation would be one that reports binding affinities for both wild-type and mutant proteins with same ligand, performed in a uniform experimental environment, for large number of proteins. Although such a dataset exists for protein-protein alanine scanning mutagenesis 12,25 , there are none reported for proteinligand interactions. In order to compare the predictions of ABS-Scan with the experimentally reported alanine-scanning mutations, a methodical search was carried out to mine all the experimental results available in literature on alanine-scanning mutagenesis of residues at the binding site. Advanced search option in PDB was used for this purpose. All the PUBMED extracts were scanned for the term -"alanine scanning". The above search criteria mentioned yielded 126 structure hits with 56 citations. The list of entries obtained, was further pruned to remove biologically irrelevant ligands, metal ions and modified residues. The list of 79 entities/binding sites that we finally obtained can be accessed at http://proline.biochem.iisc. ernet.in/abscan/validation. Alanine scanning could be successfully undertaken for 54 of these structures. On an average, atleast two residues per binding site were predicted to have ΔΔG score ≥ 0.5. The details of the dataset and the ranked lists of residues in the order of their contribution to ligand binding identified for all the complexes is made available to the community -http://proline.biochem.iisc.ernet.in/abscan/validation. A study on testosterone binding site of rat 3-alphahydroxysteroid dehydrogenase (PDBID: 1AFS) by Heredia et al. 26 reports that binding site residue in direct contact with the ligand influences the rate determining step of the enzymatic reaction. In this case, the alanine scanning experiments performed on the residues in the binding site that recognize progesterone and testosterone reports the Kd values. The ABS-Scan analysis performed on 3-alpha hydroxysteroid dehydrogenase in complex with both testosterone and progesterone also predicted the residues W227 (ΔΔG score = 1.43; Kd = 10.7±1.2), Y310 (ΔΔG score = 1.31; Kd = 9.20±0.94), L54(ΔΔG score = 0.5696; Kd = 7.24±0.79) to be important for ligand recognition. A good correlation was observed (0.829 for testosterone and 0.704 for progesterone) between the reported Kd value of the mutants and the corresponding predicted ΔΔG score.
A two-dimensional alanine scanning mutations were performed to understand the structure-function relationship between vitamin-D receptor (PDBID: 1IE9) and vitamin-D analogs by Shimizu et al. 27 .
Since there was no structural information available for the analogs complexed to vitamin-D receptor, four of the vitamin-D analogs were docked on the receptor at the vitamin-D native binding site using Rosetta 3.4 docking protocol 28 . All the poses obtained were analyzed using ABS-Scan to determine the residues crucial for interaction of particular ligands. Since this is a nuclear receptor protein, a transcriptional activity assay was used in original study to evaluate the effect of mutants generated. The effect of each vitamin-D receptor mutant was measured by the downstream transactivation assay that quantifies luciferase activity under the influence of VDR (Vitamin-D Receptor promoter) promoter sequence. In this case, if the mutation affects the binding of ligand, correspondingly the expression of luciferase would reduce by a factor that can be quantified. A good negative correlation was also observed with all the four analogs complexed to vitamin D-receptor and atleast four residues -L233, W286, R274 and H397, important for interaction with all the analogs had ΔΔG score > 0.5. L233 and W286, present in H3 (helix 3) and β sheet are reported to have hydrophobic interactions with B and C rings of the ligand whereas R274 present in H4 (helix 4) is observed to have hydrogen bond interaction with 1α-OH group of the ligand.
A similar study was carried out on human trimethyl-guanosine synthase enzyme (Tgs1) that converts m 7 G caps (7-methyl guanosine caps) to 2,2,7-trimethylguanosine (TMG) caps. In the original study 29 around 37 point mutations were introduced into human Tgs1 (PDBID: 3GDH) to study the interaction profile with mGTP (7-methyl guanosine tri-phosphate) and AdoMet (S-adenosyl methionine). The fitness of mutants generated in this case was evaluated by using the methyltransferase assay that determines the percentage of methylation by quantifying the levels of m 7 GDP to m 2,7 GDP. The residues -R807 and K646, reported to be the most affected mutants, are also predicted by ABS-Scan to be essential, with the highest predicted ΔΔG score of 3.63 and 3.39 respectively. These positively charged residues (R807 and K646) are observed to interact with α and β phosphate groups of m 7 GTP. The π -cation stacking observed between W766 and the m 7 G was also predicted to be crucial (ΔΔG score of 2.66) and correspondingly no methylated products were detected for this mutant through methyl-transferase assay. The details of the case-studies described above along with the results of the analysis can be accessed on the example section of the web-tool -http://proline.biochem.iisc.ernet.in/abscan/examples.

Implementation
The web-server was implemented using hypertext preprocessor (PHP). Autodock, Modeller and Pymol libraries have been used for modeling the mutation and evaluating the energetics. Integration of these back-end libraries for presentation as a functional and intuitive user interface is accomplished using Shell, Python, Java, HTML and PHP scripts. The web-server is platform independent and will run on any machine having internet access with browser installed. For the advanced users, a command-line interface in the form of a single python script can be accessed from github repository (https://github.com/praveeniisc/ABS-Scan). The script has been tested on Intel 2.83 GHz quad-core system running 32 bit linux OS(Ubuntu 12.04) with Modeller 10 , MGL AutodockTools 30 & Pymol (http://pymol.org) installed. For the web-server d3.js library has been used for displaying the plots. Jmol Applet has been used to visualize the protein-ligand interaction.

Input
The input required for the server is the structure of a protein-ligand complex in PDB format. Users can either provide the four-letter PDBID or upload the PDB structure file of the complex. An option is provided to define the cut-off distance and select the ligand to obtain binding site residues which would be mutated to alanine for evaluating the interaction energetics. A default distance cut-off of 4.5 Å is set to select all the residues whose atoms lie within this distance from any ligand atom. In some the cases, metal ions 31 and water molecules are observed to play a crucial role in stabilizing the interactions 32 . A major problem involved in incorporating the ligand metal ion in ABS-Scan worflow is fixing the charge parameter as metal atoms can have different ionic states (Ex. Fe 2+ , Fe 3+ etc.) which is important for evaluating energetics. Enumerating all important structural water molecules involved in the ligand interaction is also highly dependent on the resolution of the crystal structure. Hence, an advanced option is provided to the user for uploading the PDBQT format of the ligand, to account for cases where the ligand contains unusual atom types, metal ions or uses bridge-water molecules for interaction. For practical purposes, the bridge water molecules can be considered to be the part of ligand and these can be incorporated into the pdbqt file of the ligand. As an example, ABS-Scan analysis was carried out on protein lysine methyltransferase (PDB: 3S7B) complexed with S-adenosyl methionine 33 through four bridge water molecules. These four bridge-water molecules can be incorporated into the ligand pdbqt file and uploaded with the help of an advanced option provided on the server. The protocol correctly identified GLU135 and ASN182 as significant contributors to ligand binding through formation of water bridges. The output can be accessed through the example section of the web-server.

Output
All the results produced by ABScan can be visualized interactively on the web-server. Jmol Applet is used to visualize the contribution of residues towards ligand interaction (Figure 3).  d3.js library has been utilized to plot the predicted ΔΔG values and subcomponents of the energetic scores reported by Autodock4 (Figure 4). An option is provided to download publication quality images in SVG/PDF/PNG formats. Twitter bootstrap java library is used for framework development on the webserver. An option is also provided to download the raw files containing individual mutants in PDB format, ΔΔG scores in the raw CSV format along with autodock energy scores.

Conclusions
ABS-Scan webserver can provide valuable insights on molecular recognition involving protein-ligand interactions. Experimentally determined protein-ligand structures can be studied to understand individual residue contributions towards ligand binding. Modeled complexes can also be submitted to infer the feasibility of the interaction. We believe that ABS-Scan would add one more dimension to the analysis of binding sites in proteins, comparison of various ligand interactions and be of importance to researchers performing ASM studies.

Competing interests
No competing interests were disclosed.

Grant information
The authors(s) declare that no special grants were sanctioned for this project. PA was supported by Bristol-Myers Squibb fellowship while carrying out this work. No more comments. This version of the manuscript has been improved on several lines and the reply by the authors are acceptable. The paper is approved.

I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
B. Offmann is founder and hold shares in the private company PEACCEL Inc Competing Interests: (USA) and affiliated PEACCEL SAS (France). It's business purpose is to propose predictive services for protein engineering. In their manuscript entitled " ABS-Scan: alanine scanning mutagenesis for binding site residues In silico in protein-ligand complex", the authors have discussed the development of an workflow for the in silico prediction of important residues for ligand recognition in a given protein-ligand complex. In our opinion, the ABS-Scan tool has been designed and validated satisfactorily. The online web-server is intuitive, fast and easy to use. In the present version of the paper and the software tool, the authors have incorporated the recommendations provided by the previous referees, many of which in our opinion are relevant. We have the following suggestions.
The authors state that 288 of the total 343 protein-ligand complexes from the CSAR dataset, 135 of the 195 protein-ligand complexes from the PDBbind dataset and 54 of the 79 experimental datasets could be processes by ABS-Scan. Although, the authors mention in their responses to referee comments of version 1 of the manuscript, that ABS-Scan rejected some protein-ligand complexes due to "unusual atom types or missing protein/ligand atoms or unusual convention for ligand atoms", it would probably be helpful for users to determine the kind of protein-ligand complexes that are suitable/unsuitable for prediction using ABS-Scan if authors could discuss this aspect in a sufficiently detailed manner. Was there anything in particular or common in the rejected complexes from the three datasets used for validation? It would be great if they could possibly 3.
complexes from the three datasets used for validation? It would be great if they could possibly provide the list of the rejected complexes and the reason for rejection, similar to what they have done for the protein-ligand complexes which were used for validation of the ABS-Scan tool.
It is usually seen that short peptide ligands are present as a separate chain in the protein structure files. Particularly for these cases, there is no option in the web-interface of ABS-Scan to enter the other chain as ligand. Can you please elaborate how your software tool handles such protein-ligand complexes?
While providing the ΔΔG values for the mutated residues, we think it would be a useful if the ABS-Scan server also provides some indication of evolutionary conservation of the residue. This would allow the users to translate ABS-Scan results from already known or docked protein-ligand complexes to other homologs of the same protein family.
We have read this submission. We believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
No competing interests were disclosed. The authors have now incorporated all the suggestions I made in my 1st review. In fact the revised web application is able to include the solvent molecules. I think that the revised version is now ready to be indexed in PubMed.

I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
No competing interests were disclosed. This paper reports an attempt to develop an original tool that simulates alanine scanning mutagenesis to probe residues involved in the process of ligand recognition in proteins.

7.
More precisely, the work describes the development of a work flow that implements known methodologies for homology modeling of alanine single-point mutants of a protein and for molecular docking. Even though, this can be viewed as a methodological paper.
We have some serious concerns regarding this work.
The authors claim that they performed a "validation" of their tool on a dataset that comprises "79 entries" carefully selected from PDB (also cf point 2 below). Their evaluation is based on finding a correlation between docking scores with experimentally determined binding affinities. In their paper, the authors provide evidence of this validation by providing results of "Experimental correlation" for only one example (Figure 2) which relates to binding of rat 3-alpha-hydroxysteroid dehydrogenase (PDB: 1AFS) to testosterone and progesterone. Since they must have it, clearly, the authors should provide their evaluation of this correlation on all "79 entries". I would expect at least that they provide a new Figure 2 that comprises all data points coming from these "79 entries" to sustain their claim and help readers to evaluate the global performance of their tool. They attempted to provide few additional results on their website ( ). It is more confusing because the results http://proline.biochem.iisc.ernet.in/abscan/validation provided for the vitamin D receptor (PDB: 1IE9) is not about binding affinities but "translational activity". I'm here suggesting that detailed data for all mutations taken from all "79" entries are provided to the community in the form of a table or downloadable flat or excel-type file.
The amount of independent PDB entries in their dataset is not 79. In fact, in some of PDB entries, multiple ligands were observed. Surprisingly, they consider these as separate entries. So their data is redundant with respect to the proteins.
When generating homology models for protein variants, even if these are single point mutants, assessment of the quality of the models is a critical step. Selecting best models may not be that trivial. The authors need to clarify how they implement in their work flow the assessment of the quality of the models and consequently, what criteria they used for selecting the best models (and how many of them) that will be subjected to molecular docking.
Regarding the alanine scanning procedure, there are issues regarding the treatment of alanine and proline. They should both be discarded from the alanine scanning protocol: alanine is already present in the structure while proline is not suitable for mutations because of the major protein backbone rearrangements that should be performed to properly mutate it.
For such a tool, it is at stake to evaluate its performance using different homology modeling and molecular docking methods. The rational behind the choice of Modeler over other methods like Rosetta is not indicated. Likewise, the reason why Autodock and not Dock etc or even Autodock Vina is not explained.
The efficiency of molecular docking using AutoDock is also dependent on the docking protocol used. In such an automated "screen", care should be taken about the preparation of the receptor, the ligand and the grid. For example, are the ligands kept flexible ? In the manuscript, there are no indications about how the authors dealt with this central issue. The authors are encouraged to describe precisely and discuss their docking protocol.
According to the AutoDock 4.0 article, the median error range in energy estimation for any protein-ligand evaluation is 1.5-2.0 kcal/mol. In their study, the ∆∆G differences for ligand binding between mutant and native forms of the proteins are far below 2.0 kcal/mol. Thus, it is difficult to 8. 9.

1.
between mutant and native forms of the proteins are far below 2.0 kcal/mol. Thus, it is difficult to rank the mutants. Also, how the authors chose the 0.5 kcal/mol ∆∆G threshold is not clear. There is no discussion how this threshold compares with the intrinsic limits in precision of AutoDock.
The definition of ligand in the tool is problematic. In case of oligo or polysaccharides, the carbohydrate residues are erroneously considered separately. For example, in the 1J84 entry from PDB, the carbohydrate-binding module (CBM) is bound to cellotretraose, a 1,4-β-D-glucan composed of four ß-D-glucose residues linked by ß-1,4 osidic linkages. When this PDB entry is submitted to ABS-Scan, it erroneously splits the oligomer into smaller entities that correspond to the chemical IDs of its constituents (BGC 401, 402, 403, 404). This is a serious flaw in their software.
While it is common to see people to reuse available codes, the authors do not properly cite the source of their codes they posted on Github and used for providing a complete service to the community: at least 80% of the "alanine_scanning.py" code comes from either MODELLER examples ( ) or AutoDock code ( http://salilab.org/MODELLER/wiki/Mutate_model http://mgltools.scripps.edu/api/AutoDockTools/AutoDockTools.Utilities24.compute_AutoDock41_score-pysrc.htm ).
We have read this submission. We believe that we have an appropriate level of expertise to state that we do not consider it to be of an acceptable scientific standard, for reasons outlined above. We thank the reviewers for their time and effort. There were some useful suggestions, which we have incorporated but do not agree with all the points raised. A detailed point-by-point response is given below.

B. Offmann is founder and hold shares in the private company PEACCEL Inc
The authors claim that they performed a "validation" of their tool on a dataset that comprises "79 entries" carefully selected from PDB (also cf point 2 below). Their evaluation is based on finding a correlation between docking scores with experimentally determined binding affinities. In their paper, the authors provide evidence of this validation by providing results of "Experimental correlation" for only one example (Figure 2)which relates to binding of rat 3-alpha-hydroxysteroid dehydrogenase (PDB: 1AFS) to testosterone and progesterone. Since they must have it, clearly, the authors should provide their evaluation of this correlation on all "79 entries". I would expect at least that they provide a new Figure 2 that comprises all data points coming from these "79 entries" to sustain their claim and help readers to evaluate the global performance of their tool. A suitable dataset for validation would be one that reports binding affinities for both wild-type and mutant proteins with same ligand, performed in a uniform experimental 2.
A suitable dataset for validation would be one that reports binding affinities for both wild-type and mutant proteins with same ligand, performed in a uniform experimental environment, for large number of proteins. Although such a dataset exists for protein-protein alanine scanning mutagenesis for eg., Rosetta alanine scanning), there are none reported for protein-ligand interactions.
Since no such dataset was available to us, we systematically extracted PDB entries of ligand bound complexes and the corresponding binding sites in them that contained information about experimental alanine-scanning mutagenesis. However, the manner in which the effects of mutagenesis are reported in these differ significantly. While differences in ligand binding strengths (K or K values) are reported for some, changes in catalytic efficiencies are reported for some others. For some others, reporter assays are given which indicate capability of the downstream process more qualitatively. Hence it is difficult to perform a systematic comparison from these with the ∆∆G values calculated from our tool in this study. Nevertheless from this dataset, some examples were hand-picked, corresponding primary literature were read and known residue importances obtained, which were then compared with the predicted ones from our tool. In any case, ABS-Scan analysis has been successfully performed on 54 (the remaining 25 cases were not processed by default steps due to unusual atom types in proteins/ligands) complexes, which provide the extent of contribution to ligand binding of each residue in each site, in the form of a ranked list of residue-wide ∆∆G values. All this information has been made available to the community, through our webserver -( ). http://proline.biochem.iisc.ernet.in/abscan/validation Besides this, given the lack of systematic reports of experimental data, validation can only be performed to understand the significance of the ∆∆G scores calculated from our tool. For this, we have taken two large datasets (a) protein complexes with native ligands versus decoy ligands from and (b) list of well curated with precise binding site definitions for known protein-ligand complexes used for benchmarking docking algorithms. From both of these, ∆∆G scores are in the range of 0.5 was significant. a) A fresh dataset derived from PDB-Bind core dataset consisting of 195 protein-ligand complexes, which has been developed for the purposes of benchmarking docking algorithms (Kim ., 2004, Huang , 2008. Of the 195, 135 could be processed et al et al. successfully for preparation of the protein-ligand complexes for analysis. (The others that could not be included, are likely to contain either unusual atom names or types or missing protein/ligand atoms or unusual convention for ligand atoms and hence could not be processed).
b) A dataset of 343 protein-ligand complexes, each with a native and a decoy ligand. 288 structures out of 343 could be successfully evaluated. (Here again the others were omitted due to difficulties in automatic protein/ligand preparation).
In the process, since ABSscan has been run for all these complexes, information about key contributing residues is generated for each of them. This has been made available through the webserver. Residue-wise contribution is obtained and presented in a ranked order for each complex, thus providing a ready resource of important residues for ligand binding.
The results of these can be accessed from the validation section on the webserverhttp://proline.biochem.iisc.ernet.in/abscan/validation The amount of independent PDB entries in their dataset is not 79. In fact, in some of PDB a d 2. 3.

5.
The amount of independent PDB entries in their dataset is not 79. In fact, in some of PDB entries, multiple ligands were observed. Surprisingly, they consider these as separate entries. So their data is redundant with respect to the proteins.
These reflect independent binding sites (with bound ligands). As can be expected, some proteins have multiple sites with different ligands, making it necessary to consider them separately. Hence 79 sites are unique and come from 46 PDB entries. In the original manuscript, the dataset of 79 was never meant to reflect ' . In any case unique PDB entries' we refer to them now as 'binding site entities' to reflect this more clearly.
When generating homology models for protein variants, even if these are single point mutants, assessment of the quality of the models is a critical step. Selecting best models may not be that trivial. The authors need to clarify how they implement in their work flow the assessment of the quality of the models and consequently, what criteria they used for selecting the best models (and how many of them) that will be subjected to molecular docking.
Model quality has been considered as part of the modelling pipeline itself. Given the scale of the study, it is practical to generate one model for each mutant, but care is taken to ensure that it is optimal and free of errors in terms of bad contacts or atomic clashes. The optimization protocol used consists of 200 iterations of conjugate gradient, followed by molecular dynamic simulation for 4fs and simulated annealing with 200 iterations at different temperatures (This is the default protocol suggested in Model_mutate.py of Modeller -). The initial restraints for generation of the http://salilab.org/modeller/wiki/Mutate%20model model is derived from the wild-type structure itself. Assumptions necessary for modelling point mutations introduced through alanine-scanning mutagenesis protocol at the binding sites are that (a) they are unlikely to change the overall structure of the protein drastically and (b) the ligand moiety roughly retains the same conformation in comparison with the wild-type complex to interact with the mutated structure.
Since modelling protocols have been well established for a long time now, we did not see the need for adding this information explicitly in the original MS. In any case, based on the reviewers suggestion, this information has been added to the revised version. Normalized DOPE scores are reported for both the native and mutant structures. DOPE refers to 'Discrete Optimized Protein Energy' and is a statistical potential which checks for the feasibility of the observed interactions. Protein structures with lower DOPE scores (typically in negative range -1.5 to -2.5 for experimentally solved structures) can be considered to be of good quality ( ).

Shen and Sali 2006
., Regarding the alanine scanning procedure, there are issues regarding the treatment of alanine and proline. They should both be discarded from the alanine scanning protocol: alanine is already present in the structure while proline is not suitable for mutations because of the major protein backbone rearrangements that should be performed to properly mutate it.
This required addition of simple screens to filter out these residues from consideration for alanine scanning, which has been done. Changes have been made to both the source code and the web-tool now. Glycine mutations are also filtered out.
For such a tool, it is at stake to evaluate its performance using different homology modeling 5.

7.
For such a tool, it is at stake to evaluate its performance using different homology modeling and molecular docking methods. The rational behind the choice of Modeler over other methods like Rosetta is not indicated. Likewise, the reason why Autodock and not Dock etc or even Autodock Vina is not explained.
The goal of our study is not to develop a modelling algorithm or a new parameter for building models. The most widely used tool for homology modelling -Modeller, which we have currently included in the workflow, has about 1500 citations. Currently there are more than 50 tools for homology modeling -( ) and roughly the http://en.wikipedia.org/wiki/List_of_protein_structure_prediction_software same number of tools for protein-ligand docking ( ). The precise reason for choosing http://en.wikipedia.org/wiki/Docking_%28molecular%29 'Modeller' or 'Autodock' is perhaps because of our own experience in using these tools along with availability of extensive documentation, tutorials and ease of implementation. Moreover, both these libraries had python bindings available and hence could be merged into a single script using python. In future, we plan to develop a pymol plugin for the same. A simple bash script for processing the protein-ligand complex to determine the interaction energy using ROSETTA force fields has also been included in the github repository. This again, is only for the advanced users and we might incorporate it in the future versions of the pipeline.
The efficiency of molecular docking using AutoDock is also dependent on the docking protocol used. In such an automated "screen", care should be taken about the preparation of the receptor, the ligand and the grid. We would like to clarify here that there is no docking performed in the whole exercise. We only score the complex in the given conformation using the force fields. By default, through prepare_receptor4.py and prepare_ligand4.py Gasteiger charges and polar hydrogens are added while evaluating the interaction energy. This has been mentioned in the manuscript: "Each mutated structure, will then be scored by using Autodock 4.1 force field, to calculate the energetics of a protein-ligand complex. The contribution from the residue is then determined by calculating the difference in interaction score of the mutant and the wild-type protein (∆∆G value)." According to the AutoDock 4.0 article, the median error range in energy estimation for any protein-ligand evaluation is 1.5-2.0 kcal/mol. In their study, the ∆∆G differences for ligand binding between mutant and native forms of the proteins are far below 2.0 kcal/mol. Thus, it is difficult to rank the mutants. Also, how the authors chose the 0.5 kcal/mol ∆∆G threshold is not clear. There is no discussion how this threshold compares with the intrinsic limits in precision of AutoDock.
The median error range of the energy estimation reported in AutoDock 4.0 article is for the total ∆G score between the experimental and predicted values, whereas in this case it is for individual residue contributions. The distribution of the ∆∆G values obtained for the decoy and cognate ligands from the CSAR dataset ( ) was used to define http://www.csardock.org/ a cut-off of 0.5. This has also been validated on PDBbind core dataset ( ). Figures 3A and 3B have been added along with explanations http://www.pdbbind-cn.org/ 8.
We believe that intrinsic limits on precision of Autodock scoring would not be a major concern as both the wild type and the mutant are evaluated using the same scoring scheme and the cut-off has been chosen on basis of native protein-ligand complexes in CSAR and PDBbind datasets.
The definition of ligand in the tool is problematic. In case of oligo or polysaccharides, the carbohydrate residues are erroneously considered separately. For example, in the 1J84 entry from PDB, the carbohydrate-binding module (CBM) is bound to cellotretraose, a 1,4-β-D-glucan composed of four ß-D-glucose residues linked by ß-1,4 osidic linkages. When this PDB entry is submitted to ABS-Scan, it erroneously splits the oligomer into smaller entities that correspond to the chemical IDs of its constituents (BGC 401,402,403,404). This is a serious flaw in their software. This is not really a 'problem' and is an established work-around to avoid long computation and hence long waiting time for the user. All this does is to split peptides or oligosaccharides into individual moieties (typically for a peptide, each amino acid is considered as a moiety and for an oligosaccharide, each monosaccharide is considered as a moiety), as per the convention currently followed by PDB. How can this be a 'serious flaw'? It does not, in any manner, influence the results. Many other tools for protein-ligand interaction analysis such as LPC (Ligand-protein contacts, Ligplot+, Ligplus also track ligands through such etc.) residue identifiers.
However, an advanced option has now been added to provide the range of the ligand residue numbers to be considered as a single moiety during the entire protocol. For example, now a residue range 401-404 can be provided for 1J84 instead of a single residue number to consider the whole oligocomplex as single ligand. The script has also been accordingly modified in github.
While it is common to see people to reuse available codes, the authors do not properly cite the source of their codes they posted on Github and used for providing a complete service to the community: at least 80% of the "alanine_scanning.py" code comes from either MODELLER examples (http://salilab.org/MODELLER/wiki/Mutate_model) or AutoDock code (http://mgltools.scripps.edu/api/AutoDockTools/AutoDockTools.Utilities24.compute_AutoDock41_score-pysr We have indeed already cited all the tools used in the manuscript to which source codes are linked. In any case, these references are now highlighted in the source code also. Both Autodock and Modeller are released under the GNU public license, making their source code freely usable to all interested parties. Moreover, these are the primary source and codes are not extracted from any third-party tools. The purpose of putting it on Github is to be completely open about the details of the protocol and make our work fully accessible to anyone interested.
We would again like to remind the reviewer here that source-code is used only by an advanced user. The reviewer may be aware of the time and effort involved in producing a web-application interface that is embedded with visualization features. This has been done Bridge water molecules do play an important role in the protein-ligand interactions. One has to take into account the resolution of the protein structure to determine the confidence of the placed water molecules. Hence an advanced option is provided wherein these water molecules when present at the site can be considered to be a part of the corresponding ligand moiety. The user can upload his/her own pdbqt file for the ligand with the appropriate water molecules added to it. An example of protein lysine methyltransferases complexed with S-adenosyl methionine has been described in the manuscript. The corresponding pqbqt file of the ligand can be downloaded from the example section. The results of these can also be accessed from example section of the web-server.
No competing interests were disclosed. Competing Interests: