Introduction
Primitive life presumably had minimal gene content and a minuscule arsenal of enzymes at its disposal. Unfettered from selection pressures by gene duplication, a few select enzymes gained new advantageous functions1–3. Nonetheless, the vestiges of secondary activities under neutral drift4 possess the potential to reemerge under changing selection pressures5,6. This ability of an enzyme to catalyze diverse activities using the same active site, termed as promiscuity, is the cornerstone of the evolution of complex organisms from pristine life7. In the human context, compound promiscuity plays a major role in drug discovery, and in the therapeutic efficacy of drugs8. Databases are a crucial medium of cataloging known aspects of drug promiscuity9,10.
Ever since Jensen emphasized the role of promiscuity or ‘substrate ambiguity’ in evolution through ‘fortuitous error and gain of multistep pathways’, promiscuity in proteins has been the subject of intense and detailed research7. It was demonstrated in 1976 that replacing the zinc metal ion by copper in Carboxypeptidase A introduced oxidase catalysis properties11. Dioxygenases promiscuously hydrolyse esters12, while the enolase superfamily is also known to catalyze numerous catalytic reactions13,14. Alkaline phosphatases (AP), one of the key proteins in our research, are one of the most widely researched promiscuous enzymes15. APs are known to have sulfate monoesterase, phosphate diesterase, and phosphonate monoesterase activities16–18. A phosphite-dependent hydrogenase activity was also found in Escherichia coli AP (ECAP), but was absent in APs from other organisms19. Interestingly, proteins from the AP superfamily show cross activity - Pseudomonas aeruginosa arylsulfatase (PAS) which has the primary activity of hydrolyzing sulfate monoesters also catalyzes the hydrolysis of phosphate monoesters20,21.
The evolution of species through sequence mutations leaves a trail via the conservation of fragments or repeats that have been honed to achieve specific functions with remarkable efficiency22–24. The sequence-to-structure-to-function paradigm facilitates the functional characterization of new proteins by applying a ‘guilt by association’ logic, and has essentially revolutionized the field by its easy to use model25. However, occasionally nature achieves the same solution to an enzymatic problem through a completely different sequence, arriving at the same spatial conformation required for catalysis. For example, the catalytic Ser-His-Asp triad has virtually the same geometry in the major families of serine proteases (chymotrypsin and subtilisin), which have no sequence or structural homology26 - a classical example of convergent evolution27,28. Such convergently evolved proteins, and those redesigned from chiseled scaffolds through exon shuffling, remain beyond the scope of sequence analysis methods. As such structure-based methods have evolved to detect such relationships29,30. The choice of methods for binding site comparisons and methods for binding site detection as well as function prediction has been recently reviewed in detail31. Notably, most of these methods are based on structural properties of the binding or the active site. We have demonstrated that such a structural conservation leading to the same function necessitates the conservation of electrostatic properties as well (CLASP - www.sanchak.com/clasp)32. The ability of finite difference methods to quickly obtain consistent electrostatic properties from peptide structures provides an invaluable tool for investigating other innate properties of protein structures33. Furthermore, using a database of known active sites in proteins (http://www.ebi.ac.uk/thornton-srv/databases/CSA/34), we have proposed a methodology to quantify promiscuity in a wide range of proteins35.
In an endeavor to establish the validity of the computational predictions made by CLASP, we have undertaken several in vitro initiatives using different enzymes. The results of these experiments have provided several insights regarding promiscuous functions in proteins. Foremost amongst them is corroboration of the intuitive notion that inhibition is inherently simpler to predict than true catalysis. For example, we detected the presence of the serine protease (SPASE) catalytic triad motif (Ser195, His57, Asp102) in alkaline phosphatases (AP) from various organisms using the spatial and electrostatic congruence, and validated this by inhibition of the native phosphatase activity using inhibitors (AEBSF/PMSF)32, known to be active on many serine proteases by reaction with the nucleophilic serine36. However, true SPASE activity was limited to shrimp AP. Recently, the crown domain in the E. coli expressed rat intestinal AP protein was shown to be prone to protease cleavage, which the authors have ascribed to self-cleavage37. Another recent review nicely summarizes the various computational approaches applied to the AP superfamily in order to gain insights into the promiscuous functions observed in proteins belonging to the superfamily15. The therapeutic potential of AP inhibitors has also seen increased interest from medicinal researchers38.
In a similar experiment, we detected a SPASE motif in a phosphoinositide-specific phospholipase C (PI-PLC) from Bacillus cereus using CLASP39. Once again, although we easily established the inhibition of the native activity of PI-PLC using serine protease inhibitors, we struggled to establish proteolysis based on known protease substrates. Fortuitously, we observed protease activity of PI-PLC on UVI31+, a protein under investigation in our group for different reasons40. We thus concluded that one should exert caution before ruling out protease activity in an enzyme since theoretically proteases have a large number of possible substrates due to the possible variation in residues flanking the sissile bond, and the corresponding folds that harbor a recognition site for a particular protease39. Thus, it is possible that we have not found the ideal proteolytic substrate for APs32. We also tested the proteolytic functions and inhibition using protease inhibitors of the non-toxic B. cereus phosphatidylcholine-specific phospholipase C (PC-PLC) and the closely related highly toxic Clostridium perfringens α-toxin (CPA) (which possesses an additional C-terminal domain demonstrated to be responsible for its sphingomyelinase, hemolytic, and lethal activities41,42). CPA and PC-PLC activities on phospholipids were unaffected by the addition of serine protease inhibitors in concurrence with the CLASP analysis which fails to detect a SPASE scaffold in these proteins39. While CPA and PC-PLC did have a metallo-protease motif based on CLASP analysis, and both showed protease activity in vitro, the observed proteolytic activity can be attributed as an artifact of a metallo-protease contamination which is difficult to remove in spite of the purification steps. Inhibition of CPA activity using a metallo-protease inhibitor was tried out, but failed to show any results. Such lack of inhibition by a single compound is not sufficient ground to rule out the existence of a metallo-protease scaffold.
Based on predictions from CLASP, we also demonstrated the inhibition of the native phosphatase activity of a cold active alkaline phosphatase from Vibrio strain G15-21 AP (VAP)43 by a specific β-lactam compound (only imipenem, and not by ertapenem, meropenem, ampicillin or penicillin G)44. CLASP analysis detected a spatial and electrostatic congruence of the active site of a Class B2 CphA metallo-β-lactamase (MBL) from Aeromonas hydrophila45 to the active site of VAP. Several β-lactam compounds failed to inhibit E. coli or shrimp AP, as was expected by the lower congruence indicated by CLASP as compared to VAP. While all APs contain three metal ion binding sites essential for catalysis43, MBLs have either one or two metal binding sites46. It would be interesting to imagine the existence of a protein (possibly evolved from VAP) that is an MBL and requires three metal binding sites.
Another desired aspect in the search of promiscuous motifs is the ability to search for partial scaffolds, as has been implemented in the DECAAF methodology47,48. The search for an elastase-like motif in a plant protein47 led us to the pathogenesis-related protein P14a49. Although the complete motif was missing - stated previously as, ‘While Ser195, His57, and Gly193 from the input motif have a highly matching scaffold in P14a, the spatial position of the elastase Asp102 is close to Asn35 and Ser39 in P14a when the proteins are superimposed based on the matching scaffolds48’ - the structural similarity of the P14a protein to a snake venom protein with a known elastase function50 suggested strongly the possibility of pre-existing elastase functionality, or indicated a fair chance of endowing elastase activity through directed evolution techniques.
Another fascinating aspect of enzymes, although strictly not defined as promiscuity, is their ability to catalyze the reaction of a range of similar substrates of the same class51. We have hypothesized that duplicate residues, each of which results in slightly modified replicas of the active site scaffold, are responsible for the broad substrate specificity of proteins52,53.
It might appear that the presence of a motif like a SPASE catalytic triad in a protein structure is trivial, and one could expect any randomly chosen protein with a large number of residues to have such a structural motif. However, the absence of a spatially congruent SPASE catalytic triad in a reasonably large tyrosine phosphatase CD45 (PDBid: 1YGR, sequence length 610) highlights the fact that the SPASE motif is not present ubiquitously (Table 1). Even the presence of a spatially congruent motif, as in the human translation initiation factor (PDBid: 2E9H), does not imply potential congruence (Table 1).
Table 1. Non-triviality of the potential and spatial congruence of the active site residues in proteins from the serine protease motif.
The serine protease catalytic triad has been taken from a non-psychrophilic trypsin from a cold-adapted fish species (PDBid: 1A0J). The reasonably large tyrosine phosphatase CD45 (PDBid: 1YGR, sequence length 610) does not contain the spatially congruent catalytic triad. Although, a motif spatially congruent to the catalytic triad is present in the human translation initiation factor (PDBid: 2E9H), it lacks electrostatic potential congruence. D = Pairwise distance in Å. PD = Pairwise potential difference. SLen = sequence length. APBS writes out the electrostatic potential in dimensionless units of kT/e where k is Boltzmann’s constant, T is the temperature in K and e is the charge of an electron.
The biggest challenge in detecting promiscuous motifs is to be able to endow the function using rational steps54–56. However, the non-additive nature of active site residues makes this a non-trivial task even when a very close partial match exists57. For example in a catalytic site consisting of n residues, the existence of a congruent n−1 motif does not imply that it is easy or even possible to add another residue in the structure and obtain the n residue motif. This complexity is best exemplified in the failure to induce β-lactamase activity in a penicillin-binding protein (PBP-5) from E. coli58,59 by generating the L153E mutant of this protein, as proposed by our previous analysis47 (unpublished results). Although many directed evolution experiments have tried to enhance deacylation in PBPs60, 61, the catalytic step that β-lactamases use to hydrolyze β-lactams62, very few have been successful. Even the successful attempts have reported low gains in β-lactamase activity (110-fold in60 and 90-fold in61).
In spite of the inherent difficultly in rationally designing proteins, we believe that the fast maturing field of protein structure prediction might soon allow us to quickly iterate over in silico mutations63. A method like CLASP may be used to discriminate the predicted structures in order to select the mutations that achieve the desired congruence with a reference scaffold - setting up the flow to mimic the natural ‘evolutionary walk’ in vitro, and accelerate this ‘random walk’ into a ‘resolute sprint’.
Comments on this article Comments (0)