Keywords
protein fragments, protein design, fragments database, structural query, triangular inequality,
This article is included in the Japan Institutional Gateway gateway.
protein fragments, protein design, fragments database, structural query, triangular inequality,
Nowadays, a large number of protein structures are available (122,761 as of July 2017 at RCSB) and protein fragments are frequently used in structural bioinformatics. Protein structure prediction methods such as Rosetta1, QUARK2 and EdaFold3,4 use protein fragments as building blocks. Protein fragments are also used in crystallographic phasing5–7 and model rebuilding8. The quality of protein models can be improved by combining protein fragments with molecular dynamics9. Other applications include the curation of unresolved loops in crystal structures10,11, grafting of loop sequences on protein scaffolds and other protein design algorithms12,13.
Some fragment pickers14,15 and protein fragment databases16,17 are currently available. Of particular interest is the Super method15 that uses the lower bound of RMSD18 to screen the whole fragment space. However, our research on protein design and refinement of protein decoys for crystallographic phasing required specific options and therefore a new fragment picker.
Input: d b: fragment set to query
Input: r f : reference fragment set
Input: q: query fragment
Input: dq: RMSD threshold
Output: mf : matching fragment set
mf ← d b
{fuzzy query: prune the fragment space}
for rj in r f do
d ← distance(q, rj)
dinf ← d – dq
dsup ← d + dq
mf ← {∀fi ∈ mf | distance(fi, rj) ∈ [dinf, dsup]}
{distance (fi, rj) comes from the database index}
end for
{exact query: refine the result of pruning}
mf ← {∀fi ∈ mf | distance(fi, q) ≤ dq}
return mf
Fragger exploits the triangular inequality of RMSD19 to prune the fragment space (Figure 1 and Algorithm 1). RMSDs are computed efficiently via the QCP method20. Fragger is written in OCaml21, except backbone RMSD computations which are performed with a new version of the C++ ranker tool from Durandal22. Computations are parallelized on multi-core computers via the Parmap library23.
Fragger allows a database to be queried with a fragment and an RMSD threshold. Matching fragments are ranked by RMSD to the query. Fragger’s ranker tool allows to compute the backbone RMSD of a single fragment versus many. Fragger can deal with residue gaps or a selection of residues from the query, create a fragment database from a set of Protein Data Bank (PDB) files, work with all fragment lengths and extract specific or randomly-chosen fragments from a database.
q is at distance d1 (resp. d2) from reference fragment r1 (resp. r2). Only fragments which are both within d1 ± dq of r1 and d2 ± dq of r2 will undergo an RMSD calculation. Middle: 13 residues loops that can connect residue ALA 98 to GLY 110 in chain A of PDB 1MEL. The query loop is shown in red. Only its first and last three residues were used to rank the retrieved fragments. Right: Backbone of PDB 1BKR covered with ten residue fragments from non-homologous proteins retrieved with Fragger.
Compared to existing fragment pickers, some of the specific functionalities required by users include:
Outputing only the N best or N first found fragments matching a query (this can make a query terminate faster)
Constraining the amino acid sequences allowed to match a query (for loop grafting)
Reading and writing PDB fragments from/to a binary format (faster than reading/writing regular PDB files)
Preventing a list of PDB codes from matching a query
Automatically varying the RMSD threshold to the query until a given number of fragments is reached.
Tests were performed on one core of a 2.4GHz Intel Xeon workstation with 12GB of RAM running Ubuntu Linux 12.04. The PDB dataset is composed of all proteins determined by X-ray, without highly similar sequences (30% sequence identity cutoff) in order to create a challenging set of fragments to benchmark a protein design algorithm. It contains 13,554 PDBs. PDBs were extracted from the protein data bank website using the advanced search tab and ticking the "Retrieve only representatives at 30% sequence identity" box. Querying with a three (resp. nine) residues fragment takes at least 6.75s (resp. 5.2s). Query time varies with the query fragment, reference fragments and RMSD threshold to the query. Reference fragments can be chosen randomly. Pruning of the search space is better if there are at least three reference fragments and they are far from each other. For one time tasks, it is not necessary to create RMSD indices and actually query a database, as fragments extraction and RMSD computations are fast enough. For example, it takes only 15s to generate all (41,200) fragments of 13 residues starting with alanine and ending with glycine (middle of Figure 1). Ranking them to the query takes 1.5s. When working on PDB files, the ranker tool included with Fragger can compute 66,580 (resp. 23,784) RMSD/s on the backbone of three (resp. nine) residue fragments. These numbers become 304,149 (resp. 138,744) RMSD/s when working on Fragger’s binary-encoded PDBs. In the future, it might be possible to improve the performance of Fragger by incorporating a faster score than RMSD, such as BCscore24.
Fragger can be useful for protein design, loop grafting and retrieval of candidates to rebuild low-confidence regions of protein models.
Fragger can be downloaded from: https://github.com/UnixJunkie/fragger or http://www.riken.jp/zhangiru/software/fragger.tgz.
Archived source code at the time of publication: https://zenodo.org/record/87732025
Software license: LGPL.
This work was supported by the “Initiative Research Unit” program from RIKEN, Japan, the Japanese Society for the Promotion of Science (JSPS) and computing resources on the RIKEN Integrated Cluster of Clusters (RICC). FB is a JSPS international fellow.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the rationale for developing the new software tool clearly explained?
Partly
Is the description of the software tool technically sound?
Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?
Partly
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?
Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?
Partly
Competing Interests: No competing interests were disclosed.
Is the rationale for developing the new software tool clearly explained?
Partly
Is the description of the software tool technically sound?
Partly
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?
Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?
Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?
Partly
Competing Interests: No competing interests were disclosed.
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | ||
---|---|---|
1 | 2 | |
Version 2 (revision) 10 Apr 18 |
||
Version 1 22 Sep 17 |
read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)