ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Software Tool Article
Revised

Fragger: a protein fragment picker for structural queries

[version 2; peer review: 2 approved]
PUBLISHED 10 Apr 2018
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Japan Institutional Gateway gateway.

Abstract

Protein modeling and design activities often require querying the Protein Data Bank (PDB) with a structural fragment, possibly containing gaps. For some applications, it is preferable to work on a specific subset of the PDB or with unpublished structures. These requirements, along with specific user needs, motivated the creation of a new software to manage and
query 3D protein fragments. Fragger is a protein fragment picker that allows protein fragment databases to be created and queried. All fragment lengths are supported and any set of PDB files can be used to create a database. Fragger can efficiently search a fragment database with a query fragment and a distance threshold. Matching fragments are ranked by distance to the query. The query fragment can have structural gaps and the allowed amino acid sequences matching a query can be constrained via a regular expression of one-letter amino acid codes. Fragger also incorporates a tool to compute the backbone RMSD of one versus many fragments in high throughput. Fragger should be useful for protein design, loop grafting and related structural
bioinformatics tasks.

Keywords

protein fragments, protein design, fragments database, structural query, triangular inequality

Revised Amendments from Version 1

Thanks to the referees’ recommendations, we have improved our manuscript in this revised version. An extra paragraph was added in the introduction to describe the rationale underlying Fragger. Several citations to web servers and protein fragment databases were added to provide a broader overview of similar approaches. Some variables in the algorithm have been renamed to facilitate easy understanding. A new paragraph and one reference about choosing good reference fragments were added. Several changes were made to clarify the choice of reference fragments, and the impact of the effective cutoff values on search speed.

See the authors' detailed response to the review by Charlotte Deane and Saulo de Oliveira
See the authors' detailed response to the review by Pierre Tuffery

Introduction

Nowadays, a large number of protein structures are available (122,761 as of July 2017 at RCSB) and protein fragments are frequently used in structural bioinformatics. Protein structure prediction methods such as Rosetta1, QUARK2 and EdaFold3,4 use protein fragments as building blocks. Protein fragments are also used in crystallographic phasing57 and model rebuilding8. The quality of protein models can be improved by combining protein fragments with molecular dynamics9. Other applications include the curation of unresolved loops in crystal structures10,11, grafting of loop sequences on protein scaffolds and other protein design algorithms12,13.

When there are too many fragments to search from, an efficient strategy is necessary to reach sub-linear search times. This problem is well-known to the chemoinformatics community, which has developed several efficient strategies to screen large databases of small molecules. For example, geometric embedding and locality sensitive hashing14, kd-trees15, a tree data structure (called µ-tree) with a heuristic16, bounds of similarity scores for chemical fingerprints17 and a proximity filter based on the logical exclusive or operator18 have all been developed to this end.

Currently, several fragment pickers1922 and protein fragment databases2328 are available. Of particular interest is the Super method20 that uses the lower bound of RMSD29 to screen the whole fragment space. However, our research on protein design and refinement of protein decoys for crystallographic phasing required specific options and therefore a new fragment picker.

Methods

Implementation

Algorithm 1. Query with a fragment and an RMSD threshold. Comments are enclosed between braces

Input: D: fragment set to query

Input: R: reference fragment set

Input: q: query fragment

Input: dq: RMSD threshold

Output: M: matching fragment set

   MD

   {fuzzy query: prune the fragment space}

   for rj in R do

      ddistance(q, rj)

      dinf  ← ddq

      dsupd + dq

      {distance(fi, rj) comes from the database index}

      M  ← {∀fiM  | distance(fi, rj) ∈ [dinf, dsup]}

   end for

   {exact query: refine the result of pruning}

   M  ← {∀fiM  | distance(fi, q) ≤ dq}

   return M

Fragger exploits the triangular inequality of RMSD30 to prune the fragment space (Figure 1 and Algorithm 1). RMSDs are computed efficiently via the QCP method31. Fragger is written in OCaml32, except backbone RMSD computations which are performed with a new version of the C++ ranker tool from Durandal33. Computations are parallelized on multi-core computers via the Parmap library34.

efbb7483-d319-4b07-8b32-17b390eeebd6_figure1.gif

Figure 1. Left: pruning the fragment space for query distance dq and query fragment q.

q is at distance d1 (resp. d2) from reference fragment r1 (resp. r2). Only fragments which are both within d1 ± dq of r1 and d2 ± dq of r2 will undergo an RMSD calculation. Middle: 13 residues loops that can connect residue ALA 98 to GLY 110 in chain A of PDB 1MEL. The query loop is shown in red. Only its first and last three residues were used to rank the retrieved fragments. Right: Backbone of PDB 1BKR covered with ten residue fragments from non-homologous proteins retrieved with Fragger.

Fragger allows a database to be queried with a fragment and an RMSD threshold. Matching fragments are ranked by RMSD to the query. Fragger’s ranker tool allows to compute the backbone RMSD of a single fragment versus many. Fragger can deal with residue gaps or a selection of residues from the query, create a fragment database from a set of Protein Data Bank (PDB) files, work with all fragment lengths and extract specific or randomly-chosen fragments from a database.

Compared to existing fragment pickers, some of the specific functionalities required by users include:

  • Outputing only the N best or N first found fragments matching a query (this can make a query terminate faster)

  • Constraining the amino acid sequences allowed to match a query (for loop grafting; such filtering is applied after RMSD pruning of the fragment space)

  • Reading and writing PDB fragments from/to a binary format (faster than reading/writing regular PDB files)

  • Preventing a list of PDB codes from matching a query

  • Automatically varying the RMSD threshold to the query until a given number of fragments is reached.

Operation

Users need to install OPAM and the pdbset command from CCP4 in order to use Fragger.

Details on how to install Fragger and usage examples are provided in the README file of the released software.

Results and discussion

Tests were performed on one core of a 2.4GHz Intel Xeon workstation with 12GB of RAM running Ubuntu Linux 12.04. The PDB dataset is composed of all proteins determined by X-ray, without highly similar sequences (30% sequence identity cutoff) in order to create a challenging set of fragments to benchmark a protein design algorithm. It contains 13,554 PDBs. PDBs were extracted from the protein databank website using the advanced search tab and ticking the "Retrieve only representatives at 30% sequence identity" box. Querying with a three (resp. nine) residues fragment takes at least 6.75s (resp. 5.2s).

Query times vary with the query fragment, reference fragments, indexed proteins and RMSD tolerance to the query. In general, the longer the required fragment length and the smaller the RMSD tolerance, the faster the query.

Reference fragments can be chosen randomly. Pruning of the search space is better if there are at least three reference fragments, far from each other. Once a RMSD index has been computed for a randomly chosen fragment (fi), taking the furthest fragment from it (fj) and the median fragment (fk) would give three acceptable reference fragments. For interested contributors, some good heuristics can be found in the literature but were not implemented in Fragger, like Brin’s greedy algorithm35.

For one time tasks, it is not necessary to create RMSD indices and actually query a database, as fragments extraction and RMSD computations are fast enough. For example, it takes only 15s to generate all (41,200) fragments of 13 residues starting with alanine and ending with glycine (middle of Figure 1). Ranking them to the query takes 1.5s. When working on PDB files, the ranker tool included with Fragger can compute 66,580 (resp. 23,784) RMSD/s on the backbone of three (resp. nine) residue fragments. These numbers become 304,149 (resp. 138,744) RMSD/s when working on Fragger’s binary-encoded PDBs. In the future, it might be possible to improve the performance of Fragger by incorporating a faster score than RMSD, such as BCscore36.

Fragger can be useful for protein design, loop grafting and retrieval of candidates to rebuild low-confidence regions of protein models6.

Data availability

All data underlying the results are available as part of the article and no additional source data are required.

Software availability

Fragger can be downloaded from: https://github.com/UnixJunkie/fragger

Archived source code at the time of publication: https://zenodo.org/record/877320

Software license: LGPL.

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 22 Sep 2017
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Berenger F, Simoncini D, Voet A et al. Fragger: a protein fragment picker for structural queries [version 2; peer review: 2 approved]. F1000Research 2018, 6:1722 (https://doi.org/10.12688/f1000research.12486.2)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 22 Sep 2017
Views
20
Cite
Reviewer Report 18 Jan 2018
Charlotte Deane, Department of Statistics, University of Oxford, Oxford, UK 
Saulo de Oliveira, Department of Statistics, Oxford University, Oxford, UK 
Approved
VIEWS 20
In this paper, the authors describe Fragger, a web server for retrieving fragments from a  database of structures based on a query fragment. Their program allows for customisation in terms of number of fragments output and in terms of sequence ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Deane C and de Oliveira S. Reviewer Report For: Fragger: a protein fragment picker for structural queries [version 2; peer review: 2 approved]. F1000Research 2018, 6:1722 (https://doi.org/10.5256/f1000research.13520.r28334)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 10 Apr 2018
    Kam Zhang, Structural Bioinformatics Team, Division of Structural and Synthetic Biology, Center for Life Science Technologies, RIKEN, Yokohama, Kanagawa, Japan
    10 Apr 2018
    Author Response
    > The Super method (mentioned in the introduction of the manuscript) seems
    > to fulfil the same purpose as Fragger. No comparison is provided between
    > the two methods in ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 10 Apr 2018
    Kam Zhang, Structural Bioinformatics Team, Division of Structural and Synthetic Biology, Center for Life Science Technologies, RIKEN, Yokohama, Kanagawa, Japan
    10 Apr 2018
    Author Response
    > The Super method (mentioned in the introduction of the manuscript) seems
    > to fulfil the same purpose as Fragger. No comparison is provided between
    > the two methods in ... Continue reading
Views
31
Cite
Reviewer Report 11 Oct 2017
Pierre Tuffery, Molécules Thérapeutiques In Silico (MTi) (UMR-S 973), French National Institute of Health and Medical Research (INSERM), Sorbonne Paris Cité, Paris Diderot University, Paris, France 
Approved
VIEWS 31
This paper describes an approach to quickly scan large collection of proteins to identify fragments similar to a request. Not considering indels, this approach is, as stated by the authors, in the context of fragment grafting, loop modeling, protein design ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Tuffery P. Reviewer Report For: Fragger: a protein fragment picker for structural queries [version 2; peer review: 2 approved]. F1000Research 2018, 6:1722 (https://doi.org/10.5256/f1000research.13520.r26264)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 10 Apr 2018
    Kam Zhang, Structural Bioinformatics Team, Division of Structural and Synthetic Biology, Center for Life Science Technologies, RIKEN, Yokohama, Kanagawa, Japan
    10 Apr 2018
    Author Response
    > - The introduction could benefit from a better description of the
    > rationale underlying Fragger, including its use in different contexts. For
    > instance, such a strategy has also ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 10 Apr 2018
    Kam Zhang, Structural Bioinformatics Team, Division of Structural and Synthetic Biology, Center for Life Science Technologies, RIKEN, Yokohama, Kanagawa, Japan
    10 Apr 2018
    Author Response
    > - The introduction could benefit from a better description of the
    > rationale underlying Fragger, including its use in different contexts. For
    > instance, such a strategy has also ... Continue reading

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 22 Sep 2017
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.