ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Software Tool Article

shiny-pred: a server for the prediction of protein disordered regions

[version 1; peer review: 1 approved with reservations, 1 not approved]
PUBLISHED 28 Feb 2019
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the RPackage gateway.

This article is included in the Artificial Intelligence and Machine Learning gateway.

Abstract

Intrinsically disordered proteins or intrinsically disordered regions (IDR) are segments within a protein chain lacking a stable three-dimensional structure under normal physiological conditions.
Accurate prediction of IDRs is challenging due to their genome wide occurrence and low ratio of disordered residues, making them a difficult target for traditional classification techniques. Existing computational methods mostly rely on sequence profiles to improve accuracy, which is time consuming and computationally expensive.
The shiny-pred application is an ab initio sequence-only disorder predictor implemented in R/Shiny language. In order to make predictions, it uses convolutional neural network models, trained using PDB sequence data. It can be installed on any operating system on which R can be installed and run locally. A public version of the web application can be accessed at https://gmu-binf.shinyapps.io/shiny-pred

Keywords

Disordered proteins, machine learning, convolutional neural networks, R, Shiny

Introduction

Experimental structure resolution of intrinsically disordered proteins/intrinsically disordered regions (IDP/IDRs) is complex, lengthy and expensive, leading to a variety of computational approaches being developed (He et al., 2009). Over 60 computational protein disorder prediction servers are currently available, although not all publicly. Methods can be classified in one of the following categories (Atkins et al., 2015): (i) Ab initio or sequence-based, (ii) clustering, (iii) template based, and (iv) meta or consensus.

shiny-pred is an ab initio predictor, which means it relies exclusively on amino acid sequence information to make disordered predictions. It uses prediction models based on convolutional neural networks and reduced protein alphabets. Currently there are three available models, each one built using the same training protein data from PDB (Berman et al., 2000) but differing on the convolutional neural network architecture. Since it doesn't rely in sequence profiles to make predictions, it is fast to be used in proteome-wide disorder scenarios. It performs at the same level or outperforms other state of the art sequence-only methods, achieving accuracy levels of 0.76 and AUC of 0.85 on the publicly available CASP10 dataset (Monastyrskyy et al., 2014), at faster speeds.

Methods

Implementation

shiny-pred is written in the R programming language (R Core Team, 2017) and the shiny web application framework is implemented using the Shiny R package v1.1.0 (Chang, 2018).

Currently, three convolutional neural network models are made available by our application:

(i) cnn-64-ker-local, is a one layer convolutional network (step size 1 and window size of 32) with 64 kernels and local max pooling model; (ii) cnn-128-ker-local, implements one convolutional layer (step size 1 and window size of 32) with 128 kernels and local max pooling model; and (iii) cnn-2-conv-local implements two convolutional layers (64 and 32 kernels) with local max pooling.

The models were created, trained and accessed using the keras R package v2.1.6 (Allaire & Chollet, 2018).

Operation

Our tool has two operation modes; predicting disordered residues in protein sequences (prediction) and benchmarking the predictor performance against sequences with known disorder information (benchmark). The mode is selected automatically based on the format of the input sequences. Users can either upload a sequence file, type/paste a sequence into the text area or select pre-loaded examples from a list.

When in prediction mode, the amino acid sequences are expected to be in FASTA format (Figure 1). In benchmark mode, input sequences in FASTA format are expected to have an additional line containing the disorder information (D=disorder, O=ordered). Multiple sequences can be submitted at once; several examples for different types of submissions (prediction and benchmark modes) are made available as examples. In both modes, the application will show a result panel, where for each input sequence a graph with the probability of disorder per residue is plotted (Figure 2).

5f23f89e-29dc-46e6-a01b-044f3a561119_figure1.gif

Figure 1. Input sequence format (prediction mode).

5f23f89e-29dc-46e6-a01b-044f3a561119_figure2.gif

Figure 2. Prediction results.

(1) Prediction mode

The workflow for protein disorder prediction is:

  • (i) Input the target sequences (in FASTA format) in the text area;

  • (ii) Select the model to use for the prediction (default is cnn-128-ker-local) and submit the sequence for prediction;

  • (iii) Visualize and download results.

(2) Benchmark mode

In benchmark mode, input sequences are expected to have an extra line with the actual disorder information to be used as benchmark. Result tables will populate two extra columns (actual class and match) with the actual disorder information and if the prediction was accurate for the current residue. An extra panel (Benchmark) shows the ROC curve along with other common binary metrics (sensitivity, specificity, balance accuracy and Matthews correlation coefficient).

Use cases

We use shiny-pred to predict disordered regions within the publicly available CASP10 benchmark dataset. The dataset contains 94 target sequences, each one annotated with the disorder/order information at the residue level. The annotated dataset is provided as an example (‘CASP_all’) and it can be selected form the example selection list on the ‘Sequence Input’ tab. Figure 3 shows the input panel after the dataset is selected and loaded. Predictions per sequence can be viewed and downloaded from the ‘Results’ tab while the ‘Benchmark’ tab provides a summary of the performance using binary and statistical metrics. Figure 4 shows the server performance for the input dataset, achieving an AUC value of 0.85 and balance accuracy of 0.75.

5f23f89e-29dc-46e6-a01b-044f3a561119_figure3.gif

Figure 3. Input sequence format (benchmark mode).

5f23f89e-29dc-46e6-a01b-044f3a561119_figure4.gif

Figure 4. Predictor benchmarking.

Summary

This article presents shiny-pred, a sequence-only ab initio web application for predicting protein disorder. It's based on reduced amino acid alphabets and convolutional neural networks, being fast and accurate, it is suitable for large proteome-wide experiments.

Software availability

Software available from: https://gmu-binf.shinyapps.io/shiny-pred

Source code available from: https://github.com/mauricioob/shiny-pred

Archived source code as at time of publication: https://doi.org/10.5281/zenodo.2567259 (Mauricio, 2019).

License: GNU public license (GPL-3)

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 28 Feb 2019
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Oberti M and Vaisman I. shiny-pred: a server for the prediction of protein disordered regions [version 1; peer review: 1 approved with reservations, 1 not approved]. F1000Research 2019, 8:230 (https://doi.org/10.12688/f1000research.17669.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 28 Feb 2019
Views
12
Cite
Reviewer Report 08 Jul 2019
Appadurai Rajeswari, Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India 
Anand Srivastava, Molecular Biophysics Unit, Biological Sciences Division, Indian Institute of Science, Bangalore, India 
Approved with Reservations
VIEWS 12
The authors presented yet another neural network-based disorder prediction tool written in R, trained on PDB data and benchmarked on CASP10 dataset and they claim that the tool outperforms other existing tools in terms of both calculation speed and performance.
... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Rajeswari A and Srivastava A. Reviewer Report For: shiny-pred: a server for the prediction of protein disordered regions [version 1; peer review: 1 approved with reservations, 1 not approved]. F1000Research 2019, 8:230 (https://doi.org/10.5256/f1000research.19321.r50204)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
17
Cite
Reviewer Report 11 Apr 2019
Jinbo Xu, Toyota Technological Institute at Chicago, Chicago, IL, 60637, USA 
Not Approved
VIEWS 17
This manuscript describes a new protein disorder prediction web server that makes use of (shallow) convolutional neural networks.
There are already many disorder predictors, some of which are based upon deep convolutional neural network and can do prediction directly on ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Xu J. Reviewer Report For: shiny-pred: a server for the prediction of protein disordered regions [version 1; peer review: 1 approved with reservations, 1 not approved]. F1000Research 2019, 8:230 (https://doi.org/10.5256/f1000research.19321.r46176)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 28 Feb 2019
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.