ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Software Tool Article

RBP-Tar – a searchable database for experimental RBP binding sites

[version 1; peer review: 2 approved with reservations]
PUBLISHED 27 Jun 2023
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Bioinformatics gateway.

Abstract

Background: RNA-binding proteins (RBPs) play a critical role in regulating gene expression by binding to specific sites on RNA molecules. Identifying these binding sites is crucial for understanding the many functions of RBPs in cellular function, development and disease. Current experimental methods for identifying RBP binding sites, such as ultra-violet (UV) crosslinking and immunoprecipitation (CLIP), and especially the enhanced CLIP (eCLIP) protocol, were developed to identify authentic RBP binding sites experimentally.
Methods: To make this data more accessible to the scientific community, we have developed RBP-Tar (https://ncbr.muni.cz/RBP-Tar), a web server and database that utilises eCLIP data for 167 RBPs mapped on the human genome. The web server allows researchers to easily search and retrieve binding site information by genomic location and RBP name.
Use case: Researchers can produce lists of all known RBP binding sites on a gene of interest, or produce lists of binding sites for one RBP on different genomic loci.
Conclusions: Our future goal is to continue to populate the web server with additional experimental datasets from CLIP experiments as they become available and processed, making it an increasingly valuable resource for the scientific community.

Keywords

RNA Binding Proteins, CLIP, RBP, Web-server

Introduction

RNA binding proteins (RBPs) are key players in a broad spectrum of RNA regulation, including all stages of the RNA lifecycle (Gerstberger et al. 2014; Gebauer et al. 2021). Eukaryotic genomes typically encode hundreds of RBPs. For example, over 1500 human RBPs involved in the maturation, transport, stability and translation of coding and non-coding RNA were recently characterised and manually curated (Gerstberger et al. 2014). Each RBP can typically target hundreds of RNAs in a complex coordinated fashion (Hogan et al. 2008). The general transcriptomic locations of thousands of RNA binding sites corresponding to hundreds of RBPs have been identified using a family of experimental techniques based on RBP CrossLinking, ImmunoPrecipitation and Sequencing (CLIP-Seq) (Chi et al. 2009; Licatalosi et al. 2008; Moore et al. 2014; Hafner et al. 2010). A thorough exploration of tens of RBPs binding characteristics in vitro has shown that RBPs can differentiate their binding sites with context preferences beyond narrowly defined binding sequence motif and secondary structure often involving complex binding configurations (Dominguez et al. 2018). Among the experimental techniques available to date, the enhanced CLIP (eCLIP) protocol (Van Nostrand et al. 2016) is particularly important, as it significantly reduces required amplification and increases specificity in identifying authentic binding sites. This improves the efficiency and accuracy of RBP binding site identification and allows for a deeper understanding of the role of RBPs in gene regulation.

The availability of a large amount of data points produced from the same experimental technique can be very beneficial for applications such as machine learning, as it allows researchers to train and test models with more confidence. Here we present RBP-Tar (Tomáš Raček 2023), a centralised and searchable database of experimentally identified RBP binding sites that can significantly facilitate the study of the RBP mediated gene regulation. Using RBP-Tar, researchers can quickly and cleanly retrieve RBP binding sites constrained by both genomic location and associated RBP for hundreds of RBPs.

Methods

Implementation

Pipeline for reproducible data download and annotation

We have developed a reproducible and easy-to-use pipeline for downloading and annotating RBP eCLIP data from the ENCODE (Luo et al. 2020) database.

Metadata for eCLIP experiments, as well as additional files containing genomic coordinates are downloaded from the ENCODE database with the following parameters:

“status= released&

internal_tags=ENCORE&

assay_title=eCLIP&

biosample_ontology.term_name=K562&

biosample_ontology.term_name=HepG2&

files.file_type=bed+narrowPeak&

type=Experiment”.

Following the download, information about the chromosome, start position, end position and strand of reads are extracted for each RBP binding site. Binding sites are filtered by length, excluding ones shorter than 20 and longer than 100 nucleotides. As a last step, genomic sequences of binding sites are retrieved.

The described pipeline is implemented as a set of python scripts and is freely available at GitHub.

Using this pipeline, data for 167 RBPs on two cell lines (K562, HepG2) were downloaded. In total, 1.8 Gb of data, representing more than 25 million binding sites, were thus processed.

Web server

Here we present RBP-Tar, a web server that can access the above curated dataset of RBP binding sites (https://ncbr.muni.cz/RBP-Tar) and was built with Python (RRID:SCR_008394), the web development framework Flask, and a simple SQLite database (RRID:SCR_017672). The application’s source codes can be found on the project’s GitHub page, along with the requirements and instructions for the deployment if a user wants to run the application locally.

The web user interface allows searching and filtering based on the start and end position of the binding site, strand, chromosome, and protein name. It offers the download of the filtered data based on the search done by the user. Due to the size of the dataset, the view is limited to 10 000 results. However, the whole dataset can be conveniently downloaded as a gzipped CSV (579 MB) (here).

Operation

The RBP-Tar web server can take as input any of the following user-provided parameters: [Start min, Start max] denote the limits of the low genomic coordinate of the locus of interest. Similarly, [End min, End max] denote the limits of the high coordinate. [Strand] and [Chromosome] can be used to narrow down the search to only one strand and a specific chromosome. Using these combinations of parameters, a user can easily search for binding sites on their favourite gene, exon, or even a whole chromosome. The last parameter is [Protein name], which brings out a drop-down menu of all the RBPs in the database. If this parameter is not set, all RBPs are queried.

Results are shown as a table with the [Chromosome, Start, End, Strand] genomic location of the binding site, followed by the [Protein name] of the associated RBP and the [Sequence] of the binding site contains the genomic sequence of the binding site. Results can be seen online in a table format or downloaded as a CSV file with one button click. We expect most users to download the results and use them for further downstream analyses.

Use cases

Use case 1: all known RBP binding sites on the gene of interest

The first and potentially most common use case would be the query of all known RBP binding sites on a gene of interest. For example, we can query our web server with the coordinates of the Fused in Sarcoma (FUS) gene (chr16: 31180139-31191605, +) and leave the protein field empty. After this search, all 5 045 known RBP binding sites on this gene are returned and can be easily downloaded in a CSV file for further analysis (Figure 1).

5bb83aab-c8b5-4de9-b285-d18b4e3c29d3_figure1.gif

Figure 1. Extracting binding sites of all RNA-binding proteins on one locus (use case 1).

In fact, the gene we used encodes the RBP FUS, which plays important roles, among others, in neurodegeneration and cancer progression. We can use the filter [Protein Name] to identify the potential self-targeting of FUS on itself. Indeed, we can thus identify 72 potential FUS self-targeting binding sites identified via eCLIP (Grešová & Raček 2023) (Figure 2). Of course, the biological relevance of this type of finding is left to the users, as is the further validation of high-throughput derived RBP binding sites.

5bb83aab-c8b5-4de9-b285-d18b4e3c29d3_figure2.gif

Figure 2. Fused in Sarcoma (FUS) self-targeting identified by filtering by gene location and protein name.

Use case 2: training/testing dataset for one RBP

A potential user may want to develop an RBP binding site machine learning tool that would be able to predict binding sites based on a sequence. It is important to make sure that training and testing sets for their machine learning method are not overlapping. Using our web server, they can download all binding sites for a specific RBP, for example, on chromosome 1, and use them as a training set (Figure 3). Then do the same for chromosome 2 and use it as an independent testing set, thus ensuring that training and testing set do not overlap.

5bb83aab-c8b5-4de9-b285-d18b4e3c29d3_figure3.gif

Figure 3. Extracting all binding sites of a single protein on a single chromosome (use case 2).

Conclusion

Recent advancements in experimental techniques, such as eCLIP, have greatly expanded our understanding of RBP binding preferences and their role in gene regulation. The development of centralised and searchable databases of experimentally identified RBP binding sites allows researchers to access and analyse the binding preferences of RBPs easily. This information can be used to identify known binding sites on genes of interest and aid in training machine-learning models for RBP binding site prediction. This paper presents RBP-Tar, a centralised web server that can retrieve RBP Target sites with location and RBP constraints. RBP-Tar has been designed to be easily accessible by non-experts. It is still confined to a single source of data, which is helpful for avoiding experimental design effects, but makes its scope limited. We plan to expand the web server with other sources of data, as well as ways for the user to be able to take into account provenance and experimental variation.

Comments on this article Comments (0)

Version 3
VERSION 3 PUBLISHED 27 Jun 2023
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Gresova K, Racek T, Martinek V et al. RBP-Tar – a searchable database for experimental RBP binding sites [version 1; peer review: 2 approved with reservations]. F1000Research 2023, 12:755 (https://doi.org/10.12688/f1000research.131014.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 27 Jun 2023
Views
13
Cite
Reviewer Report 18 Dec 2023
Clifford A Meyer, Harvard T.H. Chan School of Public Health, Dana-Farber Cancer Institute, Boston, USA 
Approved with Reservations
VIEWS 13
This paper describes a database of RNA binding protein binding sites based on 2 ENCODE cell lines. The website allows uses to search for data by querying a gene or genomic interval.

The scope of the database ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Meyer CA. Reviewer Report For: RBP-Tar – a searchable database for experimental RBP binding sites [version 1; peer review: 2 approved with reservations]. F1000Research 2023, 12:755 (https://doi.org/10.5256/f1000research.143817.r224195)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 12 Jul 2024
    Panagiotis Alexiou, Department of Applied Biomedical science, University of Malta, Msida, Malta
    12 Jul 2024
    Author Response
    This paper describes a database of RNA binding protein binding sites based on 2 ENCODE cell lines. The website allows uses to search for data by querying a gene or ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 12 Jul 2024
    Panagiotis Alexiou, Department of Applied Biomedical science, University of Malta, Msida, Malta
    12 Jul 2024
    Author Response
    This paper describes a database of RNA binding protein binding sites based on 2 ENCODE cell lines. The website allows uses to search for data by querying a gene or ... Continue reading
Views
21
Cite
Reviewer Report 18 Dec 2023
Xiaoyong Pan, Shanghai Jiao Tong University, Shanghai, Shanghai, China 
Approved with Reservations
VIEWS 21
  1. ​​​​​In the paper, the author claim they download narrowPeak metadata from https://www.encodeproject.org/metadata/?status=released&internal_tags=ENCORE&assay_title=eCLIP&biosample_ontology.term_name=K562&biosample_ontology.term_name=HepG2&files.file_type=bed+narrowPeak&type=Experiment&files.analyses.status=released&files.preferred_default=true,and the link summary is provided in the file  https://github.com/ML-Bioinfo-CEITEC/rbp_encode_eclip/blob/main/csv/coord_links.txt. In the ENCODE dataset, the majority of RNA-binding proteins (RBPs) have biological replicate(s) or technical replicate(s),
... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Pan X. Reviewer Report For: RBP-Tar – a searchable database for experimental RBP binding sites [version 1; peer review: 2 approved with reservations]. F1000Research 2023, 12:755 (https://doi.org/10.5256/f1000research.143817.r224199)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 12 Aug 2024
    Panagiotis Alexiou, Department of Applied Biomedical science, University of Malta, Msida, Malta
    12 Aug 2024
    Author Response
    Reviewer Comment:
    ​​​​​In the paper, the author claim they download narrowPeak metadata from https://www.encodeproject.org/metadata/?status=released&internal_tags=ENCORE&assay_title=eCLIP&biosample_ontology.term_name=K562&biosample_ontology.term_name=HepG2&files.file_type=bed+narrowPeak&type=Experiment&files.analyses.status=released&files.preferred_default=true,and the link summary is provided in the file  https://github.com/ML-Bioinfo-CEITEC/rbp_encode_eclip/blob/main/csv/coord_links.txt. In the ENCODE dataset, the majority of ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 12 Aug 2024
    Panagiotis Alexiou, Department of Applied Biomedical science, University of Malta, Msida, Malta
    12 Aug 2024
    Author Response
    Reviewer Comment:
    ​​​​​In the paper, the author claim they download narrowPeak metadata from https://www.encodeproject.org/metadata/?status=released&internal_tags=ENCORE&assay_title=eCLIP&biosample_ontology.term_name=K562&biosample_ontology.term_name=HepG2&files.file_type=bed+narrowPeak&type=Experiment&files.analyses.status=released&files.preferred_default=true,and the link summary is provided in the file  https://github.com/ML-Bioinfo-CEITEC/rbp_encode_eclip/blob/main/csv/coord_links.txt. In the ENCODE dataset, the majority of ... Continue reading

Comments on this article Comments (0)

Version 3
VERSION 3 PUBLISHED 27 Jun 2023
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.