ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Software Tool Article

YODEL: Peak calling software for HITS-CLIP data

[version 1; peer review: 2 approved with reservations]
PUBLISHED 18 Jul 2017
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

Abstract

YODEL is a peak calling software for analyzing RNA sequencing data generated by High-Throughput Sequencing of RNA isolated by Crosslinking Immunoprecipitation (HITS-CLIP; also known as CLIP-SEQ), a method to identify RNA-protein interactions genome-wide. We designed YODEL to analyze HITS-CLIP experiments, in which Argonaute proteins are immunoprecipitated, followed by sequencing of the associated RNA in order to identify bound microRNAs and their mRNA targets. The HITS-CLIP sequenced reads are mapped to the genome, and then read peaks are visualized where clustered sets of reads map to the same region. Several peak calling algorithms have been developed to define the boundaries of these peaks. In contrast to other peak callers for HITS-CLIP data, such as Piranha, YODEL does not map the starts of reads to fixed interval bins, but instead uses a heuristic approach to iteratively find the tallest point within a set clustered reads and examine bases upstream and downstream of that point until a peak has been determined. This allows the peak boundary to be defined more precisely than coordinates that are multiples of the bin size. Per-sample peak counts are also generated by YODEL, which quickly enables downstream differential representation analysis. YODEL is available at https://github.com/LancePalmerStJude/YODEL/.

Keywords

HITS-CLIP, CLIP-SEQ, peak caller

Introduction

A peak caller that could accurately define a single peak amongst several samples was required to analyze High-Throughput Sequencing of RNA isolated by Crosslinking Immunoprecipitation (HITS-CLIP) (Darnell, 2010) data from fetal liver red blood cell precursors of miR-144/451-/- and wild-type mice (Paralkar et al., 2014; Paralkar VR, Palmer LE, Xu P, Lechauve C, Zhao G, Yao Y, Luan J, Wu G, Vourekas A, Mourelatos Z, Scheutz JD and Weiss MJ; unpublished study). Piranha (Uren et al., 2012) is one such software that is commonly used to identify peaks generated by HITS-CLIP. However, Piranha bins the starts of reads and does not fully define a peak. Consequently, a large bin size may result in multiple peaks being combined into one peak. We found that the identification and resolution of peaks using Piranha was highly dependent on the background threshold (-a) and binSize (-b) parameters, and it was unclear how these parameters should be set in order to obtain the most biologically relevant information. We also found that running Piranha on multiple samples separately results in peak boundaries that may be quite different from sample to sample. Generating initial peak calls from a combined sample dataset creates a single standard set of peak boundaries for all samples, which simplifies downstream analysis. We therefore developed a peak calling algorithm, named YODEL, with the following properties: 1) Incorporate strand specificity (Piranha does this, but many other CHIP-SEQ peak callers do not); 2) Generate per-sample read counts for each peak; 3) Have parameters that have easily understandable implications when changed.

Methods

Input

The main input for the peak caller is a BED file generated by clusterBed from the BEDtools suite (Quinlan & Hall, 2010) with the -s option used (see Supplementary material: Input file formats). If multiple samples are to be analyzed simultaneously, the name field must contain the sample name or ID before the first colon, followed by the read ID or other descriptive text. In addition, a sample list must be designated (with the YODEL parameter -sampleList) (see Supplementary material: Input file formats). The sample list will identify which samples are to be included for peak calling. After peak calling, read counts for each peak in all samples will be calculated. If no sample list is provided, all reads will be treated as one sample. As an example of how to process HITS-CLIP FASTQ files to generate the input clustered BED file, see Supplementary material: Analysis of HITS-CLIP data from Chi et al., 2009.

Implementation

YODEL was written in Python and tested with Python version 2.7. YODEL processes each read cluster as it is encountered within the input clustered BED file. For each cluster, the base coverage at each position for all samples under examination for peak calling is determined. Position specific counts for all individual samples are calculated as well. Once the counts are tallied, the program iteratively identifies peaks until no additional peaks are found. The program identifies the position with the highest read count. If the read count is less than the minimum peak height (mph) than no further peak calling for the cluster is performed. From the position with the highest read count, bases upstream are analyzed one at a time. The lowest read count (lowestPoint) up to the current base is tracked. Where dipTolerance (dt) and peakDipBuffer (pdb) are input parameters, if at any position the count is 0 or the count >= (lowestPoint+ peakDipBuffer)* dipTolerance, then the peak start has been determined and is recorded as the base position where the count was 0 or the base position of lowestPoint. This is repeated for bases downstream of the highest point to find the peak end. The peak summit is defined as the median of all the positions with the highest count. Two sets of peak boundaries (25% and 50%) are defined as the positions where the coverage is 25% and 50% of the highest point, and the maximum peak heights per sample are determined. The numbers of reads per peak are calculated (at both the 25% and 50% range) by determining the number of reads that overlap the peak by at least the input parameter binSize (bs). Typically, we have used the 25% peak boundary for downstream calculations. The peak counts are determined on a per sample basis, and for all the samples used in peak calling combined. Once a peak has been determined, the base coverage for positions within the full peak is set to 0 so that no further peaks are called overlapping it. The peak finding process (starting with finding the position with highest read count) is repeated until no more peaks are identified. After peaks are identified by YODEL, several filtering steps can be applied to remove low quality peaks (see Supplementary material: Peak filtering, Supplementary Figure 2Supplementary Figure 6).

Operation

YODEL has been tested on both Windows and Linux running Python 2.7 with standard libraries. Some of the tools (e.g. Bedtools) used to generate input files are Linux or OS X specific. There is no minimum memory requirement for YODEL, but the size of any BED files sorted with the Linux sort command may be dependent on system memory. See ‘Supplementary material: Analysis of HITS-CLIP data from Chi et al., 2009’ for instructions on how to preprocess data and run YODEL.

Use case

The output of YODEL is described in the Table 1. Figure 1 shows a comparison of YODEL and Piranha output from HITS-CLIP analysis of wild type and miR144/451-/- fetal liver erythroblasts. Results from two different YODEL parameter settings are shown in blue in the lower half of the figure. Cab39 (Figure 1A) (Godlewski et al., 2010) and Ywhaz (Figure 1B) (Yu et al., 2010) are two known miR-451a target mRNAs. For Cab39, the largest peak contains a miR-451a seed match that is not present in the knockout sample. Because Piranha bins the start of reads, the peak defined by Piranha may not actually include the seed match location, as is observed with the Cab39 seed match. The failure of a peak to cover a seed match may prevent a microRNA from being assigned to a peak and subsequently interfere with downstream analysis. Also, the peak calling of Piranha is greatly influenced by bin size. A bin size of 32 (default parameter) is not able to resolve many individual peaks. In Ywhaz mRNA (panel B) YODEL detects three peaks around the miR-451a seed match. One Piranha setting (a=0.98, b=16) identified the three individual peaks, but compromised detection of smaller peaks upstream of the predicted miR-451a binding site. Therefore for our miR144/451-/- data set, YODEL was superior to Piranha in defining HITS-CLIP peaks.

Table 1. Descriptions of output files.

FileDescription
prefix.bins.cov.full.bedContains start and end coordinates
for the entire peak, with the thickStart
and thickEnd fields containing the
25% start and end coordinates,
respectively
prefix.bins.cov.bedContains the 25% peak coordinates
in the start and end fields and the
50% coordinates in thickStart and
thickEnd fields.
prefix.binCountsRC25.txtRead counts per peak defined by
25% peak
prefix.binCountsRC50.txtRead counts per peak defined by
50% peak
prefix.binCountsPH.txtRead counts covering position of
maximum peak height for each
individual sample
b021e4a5-280a-4441-9e12-d398abfbaefb_figure1.gif

Figure 1. YODEL and Piranha peak calling comparisons in the Cab39 and Ywhaz genes.

IGV browser (Thorvaldsdóttir et al., 2013) images showing YODEL output for HITS-CLIP data analyzing the 3’ untranslated regions of Cab39 (A) and Ywhaz (B) mRNAs in wild-type and miR-451a/miR-144-/- mouse fetal livers at embryonic day 14.5. The coverage tracks (WT in blue, KO in red, and combined coverage in magenta) show combined sequencing reads from three animals of each genotype mapped to the mouse mm10 genome. The seedMatches track shows microRNA seed matches for miR-451a, miR-144-3p and the three most abundant erythroid microRNAs besides miR-451a (miR-16-5p, miR-486a-5p and miR-122-5p) (Paralkar VR, Palmer LE, Xu P, Lechauve C, Zhao G, Yao Y, Luan J, Wu G, Vourekas A, Mourelatos Z, Scheutz JD and Weiss MJ; unpublished results). Average peak counts per sample are shown for wild-type and miR-451a/miR-144-/- erythroblasts using the YODEL more sensitive parameters (see below). The lower panels show YODEL (blue) and Piranha (green) peak boundaries with the indicated parameter settings. For YODEL, full boundaries are shown by thin lines and 25% boundaries by thick lines. YODEL less sensitive parameter settings: dipTolerance=2, peakDipBuffer=2; more sensitive settings: dipTolerance=1.5, peakDipBuffer=1; Both parameter settings: binSize=16, minPeakHeight=5. Note that the ability of Piranha to resolve different peaks representing distinct microRNA binding sites is highly dependent on parameter settings. Piranha parameters: a=background, threshold b=binSize. The default for Piranha is a=0.99 and b=32.

We have also tested the YODEL software on a publically available HITS-CLIP data set. HITS-CLIP reads from mouse neocortex Argonaute immunoprecipitations (Chi et al., 2009) was retrieved from http://ago.rockefeller.edu/rawdata.php. The reads were pre-processed and run through YODEL, as described in the Supplementary material: Analysis of HITS-CLIP data from Chi et al., 2009. We examined the first four genes identified by Table 1 in Chi et al., as potential target of microRNAs. Figure 2 shows a comparison of peak calling in the 3’ UTR of the Plod3 gene. Again, it is seen that binning starts of reads will cause potential microRNA seeds to be missed. Figure 3 shows the 3’ UTR of the Cd164 gene. See Supplementary Figure 1 for browser images of Ctdsp1 and Itgb1.

b021e4a5-280a-4441-9e12-d398abfbaefb_figure2.gif

Figure 2. YODEL and Piranha peak calling comparisons in the Plod3 gene.

IGV browser image of a portion of the Plod3 3’ UTR. Mouse neocortex HITS-CLIP read coverage is shown along with peak calls from YODEL (lower panel blue bars) and Piranha (lower panel green bars). Relevant microRNA seeds (for seeds mapped see Supplementary material: Seeds mapped for brain neocortex data) are shown in black. For YODEL, full boundaries are shown by thin lines and 25% boundaries by thick lines. YODEL parameters: dipTolerance=1.5, peakDipBuffer=1, binSize=16, minPeakHeight=5. Piranha parameters: a=background, threshold b=binSize. The default for Piranha is a=0.99 and b=32.

b021e4a5-280a-4441-9e12-d398abfbaefb_figure3.gif

Figure 3. YODEL and Piranha peak calling comparisons in the Cd164 gene.

IGV browser image of a portion of the Cd164 3’ UTR. Mouse neocortex HITS-CLIP read coverage is shown along with peak calls from YODEL (lower panel blue bars) and Piranha (lower panel green bars). Relevant microRNA seeds (For seeds mapped see Supplementary material: Seeds mapped for brain neocortex data) are shown in black. For YODEL, full boundaries are shown by thin lines and 25% boundaries by thick lines. YODEL parameters: dipTolerance=1.5, peakDipBuffer=1, binSize=16, minPeakHeight=5. Piranha parameters: a=background, threshold b=binSize. The default for Piranha is a=0.99 and b=32.

Conclusions

We have designed a new peak-caller, termed YODEL, for analysis of RNA-seq data generated by HITS-CLIP-type experiments. Advantages of YODEL compared to Piranha, a program commonly used for the same purpose, include standardization of peak calls for comparative analysis of multiple samples, improved resolution of peak boundaries, and more consistent overlap between peak calls and microRNA seed matches.

Software and data availability

YODEL is a Python script and is available at: https://github.com/LancePalmerStJude/YODEL/

Archived source code as at time of publication: https://doi.org/10.5281/zenodo.820635 (Palmer, 2017).

License: GPLv3

Input BED files for peak calling with the miR-144/451 KO HITS-CLIP can be found in the Cab39_Ywhaz.allSamples.fullCollapsed.clusters.bed file within the archived source code listed above. This file contains the clustered BED file used for YODEL input (only around Cab39 and Ywhaz).

Sample HITS-CLIP data from the 130kD band from mouse neocortex samples was downloaded from http://ago.rockefeller.edu/rawdata.php (130kD Brain A-E samples).

The pre-processing BASH pipeline used to generate a clustered BED files, as well as some accessory scripts, can be found in the Ch2009 directory within the archived source code link listed above.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 18 Jul 2017
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Palmer LE, Weiss MJ and Paralkar VR. YODEL: Peak calling software for HITS-CLIP data [version 1; peer review: 2 approved with reservations]. F1000Research 2017, 6:1138 (https://doi.org/10.12688/f1000research.11861.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 18 Jul 2017
Views
19
Cite
Reviewer Report 16 Aug 2017
Neelanjan Mukherjee, Biochemistry and Molecular Genetics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA 
Approved with Reservations
VIEWS 19
The authors present an alternative peak caller for HITS-CLIP data. While the idea is interesting and the examples are compelling, there is not sufficient analysis presented to determine the utility of YODEL.

Major:
... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Mukherjee N. Reviewer Report For: YODEL: Peak calling software for HITS-CLIP data [version 1; peer review: 2 approved with reservations]. F1000Research 2017, 6:1138 (https://doi.org/10.5256/f1000research.12817.r24815)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
31
Cite
Reviewer Report 25 Jul 2017
Michele Trabucchi, C3M (Mediterranean Centre for Molecular Medicine), INSERM (French National Institute of Health and Medical Research), University of Côte d'Azur, Nice, France 
Silvia Bottini, C3M (Mediterranean Centre for Molecular Medicine) ,INSERM (French National Institute of Health and Medical Research), University of Côte d'Azur, Nice, France 
Approved with Reservations
VIEWS 31
The authors developed a new tool called YODEL to identify peaks from Ago2 HITS-CLIP data using a novel approach based on the identification of the peak summit of reads cluster and estimation of the size of the peaks based on ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Trabucchi M and Bottini S. Reviewer Report For: YODEL: Peak calling software for HITS-CLIP data [version 1; peer review: 2 approved with reservations]. F1000Research 2017, 6:1138 (https://doi.org/10.5256/f1000research.12817.r24325)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 18 Jul 2017
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.