miND (miRNA NGS Discovery pipeline): a small RNA-seq analysis pipeline and report generator for microRNA biomarker discovery studies

Andreas Diendorfer; Kseniya Khamina; Marianne Pultar; Matthias Hackl

doi:10.12688/f1000research.94159.2

Home Browse miND (miRNA NGS Discovery pipeline): a small RNA-seq analysis pipeline...

ALL Metrics

Views

Downloads

Get PDF

Get XML

Export

▬

✚

Software Tool Article

Revised

miND (miRNA NGS Discovery pipeline): a small RNA-seq analysis pipeline and report generator for microRNA biomarker discovery studies

[version 2; peer review: 3 approved]

Andreas Diendorfer¹, Kseniya Khamina¹, Marianne Pultar¹, Matthias Hackl¹

PUBLISHED 09 May 2026

Author details Author details

¹ TAmiRNA GmbH, Vienna, Austria

Andreas Diendorfer
Roles: Conceptualization, Data Curation, Formal Analysis, Investigation, Methodology, Project Administration, Software, Validation, Visualization, Writing – Original Draft Preparation

Kseniya Khamina
Roles: Conceptualization, Investigation, Writing – Review & Editing

Marianne Pultar
Roles: Methodology, Software, Visualization, Writing – Review & Editing

Matthias Hackl
Roles: Conceptualization, Data Curation, Funding Acquisition, Methodology, Project Administration, Resources, Supervision, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the Bioinformatics gateway.

This article is included in the Cell & Molecular Biology gateway.

Abstract

In contrast to traditional methods like real-time polymerase chain reaction, next-generation sequencing (NGS), and especially small RNA-seq, enables the untargeted investigation of the whole small RNAome, including microRNAs (miRNAs) but also a multitude of other RNA species. With the promising application of small RNAs as biofluid-based biomarkers, small RNA-seq is the method of choice for an initial discovery study. However, the presentation of specific quality aspects of small RNA-seq data varies significantly between laboratories and is lacking a common (minimal) standard.

The miRNA NGS Discovery pipeline (miND) aims to bridge the gap between wet lab scientist and bioinformatics with an easy to setup configuration sheet and an automatically generated comprehensive report that contains all essential qualitative and quantitative results that should be reported. Besides the standard steps like preprocessing, mapping, visualization, and quantification of reads, the pipeline also incorporates differential expression analysis when given the appropriate information regarding sample groups.

Although miND has a focus on miRNAs, other RNA species like tRNAs, piRNA, snRNA, or snoRNA are included and mapping statistics are available for further analysis. miND has been developed and tested on a multitude of data sets with various RNA sources (tissue, plasma, extracellular vesicles, urine, etc.) and different species.

miND is a Snakemake based pipeline and thus incorporates all advantages using a flexible workflow management system. Reference databases are downloaded, prepared and built with an included (but separate) workflow and thus can easily be updated to the most recent version but also stored for reproducibility.

In conclusion, the miND pipeline aims to streamline the bioinformatics processing of small RNA-seq data by standardizing the processing from raw data to a final, comprehensive and reproducible report.

Keywords

microRNA, Next-Generation Sequencing, differential expression, smallRNA sequencing, biomarkers, spike-in, discovery study

Corresponding author: Matthias Hackl

Competing interests: No competing interests were disclosed.

Grant information: The TransBioLine project has received funding from the Innovative Medicines Initiative 2 Joint Undertaking under grant agreement No 821283. This Joint Undertaking receives support from the European Union’s Horizon 2020 research and innovation programme and EFPIA. This communication reflects the authors’ view and neither IMI nor the European Union or EFPIA are responsible for any use that may be made of the information contained therein.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2026 Diendorfer A et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Diendorfer A, Khamina K, Pultar M and Hackl M. miND (miRNA NGS Discovery pipeline): a small RNA-seq analysis pipeline and report generator for microRNA biomarker discovery studies [version 2; peer review: 3 approved]. F1000Research 2026, 11:233 (https://doi.org/10.12688/f1000research.94159.2) First published: 24 Feb 2022, 11:233 (https://doi.org/10.12688/f1000research.94159.1) Latest published: 09 May 2026, 11:233 (https://doi.org/10.12688/f1000research.94159.2)

Revised Amendments from Version 1

In this revised version, we address the reviewers' requests for a systematic comparison with existing tools and a clearer articulation of miND's strengths and limitations. A feature comparison table has been added, covering miND, miRDeep2, miRge 3.0, Oasis 2.0, sRNAbench, Prost!, sRNAPipe, and the UEA sRNA Workbench across dimensions such as runtime environment, supported input formats, miRNA and other ncRNA quantification, differential expression support, and report generation. The Introduction now provides a broader overview of the available tool landscape for small RNA sequencing analysis. New paragraphs in the Discussion section position miND in the context of these tools and discuss its limitations, including the requirement for bioinformatics expertise, the miRNA-centric scope of the differential expression analysis, the dependency on miRBase, and hardware requirements. The text has been clarified to better describe how miND quantifies multiple small RNA species through RNAcentral mapping while focusing its differential expression analysis on miRNAs by design. The README on GitHub has been rewritten with installation instructions and usage guidance. Several sentences throughout the manuscript have been shortened for improved readability. New references have been added for the tools included in the comparison.

See the authors' detailed response to the review by Francisco J. Enguita
See the authors' detailed response to the review by Kristian Almstrup, Nina Mørup and Ailsa Maria Main

Introduction

Small RNA-seq has been a well-established tool for the quantification of short RNA molecules like microRNAs (miRNAs) in various biofluids (Murillo et al., 2019). Those short RNA molecules (17 to 25nt) play an important role in the cellular regulation of gene expression by interacting with specific complementary sites in targeted messenger RNAs (mRNAs). mRNAs that contain these target sites are then either down- or (rarely) up-regulated, resulting in a regulatory effect on the downstream translation of the mRNA (O’Brien et al., 2018). In this context, miRNAs are part of a complex regulatory network where their expression does not only affect other mRNAs, but also the expression of miRNAs themselves is highly controlled (Lee & Ambros, 2001). Thus, the levels of miRNAs can be indicators of a cell’s regulatory state and correlate with an organism’s health status. For example the liver specific miR-122-5p was shown to be a suitable marker for liver injury when measured in serum or plasma (Llewellyn et al., 2021) and as part of a miRNA expression signature can even be used to predict recovery after liver resection (Starlinger et al., 2019).

This makes them interesting targets as biomarkers in liquid biopsy (Larrea et al., 2016). The search for miRNAs or miRNA signatures suitable as biomarkers requires a specialized computational approach, and next-generation sequencing (NGS) is frequently used in the discovery phase of such studies (de Ronde et al., 2018). A number of tools for small RNA sequencing analysis are available, ranging from command-line tools such as miRDeep2 (Friedländer et al., 2012) and miRge 3.0 (Patil & Halushka, 2021), to web-based platforms like Oasis 2.0 (Rahman et al., 2018) and sRNAbench (Aparicio-Puerta et al., 2022), and desktop applications such as the UEA sRNA Workbench (Stocks et al., 2018). These tools address different aspects of small RNA analysis, including read mapping, miRNA quantification, novel miRNA prediction, and in some cases differential expression. However, many focus on individual analysis steps and leave the integration of results and their presentation in an accessible format to the user.

To address the need for a standardized and integrated analysis workflow, we developed miND, a small RNA-seq processing pipeline that combines all steps from raw data to differential expression in a single reproducible workflow. The pipeline produces a comprehensive interactive HTML report designed to support interpretation by both bioinformaticians and biologists. Experimental metadata, including sample grouping and statistical contrasts, is provided through an Excel-based contrast sheet that serves as a structured interface between wet-lab scientists and bioinformaticians.

We developed a robust and portable analysis pipeline for small RNA NGS data with a focus on biomarker discovery, targeting three goals: (1) standardized data inputs, (2) reproducible analysis, and (3) accessible results for both bioinformaticians and study statisticians, including publication-ready figures and an intuitive representation of results.

The miND pipeline can be used on many operating systems and in various setups with the only requirement of being able to run Snakemake workflows (Köster & Rahmann, 2012). Wrapper scripts for startup of the pipeline on Linux based systems are provided which can be adapted for the use on different platforms.

Methods

Implementation

The pipeline is based on Snakemake (Köster & Rahmann, 2012), a scalable bioinformatics workflow engine which incorporates many features needed for reproducible computational analysis (Mölder et al., 2021). This includes handling the installation and provisioning of software tools via conda (“Anaconda Software Distribution,” 2020) and bioconda (Grüning et al., 2018) and overall the orchestration of individual steps of the pipeline to optimize usage of limited resources like central processing unit (CPU) and memory. Configuration files in yml format are used and contain settings for multithreading to adapt the pipeline for various computing platforms (Diendorfer et al., 2022).

Use case

An example protocol demonstrating the analysis of a public data set is available at protocols.io under the name miND pipeline AWS EC2 installation and setup V.2 and can be reproduced not only as a guide for following data analysis, but also to setup the pipeline and data repository. The protocol describes the setup in an Amazon Web Services EC2 (Amazon Web Services, Inc, 2015) instance but has also been developed and tested on other platforms and systems. Only operating system specific parts would have to be adapted (e.g., installation of tools like git or wget would be done via apt on Debian based Linux distributions). For scientists interested in running the miND pipeline themselves, it is highly recommended to follow the provided protocol with the example data before running analysis on their own data sets.

The generated miND report for this example data set is available on GitHub.

Operation

The miND pipeline was developed and tested on Debian Linux (v11.2) running Snakemake (v6.0.5) and conda (v.4.10.3). The hardware requirements depend on the size of analyzed datasets, but in general it is recommended to provide at least 4 CPU cores and 8GB of memory. The pipeline will scale according to the available resources.

Data repository

The pipeline requires data from three reference data sets: (1) host genomes from ENSEMBL (Zerbino et al., 2018), (2) RNA sequences from RNAcentral (Sweeney et al., 2019), and (3) miRNA mature and precursor sequences from miRbase (Griffiths-Jones, 2004).

In order to download and prepare these datasets in the formats and structures required, miND provides separate workflows to build the data repository. These workflows can be executed with a shell script that will read configurations for each data source and then download, format and build the reference databases based on Snakemake workflows.

The data repository only has to be built once and will then provide the data needed for all future miND analysis runs. In case of updates of reference data sets, the repository can be rebuilt or extended by adding sources to the configuration files and running the build script again.

NGS raw data and metadata file

The miND pipeline requires two types of data for each experiment: raw NGS data and a meta data file with additional sample information. Raw data can be supplied either in fastq, fastq.gz or BAM (without alignments) files. The given format will be detected based on the file extensions.

Experimental meta data and details about the samples is provided in a XLS file containing three sheets: (1) Project details sheet, with general information and data of the project. This includes project title and comments but also settings relevant for the processing of the data like the sample species, adapter sequences, and cutoff levels for significance and quality filtering. (2) Sample group matrix sheet, which lists all samples that are part of this experiment and links them to additional group information. Up to five grouping variables can be set with unlimited levels each. The last sheet contains the (3) Contrast selection and allows the selection of groups and group-combinations based on the data provided in the sample group matrix sheet. The contrasts selected here will be used for the differential expression analysis.

Pipeline analysis steps

The overall flow of data through the pipeline is shown in Figure 1. This flow diagram outlines the most important steps of data processing in the miND pipeline, especially the quality control steps with FastQC (Andrews, 2010) and multiQC (Ewels et al., 2016), followed by hierarchical mapping using bowtie1 (Langmead et al., 2009) and miRDeep2 (Friedländer et al., 2012), where either mapped or unmapped reads are further processed by the next step. The final “R scripts processing” step includes multiple scripts that preprocess and analyze that data (including mapping statistics, unsupervised analysis methods and differential expression analysis) to then generate an interactive HTML report based on R markdown.

Figure 1. Flowchart representing the high-level steps of data processing through the pipeline.

Reference data is downloaded and processed by the repository build process (yellow area; top right) and then available for the miND pipeline in the repository/subfolder. Raw next-generations sequencing (NGS) data (blue area) is first adapter and quality trimmed and then handled by quality control (QC) tools and processed through hierarchical mapping steps (green area). These steps produce a set of mapping files that are then ingested and analyzed by R scripts, producing the miND report in the end.

The hierarchical mapping uses genome datasets from the prepared data repository (generated once before the initial run as described in the “Data repository” subsection) in a first step to filter out reads that to not map to the host organism’s genome (bowtie1, allowing for two mismatches). The genome-mapped reads are further processed by miRDeep2 to accurately quantify miRNAs. To identify further remaining (genome mapping but non-miRNA) reads, bowtie1 is used to first map against the RNAcentral database and then complementary DNA sequences (to assign mRNA reads), both steps allowing for one mismatch. Reads that remain unmapped after these hierarchical clustering are classified as either “unknown genomic” (if they mapped against the host genome) or “unmapped” (in case of reads that did not map against the host genome and were thus filtered in the first mapping step). The generated mapping files are processed by R scripts to prepare mapping statistics for the different RNA species in each sample.

The mapping process focuses on miRNAs and prioritizes them by using the specialized mapping tool miRDeep2 directly after an initial genome mapping step. It utilizes bowtie1 for mapping of the reads but performs a more sophisticated assignment of miRNA IDs to the reads. This includes detailed information of isomiRs (mature miRNAs with highly similar sequences) that is prepared for further analysis steps.

For the identification of other RNA species RNAcentral is used. This comprehensive database contains non-coding RNA (ncRNA) sequences from a broad range of species. This step focuses on the classification of reads and uses bowtie1 (allowing for one mismatch) reporting the first (best) hit. This limits the use of the mapping data to the required classification, as reads could map to multiple references which are not reported mainly for performance reasons.

Differential expression and independent filtering

miND pipeline uses the popular R package EdgeR (Robinson et al., 2009) for differential expression analysis (DEA) with the quasi-likelihood negative binomial generalized log-linear model functions provided by the package.

A key step in differential expression analysis is the removal of lowly expressed features, which would otherwise increase noise and inflate false positive rates. Fixed RPM-based cutoff values (e.g., filtering miRNAs below 10 RPM) do not account for variation in library size and miRNA content and are therefore arbitrary. The DEA package DESeq2 (Love et al., 2014) implements an independent filtering method that was adapted in miND to be used also with EdgeR. Assuming that most false-positives are caused by low abundant miRNAs, the algorithm removes quantiles of miRNAs from the low-abundance end and checks if the number of significant miRNAs increases after false-discovery rate (FDR) adjustment. This would be the case if mostly false positives have been removed because FDR adjustment would now be more sensitive and not remove as many true positives, increasing the overall number of significant results. This method works reliably when true positives are present. If no true positives exist, removing low-abundance miRNAs will not increase the number of significant results after FDR adjustment. For this case, the miND implementation includes a fallback: miRNAs with RPM values below 10 divided by the smallest library size, in at least half the samples of the smaller group, are pre-filtered before DEA and FDR adjustment. These miRNAs carry negligible biological and statistical relevance (Chen et al., 2016).

An exemplary relation between a given quantile cut-off and the resulting number of differentially expressed miRNAs after FDR is shown in Figure 2.

Figure 2. DESeq2's false discovery rate (FDR) based independent filtering method.

Each point represents the number of differentially expressed micro ribonucleic acids (miRNAs) after false discovery rate (FDR) adjustment and done in steps of increasingly stringent quantile-based reads filtering. With more and more low read count miRNAs removed from the differential expression analysis, the number of significant (FDR) differentially expressed (DE) miRNAs increases to the point where more and more true positives get removed, thus decreasing the total amount of DE miRNAs. This is shown in the graph as the maximum of the red line. The optimal quantile cutoff value is then determined by finding this maximum.

For differential expression the contrasts of interest can be selected in the experiment meta data XLS file (last sheet of the SampleContrastSheet.xlsx). Either groups or combinations of groups can be selected based on the group information provided for the samples. Each selected contrast will be part of the final interactive HTML report. In addition, a blocking factor can be selected if applicable. This blocking factor will be included in the model for the differential expression as additive factor and thus can be used e.g., for a paired experimental design or to account for batch effects.

Interactive HTML report and statistical analysis

Although DEA is a central point of biomarker discovery studies, other statistical methods are needed to put this analysis into context and ensure valid results. The miND pipeline report contains a series of additional graphs and tables to present the data in a way that is interactive and easy to browse. The main sections (see Figure 3) are (1) introduction, (2) data exploration (including a sample table, reads classification plots, miRNA mapping tables, heatmaps, principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE plots)), (3) differential expression results, and (4) an appendix (references and run information).

Figure 3. Outline of the interactive miND report.

The main sections (1) introduction, (2) data exploration, (3) differential expression, and (4) appendix each contain multiple subsections. The standardized structure of the report allows for the quick assessment and comparison of experiment results. t-distributed stochastic neighbor embedding (t-SNE), micro ribonucleic acids (miRNAs), differentially expressed (DE).

Reads classification plots

The reads classification plots (see Figure 4) present the amounts of reads mapped to different RNA species (miRNAs, tRNAs, piRNA, rRNA, lncRNA, etc.) based on the hierarchical mapping done by the miND pipeline. This is plotted as absolute reads but also as relative ratios (percent) to get a quick impression of the RNA classes that are present in the data set. Especially for samples with low numbers of miRNAs present (e.g. extracellular vesicles) these two graphs give important information about the success of library preparation and sequencing. While the differential expression analysis in miND focuses on miRNAs, the quantification and visualization of other small RNA species in the reads classification provides a useful overview of the small RNA composition in each sample.

Figure 4. Reads classification of all samples scaled to 100% of total reads.

Each bar represents an individual sample, while colors of the bar charts give insights in the mapped ribonucleic acid (RNA) species. This representation helps with a quick identification of library prep or sequencing issues if the ratios or total number of reads are not as expected.

The data on which the reads classification plots are based on is also included in the HTML report and can be either browsed directly in the HTML file or (as all tables and figures) or exported in various data formats (CSV or XLS for tabular data and PNG for graphics) for further analysis or publications.

miRNA mappings table

The miRNA mappings table contains read counts for each miRNA that was found in at least one of the samples. The table is available with raw read counts but also as RPM (normalized to the total number of miRNAs mapped in each sample). Group information is included in this table, if provided by the experiment metadata XLS file.

A visualization of the miRNA mapping statistics helps in comparing the number if identified miRNAs in the samples (see Figure 5). For each sample the number of distinct miRNAs with a read count above 0 and above 10 is plotted to give an impression about the abundance of distinct miRNAs and their read counts in the samples.

Figure 5. Distinct mapped micro ribonucleic acids (miRNAs) for each sample.

The number of identified miRNAs with either a read count above 0 (red) or 10 (green) is plotted for each sample.

Heatmaps, PCA and t-SNE plots

The heatmaps, PCA and t-SNE plots are part of the unsupervised clustering methods that are applied by the miND pipeline and included in the report. For better understanding of underlaying group relationships, any grouping information available in the meta data file will be included in the graphs. Two heatmaps are generated in the interactive HTML report. The first includes only the top 50 miRNAs based on the coefficient of variation (see Figure 6) while the second one contains all miRNAs that were detected in all samples. Both heatmaps are based on RPM normalized reads and scaled using the unit variance method for visualization. Clustering is based on complete clusters using Euclidean distances as these methods are applicable for most experimental setups. The group association of each sample is shown in the heatmaps with colored bars at the top to visualize clustering of samples based on the provided grouping information. Multiple groups are supported for heatmaps (no groups limit) and PCA/t-SNE (maximum of two groups are shown by colors and shapes).

Figure 6. Heatmap of top 50 miRNAs.

Group information provided with the experiment meta data XLS file is included if available.

Conclusions

The miND pipeline was developed and optimized for miRNA-focused analysis of small RNA-seq data, with particular emphasis on biomarker discovery studies. While other available tools focus on specific aspects of the analysis (e.g., miRDeep2 on quantification of miRNAs and annotation of possible novel miRNAs and sRNAbench on differential expression), miND generates an extensive and standardized report suitable for the discovery phase of biomarker studies. The prepared HTML report provides a solid basis for further research and communicates the most important results in a structured and accessible way. Especially parameters relevant to quality control of the whole sequencing experiment (from library preparation to the in-silico analysis) are reported in standardized formats, to allow for a reliable and quick analysis of the overall quality of the experiment.

Besides the results, the generated HTML report includes descriptions, hints, and details about the methods used. This ensures that the results can be interpreted and understood easily by non-statisticians or bioinformaticians. In addition, it ensures that the final HTML report contains all information needed for reproducibility and documentation of the analysis.

Data input and experimental setup of the miND pipeline can be adjusted with the given meta data file, making it possible to use the miND pipeline for various species, sample matrices and library preparation protocols.

With the availability of the source code of the pipeline under the GNU General Public License, additional analysis steps can be integrated into the R markdown report if needed, allowing the pipeline to be tailored to other specialized applications.

While miND provides an extensive set of analyses for early-phase biomarker discovery, no standardized pipeline can cover every study’s specific requirements. The results generated are meant to be a starting point for further analysis and optimizations, as parameters. For example, differential expression or heatmaps are chosen to give good results in most use cases but might not be the optimal for an individual project.

Comparison with existing tools

Several tools for small RNA sequencing analysis have been published in recent years, each with different strengths and design goals ( Table 1). Some focus on specific steps of the analysis, such as miRDeep2, which provides miRNA quantification and novel miRNA prediction through a probabilistic hairpin model. miND builds directly on miRDeep2 for these tasks and extends its output into a complete analysis workflow that includes differential expression analysis and a structured interactive report.

Table 1. Feature comparison of small RNA-seq analysis tools.

miRDeep2 is included as the foundational miRNA quantification tool integrated within the miND pipeline.

Feature	miND	miRDeep2	miRge 3.0	Oasis 2.0	sRNAbench	Prost!	sRNAPipe	sRNAWorkbench
Reference	Diendorfer et al., 2022	Friedländer et al., 2012	Patil & Halushka, 2021	Rahman et al., 2018	Aparicio-Puerta et al., 2022	Desvignes et al., 2019	Pogorelcnik et al., 2018	Stocks et al., 2018
Runtime environment	Snakemake, conda (Linux CLI)	Perl (Linux CLI)	Python CLI; Electron GUI available	Web platform	Web server; standalone JAR; Docker	Python 2.7 CLI (+ BBMap/Java)	Galaxy	Java desktop application (GUI + CLI)
Raw data processing	QC, adapter trimming, size and quality filtering (FASTQ, FASTQ.gz, BAM)	Adapter clipping, read collapsing	Adapter trimming via Cutadapt (FASTQ)	QC, adapter trimming (FASTQ)	QC, adapter trimming (FASTQ, SRA, Google Drive, Dropbox)	No; expects pre-trimmed FASTA input	No; expects pre-trimmed FASTQ input	Adapter removal, filtering (built-in)
Sample metadata input	Excel sheet with sample groups and contrast selection for DE analysis	Config file (sample ID mapping)	CSV file with group assignment	Web form for groups and covariates	Web form; Excel/text annotation for DE grouping (sRNAde)	Text file mapping filenames to sample names	Via Galaxy interface	Via GUI workflow
miRNA quantification	miRDeep2 (bowtie1)	bowtie1 with hairpin-aware assignment	bowtie1	bowtie (via miRDeep2)	bowtie1	BBMap against user-defined annotation	BWA	PatMaN/bowtie (via miRProf)
Other ncRNA quantification	RNAcentral mapping (tRNA, piRNA, rRNA, snRNA, snoRNA, lncRNA, yRNA)	No	tRNA fragments, snoRNA, rRNA, mRNA	snRNA, snoRNA, rRNA, piRNA	tRNA, snoRNA, snRNA, rRNA, yRNA; custom annotation upload	Depends on user-provided annotation	tRNA, rRNA, snRNA, gene transcripts, transposable elements; piRNA/siRNA by size class	miRNA, siRNA, ta-siRNA
Novel miRNA prediction	Yes (via adaptions and integrated miRDeep2)	Yes (core feature; Bayesian scoring model)	Yes (SVM-based)	Yes (via miRDeep2)	Yes (random forest classifier)	No (genome-first approach facilitates manual discovery)	No	Yes (miRCat2)
Differential expression	edgeR with adapted independent filtering	No	DESeq2 (optional, built-in)	DESeq2	DESeq, DESeq2, edgeR, NOISeq (via sRNAde module)	No	No	Yes (custom LOFC method with multiple normalizations)
Summary report	Comprehensive interactive HTML report with interpretation guidance and data export	HTML overview with PDF hairpin structure plots	Interactive HTML report with charts and tables	Web dashboard; downloadable interactive HTML reports	Individual result files; interactive web summaries	Excel workbook (7 sheets)	HTML report with plots, count tables, BAM/bedgraph files	Interactive visualizations via GUI
Open source, self-hosted	Yes (GPL-3.0)	Yes (GPL-3.0)	Yes (MIT)	No (web service only)	Yes (MIT)	Academic license (non-commercial use)	Yes (AFL-3.0)	Yes (MIT)

Other tools offer broader analysis capabilities through different interfaces. Oasis 2.0 (Rahman et al., 2018) provides an accessible web-based platform that requires no local installation, making it straightforward to use for researchers without command-line experience. sRNAbench (Aparicio-Puerta et al., 2022) is available both as a web service and a standalone application and offers one of the most comprehensive sets of analysis modules, including multiple methods for differential expression via its sRNAde module. The UEA sRNA Workbench (Stocks et al., 2018) provides a Java desktop application with tools for miRNA discovery and differential expression, originally developed for plant small RNA research but now supporting both plant and animal datasets. sRNAPipe (Pogorelcnik et al., 2018) integrates small RNA analysis into the Galaxy platform and includes specialized support for piRNA analysis including ping-pong signature detection. Prost! (Desvignes et al., 2019) focuses on accurate read quantification against user-defined annotations and is well suited for organisms with limited existing small RNA annotation. miRge 3.0 (Patil & Halushka, 2021) provides fast miRNA profiling with support for tRNA fragment analysis and UMI-based deduplication.

miND is specifically designed for miRNA biomarker discovery studies. Its distinguishing feature is the combination of structured metadata input through an Excel-based contrast sheet, an integrated Snakemake workflow that handles all processing steps from raw FASTQ files to differential expression results, and a comprehensive interactive HTML report. The contrast sheet allows biologists to define sample groups and statistical comparisons in a familiar spreadsheet format, while the HTML report presents quality control metrics, RNA class distributions, unsupervised clustering, and differential expression results in a single document with interactive elements and export functions for publication-ready figures. This design supports collaboration between wet-lab scientists and bioinformaticians: while biologists prepare the metadata and interpret the report, bioinformaticians handle pipeline execution.

The miND pipeline was developed as part of the Translational Safety Biomarker Pipeline (TransBioLine) project from the IMI2 consortium. This project focuses on the discovery of miRNAs as novel biomarkers in the context of drug safety. In this case, the miND pipeline provides a standardized but still extensive first analysis of NGS data. In addition, the miND pipeline includes an extra module for the implementation of miND spike-ins for absolute quantification of microRNAs as recently published by Khamina et al. (2022).

In another recently published article by Gutmann et al. (2021) the pipeline was used in the discovery phase of the study to identify miRNAs that are associated with COVID-19 severity and mortality. The miRNAs reported by the miND pipeline were later manually selected and evaluated based on the HTML report for further confirmation with RT-qPCR, where the confirmation showed a high level of reproducibility from the NGS data.

We will continue working on the pipeline and release updates to the public version if needed. Especially in regard to the miND spike-ins that allow for the absolute quantification of miRNA in biofluids we expect to release an updated version soon.

Limitations

The execution of the miND pipeline requires bioinformatics expertise, including familiarity with the Linux command line. No graphical user interface or web-based access is provided. However, the contrast sheet and interactive HTML report are designed to be accessible without bioinformatics training, so that the preparation of experimental metadata and the interpretation of results can be handled by biologists directly.

The differential expression analysis in miND is focused on miRNAs. Other small RNA species (including tRNAs, piRNAs, rRNAs, snRNAs, snoRNAs, and lncRNAs) are quantified through RNAcentral mapping and their distributions are visualized in the report, but they are not included in the differential expression analysis. It has to be noted, that the mapping against other RNA species is not as specific as the miRNA mapping, as miRNAs are mapped early on in the pipeline to allow for a targeted mapping, while RNAcentral mapping is done in one step which can lead to inaccuracies for reads mapping to multiple targets and reference databases inside RNAcentral. This reflects the pipeline’s focus on miRNA biomarker discovery. The quantification framework could be adapted to support differential expression of other small RNA species in the future.

miRNA annotation relies on miRBase as the reference database. Results are therefore dependent on the completeness and accuracy of miRBase entries for the organism under study. Finally, the hardware requirements for running miND are modest: small RNA sequencing datasets are typically compact, and analysis of dozens of samples is feasible on standard desktop hardware with four or more CPU cores and 8 GB of memory.

Data availability

Source data

Mature and hairpin sequences of miRBase are available at: https://www.mirbase.org/ftp/22.1

Genome sequences (DNA and cDNA) is available at Ensembl (for human): http://ftp.ensembl.org/pub/release-105/fasta/homo_sapiens

Non-coding RNA sequences are available at RNAcentral: http://ftp.ebi.ac.uk/pub/databases/RNAcentral/current_release

Data associated with the example use case are not owned by the authors. Requirements to access these datasets is given in the protocol (https://dx.doi.org/10.17504/protocols.io.b3f6qjre).

Software availability

Source code available from: https://github.com/tamirna/miND

Archived source code available from: https://doi.org/10.5281/zenodo.6080470 (Diendorfer et al., 2022)

License: GNU GPL 3.0

References

Anaconda Software Distribution: Anaconda Documentation. Anaconda Inc.; 2020. Reference Source
Andrews S: FastQC: A quality control tool for high throughput sequence data. 2010Reference Source
Aparicio-Puerta E, Gómez-Martín C, Giannoukakos S, et al.:sRNAbench and sRNAtoolbox 2022 update: accurate miRNA and sncRNA profiling for model and non-model organisms. Nucleic Acids Res. 2022; 50(W1): W710–W717. PubMed Abstract | Publisher Full Text | Free Full Text
Chen Y, Lun ATL, Smyth GK: From reads to genes to pathways: Differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline [version 2; referees: 5 approved]. F1000Res. 2016; 5: 1–49. Publisher Full Text
Desvignes T, Batzel P, Sydes J, et al.:miRNA analysis with Prost! reveals evolutionary conservation of organ-enriched expression and post-transcriptional modifications in three-spined stickleback and zebrafish. Sci. Rep. 2019; 9: 3913. PubMed Abstract | Publisher Full Text | Free Full Text
Diendorfer A, Khamina K, Pultar M, et al.: miND (miRNA NGS Discovery pipeline): a small RNA-seq analysis pipeline and report generator for microRNA biomarker discovery studies (v1.2RC2). Zenodo. 2022. Publisher Full Text
Ewels P, Magnusson M, Lundin S, et al.: MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016; 32(19): 3047–3048. PubMed Abstract | Publisher Full Text
Friedländer MR, MacKowiak SD, Li N, et al.: MiRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades. Nucleic Acids Res. 2012; 40(1): 37–52. PubMed Abstract | Publisher Full Text
Grüning B, Dale R, Sjödin A, et al.: Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat. Methods. 2018; 15(7): 475–476. PubMed Abstract | Publisher Full Text
Griffiths-Jones S: The microRNA registry. Nucleic Acids Res. 2004; 32(Database issue): 109D–1111D. PubMed Abstract | Publisher Full Text
Gutmann C, Khamina K, Theofilatos K, et al.: Association of cardiometabolic microRNAs with COVID-19 severity and mortality. Cardiovasc. Res. 2021; 118: 461–474. PubMed Abstract | Publisher Full Text
Köster J, Rahmann S: Snakemake-a scalable bioinformatics workflow engine. Bioinformatics. 2012; 28(19): 2520–2522. Publisher Full Text
Khamina K, Diendorfer AB, Skalicky S, et al.: A MicroRNA Next-Generation-Sequencing Discovery Assay (miND) for Genome-Scale Analysis and Absolute Quantitation of Circulating MicroRNA Biomarkers. Int. J. Mol. Sci. 2022; 23(3): 1226. PubMed Abstract | Publisher Full Text
Langmead B, Trapnell C, Pop M, et al.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009; 10(3): R25. PubMed Abstract | Publisher Full Text
Larrea E, Sole C, Manterola L, et al.: New Concepts in Cancer Biomarkers: Circulating miRNAs in Liquid Biopsies. Int. J. Mol. Sci. 2016; 17(5): 627. PubMed Abstract | Publisher Full Text
Lee RC, Ambros V: An Extensive Class of Small RNAs in Caenorhabditis elegans. Science. 2001; 294(5543): 862–864. PubMed Abstract | Publisher Full Text
Llewellyn HP, Vaidya VS, Wang Z, et al.: Evaluating the Sensitivity and Specificity of Promising Circulating Biomarkers to Diagnose Liver Injury in Humans. Toxicol. Sci. 2021; 181(1): 23–34. PubMed Abstract | Publisher Full Text
Love MI, Huber W, Anders S: Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014; 15(12): 521–550. PubMed Abstract | Publisher Full Text
Mölder F, Jablonski KP, Letcher B, et al.: Sustainable data analysis with Snakemake. F1000Res. 2021; 10: 33. Publisher Full Text
Murillo OD, Thistlethwaite W, Rozowsky J, et al.: exRNA Atlas Analysis Reveals Distinct Extracellular RNA Cargo Types and Their Carriers Present across Human Biofluids. Cell. 2019; 177(2): 463–477.e15. PubMed Abstract | Publisher Full Text
O’Brien J, Hayder H, Zayed Y, et al.: Overview of microRNA biogenesis, mechanisms of actions, and circulation. Front. Endocrinol. 2018; 9(AUG): 1–12. PubMed Abstract | Publisher Full Text
Patil AH, Halushka MK:miRge3.0: a comprehensive microRNA and tRF sequencing analysis pipeline. NAR Genom. Bioinform. 2021; 3(3): lqab068. Publisher Full Text
Pogorelcnik R, Vaury C, Pouchin P, et al.:sRNAPipe: a Galaxy-based pipeline for bioinformatic in-depth exploration of small RNAseq data. Mob. DNA. 2018; 9: 25. PubMed Abstract | Publisher Full Text | Free Full Text
Rahman RU, Gautam A, Bethune J, et al.: Oasis 2: improved online analysis of small RNA-seq data. BMC Bioinformatics. 2018; 19: 54. Publisher Full Text
Robinson MD, McCarthy DJ, Smyth GK: edgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2009; 26(1): 139–140. PubMed Abstract | Publisher Full Text
de Ronde MWJ , Ruijter JM, Moerland PD, et al.: Study Design and qPCR Data Analysis Guidelines for Reliable Circulating miRNA Biomarker Experiments: A Review. Clin. Chem. 2018; 64(9): 1308–1318. PubMed Abstract | Publisher Full Text
Ronen R, Gan I, Modai S, et al.: miRNAkey: a software for microRNA deep sequencing analysis. Bioinformatics. 2010; 26(20): 2615–2616. PubMed Abstract | Publisher Full Text
Starlinger P, Hackl H, Pereyra D, et al.: Predicting Postoperative Liver Dysfunction Based on Blood-Derived MicroRNA Signatures. Hepatology. 2019; 69(6): 2636–2651. PubMed Abstract | Publisher Full Text
Stocks MB, Mohorianu I, Beckers M, et al.:The UEA sRNA Workbench (version 4.4): a comprehensive suite of tools for analyzing miRNAs and sRNAs. Bioinformatics. 2018; 34(19): 3382–3384. PubMed Abstract | Publisher Full Text | Free Full Text
Sweeney BA, Petrov AI, Burkov B, et al.: RNAcentral: A hub of information for non-coding RNA sequences. Nucleic Acids Res. 2019; 47(D1): D221–D229. PubMed Abstract | Publisher Full Text
Wang W-C, Lin F-M, Chang W-C, et al.: miRExpress: Analyzing high-throughput sequencing data for profiling microRNA expression. BMC Bioinformatics. 2009; 10(1): 328. PubMed Abstract | Publisher Full Text
Zerbino DR, Achuthan P, Akanni W, et al.: Ensembl 2018. Nucleic Acids Res. 2018; 46(D1): D754–D761. PubMed Abstract | Publisher Full Text

Comments on this article Comments (0)

Version 2

VERSION 2 PUBLISHED 24 Feb 2022

Author details Author details

¹ TAmiRNA GmbH, Vienna, Austria

Kseniya Khamina
Roles: Conceptualization, Investigation, Writing – Review & Editing

Marianne Pultar
Roles: Methodology, Software, Visualization, Writing – Review & Editing

Matthias Hackl
Roles: Conceptualization, Data Curation, Funding Acquisition, Methodology, Project Administration, Resources, Supervision, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

The TransBioLine project has received funding from the Innovative Medicines Initiative 2 Joint Undertaking under grant agreement No 821283. This Joint Undertaking receives support from the European Union’s Horizon 2020 research and innovation programme and EFPIA. This communication reflects the authors’ view and neither IMI nor the European Union or EFPIA are responsible for any use that may be made of the information contained therein.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Article Versions (2)

version 2

Revised

Published: 09 May 2026, 11:233

https://doi.org/10.12688/f1000research.94159.2

version 1

Published: 24 Feb 2022, 11:233

https://doi.org/10.12688/f1000research.94159.1

© 2026 Diendorfer A et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

SEE MORE DETAILS

CITE

how to cite this article

Diendorfer A, Khamina K, Pultar M and Hackl M. miND (miRNA NGS Discovery pipeline): a small RNA-seq analysis pipeline and report generator for microRNA biomarker discovery studies [version 2; peer review: 3 approved]. F1000Research 2026, 11:233 (https://doi.org/10.12688/f1000research.94159.2)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?

Key to Reviewer Statuses VIEW HIDE

ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions

Version 2

VERSION 2

PUBLISHED 09 May 2026

Revised

Views

Reviewer Report 14 May 2026

Iddo Magen, Weizmann Institute of Science, Rehovot, Israel

Approved

https://doi.org/10.5256/f1000research.198035.r483661

The authors explained clearly the pipeline and its features, with proper visualizations and diagrams, and included reference databases and comparison to previous pipelines. They addressed very well the reviewers' concerns. I do not have any specific comments for this manuscript ... Continue reading

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: biomarkers in neurodegeneration.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

CITE

Report a concern

Respond or Comment

Views

Reviewer Report 13 May 2026

Ailsa Maria Main, Department of Growth and Reproduction, Copenhagen University Hospital - Rigshospitalet, Copenhagen, Denmark

Nina Mørup, Department of Growth and Reproduction, Copenhagen University Hospital - Rigshospitalet, Copenhagen, Denmark

Kristian Almstrup, Department of Growth and Reproduction, Copenhagen University Hospital - Rigshospitalet, Copenhagen, Denmark

Approved

https://doi.org/10.5256/f1000research.198035.r483190

The revision has clearly improved the paper, and the comparison with ... Continue reading

CITE

Report a concern

Respond or Comment

Views

Reviewer Report 13 May 2026

Francisco J. Enguita, Faculdade de Medicina, Universidade de Lisboa, Lisboa, Portugal

Approved

https://doi.org/10.5256/f1000research.198035.r483191

Dear Editor

I have checked the answers and modifications introduced in the manuscript by the ... Continue reading

CITE

Report a concern

Respond or Comment

Version 1

VERSION 1

PUBLISHED 24 Feb 2022

Views

Reviewer Report 05 Oct 2022

Francisco J. Enguita, Faculdade de Medicina, Universidade de Lisboa, Lisboa, Portugal

Approved with Reservations

https://doi.org/10.5256/f1000research.101122.r151316

The manuscript by Diendorfer and coworkers describes a pipeline for NGS data processing specially devoted to the analysis of small non-coding RNAs, mainly focused on miRNAs.

The manuscript is well written, but the authors would need to give further details in order to compare their pipeline with the already existing ones. It is not clear for the reader what are the main advantages of miND pipeline in comparison with the already available ones.

I would advise to perform a small benchmarking study using a test dataset that could be any one existing in public databases. The authors would need to answer the questions: what are the advantages of miND, its weaknesses and why the user should give a try to this new pipeline.

Is the rationale for developing the new software tool clearly explained?

Partly
Is the description of the software tool technically sound?

Partly
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Partly
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

No

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: non-coding RNAs; miRNAs; lncRNAs; circRNAs; structural biology

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Author Response 09 May 2026

Andreas B. Diendorfer, TAmiRNA GmbH, Vienna, Austria

09 May 2026

Author Response

We thank both reviewers for their constructive comments and suggestions. We have revised the manuscript to address all points raised. Below we provide a point-by-point response. All changes are marked ... Continue reading We thank both reviewers for their constructive comments and suggestions. We have revised the manuscript to address all points raised. Below we provide a point-by-point response. All changes are marked in the revised manuscript.

Comment 1: Comparison with existing pipelines

"The authors would need to give further details in order to compare their pipeline with the already existing ones. It is not clear for the reader what are the main advantages of miND pipeline in comparison with the already available ones."

Response: We have added a feature comparison table that systematically compares miND against seven other small RNA analysis tools. We have also expanded the Discussion to position miND in the context of these tools and clearly articulate its design goals and distinguishing features: the combination of an Excel-based contrast sheet for metadata input, an integrated Snakemake workflow building on miRDeep2, and a comprehensive interactive HTML report tailored for biomarker discovery studies.

Comment 2: Benchmarking study

"I would advise to perform a small benchmarking study using a test dataset that could be any one existing in public databases."

Response: Rather than a data-driven benchmark, we have provided a feature-level comparison that we believe is more informative for readers selecting a tool. The choice of a small RNA analysis pipeline typically depends on the required analysis scope, interface preferences, and reporting needs. The manuscript already demonstrates miND's output using a public dataset (PRJEB27261/E-MTAB-6885), and the generated example report is available on GitHub.

Comment 3: Advantages, weaknesses, and rationale

"The authors would need to answer the questions: what are the advantages of miND, its weaknesses and why the user should give a try to this new pipeline."

Response: We have addressed this through three changes: (1) the expanded Introduction now provides a broader context of existing tools, (2) the new "Comparison with existing tools" paragraphs in the Discussion articulate miND's specific strengths, and (3) the new Limitations paragraph honestly discusses its constraints. miND's primary value lies in its end-to-end integration from raw data to an interactive report, the structured contrast sheet that facilitates collaboration between biologists and bioinformaticians, and its focus on the specific needs of miRNA biomarker discovery studies.
We thank both reviewers for their constructive comments and suggestions. We have revised the manuscript to address all points raised. Below we provide a point-by-point response. All changes are marked in the revised manuscript.

Comment 1: Comparison with existing pipelines

"The authors would need to give further details in order to compare their pipeline with the already existing ones. It is not clear for the reader what are the main advantages of miND pipeline in comparison with the already available ones."

Response: We have added a feature comparison table that systematically compares miND against seven other small RNA analysis tools. We have also expanded the Discussion to position miND in the context of these tools and clearly articulate its design goals and distinguishing features: the combination of an Excel-based contrast sheet for metadata input, an integrated Snakemake workflow building on miRDeep2, and a comprehensive interactive HTML report tailored for biomarker discovery studies.

Comment 2: Benchmarking study

"I would advise to perform a small benchmarking study using a test dataset that could be any one existing in public databases."

Response: Rather than a data-driven benchmark, we have provided a feature-level comparison that we believe is more informative for readers selecting a tool. The choice of a small RNA analysis pipeline typically depends on the required analysis scope, interface preferences, and reporting needs. The manuscript already demonstrates miND's output using a public dataset (PRJEB27261/E-MTAB-6885), and the generated example report is available on GitHub.

Comment 3: Advantages, weaknesses, and rationale

"The authors would need to answer the questions: what are the advantages of miND, its weaknesses and why the user should give a try to this new pipeline."

Response: We have addressed this through three changes: (1) the expanded Introduction now provides a broader context of existing tools, (2) the new "Comparison with existing tools" paragraphs in the Discussion articulate miND's specific strengths, and (3) the new Limitations paragraph honestly discusses its constraints. miND's primary value lies in its end-to-end integration from raw data to an interactive report, the structured contrast sheet that facilitates collaboration between biologists and bioinformaticians, and its focus on the specific needs of miRNA biomarker discovery studies.
Competing Interests: none Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 09 May 2026

Andreas B. Diendorfer, TAmiRNA GmbH, Vienna, Austria

09 May 2026

Author Response

We thank both reviewers for their constructive comments and suggestions. We have revised the manuscript to address all points raised. Below we provide a point-by-point response. All changes are marked ... Continue reading We thank both reviewers for their constructive comments and suggestions. We have revised the manuscript to address all points raised. Below we provide a point-by-point response. All changes are marked in the revised manuscript.

Comment 1: Comparison with existing pipelines

"The authors would need to give further details in order to compare their pipeline with the already existing ones. It is not clear for the reader what are the main advantages of miND pipeline in comparison with the already available ones."

Response: We have added a feature comparison table that systematically compares miND against seven other small RNA analysis tools. We have also expanded the Discussion to position miND in the context of these tools and clearly articulate its design goals and distinguishing features: the combination of an Excel-based contrast sheet for metadata input, an integrated Snakemake workflow building on miRDeep2, and a comprehensive interactive HTML report tailored for biomarker discovery studies.

Comment 2: Benchmarking study

"I would advise to perform a small benchmarking study using a test dataset that could be any one existing in public databases."

Response: Rather than a data-driven benchmark, we have provided a feature-level comparison that we believe is more informative for readers selecting a tool. The choice of a small RNA analysis pipeline typically depends on the required analysis scope, interface preferences, and reporting needs. The manuscript already demonstrates miND's output using a public dataset (PRJEB27261/E-MTAB-6885), and the generated example report is available on GitHub.

Comment 3: Advantages, weaknesses, and rationale

"The authors would need to answer the questions: what are the advantages of miND, its weaknesses and why the user should give a try to this new pipeline."

Response: We have addressed this through three changes: (1) the expanded Introduction now provides a broader context of existing tools, (2) the new "Comparison with existing tools" paragraphs in the Discussion articulate miND's specific strengths, and (3) the new Limitations paragraph honestly discusses its constraints. miND's primary value lies in its end-to-end integration from raw data to an interactive report, the structured contrast sheet that facilitates collaboration between biologists and bioinformaticians, and its focus on the specific needs of miRNA biomarker discovery studies.
We thank both reviewers for their constructive comments and suggestions. We have revised the manuscript to address all points raised. Below we provide a point-by-point response. All changes are marked in the revised manuscript.

Comment 1: Comparison with existing pipelines

"The authors would need to give further details in order to compare their pipeline with the already existing ones. It is not clear for the reader what are the main advantages of miND pipeline in comparison with the already available ones."

Response: We have added a feature comparison table that systematically compares miND against seven other small RNA analysis tools. We have also expanded the Discussion to position miND in the context of these tools and clearly articulate its design goals and distinguishing features: the combination of an Excel-based contrast sheet for metadata input, an integrated Snakemake workflow building on miRDeep2, and a comprehensive interactive HTML report tailored for biomarker discovery studies.

Comment 2: Benchmarking study

"I would advise to perform a small benchmarking study using a test dataset that could be any one existing in public databases."

Response: Rather than a data-driven benchmark, we have provided a feature-level comparison that we believe is more informative for readers selecting a tool. The choice of a small RNA analysis pipeline typically depends on the required analysis scope, interface preferences, and reporting needs. The manuscript already demonstrates miND's output using a public dataset (PRJEB27261/E-MTAB-6885), and the generated example report is available on GitHub.

Comment 3: Advantages, weaknesses, and rationale

"The authors would need to answer the questions: what are the advantages of miND, its weaknesses and why the user should give a try to this new pipeline."

Response: We have addressed this through three changes: (1) the expanded Introduction now provides a broader context of existing tools, (2) the new "Comparison with existing tools" paragraphs in the Discussion articulate miND's specific strengths, and (3) the new Limitations paragraph honestly discusses its constraints. miND's primary value lies in its end-to-end integration from raw data to an interactive report, the structured contrast sheet that facilitates collaboration between biologists and bioinformaticians, and its focus on the specific needs of miRNA biomarker discovery studies.
Competing Interests: none Close
Report a concern

Views

Reviewer Report 14 Apr 2022

Kristian Almstrup, Department of Growth and Reproduction, Copenhagen University Hospital - Rigshospitalet, Copenhagen, Denmark

Nina Mørup, Department of Growth and Reproduction, Copenhagen University Hospital - Rigshospitalet, Copenhagen, Denmark

Ailsa Maria Main, Department of Growth and Reproduction, Copenhagen University Hospital - Rigshospitalet, Copenhagen, Denmark

Approved with Reservations

https://doi.org/10.5256/f1000research.101122.r129126

In the manuscript by Diendorfer et al., a bioinformatic pipeline for analysis of data from small RNA sequencing is presented. The pipeline, named miND, allows identification and annotation of small RNA reads as well as differential expression analysis.

I have the following major concerns about the study as it is:

Several other small RNA sequencing pipelines, like Oasis2.0 (https://oasis.dzne.de/index.php), sRNAWorkbench (https://sourceforge.net/projects/srnaworkbench/), sRNAPipe (https://github.com/GReD-Clermont/sRNAPipe), miRge3.0 (https://sourceforge.net/projects/mirge3/) already exist and some support both identification and differential expression analysis. It is hence unclear what novelty miND brings compared to other pipelines. To allow the reader to make an informed choice about which pipeline to choose for analysis, miND should be benchmarked against some of the already existing pipelines. What are the differences when the same dataset (PRJEB27261/E-MTAB-6885) is analysed with e.g. Oasis2.0 (Rahman et al., 2018)?

The authors argue that miND “bridges the gap between biologists and bioinformaticians”, and this is also evident from the easy-to-use Excel files. However, the pipeline is based on Snakemake workflows and a conda install and hence require a priori knowledge of conda, which would not be common knowledge to biologists. I encourage the authors to make miND available as a standalone app or web portal (as is the case for the similar pipeline Oasis2.0). Since, at least, parts of the miND pipeline are based on R-scripts it might be easy to make a Shiny app or similar.

The pipeline focuses on miRNAs. This reviewer encourages the authors also to include analysis of other small RNA species as these are likely to be equally important as biomarkers in a liquid biopsy. Furthermore, on a whole, the authors do not discuss in detail the possible problems and downsides of their pipeline. A more in-depth and critical discussion of strengths and limitations is warranted.

The readme file on github should contain instructions on how to install miND in a language that biologists can understand.

Finally, in some places the authors should consider shortening sentences/simplifying statements so, again, it is easier for the readers to follow.

Is the rationale for developing the new software tool clearly explained?

Partly
Is the description of the software tool technically sound?

Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Partly

References

1. Rahman R, Gautam A, Bethune J, Sattar A, et al.: Oasis 2: improved online analysis of small RNA-seq data. BMC Bioinformatics. 2018; 19 (1). Publisher Full Text

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Endocrinology, small RNAs, genetics

We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above.

CITE

Report a concern

Author Response 09 May 2026

Andreas B. Diendorfer, TAmiRNA GmbH, Vienna, Austria

09 May 2026

Author Response

We thank both reviewers for their constructive comments and suggestions. We have revised the manuscript to address all points raised. Below we provide a point-by-point response. All changes are marked ... Continue reading We thank both reviewers for their constructive comments and suggestions. We have revised the manuscript to address all points raised. Below we provide a point-by-point response. All changes are marked in the revised manuscript.

Comment 1: Benchmarking against existing pipelines

"Several other small RNA sequencing pipelines, like Oasis2.0, sRNAWorkbench, sRNAPipe, miRge3.0 already exist and some support both identification and differential expression analysis. It is hence unclear what novelty miND brings compared to other pipelines. To allow the reader to make an informed choice about which pipeline to choose for analysis, miND should be benchmarked against some of the already existing pipelines."

Response: We agree that a systematic comparison with existing tools is valuable and have added a feature comparison table to the manuscript. The table covers miND, miRDeep2, miRge 3.0, Oasis 2.0, sRNAbench, Prost!, sRNAPipe, and sRNAWorkbench across dimensions including runtime environment, raw data processing, metadata input, miRNA and other ncRNA quantification, novel miRNA prediction, differential expression analysis, summary reporting, and licensing. We note that miRDeep2 is included as the foundational quantification tool that miND builds upon, rather than as a competing pipeline.

Rather than a data-driven benchmark on a single dataset, which would primarily reflect differences in read mapping and counting strategies, we opted for a feature-level comparison. We believe this is more useful for readers choosing a tool, as the choice typically depends on the required analysis scope, interface, and reporting capabilities rather than marginal differences in miRNA detection rates.

We have also expanded the Discussion to contextualize miND's positioning among these tools (see revised Discussion section "Comparison with existing tools").

Comment 2: Web portal / standalone app / Shiny app

"The pipeline is based on Snakemake workflows and a conda install and hence require a priori knowledge of conda, which would not be common knowledge to biologists. I encourage the authors to make miND available as a standalone app or web portal."

Response: We appreciate this suggestion. miND is intentionally designed as a self-hosted, open-source pipeline. This ensures that users and institutions are not dependent on third-party infrastructure, which can become unavailable when hosting organizations discontinue support. The open-source license (GPL-3.0) enables any institution to deploy the pipeline freely.

We would like to clarify that miND does not aim to be directly operated by biologists without bioinformatics support. The contrast sheet and interactive HTML report are designed to serve as the interface between wet-lab scientists and bioinformaticians: biologists prepare the experimental metadata in a familiar Excel format and interpret the report, while bioinformaticians handle pipeline installation and execution. We have revised the text to make this distinction clearer.

To lower the barrier for bioinformaticians setting up the pipeline, we have rewritten the README on GitHub with step-by-step installation instructions and usage guidance. A detailed setup protocol is also available on protocols.io (doi: 10.17504/protocols.io.b3f6qjre).

Comment 3: Other small RNA species

"The pipeline focuses on miRNAs. This reviewer encourages the authors also to include analysis of other small RNA species as these are likely to be equally important as biomarkers in a liquid biopsy."

Response: miND does quantify other small RNA species. Through RNAcentral mapping, the pipeline classifies reads into multiple RNA categories including tRNAs, piRNAs, rRNAs, snRNAs, snoRNAs, and lncRNAs. These distributions are visualized in the reads classification plots of the HTML report. However, the differential expression analysis is focused on miRNAs, reflecting the pipeline's primary use case in miRNA biomarker discovery. The quantification framework could be adapted to support DE analysis of other small RNA species in the future.

We have clarified this in the revised manuscript (see reads classification section and the new Limitations paragraph in the Discussion).

Comment 4: Discussion of problems and downsides

"The authors do not discuss in detail the possible problems and downsides of their pipeline. A more in-depth and critical discussion of strengths and limitations is warranted."

Response: We have added a dedicated Limitations paragraph to the Discussion section. It addresses the requirement for bioinformatics expertise to run the pipeline, the miRNA-centric scope of the DE analysis, the dependency on miRBase for miRNA annotation, and the hardware requirements.

Comment 5: README on GitHub

"The readme file on github should contain instructions on how to install miND in a language that biologists can understand."

Response: We have rewritten the README on GitHub (https://github.com/TAmiRNA/miND) with a description of the pipeline, its features, hardware requirements, installation steps, usage instructions, and output descriptions. The README also references the detailed protocols.io protocol for step-by-step setup guidance.

Comment 6: Sentence length and readability

"In some places the authors should consider shortening sentences/simplifying statements so, again, it is easier for the readers to follow."

Response: We have revised the manuscript for clarity throughout, shortening long sentences and simplifying phrasing where possible. Specific changes include the Introduction (paragraph on existing tools), the independent filtering method description, and parts of the Conclusions.
We thank both reviewers for their constructive comments and suggestions. We have revised the manuscript to address all points raised. Below we provide a point-by-point response. All changes are marked in the revised manuscript.

Comment 1: Benchmarking against existing pipelines

"Several other small RNA sequencing pipelines, like Oasis2.0, sRNAWorkbench, sRNAPipe, miRge3.0 already exist and some support both identification and differential expression analysis. It is hence unclear what novelty miND brings compared to other pipelines. To allow the reader to make an informed choice about which pipeline to choose for analysis, miND should be benchmarked against some of the already existing pipelines."

Response: We agree that a systematic comparison with existing tools is valuable and have added a feature comparison table to the manuscript. The table covers miND, miRDeep2, miRge 3.0, Oasis 2.0, sRNAbench, Prost!, sRNAPipe, and sRNAWorkbench across dimensions including runtime environment, raw data processing, metadata input, miRNA and other ncRNA quantification, novel miRNA prediction, differential expression analysis, summary reporting, and licensing. We note that miRDeep2 is included as the foundational quantification tool that miND builds upon, rather than as a competing pipeline.

Rather than a data-driven benchmark on a single dataset, which would primarily reflect differences in read mapping and counting strategies, we opted for a feature-level comparison. We believe this is more useful for readers choosing a tool, as the choice typically depends on the required analysis scope, interface, and reporting capabilities rather than marginal differences in miRNA detection rates.

We have also expanded the Discussion to contextualize miND's positioning among these tools (see revised Discussion section "Comparison with existing tools").

Comment 2: Web portal / standalone app / Shiny app

"The pipeline is based on Snakemake workflows and a conda install and hence require a priori knowledge of conda, which would not be common knowledge to biologists. I encourage the authors to make miND available as a standalone app or web portal."

Response: We appreciate this suggestion. miND is intentionally designed as a self-hosted, open-source pipeline. This ensures that users and institutions are not dependent on third-party infrastructure, which can become unavailable when hosting organizations discontinue support. The open-source license (GPL-3.0) enables any institution to deploy the pipeline freely.

We would like to clarify that miND does not aim to be directly operated by biologists without bioinformatics support. The contrast sheet and interactive HTML report are designed to serve as the interface between wet-lab scientists and bioinformaticians: biologists prepare the experimental metadata in a familiar Excel format and interpret the report, while bioinformaticians handle pipeline installation and execution. We have revised the text to make this distinction clearer.

To lower the barrier for bioinformaticians setting up the pipeline, we have rewritten the README on GitHub with step-by-step installation instructions and usage guidance. A detailed setup protocol is also available on protocols.io (doi: 10.17504/protocols.io.b3f6qjre).

Comment 3: Other small RNA species

"The pipeline focuses on miRNAs. This reviewer encourages the authors also to include analysis of other small RNA species as these are likely to be equally important as biomarkers in a liquid biopsy."

Response: miND does quantify other small RNA species. Through RNAcentral mapping, the pipeline classifies reads into multiple RNA categories including tRNAs, piRNAs, rRNAs, snRNAs, snoRNAs, and lncRNAs. These distributions are visualized in the reads classification plots of the HTML report. However, the differential expression analysis is focused on miRNAs, reflecting the pipeline's primary use case in miRNA biomarker discovery. The quantification framework could be adapted to support DE analysis of other small RNA species in the future.

We have clarified this in the revised manuscript (see reads classification section and the new Limitations paragraph in the Discussion).

Comment 4: Discussion of problems and downsides

"The authors do not discuss in detail the possible problems and downsides of their pipeline. A more in-depth and critical discussion of strengths and limitations is warranted."

Response: We have added a dedicated Limitations paragraph to the Discussion section. It addresses the requirement for bioinformatics expertise to run the pipeline, the miRNA-centric scope of the DE analysis, the dependency on miRBase for miRNA annotation, and the hardware requirements.

Comment 5: README on GitHub

"The readme file on github should contain instructions on how to install miND in a language that biologists can understand."

Response: We have rewritten the README on GitHub (https://github.com/TAmiRNA/miND) with a description of the pipeline, its features, hardware requirements, installation steps, usage instructions, and output descriptions. The README also references the detailed protocols.io protocol for step-by-step setup guidance.

Comment 6: Sentence length and readability

"In some places the authors should consider shortening sentences/simplifying statements so, again, it is easier for the readers to follow."

Response: We have revised the manuscript for clarity throughout, shortening long sentences and simplifying phrasing where possible. Specific changes include the Introduction (paragraph on existing tools), the independent filtering method description, and parts of the Conclusions.
Competing Interests: none Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 09 May 2026

Andreas B. Diendorfer, TAmiRNA GmbH, Vienna, Austria

09 May 2026

Author Response

We thank both reviewers for their constructive comments and suggestions. We have revised the manuscript to address all points raised. Below we provide a point-by-point response. All changes are marked ... Continue reading We thank both reviewers for their constructive comments and suggestions. We have revised the manuscript to address all points raised. Below we provide a point-by-point response. All changes are marked in the revised manuscript.

Comment 1: Benchmarking against existing pipelines

"Several other small RNA sequencing pipelines, like Oasis2.0, sRNAWorkbench, sRNAPipe, miRge3.0 already exist and some support both identification and differential expression analysis. It is hence unclear what novelty miND brings compared to other pipelines. To allow the reader to make an informed choice about which pipeline to choose for analysis, miND should be benchmarked against some of the already existing pipelines."

Response: We agree that a systematic comparison with existing tools is valuable and have added a feature comparison table to the manuscript. The table covers miND, miRDeep2, miRge 3.0, Oasis 2.0, sRNAbench, Prost!, sRNAPipe, and sRNAWorkbench across dimensions including runtime environment, raw data processing, metadata input, miRNA and other ncRNA quantification, novel miRNA prediction, differential expression analysis, summary reporting, and licensing. We note that miRDeep2 is included as the foundational quantification tool that miND builds upon, rather than as a competing pipeline.

Rather than a data-driven benchmark on a single dataset, which would primarily reflect differences in read mapping and counting strategies, we opted for a feature-level comparison. We believe this is more useful for readers choosing a tool, as the choice typically depends on the required analysis scope, interface, and reporting capabilities rather than marginal differences in miRNA detection rates.

We have also expanded the Discussion to contextualize miND's positioning among these tools (see revised Discussion section "Comparison with existing tools").

Comment 2: Web portal / standalone app / Shiny app

"The pipeline is based on Snakemake workflows and a conda install and hence require a priori knowledge of conda, which would not be common knowledge to biologists. I encourage the authors to make miND available as a standalone app or web portal."

Response: We appreciate this suggestion. miND is intentionally designed as a self-hosted, open-source pipeline. This ensures that users and institutions are not dependent on third-party infrastructure, which can become unavailable when hosting organizations discontinue support. The open-source license (GPL-3.0) enables any institution to deploy the pipeline freely.

We would like to clarify that miND does not aim to be directly operated by biologists without bioinformatics support. The contrast sheet and interactive HTML report are designed to serve as the interface between wet-lab scientists and bioinformaticians: biologists prepare the experimental metadata in a familiar Excel format and interpret the report, while bioinformaticians handle pipeline installation and execution. We have revised the text to make this distinction clearer.

To lower the barrier for bioinformaticians setting up the pipeline, we have rewritten the README on GitHub with step-by-step installation instructions and usage guidance. A detailed setup protocol is also available on protocols.io (doi: 10.17504/protocols.io.b3f6qjre).

Comment 3: Other small RNA species

"The pipeline focuses on miRNAs. This reviewer encourages the authors also to include analysis of other small RNA species as these are likely to be equally important as biomarkers in a liquid biopsy."

Response: miND does quantify other small RNA species. Through RNAcentral mapping, the pipeline classifies reads into multiple RNA categories including tRNAs, piRNAs, rRNAs, snRNAs, snoRNAs, and lncRNAs. These distributions are visualized in the reads classification plots of the HTML report. However, the differential expression analysis is focused on miRNAs, reflecting the pipeline's primary use case in miRNA biomarker discovery. The quantification framework could be adapted to support DE analysis of other small RNA species in the future.

We have clarified this in the revised manuscript (see reads classification section and the new Limitations paragraph in the Discussion).

Comment 4: Discussion of problems and downsides

"The authors do not discuss in detail the possible problems and downsides of their pipeline. A more in-depth and critical discussion of strengths and limitations is warranted."

Response: We have added a dedicated Limitations paragraph to the Discussion section. It addresses the requirement for bioinformatics expertise to run the pipeline, the miRNA-centric scope of the DE analysis, the dependency on miRBase for miRNA annotation, and the hardware requirements.

Comment 5: README on GitHub

"The readme file on github should contain instructions on how to install miND in a language that biologists can understand."

Response: We have rewritten the README on GitHub (https://github.com/TAmiRNA/miND) with a description of the pipeline, its features, hardware requirements, installation steps, usage instructions, and output descriptions. The README also references the detailed protocols.io protocol for step-by-step setup guidance.

Comment 6: Sentence length and readability

"In some places the authors should consider shortening sentences/simplifying statements so, again, it is easier for the readers to follow."

Response: We have revised the manuscript for clarity throughout, shortening long sentences and simplifying phrasing where possible. Specific changes include the Introduction (paragraph on existing tools), the independent filtering method description, and parts of the Conclusions.
We thank both reviewers for their constructive comments and suggestions. We have revised the manuscript to address all points raised. Below we provide a point-by-point response. All changes are marked in the revised manuscript.

Comment 1: Benchmarking against existing pipelines

"Several other small RNA sequencing pipelines, like Oasis2.0, sRNAWorkbench, sRNAPipe, miRge3.0 already exist and some support both identification and differential expression analysis. It is hence unclear what novelty miND brings compared to other pipelines. To allow the reader to make an informed choice about which pipeline to choose for analysis, miND should be benchmarked against some of the already existing pipelines."

Response: We agree that a systematic comparison with existing tools is valuable and have added a feature comparison table to the manuscript. The table covers miND, miRDeep2, miRge 3.0, Oasis 2.0, sRNAbench, Prost!, sRNAPipe, and sRNAWorkbench across dimensions including runtime environment, raw data processing, metadata input, miRNA and other ncRNA quantification, novel miRNA prediction, differential expression analysis, summary reporting, and licensing. We note that miRDeep2 is included as the foundational quantification tool that miND builds upon, rather than as a competing pipeline.

Rather than a data-driven benchmark on a single dataset, which would primarily reflect differences in read mapping and counting strategies, we opted for a feature-level comparison. We believe this is more useful for readers choosing a tool, as the choice typically depends on the required analysis scope, interface, and reporting capabilities rather than marginal differences in miRNA detection rates.

We have also expanded the Discussion to contextualize miND's positioning among these tools (see revised Discussion section "Comparison with existing tools").

Comment 2: Web portal / standalone app / Shiny app

"The pipeline is based on Snakemake workflows and a conda install and hence require a priori knowledge of conda, which would not be common knowledge to biologists. I encourage the authors to make miND available as a standalone app or web portal."

Response: We appreciate this suggestion. miND is intentionally designed as a self-hosted, open-source pipeline. This ensures that users and institutions are not dependent on third-party infrastructure, which can become unavailable when hosting organizations discontinue support. The open-source license (GPL-3.0) enables any institution to deploy the pipeline freely.

We would like to clarify that miND does not aim to be directly operated by biologists without bioinformatics support. The contrast sheet and interactive HTML report are designed to serve as the interface between wet-lab scientists and bioinformaticians: biologists prepare the experimental metadata in a familiar Excel format and interpret the report, while bioinformaticians handle pipeline installation and execution. We have revised the text to make this distinction clearer.

To lower the barrier for bioinformaticians setting up the pipeline, we have rewritten the README on GitHub with step-by-step installation instructions and usage guidance. A detailed setup protocol is also available on protocols.io (doi: 10.17504/protocols.io.b3f6qjre).

Comment 3: Other small RNA species

"The pipeline focuses on miRNAs. This reviewer encourages the authors also to include analysis of other small RNA species as these are likely to be equally important as biomarkers in a liquid biopsy."

Response: miND does quantify other small RNA species. Through RNAcentral mapping, the pipeline classifies reads into multiple RNA categories including tRNAs, piRNAs, rRNAs, snRNAs, snoRNAs, and lncRNAs. These distributions are visualized in the reads classification plots of the HTML report. However, the differential expression analysis is focused on miRNAs, reflecting the pipeline's primary use case in miRNA biomarker discovery. The quantification framework could be adapted to support DE analysis of other small RNA species in the future.

We have clarified this in the revised manuscript (see reads classification section and the new Limitations paragraph in the Discussion).

Comment 4: Discussion of problems and downsides

"The authors do not discuss in detail the possible problems and downsides of their pipeline. A more in-depth and critical discussion of strengths and limitations is warranted."

Response: We have added a dedicated Limitations paragraph to the Discussion section. It addresses the requirement for bioinformatics expertise to run the pipeline, the miRNA-centric scope of the DE analysis, the dependency on miRBase for miRNA annotation, and the hardware requirements.

Comment 5: README on GitHub

"The readme file on github should contain instructions on how to install miND in a language that biologists can understand."

Response: We have rewritten the README on GitHub (https://github.com/TAmiRNA/miND) with a description of the pipeline, its features, hardware requirements, installation steps, usage instructions, and output descriptions. The README also references the detailed protocols.io protocol for step-by-step setup guidance.

Comment 6: Sentence length and readability

"In some places the authors should consider shortening sentences/simplifying statements so, again, it is easier for the readers to follow."

Response: We have revised the manuscript for clarity throughout, shortening long sentences and simplifying phrasing where possible. Specific changes include the Introduction (paragraph on existing tools), the independent filtering method description, and parts of the Conclusions.
Competing Interests: none Close
Report a concern

Comments on this article Comments (0)

Version 2

VERSION 2 PUBLISHED 24 Feb 2022

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2	3
Version 2 (revision) 09 May 26	read	read	read
Version 1 24 Feb 22	read	read

Ailsa Maria Main, Copenhagen University Hospital - Rigshospitalet, Copenhagen, Denmark

Nina Mørup, Copenhagen University Hospital - Rigshospitalet, Copenhagen, Denmark

Kristian Almstrup, Copenhagen University Hospital - Rigshospitalet, Copenhagen, Denmark
Francisco J. Enguita, Universidade de Lisboa, Lisboa, Portugal
Iddo Magen, Weizmann Institute of Science, Rehovot, Israel

Comments on this article

All Comments(0)

Add a comment

Browse by related subjects

Back to all reports

Reviewer Report

4 Views

14 May 2026 | for Version 2

Iddo Magen, Weizmann Institute of Science, Rehovot, Israel

4 Views Cite this report Responses(0)

Approved

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

biomarkers in neurodegeneration.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

3 Views

13 May 2026 | for Version 2

Ailsa Maria Main, Department of Growth and Reproduction, Copenhagen University Hospital - Rigshospitalet, Copenhagen, Denmark

Nina Mørup, Department of Growth and Reproduction, Copenhagen University Hospital - Rigshospitalet, Copenhagen, Denmark

Kristian Almstrup, Department of Growth and Reproduction, Copenhagen University Hospital - Rigshospitalet, Copenhagen, Denmark

3 Views Cite this report Responses(0)

Approved

The revision has clearly improved the paper, and the comparison with other similar tools makes it of higher value to the readers.

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Endocrinology, small RNAs, genetics

We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

1 Views

13 May 2026 | for Version 2

Francisco J. Enguita, Faculdade de Medicina, Universidade de Lisboa, Lisboa, Portugal

1 Views Cite this report Responses(0)

Approved

Dear Editor

I have checked the answers and modifications introduced in the manuscript by the Authors, and I sincerely think that they have substantially improved the text. I would recommend it for publication.

Sincerely

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

non-coding RNAs; miRNAs; lncRNAs; circRNAs; structural biology

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

36 Views

05 Oct 2022 | for Version 1

Francisco J. Enguita, Faculdade de Medicina, Universidade de Lisboa, Lisboa, Portugal

36 Views Cite this report Responses(1)

Approved With Reservations

Is the rationale for developing the new software tool clearly explained?

Partly
Is the description of the software tool technically sound?

Partly
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Partly
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

No

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

non-coding RNAs; miRNAs; lncRNAs; circRNAs; structural biology

Respond to this report

Responses (1)

Author Response

09 May 2026

Andreas B. Diendorfer, TAmiRNA GmbH, Vienna, Austria

View more View less

Competing Interests

none

Back to all reports

Reviewer Report

48 Views

14 Apr 2022 | for Version 1

Kristian Almstrup, Department of Growth and Reproduction, Copenhagen University Hospital - Rigshospitalet, Copenhagen, Denmark

Nina Mørup, Department of Growth and Reproduction, Copenhagen University Hospital - Rigshospitalet, Copenhagen, Denmark

Ailsa Maria Main, Department of Growth and Reproduction, Copenhagen University Hospital - Rigshospitalet, Copenhagen, Denmark

48 Views Cite this report Responses(1)

Approved With Reservations

Is the rationale for developing the new software tool clearly explained?

Partly
Is the description of the software tool technically sound?

Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Partly

References

1. Rahman R, Gautam A, Bethune J, Sattar A, et al.: Oasis 2: improved online analysis of small RNA-seq data. BMC Bioinformatics. 2018; 19 (1). Publisher Full Text

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Endocrinology, small RNAs, genetics

Respond to this report

Responses (1)

Author Response

09 May 2026

Andreas B. Diendorfer, TAmiRNA GmbH, Vienna, Austria

We thank both reviewers for their constructive comments and suggestions. We have revised the manuscript to address all points raised. Below we provide a point-by-point response. All changes are marked in the revised manuscript.

Comment 1: Benchmarking against existing pipelines

"Several other small RNA sequencing pipelines, like Oasis2.0, sRNAWorkbench, sRNAPipe, miRge3.0 already exist and some support both identification and differential expression analysis. It is hence unclear what novelty miND brings compared to other pipelines. To allow the reader to make an informed choice about which pipeline to choose for analysis, miND should be benchmarked against some of the already existing pipelines."

Response: We agree that a systematic comparison with existing tools is valuable and have added a feature comparison table to the manuscript. The table covers miND, miRDeep2, miRge 3.0, Oasis 2.0, sRNAbench, Prost!, sRNAPipe, and sRNAWorkbench across dimensions including runtime environment, raw data processing, metadata input, miRNA and other ncRNA quantification, novel miRNA prediction, differential expression analysis, summary reporting, and licensing. We note that miRDeep2 is included as the foundational quantification tool that miND builds upon, rather than as a competing pipeline.

Rather than a data-driven benchmark on a single dataset, which would primarily reflect differences in read mapping and counting strategies, we opted for a feature-level comparison. We believe this is more useful for readers choosing a tool, as the choice typically depends on the required analysis scope, interface, and reporting capabilities rather than marginal differences in miRNA detection rates.

We have also expanded the Discussion to contextualize miND's positioning among these tools (see revised Discussion section "Comparison with existing tools").

Comment 2: Web portal / standalone app / Shiny app

"The pipeline is based on Snakemake workflows and a conda install and hence require a priori knowledge of conda, which would not be common knowledge to biologists. I encourage the authors to make miND available as a standalone app or web portal."

Response: We appreciate this suggestion. miND is intentionally designed as a self-hosted, open-source pipeline. This ensures that users and institutions are not dependent on third-party infrastructure, which can become unavailable when hosting organizations discontinue support. The open-source license (GPL-3.0) enables any institution to deploy the pipeline freely.

We would like to clarify that miND does not aim to be directly operated by biologists without bioinformatics support. The contrast sheet and interactive HTML report are designed to serve as the interface between wet-lab scientists and bioinformaticians: biologists prepare the experimental metadata in a familiar Excel format and interpret the report, while bioinformaticians handle pipeline installation and execution. We have revised the text to make this distinction clearer.

To lower the barrier for bioinformaticians setting up the pipeline, we have rewritten the README on GitHub with step-by-step installation instructions and usage guidance. A detailed setup protocol is also available on protocols.io (doi: 10.17504/protocols.io.b3f6qjre).

Comment 3: Other small RNA species

"The pipeline focuses on miRNAs. This reviewer encourages the authors also to include analysis of other small RNA species as these are likely to be equally important as biomarkers in a liquid biopsy."

Response: miND does quantify other small RNA species. Through RNAcentral mapping, the pipeline classifies reads into multiple RNA categories including tRNAs, piRNAs, rRNAs, snRNAs, snoRNAs, and lncRNAs. These distributions are visualized in the reads classification plots of the HTML report. However, the differential expression analysis is focused on miRNAs, reflecting the pipeline's primary use case in miRNA biomarker discovery. The quantification framework could be adapted to support DE analysis of other small RNA species in the future.

We have clarified this in the revised manuscript (see reads classification section and the new Limitations paragraph in the Discussion).

Comment 4: Discussion of problems and downsides

"The authors do not discuss in detail the possible problems and downsides of their pipeline. A more in-depth and critical discussion of strengths and limitations is warranted."

Response: We have added a dedicated Limitations paragraph to the Discussion section. It addresses the requirement for bioinformatics expertise to run the pipeline, the miRNA-centric scope of the DE analysis, the dependency on miRBase for miRNA annotation, and the hardware requirements.

Comment 5: README on GitHub

"The readme file on github should contain instructions on how to install miND in a language that biologists can understand."

Response: We have rewritten the README on GitHub (https://github.com/TAmiRNA/miND) with a description of the pipeline, its features, hardware requirements, installation steps, usage instructions, and output descriptions. The README also references the detailed protocols.io protocol for step-by-step setup guidance.

Comment 6: Sentence length and readability

"In some places the authors should consider shortening sentences/simplifying statements so, again, it is easier for the readers to follow."

Response: We have revised the manuscript for clarity throughout, shortening long sentences and simplifying phrasing where possible. Specific changes include the Introduction (paragraph on existing tools), the independent filtering method description, and parts of the Conclusions.

View more View less

Competing Interests

none

Alongside their report, reviewers assign a status to the article:

Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions

[1] Anaconda Software Distribution: Anaconda Documentation. Anaconda Inc.; 2020. Reference Source

[2] Andrews S: FastQC: A quality control tool for high throughput sequence data. 2010Reference Source

[3] Aparicio-Puerta E, Gómez-Martín C, Giannoukakos S, et al.:sRNAbench and sRNAtoolbox 2022 update: accurate miRNA and sncRNA profiling for model and non-model organisms. Nucleic Acids Res. 2022; 50(W1): W710–W717. PubMed Abstract | Publisher Full Text | Free Full Text

[4] Chen Y, Lun ATL, Smyth GK: From reads to genes to pathways: Differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline [version 2; referees: 5 approved]. F1000Res. 2016; 5: 1–49. Publisher Full Text

[5] Desvignes T, Batzel P, Sydes J, et al.:miRNA analysis with Prost! reveals evolutionary conservation of organ-enriched expression and post-transcriptional modifications in three-spined stickleback and zebrafish. Sci. Rep. 2019; 9: 3913. PubMed Abstract | Publisher Full Text | Free Full Text

[6] Diendorfer A, Khamina K, Pultar M, et al.: miND (miRNA NGS Discovery pipeline): a small RNA-seq analysis pipeline and report generator for microRNA biomarker discovery studies (v1.2RC2). Zenodo. 2022. Publisher Full Text

[7] Ewels P, Magnusson M, Lundin S, et al.: MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016; 32(19): 3047–3048. PubMed Abstract | Publisher Full Text

[8] Friedländer MR, MacKowiak SD, Li N, et al.: MiRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades. Nucleic Acids Res. 2012; 40(1): 37–52. PubMed Abstract | Publisher Full Text

[9] Grüning B, Dale R, Sjödin A, et al.: Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat. Methods. 2018; 15(7): 475–476. PubMed Abstract | Publisher Full Text

[10] Griffiths-Jones S: The microRNA registry. Nucleic Acids Res. 2004; 32(Database issue): 109D–1111D. PubMed Abstract | Publisher Full Text

[11] Gutmann C, Khamina K, Theofilatos K, et al.: Association of cardiometabolic microRNAs with COVID-19 severity and mortality. Cardiovasc. Res. 2021; 118: 461–474. PubMed Abstract | Publisher Full Text

[12] Köster J, Rahmann S: Snakemake-a scalable bioinformatics workflow engine. Bioinformatics. 2012; 28(19): 2520–2522. Publisher Full Text

[13] Khamina K, Diendorfer AB, Skalicky S, et al.: A MicroRNA Next-Generation-Sequencing Discovery Assay (miND) for Genome-Scale Analysis and Absolute Quantitation of Circulating MicroRNA Biomarkers. Int. J. Mol. Sci. 2022; 23(3): 1226. PubMed Abstract | Publisher Full Text

[14] Langmead B, Trapnell C, Pop M, et al.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009; 10(3): R25. PubMed Abstract | Publisher Full Text

[15] Larrea E, Sole C, Manterola L, et al.: New Concepts in Cancer Biomarkers: Circulating miRNAs in Liquid Biopsies. Int. J. Mol. Sci. 2016; 17(5): 627. PubMed Abstract | Publisher Full Text

[16] Lee RC, Ambros V: An Extensive Class of Small RNAs in Caenorhabditis elegans. Science. 2001; 294(5543): 862–864. PubMed Abstract | Publisher Full Text

[17] Llewellyn HP, Vaidya VS, Wang Z, et al.: Evaluating the Sensitivity and Specificity of Promising Circulating Biomarkers to Diagnose Liver Injury in Humans. Toxicol. Sci. 2021; 181(1): 23–34. PubMed Abstract | Publisher Full Text

[18] Love MI, Huber W, Anders S: Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014; 15(12): 521–550. PubMed Abstract | Publisher Full Text

[19] Mölder F, Jablonski KP, Letcher B, et al.: Sustainable data analysis with Snakemake. F1000Res. 2021; 10: 33. Publisher Full Text

[20] Murillo OD, Thistlethwaite W, Rozowsky J, et al.: exRNA Atlas Analysis Reveals Distinct Extracellular RNA Cargo Types and Their Carriers Present across Human Biofluids. Cell. 2019; 177(2): 463–477.e15. PubMed Abstract | Publisher Full Text

[21] O’Brien J, Hayder H, Zayed Y, et al.: Overview of microRNA biogenesis, mechanisms of actions, and circulation. Front. Endocrinol. 2018; 9(AUG): 1–12. PubMed Abstract | Publisher Full Text

[22] Patil AH, Halushka MK:miRge3.0: a comprehensive microRNA and tRF sequencing analysis pipeline. NAR Genom. Bioinform. 2021; 3(3): lqab068. Publisher Full Text

[23] Pogorelcnik R, Vaury C, Pouchin P, et al.:sRNAPipe: a Galaxy-based pipeline for bioinformatic in-depth exploration of small RNAseq data. Mob. DNA. 2018; 9: 25. PubMed Abstract | Publisher Full Text | Free Full Text

[24] Rahman RU, Gautam A, Bethune J, et al.: Oasis 2: improved online analysis of small RNA-seq data. BMC Bioinformatics. 2018; 19: 54. Publisher Full Text

[25] Robinson MD, McCarthy DJ, Smyth GK: edgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2009; 26(1): 139–140. PubMed Abstract | Publisher Full Text

[26] de Ronde MWJ , Ruijter JM, Moerland PD, et al.: Study Design and qPCR Data Analysis Guidelines for Reliable Circulating miRNA Biomarker Experiments: A Review. Clin. Chem. 2018; 64(9): 1308–1318. PubMed Abstract | Publisher Full Text

[27] Ronen R, Gan I, Modai S, et al.: miRNAkey: a software for microRNA deep sequencing analysis. Bioinformatics. 2010; 26(20): 2615–2616. PubMed Abstract | Publisher Full Text

[28] Starlinger P, Hackl H, Pereyra D, et al.: Predicting Postoperative Liver Dysfunction Based on Blood-Derived MicroRNA Signatures. Hepatology. 2019; 69(6): 2636–2651. PubMed Abstract | Publisher Full Text

[29] Stocks MB, Mohorianu I, Beckers M, et al.:The UEA sRNA Workbench (version 4.4): a comprehensive suite of tools for analyzing miRNAs and sRNAs. Bioinformatics. 2018; 34(19): 3382–3384. PubMed Abstract | Publisher Full Text | Free Full Text

[30] Sweeney BA, Petrov AI, Burkov B, et al.: RNAcentral: A hub of information for non-coding RNA sequences. Nucleic Acids Res. 2019; 47(D1): D221–D229. PubMed Abstract | Publisher Full Text

[31] Wang W-C, Lin F-M, Chang W-C, et al.: miRExpress: Analyzing high-throughput sequencing data for profiling microRNA expression. BMC Bioinformatics. 2009; 10(1): 328. PubMed Abstract | Publisher Full Text

[32] Zerbino DR, Achuthan P, Akanni W, et al.: Ensembl 2018. Nucleic Acids Res. 2018; 46(D1): D754–D761. PubMed Abstract | Publisher Full Text

miND (miRNA NGS Discovery pipeline): a small RNA-seq analysis pipeline and report generator for microRNA biomarker discovery studies

Abstract

Keywords

Revised Amendments from Version 1

Introduction

Methods

Implementation

Use case

Operation

Data repository

NGS raw data and metadata file

Pipeline analysis steps

Figure 1. Flowchart representing the high-level steps of data processing through the pipeline.

Differential expression and independent filtering

Figure 2. DESeq2's false discovery rate (FDR) based independent filtering method.

Interactive HTML report and statistical analysis

Figure 3. Outline of the interactive miND report.

Figure 4. Reads classification of all samples scaled to 100% of total reads.

Figure 5. Distinct mapped micro ribonucleic acids (miRNAs) for each sample.

Figure 6. Heatmap of top 50 miRNAs.

Conclusions

Table 1. Feature comparison of small RNA-seq analysis tools.

Limitations

Data availability

Source data

Software availability

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated