Assembly and quantification of transcripts from noisy long reads with NIFFLR

Alina Guo; Mihaela Pertea; Aleksey V Zimin

doi:10.12688/f1000research.164583.1

Home Browse Assembly and quantification of transcripts from noisy long reads with...

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Software Tool Article

Assembly and quantification of transcripts from noisy long reads with NIFFLR

[version 1; peer review: 1 approved with reservations, 2 not approved]

Alina Guo¹, Mihaela Pertea^1-3, Aleksey V Zimin ^1,2

PUBLISHED 20 Jun 2025

Author details Author details

¹ Center for Computational Biology, Johns Hopkins University, Baltimore, MD, 21205, USA
² Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, 21205, USA
³ Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, 21205, USA

Alina Guo
Roles: Formal Analysis, Investigation, Methodology, Software, Validation, Writing – Original Draft Preparation, Writing – Review & Editing

Mihaela Pertea
Roles: Investigation, Methodology, Validation, Writing – Review & Editing

Aleksey V Zimin
Roles: Conceptualization, Data Curation, Formal Analysis, Investigation, Methodology, Project Administration, Resources, Software, Supervision, Validation, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the Cell & Molecular Biology gateway.

This article is included in the Nanopore Analysis gateway.

Abstract

Background

Long-read RNA sequencing technologies can produce complete or near-complete transcript sequences. Recently introduced methods for direct RNA and cDNA sequencing can provide a high-throughput strategy for the discovery of novel and rare gene isoforms. However, the high error rates in ONT sequences limit the ability to exactly pinpoint splice site boundaries when aligning reads to the genome.

Methods

In this paper, we present a novel tool called NIFFLR (Novel IsoForm Finder using Long Reads) that identifies and quantifies both known and novel isoforms using long-read RNA sequencing data. NIFFLR recovers known transcripts and assembles novel transcripts present in the data by aligning exons from a reference annotation to the long reads.

Results

NIFFLR effectively recovers correct transcripts from simulated reads based on known transcript annotations, achieving higher sensitivity and precision compared to several previously-published tools. On real data, NIFFLR shows the high accuracy as measured by concordance of isoform counts to the counts computed from Illumina data for the same sample. We applied NIFFLR to a set of 92 GTEx long-read samples and produced transcript counts for both novel and known isoforms. In total, we identified and quantified 121,155 isoforms present in the RefSeq annotation of GRCh38 and 106,667 high-confidence novel isoforms across 32,875 genes present in two or more samples in these data, more than previous studies identified in this data set.

Conclusions

NIFFLR is an effective tool aimed at assembly and quantification of transcripts present in the long high error transcriptome reads. NIFFLR is released under an open-source license (GPL 3.0) and is available on GitHub at https://github.com/alguoo314/NIFFLR/releases.

Keywords

transcriptome, quantification, assembly, discovery, annotation

Corresponding author: Aleksey V Zimin

Competing interests: No competing interests were disclosed.

Grant information: This work was supported by National Science Foundation grant IOS-2432298 to Johns Hopkins University (PI Zimin, Co-PI Salzberg), and by National Institutes of Health grants to Johns Hopkins University R01-HG006677 (PI Salzberg) and R35-GM130151 (PI Salzberg). Zimin is a member of the Salzberg lab at JHU.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2025 Guo A et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Guo A, Pertea M and Zimin AV. Assembly and quantification of transcripts from noisy long reads with NIFFLR [version 1; peer review: 1 approved with reservations, 2 not approved]. F1000Research 2025, 14:608 (https://doi.org/10.12688/f1000research.164583.1) First published: 20 Jun 2025, 14:608 (https://doi.org/10.12688/f1000research.164583.1) Latest published: 20 Jun 2025, 14:608 (https://doi.org/10.12688/f1000research.164583.1)

Introduction

Direct RNA and cDNA sequencing technologies from Oxford Nanopore Technologies (ONT) produce long transcriptome reads with high yields at relatively low cost. However, the per-base error rates of ONT reads are still much higher than those of Illumina reads. Several computational tools have recently been developed to assemble transcripts and quantify isoforms in samples sequenced using ONT reads, including FLAIR (Tang AD et al., 2020), ESPRESSO (Gao Y et al., 2023), and IsoQuant (Prjibelski AD et al., 2023). All these tools begin by mapping the long reads to the genome using the Minimap2 (Li H, 2018) aligner in spliced alignment mode. However, the high error rate of ONT reads makes it challenging to precisely identify splice sites through spliced alignment alone. Therefore, these tools incorporate additional information to locate the splice sites accurately. FLAIR can correctly identify splice sites by either using alignments of short-read RNA-seq data or by using a reference annotation. ESPRESSO accepts novel splice junctions only if at least one read aligns perfectly to the reference genome within 10 nucleotides (nt) of the splice site, a stringent criterion that limits its ability to discover novel junctions. IsoQuant replaces novel splice sites with nearby annotated sites within a user-defined distance and restores short, skipped exons according to the reference annotation. For all these programs, misalignments can lead to incorrect identification of splice junctions, which may subsequently result in inaccurate transcript reconstruction.

Here, we present NIFFLR (Novel IsoForm Finder using Long Reads), a tool designed to construct and quantify both annotated and novel isoforms using a reference annotation and long RNA sequencing reads. Unlike other isoform identification tools, NIFFLR does not rely on a spliced aligner to map reads onto the reference genome. Instead, it extracts exons from the given annotation and aligns them directly to the long reads. NIFFLR then constructs transcripts by identifying an optimal path through the mapped exons for each long read, removes redundant transcripts that are contained within others, filters out transcripts with low read support, compares the predicted transcripts to the reference annotation, and finally quantifies both annotated and novel isoforms. For efficient exon-to-read alignment, NIFFLR uses a custom aligner based on a partial suffix array adapted from the MaSuRCA assembler (Zimin et al., 2013).

Methods

Implementation

We designed the NIFFLR algorithm to build transcripts (i.e., sequences of exons) by computing the optimal tiling of every long read using exons and transcripts provided as input. We require the following inputs: long RNA sequencing reads in FASTQ format, a reference genome sequence file in FASTA format, and a reference annotation file in GTF format.

First, we extract the exon sequences from the reference genome using the annotation and output them into a FASTA file. The name of each exon encodes the chromosome name, start and end position on the chromosome, the name of the gene to which the exon belongs, and its orientation. We reverse complement all exon sequences that are on the reverse strand.

We then use a version of a technique first utilized in the MaSuRCA assembler (Zimin et al., 2013) to efficiently compute approximate alignments of exons to the long reads. This alignment technique, which we refer to as psa_aligner, is based on a partial suffix array (PSA). The PSA is designed to efficiently compute approximate alignments, or alignment intervals between two sets of DNA sequences. The psa_aligner first builds a partial suffix array from a concatenated string S containing the sequences of all exons, separated by the letter ‘N’ (note that no ‘N’ characters are allowed in the reference sequence). We also record the starting position of each exon in S. Unlike a traditional suffix array, the PSA limits the suffix size to a predefined value K. The suffix array allows us to quickly locate all occurrences of a given subsequence of length K (or a K-mer) within S, and thus identify all exons and positions where a particular K-mer occurs. We then examine each K-mer in a given long read and compute all the longest common sub-sequences (LCS) of K-mers between the read and the exons, using a default value of K = 12. The approximate alignment coordinates are then determined by calculating the best linear fit between the positions of K-mers belonging to the LCS in the read and on the exon. We only retain alignments where matching K-mers cover at least 35% of the bases within the match interval. Each alignment provides alignment start and end positions, along with the exon and read overhangs, as shown in Figure 1. For each exon, we record the number of K-mers in the LCS, the alignment start and end positions, and the implied start and end on the read. The implied start is calculated as alignment_start-a_overhang, and the implied alignment end is alignment_stop+b_overhang.

Figure 1. Definitions of alignment coordinates.

After building the alignments, we assign each long read to a gene locus using a “majority vote” approach. Specifically, for each read, we compute the total number of K-mers in all LCSs for all matching exons from different gene loci and assign the read to the locus L whose exons have the highest total number of matching K-mers. Alignments of any exons that belong to different gene loci are then discarded. Next, we build the transcript matching the read by finding the best tiling of read using exons that belong to locus L. The best sequence maximizes coverage of the read while minimizing gaps or overlaps in the implied alignment coordinates. The long read defines a 5’ to 3’ forward direction, specifying a topological order. We sort the aligning exons in the order of their “alignment start” coordinates if aligned in the forward direction, or “alignment end” coordinates if aligned in the reverse direction. Since we only kept alignments of exons that all belong to a single gene locus L, the exons must all align either in forward or reverse direction. For simplicity, below we describe the algorithm assuming all exons are aligned in the forward direction; the reverse case is treated the same, by reversing the long read.

We represent the exon tiling problem as a graph, where nodes represent exons and edges are defined by gaps or overlaps of 20 bases or less between the implied end of an exon and the implied start of the following exon in the topological order. Next, we choose the “starting” nodes that are not connected on the left. A starting node must be connected on the right and have an “alignment start” closest to the 5’ end of the long read or fully cover it. If multiple exons share the same “alignment start” coordinate due to alternative splicing, we select the exon with the smallest 5’ “overhang”. If the 5’ overhang is the same for more than one exon, we use all such exons as alternative start nodes. We solve the exon tiling problem by finding the longest path through the graph, starting from any start node that minimizes the penalty, defined as the average gap/overlap size between connected exons in the path. In case of a tie, we select the path that maximizes the sum of exon matching lengths minus the sum of the overhangs of the first and last exons. Figure 2 illustrates an example of such a path. Once the longest path is identified, we examine the genomic coordinates of the exons, which are encoded in their sequence IDs. We eliminate the path if there is an overlap between the genomic coordinates of the exons in the path, which could indicate that the long read is chimeric or that there is a significant local genome rearrangement that NIFFLR cannot handle.

Figure 2. An illustration of the optimal path of exons through a long transcriptomic read (shown in green).

Shading shows the alignment regions. Arrows indicate links. The best path shown in red is the longest path that minimizes the gap/overlap/overhang penalty. Exon1 is chosen as the start exon because exon1+ exon3 have a longer alignment than exon2. Exon5 is alternatively spliced compared to exon6 and exon7, and its longest match is the same as exon6’s, shorter than exon6 and exon7 combined, and hence not selected for the optimal path. Exon2 is alternatively spliced as well.

We convert the best path of exons for each read into a plausible transcript and then group reads that yield the same transcript. For each transcript, we record the reads contributing to it, along with the minimum of the average gap/overlap penalty (A_min) and the minimum of the maximum gap/overlap penalty (G_min) across all paths of reads that yielded the transcript. In subsequent steps, we use only those transcripts where A_min < 5 and G_min < 15. These values are empirically obtained parameters and they yielded the best performance in our experiments with simulated reads.

We then use the GffCompare tool to create a set of maximal transcripts by removing those whose intron chains are contained in longer assembled transcripts. We call this set of transcripts “non-redundant”. Next, we perform the first round of transcript quantification, using all originally assembled transcripts to assign reads to the non-redundant transcripts based on containment. Reads from assembled transcripts, which are contained in multiple maximal transcripts, are distributed proportionally to the size of the container transcripts. For each maximal transcript, we calculate the following:

1. The number of reads supporting the transcript.
2. The minimum read coverage across all intron junctions.
3. The total number of junctions covered by at least one read.
4. The portion of the transcript covered by reads.

By design, all intron junctions are covered in the maximal set. After quantification, we perform a transcript recovery step where we attempt to recover reference annotation transcripts that are likely present in the sample, but their intron chains are not completely covered by any long read. If a maximal transcript is contained within a reference transcript, we tentatively replace the contained transcript with the containing transcript from the reference. We then perform quantification again and eliminate multi-exon reference transcripts where none of the intron junctions are spanned by long reads, which means that only one exon had reads aligned to it. These reference transcripts are unlikely to be present in the sample. This procedure is designed to eliminate computed isoforms whose intron chains are contained in the reference transcripts, as these are unlikely to represent genuinely novel isoforms and are likely sequenced from known transcripts. Next, we identify novel transcripts (i.e., those not present in the reference) and apply stricter filtering criteria, requiring the minimum average gap in the exon paths to be less than 2 and the minimum of the maximum gap to be less than 5. This yields the final set of transcripts, containing both novel and known transcripts, which we then again quantify to produce the final set of quantified transcripts.

Operation

NIFFLR is designed to run under 64-bit Linux operating system. NIFFLR requires at least 16Gb of RAM and supports multi-core multi-threaded hardware environment. NIFFLR code consists of shell and Python scipts and C++ code. We provide installation instructions for NIFFLR on github: https://github.com/alguoo314/NIFFLR. Basic usage of NIFFLR is as follows: /path/nifflr.sh -r genome.fasta -f reads.fastq -g genome.gtf.

Results

In this section, we compare NIFFLR to other similar methods such as FLAIR2, IsoQuant, and ESPRESSO, and discuss the results of applying NIFFLR to ONT data from the Genotype-Tissue Expression (GTEx) project (Glinos et al., 2022). We performed two evaluations to compare NIFFLR to the existing methods. First, we assessed the performance of each program on a set of simulated ONT direct RNA sequencing reads. Next, we tested all programs on a sample from the GTEx project that was sequenced using both Illumina and ONT technologies.

Comparison on simulated long reads

We simulated reads using NanoSim software (Yang et al., 2017) from the human reference genome GRCh38.p14 and its corresponding RefSeq genome annotation (RS_2024_08). We derived read error profiles from ONT reads of GTEx sample 1192X, which was sequenced with both Illumina RNA-seq and ONT technologies. We used the Illumina reads from the same sample to generate an expression profile for the simulation. Our simulated data set contained approximately 7.8 million reads with an average error rate of 8.7% and an N50 read length of 944 bp. According to Nanosim output, the simulated set had 50,748 unique transcripts expressed.

All programs in this comparison allow the use of a reference annotation to identify and correct splice junctions, and we provided such annotation in all our experiments. Note that FLAIR and IsoQuant have options allowing them to run without annotation, but their accuracy is higher if annotation is provided. To make the evaluation more realistic, we split the reference annotation into a “core” set of transcripts, which is the set with the smallest number of transcripts where each exon was present at least once (referred to as the known set), and the rest of the transcripts (referred to as the novel set). By design, the core set contained every reference donor and acceptor splice site at least once. We provided the core set but not the novel set to all programs. This way we ensured that some portion of the expressed transcripts were not present in the input set of the reference transcripts, enabling us to measure the programs’ ability to discover and quantify novel transcripts in addition to the known transcripts. Our simulated set consisted of reads simulated from 50,748 transcripts, of which 33,686 comprised the core set and the remaining 17,062 comprised the novel set. In our experiments, we measured the number of novel and known transcripts correctly recovered by the programs, as well as the number of false positive transcripts, using the GffCompare tool (Pertea & Pertea, 2020). False positives were defined as any transcripts output by the programs that did not have a complete intron chain match to a transcript in the known or novel set. Table 1 shows the comparison of the programs on the simulated data. NIFFLR has the best sensitivity in recovering known, novel, and all isoforms, and the best overall F1 score, while only losing to IsoQuant in precision. NIFFLR recovers the most isoforms from both the known and novel sets while keeping the number of spurious isoforms relatively low. This result demonstrates that when novel isoform discovery and quantification are the primary goals, NIFFLR is the best tool.

Table 1. Performance of the assembly and quantification pipelines on simulated data.

The best values are in bold. NIFFLR recovers the most novel isoforms and the most isoforms total (32,711) while keeping the number of erroneous isoforms lower than FLAIR2 and ESPRESSO, resulting in the best sensitivity and F1 score for isoform recovery. Isoquant is the most conservative and the least sensitive, both on novel and known isoform discovery.

	# of novel isoforms	Sn for novel isoforms	# of known isoforms	Sn for known isoforms	# of all correct isoforms	Sn for all isoforms	Pr for all isoforms	F1 for all isoforms	# of spurious isoforms
All simulated transcripts	17062	100.0%	33686	100.0%	50748	100.0%	100.0%	100	0
FLAIR2	4988	29.2%	15529	46.1%	20517	40.4%	54.8%	46.5	41777
IsoQuant	1926	11.3%	19629	58.3%	21555	42.5%	98.1%	59.3	964
NIFFLR	5153	30.2%	27558	81.8%	32711	64.5%	73.5%	68.7	7961
ESPRESSO	1490	8.7%	20750	61.6%	22240	43.8%	67.7%	53.2	24198

We compared the read counts computed by each program for every transcript to the actual counts from the simulation. Figure 3a presents box-and-whisker plots of the ratios (expressed as base-2-logarithms) of the actual and computed counts for each transcript. The box spans the upper and lower quartile of the ratios and the whiskers represent the range for 95% of the values, with individual outliers outside of the 95% interval shown as dots. NIFFLR has a tighter distribution than FLAIR and ESPRESSO, though it is slightly outperformed by IsoQuant. ESPRESSO shows the worst overall performance, both in terms of the distribution’s tightness and bias. Figure 3b shows a more detailed comparison of the ratios between the computed counts from NIFFLR and IsoQuant, compared to the actual counts for the subset of 18,686 isoforms quantified by both tools. We observe that in this comparison the accuracy is nearly identical, with NIFFLR counts showing less overall bias. This figure suggests that the reason for the slightly lower accuracy (wider whiskers) of NIFFLR compared to IsoQuant in panel (a) is the inclusion of counts for many more transcripts by NIFFLR, capturing less reliable lower-count transcripts, which IsoQuant discards. In the simulated data comparison, NIFFLR demonstrates superior quantification accuracy and sensitivity overall.

Figure 3. (a) Box and whisker plots of the log2 ratios (y-axis) of the actual and computed read counts for each transcript for simulated reads.

The box spans the upper and lower quartile of the log2 ratios, and the whiskers represent 95% of the values, with individual outliers outside of the 95% interval shown as dots. IsoQuant and NIFFLR show the least variation from the true counts in the simulated data. (b) Box and whisker plots of the log2 ratios of the actual and computed read counts for each transcript from the set of 18,686 simulated transcripts quantified by both NIFFLR and IsoQuant. IsoQuant and NIFFLR show the same accuracy (the height of the box and whiskers are the same size) on this set of transcripts, however, NIFFLR counts have smaller bias (the mean and the median for NIFFLR are closer to zero) and fewer outliers.

Comparison on a real data sample sequenced with both Illumina and ONT technologies

For this experiment, we selected the GTEX-1192X sample, which was sequenced with both Illumina and Oxford Nanopore instruments. The ONT data contained 7.6 million long reads with an N50 of 872 bp and a total sequence of 5.3 Gbps. In this dataset, the exact expression of existing and novel transcripts is unknown. However, we can estimate the number and abundances of the known transcripts from the Illumina RNA-seq data, which provides much deeper coverage of the sample. We used StringTie2 (Kovaka et al., 2019) in reference-guided mode to assemble the Illumina data, and this yielded 51,909 distinct transcript variants. The reference-guided mode of StringTie does not output any novel isoforms. Table 2 shows the number of total isoforms and known isoforms found by the four long-read quantification programs when using the ONT data. NIFFLR identified and quantified 43,093 transcripts that matched the reference, which was more than twice as many as any of the other pipelines. To evaluate the accuracy of the quantification, we compared the read counts computed by the programs to the transcript coverage values computed by StringTie on the Illumina data from the same sample. To adjust for the overall coverage difference, we multiplied the coverage values for the Illumina data by 1.59, corresponding to the ratio of the number of bases in the Illumina reads (8.5B bp) divided by the number of bases in the ONT reads (5.33B bp). Figure 4 presents box-and-whisker plots of the ratios (expressed as base-2-logarithms) of the scaled transcript coverages computed with StringTie from Illumina RNA-seq reads and the read counts computed with long-read pipelines from Oxford Nanopore reads for the same sample. The box spans the upper and lower quartile of the ratios and the whiskers represent the range for 95% of the values, with individual outliers outside of the 95% interval shown as dots. The quantification estimates produced by NIFFLR were the second most-consistent to StringTie, outperformed slightly by Isoquant. NIFFLR was the most sensitive, quantifying 26,312 isoforms found in the Illumina RNA-seq data by StringTie.

Table 2. Performance of long-read transcriptome assembly and quantification methods on GTEx ONT data. NIFFLR recovers the largest number of reference isoforms.

	# of reference isoforms	# of total isoforms
FLAIR2	14,957	75,557
IsoQuant	17,183	17,183
NIFFLR	43,093	58,377
ESPRESSO	21,026	26,222

Figure 4. Comparison of scaled transcript coverages computed with StringTie from Illumina RNA-seq reads and the read counts computed with long-read pipelines from Oxford Nanopore reads for the same sample.

NIFFLR quantified 26,312 reference transcripts that were also quantified with StringTie, far more than the competing pipelines. IsoQuant counts are the most consistent with StringTie counts derived from Illumina data for the same sample, and NIFFLR counts are the second closest.

Isoform discovery with NIFFLR on 92 GTEx samples

We applied NIFFLR to identify and quantify isoforms in 92 ONT GTEx samples described in (Glinos et al., 2022), using the RefSeq annotation of GRCh38.p14 as the reference. Across all samples, we identified 135,343 known isoforms and 316,284 novel isoforms in 35,686 genes. Our high confidence set included isoforms identified in two or more sequence samples, and it includes 106,667 novel isoforms and 121,155 known isoforms across 32,875 genes. Number of isoforms identified by NIFFLR far exceeds the number reported by FLAIR (Glinos et al., 2022), which identified 93,718 transcripts across 21,067 genes, of which 77% were novel. Figure 5 illustrates the distribution of counts of novel isoforms across all samples. Interestingly, NIFFLR identified 13 novel isoforms that were present in all 92 samples. Three of these 13 isoforms are annotated in the CHESS annotation version 3.0.1 (Varabyou et al., 2023), or in the GENCODE annotation release 47, with one isoform present in both annotations. Table 3 shows the breakdown of novel and known transcripts found by NIFFLR in GTEx long-read data by tissue. As expected, the percentage of novel isoforms increases with increase in the number of samples for a given tissue, as rare isoforms become more abundant.

Figure 5. The number of novel isoforms discovered by NIFFLR vs. the number of samples these isoforms were found.

The total number of novel isoforms identified by NIFFLR in the 92 GTEX samples was 451,627. Of these, 223,805 were only seen in a single sample and 13 isoforms were identified in all 92 samples.

Table 3. Breakdown of novel and known transcripts found by NIFFLR in GTEx long-read data by tissue.

The share of novel isoforms increases with the increase in the number of samples for a given tissue. We used all isoforms identified by NIFFLR for the counts shown in this table.

Tissue	# Samples	Novel Transcripts	Known Transcripts	Percent Novel Transcripts
Adipose	1	9,273	30,159	23.5
Brain	22	113,294	103,644	52.2
Breast	1	8,391	32,940	20.3
Cultured Fibroblasts	22	156,097	103,354	60.2
Heart	16	71,024	86,431	45.1
K562 (Human Chronic Myelogenous Leukemia cell line)	4	22,056	33,677	39.6
Liver	8	46,781	68,622	40.5
Lung	8	73,414	84,373	46.5
Muscle	9	76,409	75,407	50.3
Pancreas	1	9,313	32,101	22.5

Discussion

In this manuscript, we describe a novel approach for the discovery and quantification of isoforms from long-read RNA sequencing data produced by Oxford Nanopore sequencing technology. The key difference between NIFFLR and other published programs with similar functionality is that NIFFLR aligns exons from the reference annotation directly to the reads, rather than performing spliced alignment of the reads to the genome. This approach works best for well-annotated genomes, such as the human genome, offering superior sensitivity in this case. However, NIFFLR can still be applied to genomes where their annotation is less reliable, after inferring potential exons from the Illumina RNA-seq data using transcriptome assemblers such as StringTie.

Timings comparison

NIFFLR is generally fast enough for research use. As shown in Table 4, NIFFLR was slower than FLAIR2 and IsoQuant, but much faster than ESPRESSO on both simulated and real datasets. Most of the runtime for NIFFLR was spent on aligning exons to the long reads.

Table 4. Timings for the quantification software measured on the simulated and real data.

We ran all experiments on a 24-core Intel Xeon Gold server with 1TB or RAM, using 24 threads. Time is in hours.

	IsoQuant	FLAIR2	NIFFLR	ESPRESSO
Simulated reads	0.7	1.3	1.9	45
GTEx sample	1.2	2.1	3.2	106

NIFFLR is written in shell script, Python, and C++ (the psa_aligner code). To simplify installation, we provide an install script that performs system checks and compiles all necessary executables. We have tested the installation on several popular Linux distributions including RedHat 7, 8, and 9, as well as Ubuntu 18, 20, and 22 LTS.

Software availability

• Software available from: https://github.com/alguoo314/NIFFLR
• Source code available from: https://github.com/alguoo314/NIFFLR
• Archived source code at time of publication: Zenodo doi 10.5281/zenodo.15585584
• License: GNU General Public License v3.0

Ethical considerations

Ethics and consent are not required.

Data availability

• The supplementary materials, transcript assembly and quantification results computed by NIFFLR from GTEx data are available on Zenodo.
• [Zenodo]. [Supplementary information and transcripts assembled by NIFFLR software for 92 GTEx long-read transcriptome sequencing samples]. [10.5281/zenodo.15585443].
• The project contains the following underlying data: Transcripts assembled by NIFFLR software for 92 GTEx long-read transcriptome sequencing samples along with the number of samples the transcripts were observed in. Supplementary materials: commands we used to run NIFFLR and competing software for comparisons are listed in the Supplementary Information.
• combined92.combined.chr.gtf – GTF format file (9-column tab separated text) containing assembled transcripts on human GRCh38 assembly, chromosomes identified with chromosome names.
• combined92.combined.gtf – GTF format file (9-column tab separated text) containing assembled transcripts on human GRCh38 assembly, chromosomes identified with NCBI RefSeq chromosome IDs.
• combined92.combined.min2sampl.gtf – GTF format file (9-column tab separated text) containing assembled transcripts found in at least two samples, on human GRCh38 assembly, chromosomes identified with NCBI RefSeq chromosome IDs.
• Supplementary materials.pdf – Supplementary materials for the manuscript titled “Assembly and quantification of transcripts from noisy long reads with NIFFLR.”

Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

Acknowledgements

We thank Steven L. Salzberg, Bloomberg Distinguished Professor of Biomedical Engineering, Computer Science and Biostatistics at Johns Hopkins University for help with editing the manuscript and obtaining funding for this project.

References

Gao Y, Wang F, Wang R, et al.: ESPRESSO: robust discovery and quantification of transcript isoforms from error-prone long-read RNA-seq data. Science Advances. 2023 Jan 20; 9(3): eabq5072. PubMed Abstract | Publisher Full Text | Free Full Text
Glinos DA, Garborcauskas G, Hoffman P, et al.: Transcriptome variation in human tissues revealed by long-read sequencing. Nature. 2022 Aug 11; 608(7922): 353–359. PubMed Abstract | Publisher Full Text | Free Full Text
Kovaka S, Zimin AV, Pertea GM, et al.: Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 2019 Dec; 20: 273–278. PubMed Abstract | Publisher Full Text | Free Full Text
Li H: Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018 Sep 15; 34(18): 3094–3100. PubMed Abstract | Publisher Full Text | Free Full Text
Pertea G, Pertea M: GFF utilities: GffRead and GffCompare. F1000Res. 2020; 9.
Prjibelski AD, Mikheenko A, Joglekar A, et al.: Accurate isoform discovery with IsoQuant using long reads. Nat. Biotechnol. 2023 Jul; 41(7): 915–918. PubMed Abstract | Publisher Full Text | Free Full Text
Tang AD, Soulette CM, van Baren MJ , et al.: Full-length transcript characterization of SF3B1 mutation in chronic lymphocytic leukemia reveals downregulation of retained introns. Nat. Commun. 2020 Mar 18; 11(1): 1438. PubMed Abstract | Publisher Full Text | Free Full Text
Varabyou A, Sommer MJ, Erdogdu B, et al.: CHESS 3: an improved, comprehensive catalog of human genes and transcripts based on large-scale expression data, phylogenetic analysis, and protein structure. Genome Biol. 2023 Oct 30; 24(1): 249. PubMed Abstract | Publisher Full Text | Free Full Text
Yang C, Chu J, Warren RL, et al.: NanoSim: nanopore sequence read simulator based on statistical characterization. GigaScience. 2017 Apr; 6(4): 1–6. PubMed Abstract | Publisher Full Text | Free Full Text
Zimin AV, Marçais G, Puiu D, et al.: The MaSuRCA genome assembler. Bioinformatics. 2013 Nov; 29(21): 2669–2677. PubMed Abstract | Publisher Full Text | Free Full Text

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 20 Jun 2025

Author details Author details

¹ Center for Computational Biology, Johns Hopkins University, Baltimore, MD, 21205, USA
² Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, 21205, USA
³ Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, 21205, USA

Alina Guo
Roles: Formal Analysis, Investigation, Methodology, Software, Validation, Writing – Original Draft Preparation, Writing – Review & Editing

Mihaela Pertea
Roles: Investigation, Methodology, Validation, Writing – Review & Editing

Aleksey V Zimin
Roles: Conceptualization, Data Curation, Formal Analysis, Investigation, Methodology, Project Administration, Resources, Software, Supervision, Validation, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

This work was supported by National Science Foundation grant IOS-2432298 to Johns Hopkins University (PI Zimin, Co-PI Salzberg), and by National Institutes of Health grants to Johns Hopkins University R01-HG006677 (PI Salzberg) and R35-GM130151 (PI Salzberg). Zimin is a member of the Salzberg lab at JHU.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Article Versions (1)

version 1

Published: 20 Jun 2025, 14:608

https://doi.org/10.12688/f1000research.164583.1

Copyright

© 2025 Guo A et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

Guo A, Pertea M and Zimin AV. Assembly and quantification of transcripts from noisy long reads with NIFFLR [version 1; peer review: 1 approved with reservations, 2 not approved]. F1000Research 2025, 14:608 (https://doi.org/10.12688/f1000research.164583.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?

Key to Reviewer Statuses VIEW HIDE

ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions

Version 1

VERSION 1

PUBLISHED 20 Jun 2025

Views

12

Reviewer Report 04 Aug 2025

Yuan Gao, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, Beijing, China

Not Approved

https://doi.org/10.5256/f1000research.181114.r395536

Guo et al. proposed NIFFLR, a tool for assembling and quantifying transcripts using long-read RNA-seq data. However, the current manuscript does not provide sufficient evidence to demonstrate the novelty or efficiency of NIFFLR in analyzing long-read data. Their evaluation and ... Continue reading

Guo et al. proposed NIFFLR, a tool for assembling and quantifying transcripts using long-read RNA-seq data. However, the current manuscript does not provide sufficient evidence to demonstrate the novelty or efficiency of NIFFLR in analyzing long-read data. Their evaluation and conclusions are not convincing.
1 Most readers would be confused about the novelty or advances of NIFFLR. It’s based on the alignment of constructed exon-exon junction sequences, a strategy used by many tools long ago. Does this strategy work for mini-exons that are shorter than 10 nt or 20 nt? This strategy also heavily depends on annotated exons. Can it identify any novel splice donors or acceptors? For tissues or organisms with incomplete annotation, how will this strategy be affected? All of above need be carefully evaluated and described in details.
2 The aligner (psa_aligner) used by the authors needs comprehensively assessment, e.g. whether psa_aligner is suitable for noisy long-read data. An evaluation and comparison with the commonly used long-read aligner Minimap2 should be included.
3 The NIFFLR programming script is poorly written and difficult to install and use for analysis. For example, when I tried to run NIFFLR, I received an error message from python (not NIFFLR) and could not finished the analysis. The error, “IndexError: list index out of range”, occurred during the “performing filtering and quantification” step.
4 The authors used simulated data to evaluate NIFFLR and other tools. However, this evaluation may result in biased results. Many reports have noted that existing simulators are not suitable for evaluation. For example, according to a recent paper published by Dr. Hagen Tilgner and colleagues (Mikheenko A, et al., 2022 [Ref 1]), NanoSim randomly selects a starting position in a transcript to simulate truncation based on a uniform distribution. However, in real long-read data, a uniform distribution cannot be observed.
5 A direct comparison of the tools for transcript identification and quantification using real long-read data with ground truth need be included. Many previous studies used SIRV E2 for evaluation, which contains 69 synthesized transcript isoforms with different abundances. I actually tried to run NIFFLR to analyze SIRV data, but it failed. NIFFLR produced an "exon extraction failed" error when processing the GTF annotation of SIRV, while I did not encounter any errors when using other tools. This is another example of poor programming of NIFFLR.
6 I am stunned that the authors used transcript abundance from short-read data to evaluate the performance of long-read tools. The bias of short-read data has been so widely reported. For example, a Nature Methods paper published by Chen et al. 2025 [Ref 2] provided important evidence on this. In addition to the bias, the novel transcript isoforms in the data that are similar to annotated isoforms would also confuse short-read quantification.
7 The authors need to provide sufficient evidence before arbitrarily judging existing tools. For example, they described ESPRESSO’s strategy for identifying novel splice junctions as “a stringent criterion that limits its ability to discover novel junctions”. However, they did not provide any evidence regarding the sensitivity of detecting novel splice junctions. Novel splice junctions can be further divided into junctions with novel splice sites and junctions as novel combination of annotated splice sites. Both types need to be compared between ESPRESSO and NIFFLR before the authors can draw such a conclusion.
8 Why was BamBu (Chen Y, et al., 2023 [Ref 3]) not included in the evaluation and comparison? BamBu was published online more than two years ago and has been widely used for long-read data analysis. To demonstrate the advantages of their tool, the authors will need to compare it with BamBu.

Is the rationale for developing the new software tool clearly explained?

Partly
Is the description of the software tool technically sound?

No
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Partly
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

No

References

1. Mikheenko A, Prjibelski A, Joglekar A, Tilgner H: Sequencing of individual barcoded cDNAs using Pacific Biosciences and Oxford Nanopore Technologies reveals platform-specific error patterns. Genome Research. 2022; 32 (4): 726-737 Publisher Full Text
2. Chen Y, Davidson N, Wan Y, Yao F, et al.: A systematic benchmark of Nanopore long-read RNA sequencing for transcript-level analysis in human cell lines. Nature Methods. 2025; 22 (4): 801-812 Publisher Full Text
3. Chen Y, Sim A, Wan Y, Yeo K, et al.: Context-aware transcript quantification from long-read RNA-seq data with Bambu. Nature Methods. 2023; 20 (8): 1187-1195 Publisher Full Text

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Bioinformatics, Computational Biology

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

CITE

Report a concern

Respond or Comment

Views

14

Reviewer Report 04 Aug 2025

Fairlie Reese, University of California, Irvine, California, USA

Not Approved

https://doi.org/10.5256/f1000research.181114.r393925

The authors present NIFFLR, a minimap2-free tool for the assembly and quantification of known and novel transcripts from long-read RNA-seq data. In the paper, they describe NIFFLR, the developed method, which works using partial suffix arrays and kmer-matching of annotated ... Continue reading

The authors present NIFFLR, a minimap2-free tool for the assembly and quantification of known and novel transcripts from long-read RNA-seq data. In the paper, they describe NIFFLR, the developed method, which works using partial suffix arrays and kmer-matching of annotated exons to the long reads themselves; thus bypassing the potential error-prone choices that are made when aligning noisy long reads at splice junctions. Overall, this seems like a really interesting and novel approach to a problem (errors in minimap2 alignments) that I’ve seen several times now in the field. However, I have several major concerns about the quality of the benchmarking and how the paper integrates into the overall field. I elaborate on my concerns below.

Major concerns:

It’s unclear why certain parameter choices were made for the implementation, such as k = 12 or coverage >= 35%, at the level of the exon to read alignment or Amin / Gmin at the level of choosing the most optimal path. The authors mention that they performed optimization but the results of these experiments are not included. Adding these results showing that this performance is optimal would increase the confidence in the tool and these decisions taken.
One of my main concerns is that, based on my understanding of the method, the only novel transcripts that can be discovered by NIFFLR are those that use already-annotated exons. This could substantially limit its ability to identify biologically relevant novel isoforms, which frequently involve novel splice sites or exons and are described in previous long-read RNA-seq studies. The authors should consider alternative methods to include novel splicing as part of their novel isoform discovery.
Related to point 2, it is hard to assess the performance of the tool relative to other reported findings in the field because it only contains novel-in-catalog novel transcripts according to the very commonly-used SQANTI classification (Pardo-Palacios F, et al., 2024 [Ref 1]). It is common in the field to examine the proportions of transcripts from each of these categories as a quality control metric.
My main concern in this paper however is the quality of the benchmarking. There have been many papers that have performed long-read RNA-seq benchmarking in the past and have employed various metrics to evaluate the results: (Dong X, et al., 2023 [Ref 2], Chen Y, et al., 2022 [Ref 3], Pardo-Palacios F, et al., 2024 [Ref 4]),
The manuscript would benefit from adopting such standard benchmarking strategies that have become widely-accepted. As it stands, it is very tough to contextualize its results in the field as a whole. Some more point-by-point recommendations would be at least the following:
1. For the simulations, as referenced in point 2, it is unrealistic to expect that novel transcripts will only arise from novel combinations of known exons (NICs). The authors should consider using existing simulated novel transcript ground truth datasets that exist, such as the one from LRGASP.
2. For the quantification benchmarking, it is more common to report a correlation metric between the ground truth and the estimated quantification from the tool. This would make these results more interpretable and comparable to other benchmarking efforts in the field.
3. Also related to the quantification benchmarking, there are no significance values reported for any of the pairwise comparisons; just written speculation on the visual appearance of the plots. Statistical analyses would increase the confidence of these results.
4. Finding more isoforms is not necessarily a metric of a “better” tool, but is referenced as if it is in the text related to Figure 5. In fact, the percentage of reported “novel” transcripts for various GTEx tissues is surprisingly high and therefore the high number of transcripts could be indicative of over-calling novel transcripts and therefore poor specificity. Instead, the authors should additionally overlap the discovered isoforms with those discovered by Glinos et al. in their original paper (or other external datasets) to see how well their method recapitulates what others have already said about the dataset.
Would the authors be able to speculate on any specific use case (a specific cohort, a specific technology, etc) where the exons-to-reads alignment approach might be especially beneficial compared to traditional minimap2-based approaches? This might add some insight to the discussion.

Minor concerns:

As the development of the method is of central importance to the paper, the implementation could be expanded on or explained a bit more. In particular, the concepts of the “implied” starts / ends of the alignments were confusing in Figure 1 and related text. Similarly, Figure 2 was a bit overly-complicated, and perhaps the authors could consider presenting the possible transcript paths separately as alternative transcripts in genome browser format (ie IGV or UCSC).
At the end of the first paragraph of main text on page 7, the authors state “This result demonstrates that when novel isoform discovery and quantification are the primary goals, NIFFLR is the best tool” when referring to the simulation experiment where they measured isoform detection / assembly only. This means this result has no relevance for quantification.
If the authors mean “false positive transcripts” by “spurious transcripts”, they should simply refer to them as the latter as there is no definition for “spurious” transcripts in the text.
In table 2, IsoQuant has the same number of known and total isoforms; implying it found no novel isoforms. I am highly doubtful this is correct and is probably a typo.
Figure 3 is missing y axis labels. Furthermore, the meaning of the box and whiskers are elaborated on in the results, which should just be in the legend. Additionally, Figure 3b is just a zoomed in duplicate of two plots from 3a and is unnecessary.
For Figure 4 the authors describe a strange method of depth normalization to compare the long-read RNA-seq to short-read RNA-seq transcript quantification estimates. They should use normal TPM / CPM normalization. This figure also has unnecessary details about the plot in the results section which should be in the legend (same as in Figure 3).
The authors make no references to their supplementary material PDF in the main body of the text. They should include references so that readers know where to find the calls made to perform the benchmarking etc.

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Partly
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

No
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Partly

References

1. Pardo-Palacios F, Arzalluz-Luque A, Kondratova L, Salguero P, et al.: SQANTI3: curation of long-read transcriptomes for accurate identification of known and novel isoforms. Nature Methods. 2024; 21 (5): 793-797 Publisher Full Text
2. Dong X, Du M, Gouil Q, Tian L, et al.: Benchmarking long-read RNA-sequencing analysis tools using in silico mixtures. Nature Methods. 2023; 20 (11): 1810-1821 Publisher Full Text
3. Chen Y, Davidson N, Wan Y, Yao F, et al.: A systematic benchmark of Nanopore long-read RNA sequencing for transcript-level analysis in human cell lines. Nature Methods. 2025; 22 (4): 801-812 Publisher Full Text
4. Pardo-Palacios F, Wang D, Reese F, Diekhans M, et al.: Systematic assessment of long-read RNA-seq methods for transcript identification and quantification. Nature Methods. 2024; 21 (7): 1349-1363 Publisher Full Text

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Long-read transcriptomics, including development of tools and their benchmarking for long-read transcriptomics analysis. I am less experienced in the field of alignment algorithms and cannot judge this implementation as thoroughly.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

CITE

Report a concern

Respond or Comment

Views

13

Reviewer Report 04 Aug 2025

Colin Dewey, University of Wisconsin-Madison, Wisconsin, USA

Approved with Reservations

https://doi.org/10.5256/f1000research.181114.r393921

The authors describe a novel method and associated software, NIFFLR, for identifying and quantifying expressed transcript structures (both known and novel) from long, noisy RNA sequencing data, such as that produced by Oxford Nanopore Technologies (ONT). A key challenge for ... Continue reading

The authors describe a novel method and associated software, NIFFLR, for identifying and quantifying expressed transcript structures (both known and novel) from long, noisy RNA sequencing data, such as that produced by Oxford Nanopore Technologies (ONT). A key challenge for methods addressing this task is the difficulty in identifying the precise locations of exon boundaries in the presence of frequent sequencing errors, particularly indels. The novel approach taken by NIFFLR is the converse of that of other methods: instead of aligning reads to the genome, NIFFLR aligns annotated exons within the genome to the read. NIFFLR uses a partial suffix array approach to efficiently align exons to the reads and a graph-based algorithm to identify the likely transcript structure for each read, followed by a series of heuristics to filter and quantify the transcripts. With both simulated and real data, NIFFLR's accuracy is compared to that of three other methods: FLAIR2, IsoQuant, and ESPRESSO. These experiments suggest that NIFFLR has high sensitivity and quantification accuracy comparable to the next best method. Runtime measurements show that NIFFLR runs in time comparable to the fastest methods.

Major comments:

1. A key limitation of the method is its reliance on a known set of exons: all predicted transcripts must be combinations of known exons. It appears that even slight variations in the boundaries of exons must be previously annotated for the method to predict transcripts with those variant exons. This general issue is brought up in the last two sentences of the discussion. The authors suggest that when the full exon set is not known, short read data could be used to delineate the exons. That may be a feasible strategy, but it is not implemented or evaluated in this work, nor do the evaluations consider novel exons. This work would be strengthened by evaluations that include the more realistic scenario of novel exons (including 5' and 3' end variants of known exons). By definition, NIFFLR will not be able to detect or quantify transcripts including these exons, but the impact of these exons on the precision and quantification accuracy of other transcripts should be measured. Alternatively, the authors could implement the exon delineation strategy that they mentioned and examine NIFFLR's performance in combination with this strategy. Admittedly, the evaluation on the real data set potentially includes reads from transcripts with novel exons, but it is difficult to discern from the results presented how any novel exons impacted the method's performance.

2. A number of details of the method are omitted or unclear. A more formal and detailed presentation of the algorithm is needed. For example, (A) what algorithm is used to solve the "exon tiling problem"? (B) what, precisely, is the objective function? (C) is the algorithm guaranteed to find the optimal solution? (D) how is an edge allowed between overlapping exons? (E) what is the intuition behind distributing reads "proportionally to the size of the container transcripts"? Personally, I would like to see a more mathematical presentation and a figure depicting the steps of the entire procedure, particularly those detailed in the paragraph beginning "By design, all intron junctions...".

3. With respect to the software package, I was ultimately able to compile the software after installing the Boost and zlib development libraries. It would be helpful to have these compilation dependencies noted in the README. For ease of use by the community, it would also be helpful to have the package available via conda and/or Docker. Finally, please provide a small test dataset with the software such that users can make sure that their installation is working and can see what the inputs and outputs look like.

Minor comments:

4. From the filtering steps of the algorithm, could there be an output provided that might indicate the presence of novel exons at a given locus? For example, could the user be provided the number of reads that mapped to a locus but that were not ultimately assigned to a transcript?

5. For quantification evaluations, the box and whisker plots of log2 fold changes are helpful, but I would have also liked to have seen scatterplots of true vs. predicted counts (on log-scaled axes) along with correlation values, as has been common for short-read quantification methods. Such scatterplots help to visualize potential subsets of transcripts that have biased predictions and trends relative to the magnitude of expression.

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Partly
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Partly

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Computational biology, bioinformatics, transcriptomics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 20 Jun 2025

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2	3
Version 1 20 Jun 25	read	read	read

Colin Dewey, University of Wisconsin-Madison, Wisconsin, USA
Fairlie Reese, University of California, Irvine, USA
Yuan Gao, Chinese Academy of Sciences, Beijing, China

Comments on this article

All Comments(0)

Add a comment

Sign up for content alerts

Browse by related subjects

Back to all reports

Reviewer Report

12 Views

04 Aug 2025 | for Version 1

Yuan Gao, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, Beijing, China

12 Views Cite this report Responses(0)

Not Approved

Guo et al. proposed NIFFLR, a tool for assembling and quantifying transcripts using long-read RNA-seq data. However, the current manuscript does not provide sufficient evidence to demonstrate the novelty or efficiency of NIFFLR in analyzing long-read data. Their evaluation and conclusions are not convincing.
1 Most readers would be confused about the novelty or advances of NIFFLR. It’s based on the alignment of constructed exon-exon junction sequences, a strategy used by many tools long ago. Does this strategy work for mini-exons that are shorter than 10 nt or 20 nt? This strategy also heavily depends on annotated exons. Can it identify any novel splice donors or acceptors? For tissues or organisms with incomplete annotation, how will this strategy be affected? All of above need be carefully evaluated and described in details.
2 The aligner (psa_aligner) used by the authors needs comprehensively assessment, e.g. whether psa_aligner is suitable for noisy long-read data. An evaluation and comparison with the commonly used long-read aligner Minimap2 should be included.
3 The NIFFLR programming script is poorly written and difficult to install and use for analysis. For example, when I tried to run NIFFLR, I received an error message from python (not NIFFLR) and could not finished the analysis. The error, “IndexError: list index out of range”, occurred during the “performing filtering and quantification” step.
4 The authors used simulated data to evaluate NIFFLR and other tools. However, this evaluation may result in biased results. Many reports have noted that existing simulators are not suitable for evaluation. For example, according to a recent paper published by Dr. Hagen Tilgner and colleagues (Mikheenko A, et al., 2022 [Ref 1]), NanoSim randomly selects a starting position in a transcript to simulate truncation based on a uniform distribution. However, in real long-read data, a uniform distribution cannot be observed.
5 A direct comparison of the tools for transcript identification and quantification using real long-read data with ground truth need be included. Many previous studies used SIRV E2 for evaluation, which contains 69 synthesized transcript isoforms with different abundances. I actually tried to run NIFFLR to analyze SIRV data, but it failed. NIFFLR produced an "exon extraction failed" error when processing the GTF annotation of SIRV, while I did not encounter any errors when using other tools. This is another example of poor programming of NIFFLR.
6 I am stunned that the authors used transcript abundance from short-read data to evaluate the performance of long-read tools. The bias of short-read data has been so widely reported. For example, a Nature Methods paper published by Chen et al. 2025 [Ref 2] provided important evidence on this. In addition to the bias, the novel transcript isoforms in the data that are similar to annotated isoforms would also confuse short-read quantification.
7 The authors need to provide sufficient evidence before arbitrarily judging existing tools. For example, they described ESPRESSO’s strategy for identifying novel splice junctions as “a stringent criterion that limits its ability to discover novel junctions”. However, they did not provide any evidence regarding the sensitivity of detecting novel splice junctions. Novel splice junctions can be further divided into junctions with novel splice sites and junctions as novel combination of annotated splice sites. Both types need to be compared between ESPRESSO and NIFFLR before the authors can draw such a conclusion.
8 Why was BamBu (Chen Y, et al., 2023 [Ref 3]) not included in the evaluation and comparison? BamBu was published online more than two years ago and has been widely used for long-read data analysis. To demonstrate the advantages of their tool, the authors will need to compare it with BamBu.

Is the rationale for developing the new software tool clearly explained?

Partly
Is the description of the software tool technically sound?

No
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Partly
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

No

References

1. Mikheenko A, Prjibelski A, Joglekar A, Tilgner H: Sequencing of individual barcoded cDNAs using Pacific Biosciences and Oxford Nanopore Technologies reveals platform-specific error patterns. Genome Research. 2022; 32 (4): 726-737 Publisher Full Text
2. Chen Y, Davidson N, Wan Y, Yao F, et al.: A systematic benchmark of Nanopore long-read RNA sequencing for transcript-level analysis in human cell lines. Nature Methods. 2025; 22 (4): 801-812 Publisher Full Text
3. Chen Y, Sim A, Wan Y, Yeo K, et al.: Context-aware transcript quantification from long-read RNA-seq data with Bambu. Nature Methods. 2023; 20 (8): 1187-1195 Publisher Full Text

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Bioinformatics, Computational Biology

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

14 Views

04 Aug 2025 | for Version 1

Fairlie Reese, University of California, Irvine, California, USA

14 Views Cite this report Responses(0)

Not Approved

The authors present NIFFLR, a minimap2-free tool for the assembly and quantification of known and novel transcripts from long-read RNA-seq data. In the paper, they describe NIFFLR, the developed method, which works using partial suffix arrays and kmer-matching of annotated exons to the long reads themselves; thus bypassing the potential error-prone choices that are made when aligning noisy long reads at splice junctions. Overall, this seems like a really interesting and novel approach to a problem (errors in minimap2 alignments) that I’ve seen several times now in the field. However, I have several major concerns about the quality of the benchmarking and how the paper integrates into the overall field. I elaborate on my concerns below.

Major concerns:

It’s unclear why certain parameter choices were made for the implementation, such as k = 12 or coverage >= 35%, at the level of the exon to read alignment or Amin / Gmin at the level of choosing the most optimal path. The authors mention that they performed optimization but the results of these experiments are not included. Adding these results showing that this performance is optimal would increase the confidence in the tool and these decisions taken.
One of my main concerns is that, based on my understanding of the method, the only novel transcripts that can be discovered by NIFFLR are those that use already-annotated exons. This could substantially limit its ability to identify biologically relevant novel isoforms, which frequently involve novel splice sites or exons and are described in previous long-read RNA-seq studies. The authors should consider alternative methods to include novel splicing as part of their novel isoform discovery.
Related to point 2, it is hard to assess the performance of the tool relative to other reported findings in the field because it only contains novel-in-catalog novel transcripts according to the very commonly-used SQANTI classification (Pardo-Palacios F, et al., 2024 [Ref 1]). It is common in the field to examine the proportions of transcripts from each of these categories as a quality control metric.
My main concern in this paper however is the quality of the benchmarking. There have been many papers that have performed long-read RNA-seq benchmarking in the past and have employed various metrics to evaluate the results: (Dong X, et al., 2023 [Ref 2], Chen Y, et al., 2022 [Ref 3], Pardo-Palacios F, et al., 2024 [Ref 4]),
The manuscript would benefit from adopting such standard benchmarking strategies that have become widely-accepted. As it stands, it is very tough to contextualize its results in the field as a whole. Some more point-by-point recommendations would be at least the following:
1. For the simulations, as referenced in point 2, it is unrealistic to expect that novel transcripts will only arise from novel combinations of known exons (NICs). The authors should consider using existing simulated novel transcript ground truth datasets that exist, such as the one from LRGASP.
2. For the quantification benchmarking, it is more common to report a correlation metric between the ground truth and the estimated quantification from the tool. This would make these results more interpretable and comparable to other benchmarking efforts in the field.
3. Also related to the quantification benchmarking, there are no significance values reported for any of the pairwise comparisons; just written speculation on the visual appearance of the plots. Statistical analyses would increase the confidence of these results.
4. Finding more isoforms is not necessarily a metric of a “better” tool, but is referenced as if it is in the text related to Figure 5. In fact, the percentage of reported “novel” transcripts for various GTEx tissues is surprisingly high and therefore the high number of transcripts could be indicative of over-calling novel transcripts and therefore poor specificity. Instead, the authors should additionally overlap the discovered isoforms with those discovered by Glinos et al. in their original paper (or other external datasets) to see how well their method recapitulates what others have already said about the dataset.
Would the authors be able to speculate on any specific use case (a specific cohort, a specific technology, etc) where the exons-to-reads alignment approach might be especially beneficial compared to traditional minimap2-based approaches? This might add some insight to the discussion.

Minor concerns:

As the development of the method is of central importance to the paper, the implementation could be expanded on or explained a bit more. In particular, the concepts of the “implied” starts / ends of the alignments were confusing in Figure 1 and related text. Similarly, Figure 2 was a bit overly-complicated, and perhaps the authors could consider presenting the possible transcript paths separately as alternative transcripts in genome browser format (ie IGV or UCSC).
At the end of the first paragraph of main text on page 7, the authors state “This result demonstrates that when novel isoform discovery and quantification are the primary goals, NIFFLR is the best tool” when referring to the simulation experiment where they measured isoform detection / assembly only. This means this result has no relevance for quantification.
If the authors mean “false positive transcripts” by “spurious transcripts”, they should simply refer to them as the latter as there is no definition for “spurious” transcripts in the text.
In table 2, IsoQuant has the same number of known and total isoforms; implying it found no novel isoforms. I am highly doubtful this is correct and is probably a typo.
Figure 3 is missing y axis labels. Furthermore, the meaning of the box and whiskers are elaborated on in the results, which should just be in the legend. Additionally, Figure 3b is just a zoomed in duplicate of two plots from 3a and is unnecessary.
For Figure 4 the authors describe a strange method of depth normalization to compare the long-read RNA-seq to short-read RNA-seq transcript quantification estimates. They should use normal TPM / CPM normalization. This figure also has unnecessary details about the plot in the results section which should be in the legend (same as in Figure 3).
The authors make no references to their supplementary material PDF in the main body of the text. They should include references so that readers know where to find the calls made to perform the benchmarking etc.

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Partly
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

No
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Partly

References

1. Pardo-Palacios F, Arzalluz-Luque A, Kondratova L, Salguero P, et al.: SQANTI3: curation of long-read transcriptomes for accurate identification of known and novel isoforms. Nature Methods. 2024; 21 (5): 793-797 Publisher Full Text
2. Dong X, Du M, Gouil Q, Tian L, et al.: Benchmarking long-read RNA-sequencing analysis tools using in silico mixtures. Nature Methods. 2023; 20 (11): 1810-1821 Publisher Full Text
3. Chen Y, Davidson N, Wan Y, Yao F, et al.: A systematic benchmark of Nanopore long-read RNA sequencing for transcript-level analysis in human cell lines. Nature Methods. 2025; 22 (4): 801-812 Publisher Full Text
4. Pardo-Palacios F, Wang D, Reese F, Diekhans M, et al.: Systematic assessment of long-read RNA-seq methods for transcript identification and quantification. Nature Methods. 2024; 21 (7): 1349-1363 Publisher Full Text

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Long-read transcriptomics, including development of tools and their benchmarking for long-read transcriptomics analysis. I am less experienced in the field of alignment algorithms and cannot judge this implementation as thoroughly.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

13 Views

04 Aug 2025 | for Version 1

Colin Dewey, University of Wisconsin-Madison, Wisconsin, USA

13 Views Cite this report Responses(0)

Approved With Reservations

The authors describe a novel method and associated software, NIFFLR, for identifying and quantifying expressed transcript structures (both known and novel) from long, noisy RNA sequencing data, such as that produced by Oxford Nanopore Technologies (ONT). A key challenge for methods addressing this task is the difficulty in identifying the precise locations of exon boundaries in the presence of frequent sequencing errors, particularly indels. The novel approach taken by NIFFLR is the converse of that of other methods: instead of aligning reads to the genome, NIFFLR aligns annotated exons within the genome to the read. NIFFLR uses a partial suffix array approach to efficiently align exons to the reads and a graph-based algorithm to identify the likely transcript structure for each read, followed by a series of heuristics to filter and quantify the transcripts. With both simulated and real data, NIFFLR's accuracy is compared to that of three other methods: FLAIR2, IsoQuant, and ESPRESSO. These experiments suggest that NIFFLR has high sensitivity and quantification accuracy comparable to the next best method. Runtime measurements show that NIFFLR runs in time comparable to the fastest methods.

Major comments:

1. A key limitation of the method is its reliance on a known set of exons: all predicted transcripts must be combinations of known exons. It appears that even slight variations in the boundaries of exons must be previously annotated for the method to predict transcripts with those variant exons. This general issue is brought up in the last two sentences of the discussion. The authors suggest that when the full exon set is not known, short read data could be used to delineate the exons. That may be a feasible strategy, but it is not implemented or evaluated in this work, nor do the evaluations consider novel exons. This work would be strengthened by evaluations that include the more realistic scenario of novel exons (including 5' and 3' end variants of known exons). By definition, NIFFLR will not be able to detect or quantify transcripts including these exons, but the impact of these exons on the precision and quantification accuracy of other transcripts should be measured. Alternatively, the authors could implement the exon delineation strategy that they mentioned and examine NIFFLR's performance in combination with this strategy. Admittedly, the evaluation on the real data set potentially includes reads from transcripts with novel exons, but it is difficult to discern from the results presented how any novel exons impacted the method's performance.

2. A number of details of the method are omitted or unclear. A more formal and detailed presentation of the algorithm is needed. For example, (A) what algorithm is used to solve the "exon tiling problem"? (B) what, precisely, is the objective function? (C) is the algorithm guaranteed to find the optimal solution? (D) how is an edge allowed between overlapping exons? (E) what is the intuition behind distributing reads "proportionally to the size of the container transcripts"? Personally, I would like to see a more mathematical presentation and a figure depicting the steps of the entire procedure, particularly those detailed in the paragraph beginning "By design, all intron junctions...".

3. With respect to the software package, I was ultimately able to compile the software after installing the Boost and zlib development libraries. It would be helpful to have these compilation dependencies noted in the README. For ease of use by the community, it would also be helpful to have the package available via conda and/or Docker. Finally, please provide a small test dataset with the software such that users can make sure that their installation is working and can see what the inputs and outputs look like.

Minor comments:

4. From the filtering steps of the algorithm, could there be an output provided that might indicate the presence of novel exons at a given locus? For example, could the user be provided the number of reads that mapped to a locus but that were not ultimately assigned to a transcript?

5. For quantification evaluations, the box and whisker plots of log2 fold changes are helpful, but I would have also liked to have seen scatterplots of true vs. predicted counts (on log-scaled axes) along with correlation values, as has been common for short-read quantification methods. Such scatterplots help to visualize potential subsets of transcripts that have biased predictions and trends relative to the magnitude of expression.

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Partly
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Partly

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Computational biology, bioinformatics, transcriptomics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Respond to this report

Responses (0)

[1] Gao Y, Wang F, Wang R, et al.: ESPRESSO: robust discovery and quantification of transcript isoforms from error-prone long-read RNA-seq data. Science Advances. 2023 Jan 20; 9(3): eabq5072. PubMed Abstract | Publisher Full Text | Free Full Text

[2] Glinos DA, Garborcauskas G, Hoffman P, et al.: Transcriptome variation in human tissues revealed by long-read sequencing. Nature. 2022 Aug 11; 608(7922): 353–359. PubMed Abstract | Publisher Full Text | Free Full Text

[3] Kovaka S, Zimin AV, Pertea GM, et al.: Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 2019 Dec; 20: 273–278. PubMed Abstract | Publisher Full Text | Free Full Text

[4] Li H: Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018 Sep 15; 34(18): 3094–3100. PubMed Abstract | Publisher Full Text | Free Full Text

[5] Pertea G, Pertea M: GFF utilities: GffRead and GffCompare. F1000Res. 2020; 9.

[6] Prjibelski AD, Mikheenko A, Joglekar A, et al.: Accurate isoform discovery with IsoQuant using long reads. Nat. Biotechnol. 2023 Jul; 41(7): 915–918. PubMed Abstract | Publisher Full Text | Free Full Text

[7] Tang AD, Soulette CM, van Baren MJ , et al.: Full-length transcript characterization of SF3B1 mutation in chronic lymphocytic leukemia reveals downregulation of retained introns. Nat. Commun. 2020 Mar 18; 11(1): 1438. PubMed Abstract | Publisher Full Text | Free Full Text

[8] Varabyou A, Sommer MJ, Erdogdu B, et al.: CHESS 3: an improved, comprehensive catalog of human genes and transcripts based on large-scale expression data, phylogenetic analysis, and protein structure. Genome Biol. 2023 Oct 30; 24(1): 249. PubMed Abstract | Publisher Full Text | Free Full Text

[9] Yang C, Chu J, Warren RL, et al.: NanoSim: nanopore sequence read simulator based on statistical characterization. GigaScience. 2017 Apr; 6(4): 1–6. PubMed Abstract | Publisher Full Text | Free Full Text

[10] Zimin AV, Marçais G, Puiu D, et al.: The MaSuRCA genome assembler. Bioinformatics. 2013 Nov; 29(21): 2669–2677. PubMed Abstract | Publisher Full Text | Free Full Text

Assembly and quantification of transcripts from noisy long reads with NIFFLR

Abstract

Background

Methods

Results

Conclusions

Keywords

Introduction

Methods

Implementation

Figure 1. Definitions of alignment coordinates.

Figure 2. An illustration of the optimal path of exons through a long transcriptomic read (shown in green).

Operation

Results

Comparison on simulated long reads

Table 1. Performance of the assembly and quantification pipelines on simulated data.

Figure 3. (a) Box and whisker plots of the log2 ratios (y-axis) of the actual and computed read counts for each transcript for simulated reads.

Comparison on a real data sample sequenced with both Illumina and ONT technologies

Table 2. Performance of long-read transcriptome assembly and quantification methods on GTEx ONT data. NIFFLR recovers the largest number of reference isoforms.

Figure 4. Comparison of scaled transcript coverages computed with StringTie from Illumina RNA-seq reads and the read counts computed with long-read pipelines from Oxford Nanopore reads for the same sample.

Isoform discovery with NIFFLR on 92 GTEx samples

Figure 5. The number of novel isoforms discovered by NIFFLR vs. the number of samples these isoforms were found.

Table 3. Breakdown of novel and known transcripts found by NIFFLR in GTEx long-read data by tissue.

Discussion

Timings comparison

Table 4. Timings for the quantification software measured on the simulated and real data.

Software availability

Ethical considerations

Data availability

Acknowledgements

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated