Background

F1000Research

2046-1402

F1000 Research Limited

London, UK

10.12688/f1000research.164583.1

Software Tool Article

Articles

Assembly and quantification of transcripts from noisy long reads with NIFFLR

[version 1; peer review: 1 approved with reservations, 2 not approved]

Guo

Alina

Formal Analysis Investigation Methodology Software Validation Writing – Original Draft Preparation Writing – Review & Editing https://orcid.org/0000-0002-6664-0145 1 Pertea

Mihaela

Investigation Methodology Validation Writing – Review & Editing https://orcid.org/0000-0003-0762-8637 1 2 3 Zimin

Aleksey V

Conceptualization Data Curation Formal Analysis Investigation Methodology Project Administration Resources Software Supervision Validation Visualization Writing – Original Draft Preparation Writing – Review & Editing https://orcid.org/0000-0001-5091-3092 a 1 2 1Center for Computational Biology, Johns Hopkins University, Baltimore, MD, 21205, USA 2Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, 21205, USA 3Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, 21205, USA

a aleksey.zimin@gmail.com

No competing interests were disclosed.

20 6 2025

2025

608

13 6 2025

2025

This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Background

Long-read RNA sequencing technologies can produce complete or near-complete transcript sequences. Recently introduced methods for direct RNA and cDNA sequencing can provide a high-throughput strategy for the discovery of novel and rare gene isoforms. However, the high error rates in ONT sequences limit the ability to exactly pinpoint splice site boundaries when aligning reads to the genome.

Methods

In this paper, we present a novel tool called NIFFLR (Novel IsoForm Finder using Long Reads) that identifies and quantifies both known and novel isoforms using long-read RNA sequencing data. NIFFLR recovers known transcripts and assembles novel transcripts present in the data by aligning exons from a reference annotation to the long reads.

Results

NIFFLR effectively recovers correct transcripts from simulated reads based on known transcript annotations, achieving higher sensitivity and precision compared to several previously-published tools. On real data, NIFFLR shows the high accuracy as measured by concordance of isoform counts to the counts computed from Illumina data for the same sample. We applied NIFFLR to a set of 92 GTEx long-read samples and produced transcript counts for both novel and known isoforms. In total, we identified and quantified 121,155 isoforms present in the RefSeq annotation of GRCh38 and 106,667 high-confidence novel isoforms across 32,875 genes present in two or more samples in these data, more than previous studies identified in this data set.

Conclusions

NIFFLR is an effective tool aimed at assembly and quantification of transcripts present in the long high error transcriptome reads. NIFFLR is released under an open-source license (GPL 3.0) and is available on GitHub at https://github.com/alguoo314/NIFFLR/releases.

transcriptome quantification assembly discovery annotation

Directorate for Biological Sciences

IOS-2432298

NIH

R35-GM130151

NIH

R01-HG006677

This work was supported by National Science Foundation grant IOS-2432298 to Johns Hopkins University (PI Zimin, Co-PI Salzberg), and by National Institutes of Health grants to Johns Hopkins University R01-HG006677 (PI Salzberg) and R35-GM130151 (PI Salzberg). Zimin is a member of the Salzberg lab at JHU.

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Introduction

Direct RNA and cDNA sequencing technologies from Oxford Nanopore Technologies (ONT) produce long transcriptome reads with high yields at relatively low cost. However, the per-base error rates of ONT reads are still much higher than those of Illumina reads. Several computational tools have recently been developed to assemble transcripts and quantify isoforms in samples sequenced using ONT reads, including FLAIR ( Tang AD et al., 2020), ESPRESSO ( Gao Y et al., 2023), and IsoQuant ( Prjibelski AD et al., 2023). All these tools begin by mapping the long reads to the genome using the Minimap2 ( Li H, 2018) aligner in spliced alignment mode. However, the high error rate of ONT reads makes it challenging to precisely identify splice sites through spliced alignment alone. Therefore, these tools incorporate additional information to locate the splice sites accurately. FLAIR can correctly identify splice sites by either using alignments of short-read RNA-seq data or by using a reference annotation. ESPRESSO accepts novel splice junctions only if at least one read aligns perfectly to the reference genome within 10 nucleotides (nt) of the splice site, a stringent criterion that limits its ability to discover novel junctions. IsoQuant replaces novel splice sites with nearby annotated sites within a user-defined distance and restores short, skipped exons according to the reference annotation. For all these programs, misalignments can lead to incorrect identification of splice junctions, which may subsequently result in inaccurate transcript reconstruction.

Here, we present NIFFLR (Novel IsoForm Finder using Long Reads), a tool designed to construct and quantify both annotated and novel isoforms using a reference annotation and long RNA sequencing reads. Unlike other isoform identification tools, NIFFLR does not rely on a spliced aligner to map reads onto the reference genome. Instead, it extracts exons from the given annotation and aligns them directly to the long reads. NIFFLR then constructs transcripts by identifying an optimal path through the mapped exons for each long read, removes redundant transcripts that are contained within others, filters out transcripts with low read support, compares the predicted transcripts to the reference annotation, and finally quantifies both annotated and novel isoforms. For efficient exon-to-read alignment, NIFFLR uses a custom aligner based on a partial suffix array adapted from the MaSuRCA assembler ( Zimin et al., 2013).

Methods Implementation

We designed the NIFFLR algorithm to build transcripts (i.e., sequences of exons) by computing the optimal tiling of every long read using exons and transcripts provided as input. We require the following inputs: long RNA sequencing reads in FASTQ format, a reference genome sequence file in FASTA format, and a reference annotation file in GTF format.

First, we extract the exon sequences from the reference genome using the annotation and output them into a FASTA file. The name of each exon encodes the chromosome name, start and end position on the chromosome, the name of the gene to which the exon belongs, and its orientation. We reverse complement all exon sequences that are on the reverse strand.

We then use a version of a technique first utilized in the MaSuRCA assembler ( Zimin et al., 2013) to efficiently compute approximate alignments of exons to the long reads. This alignment technique, which we refer to as psa_aligner, is based on a partial suffix array (PSA). The PSA is designed to efficiently compute approximate alignments, or alignment intervals between two sets of DNA sequences. The psa_aligner first builds a partial suffix array from a concatenated string S containing the sequences of all exons, separated by the letter ‘N’ (note that no ‘N’ characters are allowed in the reference sequence). We also record the starting position of each exon in S. Unlike a traditional suffix array, the PSA limits the suffix size to a predefined value K. The suffix array allows us to quickly locate all occurrences of a given subsequence of length K (or a K-mer) within S, and thus identify all exons and positions where a particular K-mer occurs. We then examine each K-mer in a given long read and compute all the longest common sub-sequences (LCS) of K-mers between the read and the exons, using a default value of K = 12. The approximate alignment coordinates are then determined by calculating the best linear fit between the positions of K-mers belonging to the LCS in the read and on the exon. We only retain alignments where matching K-mers cover at least 35% of the bases within the match interval. Each alignment provides alignment start and end positions, along with the exon and read overhangs, as shown in Figure 1. For each exon, we record the number of K-mers in the LCS, the alignment start and end positions, and the implied start and end on the read. The implied start is calculated as alignment_start-a_overhang, and the implied alignment end is alignment_stop+b_overhang.

Figure 1. Definitions of alignment coordinates.

After building the alignments, we assign each long read to a gene locus using a “majority vote” approach. Specifically, for each read, we compute the total number of K-mers in all LCSs for all matching exons from different gene loci and assign the read to the locus L whose exons have the highest total number of matching K-mers. Alignments of any exons that belong to different gene loci are then discarded. Next, we build the transcript matching the read by finding the best tiling of read using exons that belong to locus L. The best sequence maximizes coverage of the read while minimizing gaps or overlaps in the implied alignment coordinates. The long read defines a 5’ to 3’ forward direction, specifying a topological order. We sort the aligning exons in the order of their “alignment start” coordinates if aligned in the forward direction, or “alignment end” coordinates if aligned in the reverse direction. Since we only kept alignments of exons that all belong to a single gene locus L, the exons must all align either in forward or reverse direction. For simplicity, below we describe the algorithm assuming all exons are aligned in the forward direction; the reverse case is treated the same, by reversing the long read.

We represent the exon tiling problem as a graph, where nodes represent exons and edges are defined by gaps or overlaps of 20 bases or less between the implied end of an exon and the implied start of the following exon in the topological order. Next, we choose the “starting” nodes that are not connected on the left. A starting node must be connected on the right and have an “alignment start” closest to the 5’ end of the long read or fully cover it. If multiple exons share the same “alignment start” coordinate due to alternative splicing, we select the exon with the smallest 5’ “overhang”. If the 5’ overhang is the same for more than one exon, we use all such exons as alternative start nodes. We solve the exon tiling problem by finding the longest path through the graph, starting from any start node that minimizes the penalty, defined as the average gap/overlap size between connected exons in the path. In case of a tie, we select the path that maximizes the sum of exon matching lengths minus the sum of the overhangs of the first and last exons. Figure 2 illustrates an example of such a path. Once the longest path is identified, we examine the genomic coordinates of the exons, which are encoded in their sequence IDs. We eliminate the path if there is an overlap between the genomic coordinates of the exons in the path, which could indicate that the long read is chimeric or that there is a significant local genome rearrangement that NIFFLR cannot handle.

Figure 2. An illustration of the optimal path of exons through a long transcriptomic read (shown in green).

Shading shows the alignment regions. Arrows indicate links. The best path shown in red is the longest path that minimizes the gap/overlap/overhang penalty. Exon1 is chosen as the start exon because exon1+ exon3 have a longer alignment than exon2. Exon5 is alternatively spliced compared to exon6 and exon7, and its longest match is the same as exon6’s, shorter than exon6 and exon7 combined, and hence not selected for the optimal path. Exon2 is alternatively spliced as well.

We convert the best path of exons for each read into a plausible transcript and then group reads that yield the same transcript. For each transcript, we record the reads contributing to it, along with the minimum of the average gap/overlap penalty (A _min) and the minimum of the maximum gap/overlap penalty (G _min) across all paths of reads that yielded the transcript. In subsequent steps, we use only those transcripts where A _min < 5 and G _min < 15. These values are empirically obtained parameters and they yielded the best performance in our experiments with simulated reads.

We then use the GffCompare tool to create a set of maximal transcripts by removing those whose intron chains are contained in longer assembled transcripts. We call this set of transcripts “non-redundant”. Next, we perform the first round of transcript quantification, using all originally assembled transcripts to assign reads to the non-redundant transcripts based on containment. Reads from assembled transcripts, which are contained in multiple maximal transcripts, are distributed proportionally to the size of the container transcripts. For each maximal transcript, we calculate the following: 1.

The number of reads supporting the transcript.

The minimum read coverage across all intron junctions.

The total number of junctions covered by at least one read.

The portion of the transcript covered by reads.

By design, all intron junctions are covered in the maximal set. After quantification, we perform a transcript recovery step where we attempt to recover reference annotation transcripts that are likely present in the sample, but their intron chains are not completely covered by any long read. If a maximal transcript is contained within a reference transcript, we tentatively replace the contained transcript with the containing transcript from the reference. We then perform quantification again and eliminate multi-exon reference transcripts where none of the intron junctions are spanned by long reads, which means that only one exon had reads aligned to it. These reference transcripts are unlikely to be present in the sample. This procedure is designed to eliminate computed isoforms whose intron chains are contained in the reference transcripts, as these are unlikely to represent genuinely novel isoforms and are likely sequenced from known transcripts. Next, we identify novel transcripts (i.e., those not present in the reference) and apply stricter filtering criteria, requiring the minimum average gap in the exon paths to be less than 2 and the minimum of the maximum gap to be less than 5. This yields the final set of transcripts, containing both novel and known transcripts, which we then again quantify to produce the final set of quantified transcripts.

Operation

NIFFLR is designed to run under 64-bit Linux operating system. NIFFLR requires at least 16Gb of RAM and supports multi-core multi-threaded hardware environment. NIFFLR code consists of shell and Python scipts and C++ code. We provide installation instructions for NIFFLR on github: https://github.com/alguoo314/NIFFLR. Basic usage of NIFFLR is as follows: /path/nifflr.sh -r genome.fasta -f reads.fastq -g genome.gtf.

Results

In this section, we compare NIFFLR to other similar methods such as FLAIR2, IsoQuant, and ESPRESSO, and discuss the results of applying NIFFLR to ONT data from the Genotype-Tissue Expression (GTEx) project ( Glinos et al., 2022). We performed two evaluations to compare NIFFLR to the existing methods. First, we assessed the performance of each program on a set of simulated ONT direct RNA sequencing reads. Next, we tested all programs on a sample from the GTEx project that was sequenced using both Illumina and ONT technologies.

Comparison on simulated long reads

We simulated reads using NanoSim software ( Yang et al., 2017) from the human reference genome GRCh38.p14 and its corresponding RefSeq genome annotation (RS_2024_08). We derived read error profiles from ONT reads of GTEx sample 1192X, which was sequenced with both Illumina RNA-seq and ONT technologies. We used the Illumina reads from the same sample to generate an expression profile for the simulation. Our simulated data set contained approximately 7.8 million reads with an average error rate of 8.7% and an N50 read length of 944 bp. According to Nanosim output, the simulated set had 50,748 unique transcripts expressed.

All programs in this comparison allow the use of a reference annotation to identify and correct splice junctions, and we provided such annotation in all our experiments. Note that FLAIR and IsoQuant have options allowing them to run without annotation, but their accuracy is higher if annotation is provided. To make the evaluation more realistic, we split the reference annotation into a “core” set of transcripts, which is the set with the smallest number of transcripts where each exon was present at least once (referred to as the known set), and the rest of the transcripts (referred to as the novel set). By design, the core set contained every reference donor and acceptor splice site at least once. We provided the core set but not the novel set to all programs. This way we ensured that some portion of the expressed transcripts were not present in the input set of the reference transcripts, enabling us to measure the programs’ ability to discover and quantify novel transcripts in addition to the known transcripts. Our simulated set consisted of reads simulated from 50,748 transcripts, of which 33,686 comprised the core set and the remaining 17,062 comprised the novel set. In our experiments, we measured the number of novel and known transcripts correctly recovered by the programs, as well as the number of false positive transcripts, using the GffCompare tool ( Pertea & Pertea, 2020). False positives were defined as any transcripts output by the programs that did not have a complete intron chain match to a transcript in the known or novel set. Table 1 shows the comparison of the programs on the simulated data. NIFFLR has the best sensitivity in recovering known, novel, and all isoforms, and the best overall F1 score, while only losing to IsoQuant in precision. NIFFLR recovers the most isoforms from both the known and novel sets while keeping the number of spurious isoforms relatively low. This result demonstrates that when novel isoform discovery and quantification are the primary goals, NIFFLR is the best tool.

Table 1. Performance of the assembly and quantification pipelines on simulated data.

The best values are in bold. NIFFLR recovers the most novel isoforms and the most isoforms total (32,711) while keeping the number of erroneous isoforms lower than FLAIR2 and ESPRESSO, resulting in the best sensitivity and F1 score for isoform recovery. Isoquant is the most conservative and the least sensitive, both on novel and known isoform discovery.

	# of novel isoforms	Sn for novel isoforms	# of known isoforms	Sn for known isoforms	# of all correct isoforms	Sn for all isoforms	Pr for all isoforms	F1 for all isoforms	# of spurious isoforms
All simulated transcripts	17062	100.0%	33686	100.0%	50748	100.0%	100.0%	100	0
FLAIR2	4988	29.2%	15529	46.1%	20517	40.4%	54.8%	46.5	41777
IsoQuant	1926	11.3%	19629	58.3%	21555	42.5%	98.1%	59.3	964
NIFFLR	5153	30.2%	27558	81.8%	32711	64.5%	73.5%	68.7	7961
ESPRESSO	1490	8.7%	20750	61.6%	22240	43.8%	67.7%	53.2	24198

We compared the read counts computed by each program for every transcript to the actual counts from the simulation. Figure 3a presents box-and-whisker plots of the ratios (expressed as base-2-logarithms) of the actual and computed counts for each transcript. The box spans the upper and lower quartile of the ratios and the whiskers represent the range for 95% of the values, with individual outliers outside of the 95% interval shown as dots. NIFFLR has a tighter distribution than FLAIR and ESPRESSO, though it is slightly outperformed by IsoQuant. ESPRESSO shows the worst overall performance, both in terms of the distribution’s tightness and bias. Figure 3b shows a more detailed comparison of the ratios between the computed counts from NIFFLR and IsoQuant, compared to the actual counts for the subset of 18,686 isoforms quantified by both tools. We observe that in this comparison the accuracy is nearly identical, with NIFFLR counts showing less overall bias. This figure suggests that the reason for the slightly lower accuracy (wider whiskers) of NIFFLR compared to IsoQuant in panel (a) is the inclusion of counts for many more transcripts by NIFFLR, capturing less reliable lower-count transcripts, which IsoQuant discards. In the simulated data comparison, NIFFLR demonstrates superior quantification accuracy and sensitivity overall.

Figure 3. <bold>(a)</bold> Box and whisker plots of the log2 ratios (y-axis) of the actual and computed read counts for each transcript for simulated reads.

The box spans the upper and lower quartile of the log2 ratios, and the whiskers represent 95% of the values, with individual outliers outside of the 95% interval shown as dots. IsoQuant and NIFFLR show the least variation from the true counts in the simulated data. (b) Box and whisker plots of the log2 ratios of the actual and computed read counts for each transcript from the set of 18,686 simulated transcripts quantified by both NIFFLR and IsoQuant. IsoQuant and NIFFLR show the same accuracy (the height of the box and whiskers are the same size) on this set of transcripts, however, NIFFLR counts have smaller bias (the mean and the median for NIFFLR are closer to zero) and fewer outliers.

Comparison on a real data sample sequenced with both Illumina and ONT technologies

For this experiment, we selected the GTEX-1192X sample, which was sequenced with both Illumina and Oxford Nanopore instruments. The ONT data contained 7.6 million long reads with an N50 of 872 bp and a total sequence of 5.3 Gbps. In this dataset, the exact expression of existing and novel transcripts is unknown. However, we can estimate the number and abundances of the known transcripts from the Illumina RNA-seq data, which provides much deeper coverage of the sample. We used StringTie2 ( Kovaka et al., 2019) in reference-guided mode to assemble the Illumina data, and this yielded 51,909 distinct transcript variants. The reference-guided mode of StringTie does not output any novel isoforms. Table 2 shows the number of total isoforms and known isoforms found by the four long-read quantification programs when using the ONT data. NIFFLR identified and quantified 43,093 transcripts that matched the reference, which was more than twice as many as any of the other pipelines. To evaluate the accuracy of the quantification, we compared the read counts computed by the programs to the transcript coverage values computed by StringTie on the Illumina data from the same sample. To adjust for the overall coverage difference, we multiplied the coverage values for the Illumina data by 1.59, corresponding to the ratio of the number of bases in the Illumina reads (8.5B bp) divided by the number of bases in the ONT reads (5.33B bp). Figure 4 presents box-and-whisker plots of the ratios (expressed as base-2-logarithms) of the scaled transcript coverages computed with StringTie from Illumina RNA-seq reads and the read counts computed with long-read pipelines from Oxford Nanopore reads for the same sample. The box spans the upper and lower quartile of the ratios and the whiskers represent the range for 95% of the values, with individual outliers outside of the 95% interval shown as dots. The quantification estimates produced by NIFFLR were the second most-consistent to StringTie, outperformed slightly by Isoquant. NIFFLR was the most sensitive, quantifying 26,312 isoforms found in the Illumina RNA-seq data by StringTie.

Table 2. Performance of long-read transcriptome assembly and quantification methods on GTEx ONT data. NIFFLR recovers the largest number of reference isoforms.

	# of reference isoforms	# of total isoforms
FLAIR2	14,957	75,557
IsoQuant	17,183	17,183
NIFFLR	43,093	58,377
ESPRESSO	21,026	26,222

Figure 4. Comparison of scaled transcript coverages computed with StringTie from Illumina RNA-seq reads and the read counts computed with long-read pipelines from Oxford Nanopore reads for the same sample.

NIFFLR quantified 26,312 reference transcripts that were also quantified with StringTie, far more than the competing pipelines. IsoQuant counts are the most consistent with StringTie counts derived from Illumina data for the same sample, and NIFFLR counts are the second closest.

Isoform discovery with NIFFLR on 92 GTEx samples

We applied NIFFLR to identify and quantify isoforms in 92 ONT GTEx samples described in ( Glinos et al., 2022), using the RefSeq annotation of GRCh38.p14 as the reference. Across all samples, we identified 135,343 known isoforms and 316,284 novel isoforms in 35,686 genes. Our high confidence set included isoforms identified in two or more sequence samples, and it includes 106,667 novel isoforms and 121,155 known isoforms across 32,875 genes. Number of isoforms identified by NIFFLR far exceeds the number reported by FLAIR ( Glinos et al., 2022), which identified 93,718 transcripts across 21,067 genes, of which 77% were novel. Figure 5 illustrates the distribution of counts of novel isoforms across all samples. Interestingly, NIFFLR identified 13 novel isoforms that were present in all 92 samples. Three of these 13 isoforms are annotated in the CHESS annotation version 3.0.1 ( Varabyou et al., 2023), or in the GENCODE annotation release 47, with one isoform present in both annotations. Table 3 shows the breakdown of novel and known transcripts found by NIFFLR in GTEx long-read data by tissue. As expected, the percentage of novel isoforms increases with increase in the number of samples for a given tissue, as rare isoforms become more abundant.

Figure 5. The number of novel isoforms discovered by NIFFLR vs. the number of samples these isoforms were found.

The total number of novel isoforms identified by NIFFLR in the 92 GTEX samples was 451,627. Of these, 223,805 were only seen in a single sample and 13 isoforms were identified in all 92 samples.

Table 3. Breakdown of novel and known transcripts found by NIFFLR in GTEx long-read data by tissue.

The share of novel isoforms increases with the increase in the number of samples for a given tissue. We used all isoforms identified by NIFFLR for the counts shown in this table.

Tissue	# Samples	Novel Transcripts	Known Transcripts	Percent Novel Transcripts
Adipose	1	9,273	30,159	23.5
Brain	22	113,294	103,644	52.2
Breast	1	8,391	32,940	20.3
Cultured Fibroblasts	22	156,097	103,354	60.2
Heart	16	71,024	86,431	45.1
K562 (Human Chronic Myelogenous Leukemia cell line)	4	22,056	33,677	39.6
Liver	8	46,781	68,622	40.5
Lung	8	73,414	84,373	46.5
Muscle	9	76,409	75,407	50.3
Pancreas	1	9,313	32,101	22.5

Discussion

In this manuscript, we describe a novel approach for the discovery and quantification of isoforms from long-read RNA sequencing data produced by Oxford Nanopore sequencing technology. The key difference between NIFFLR and other published programs with similar functionality is that NIFFLR aligns exons from the reference annotation directly to the reads, rather than performing spliced alignment of the reads to the genome. This approach works best for well-annotated genomes, such as the human genome, offering superior sensitivity in this case. However, NIFFLR can still be applied to genomes where their annotation is less reliable, after inferring potential exons from the Illumina RNA-seq data using transcriptome assemblers such as StringTie.

Timings comparison

NIFFLR is generally fast enough for research use. As shown in Table 4, NIFFLR was slower than FLAIR2 and IsoQuant, but much faster than ESPRESSO on both simulated and real datasets. Most of the runtime for NIFFLR was spent on aligning exons to the long reads.

Table 4. Timings for the quantification software measured on the simulated and real data.

We ran all experiments on a 24-core Intel Xeon Gold server with 1TB or RAM, using 24 threads. Time is in hours.

	IsoQuant	FLAIR2	NIFFLR	ESPRESSO
Simulated reads	0.7	1.3	1.9	45
GTEx sample	1.2	2.1	3.2	106

NIFFLR is written in shell script, Python, and C++ (the psa_aligner code). To simplify installation, we provide an install script that performs system checks and compiles all necessary executables. We have tested the installation on several popular Linux distributions including RedHat 7, 8, and 9, as well as Ubuntu 18, 20, and 22 LTS.

Software availability

•

Software available from: https://github.com/alguoo314/NIFFLR

•

Source code available from: https://github.com/alguoo314/NIFFLR

•

Archived source code at time of publication: Zenodo doi 10.5281/zenodo.15585584

•

License: GNU General Public License v3.0

Ethical considerations

Ethics and consent are not required.

Data availability

•

The supplementary materials, transcript assembly and quantification results computed by NIFFLR from GTEx data are available on Zenodo.

•

[Zenodo]. [Supplementary information and transcripts assembled by NIFFLR software for 92 GTEx long-read transcriptome sequencing samples]. [ 10.5281/zenodo.15585443].

•

The project contains the following underlying data: Transcripts assembled by NIFFLR software for 92 GTEx long-read transcriptome sequencing samples along with the number of samples the transcripts were observed in. Supplementary materials: commands we used to run NIFFLR and competing software for comparisons are listed in the Supplementary Information.

•

combined92.combined.chr.gtf – GTF format file (9-column tab separated text) containing assembled transcripts on human GRCh38 assembly, chromosomes identified with chromosome names.

•

combined92.combined.gtf – GTF format file (9-column tab separated text) containing assembled transcripts on human GRCh38 assembly, chromosomes identified with NCBI RefSeq chromosome IDs.

•

combined92.combined.min2sampl.gtf – GTF format file (9-column tab separated text) containing assembled transcripts found in at least two samples, on human GRCh38 assembly, chromosomes identified with NCBI RefSeq chromosome IDs.

•

Supplementary materials.pdf – Supplementary materials for the manuscript titled “Assembly and quantification of transcripts from noisy long reads with NIFFLR.”

Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

Acknowledgements

We thank Steven L. Salzberg, Bloomberg Distinguished Professor of Biomedical Engineering, Computer Science and Biostatistics at Johns Hopkins University for help with editing the manuscript and obtaining funding for this project.

References

Gao

Wang

: ESPRESSO: robust discovery and quantification of transcript isoforms from error-prone long-read RNA-seq data. Science Advances. 2023 Jan 20;9(3):eabq5072. 36662851

10.1126/sciadv.abq5072

PMC9858503

Glinos

Garborcauskas

Hoffman

: Transcriptome variation in human tissues revealed by long-read sequencing. Nature. 2022 Aug 11;608(7922):353–359. 35922509

10.1038/s41586-022-05035-y

PMC10337767

Kovaka

Zimin

Pertea

: Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 2019 Dec;20:273–278. 31842956

10.1186/s13059-019-1910-1

PMC6912988

: Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018 Sep 15;34(18):3094–3100. 29750242

10.1093/bioinformatics/bty191

PMC6137996

Pertea

: GFF utilities: GffRead and GffCompare. F1000Res. 2020;9.

Prjibelski

Mikheenko

Joglekar

: Accurate isoform discovery with IsoQuant using long reads. Nat. Biotechnol. 2023 Jul;41(7):915–918. 36593406

10.1038/s41587-022-01565-y

PMC10344776

Tang

Soulette

Baren

van : Full-length transcript characterization of SF3B1 mutation in chronic lymphocytic leukemia reveals downregulation of retained introns. Nat. Commun. 2020 Mar 18;11(1):1438. 32188845

10.1038/s41467-020-15171-6

PMC7080807

Varabyou

Sommer

Erdogdu

: CHESS 3: an improved, comprehensive catalog of human genes and transcripts based on large-scale expression data, phylogenetic analysis, and protein structure. Genome Biol. 2023 Oct 30;24(1):249. 37904256

10.1186/s13059-023-03088-4

PMC10614308

Yang

Chu

Warren

: NanoSim: nanopore sequence read simulator based on statistical characterization. GigaScience. 2017 Apr;6(4):1–6. 28327957

10.1093/gigascience/gix010

PMC5530317

Zimin

Marçais

Puiu

: The MaSuRCA genome assembler. Bioinformatics. 2013 Nov;29(21):2669–2677. 23990416

10.1093/bioinformatics/btt476

PMC3799473

10.5256/f1000research.181114.r395536

Reviewer response for version 1

Gao

Yuan

1 Referee 1Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, Beijing, China

Competing interests: No competing interests were disclosed.

4 8 2025

2025

This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

recommendation

reject

Guo et al. proposed NIFFLR, a tool for assembling and quantifying transcripts using long-read RNA-seq data. However, the current manuscript does not provide sufficient evidence to demonstrate the novelty or efficiency of NIFFLR in analyzing long-read data. Their evaluation and conclusions are not convincing.

1 Most readers would be confused about the novelty or advances of NIFFLR. It’s based on the alignment of constructed exon-exon junction sequences, a strategy used by many tools long ago. Does this strategy work for mini-exons that are shorter than 10 nt or 20 nt? This strategy also heavily depends on annotated exons. Can it identify any novel splice donors or acceptors? For tissues or organisms with incomplete annotation, how will this strategy be affected? All of above need be carefully evaluated and described in details.

2 The aligner (psa_aligner) used by the authors needs comprehensively assessment, e.g. whether psa_aligner is suitable for noisy long-read data. An evaluation and comparison with the commonly used long-read aligner Minimap2 should be included.

3 The NIFFLR programming script is poorly written and difficult to install and use for analysis. For example, when I tried to run NIFFLR, I received an error message from python (not NIFFLR) and could not finished the analysis. The error, “IndexError: list index out of range”, occurred during the “performing filtering and quantification” step.

4 The authors used simulated data to evaluate NIFFLR and other tools. However, this evaluation may result in biased results. Many reports have noted that existing simulators are not suitable for evaluation. For example, according to a recent paper published by Dr. Hagen Tilgner and colleagues (Mikheenko A, et al., 2022 [Ref 1]), NanoSim randomly selects a starting position in a transcript to simulate truncation based on a uniform distribution. However, in real long-read data, a uniform distribution cannot be observed.

5 A direct comparison of the tools for transcript identification and quantification using real long-read data with ground truth need be included. Many previous studies used SIRV E2 for evaluation, which contains 69 synthesized transcript isoforms with different abundances. I actually tried to run NIFFLR to analyze SIRV data, but it failed. NIFFLR produced an "exon extraction failed" error when processing the GTF annotation of SIRV, while I did not encounter any errors when using other tools. This is another example of poor programming of NIFFLR.

6 I am stunned that the authors used transcript abundance from short-read data to evaluate the performance of long-read tools. The bias of short-read data has been so widely reported. For example, a Nature Methods paper published by Chen et al. 2025 [Ref 2] provided important evidence on this. In addition to the bias, the novel transcript isoforms in the data that are similar to annotated isoforms would also confuse short-read quantification.

7 The authors need to provide sufficient evidence before arbitrarily judging existing tools. For example, they described ESPRESSO’s strategy for identifying novel splice junctions as “a stringent criterion that limits its ability to discover novel junctions”. However, they did not provide any evidence regarding the sensitivity of detecting novel splice junctions. Novel splice junctions can be further divided into junctions with novel splice sites and junctions as novel combination of annotated splice sites. Both types need to be compared between ESPRESSO and NIFFLR before the authors can draw such a conclusion.

8 Why was BamBu (Chen Y, et al., 2023 [Ref 3]) not included in the evaluation and comparison? BamBu was published online more than two years ago and has been widely used for long-read data analysis. To demonstrate the advantages of their tool, the authors will need to compare it with BamBu.

Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Is the rationale for developing the new software tool clearly explained?

Partly

Is the description of the software tool technically sound?

Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Partly

Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Partly

Reviewer Expertise:

Bioinformatics, Computational Biology

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

References 1

: Sequencing of individual barcoded cDNAs using Pacific Biosciences and Oxford Nanopore Technologies reveals platform-specific error patterns. Genome Research .2022;32(4) : 10.1101/gr.276405.121 726-737

10.1101/gr.276405.121

: A systematic benchmark of Nanopore long-read RNA sequencing for transcript-level analysis in human cell lines. Nature Methods .2025;22(4) : 10.1038/s41592-025-02623-4 801-812

10.1038/s41592-025-02623-4

: Context-aware transcript quantification from long-read RNA-seq data with Bambu. Nature Methods .2023;20(8) : 10.1038/s41592-023-01908-w 1187-1195

10.1038/s41592-023-01908-w

Zimin

Aleksey

Competing interests: CONFLICT OF INTEREST NOTE: This reviewer, Yuan Gao, Chinese Academy of Sciences, Beijing, China, is in direct conflict with this publication. The reviewer is the lead author of ESPRESSO software, and NIFFLR is a direct competitor to that software. We cite the ESPRESSO publication in this manuscript.

24 10 2025

1. Most readers would be confused about the novelty or advances of NIFFLR. It’s based on the alignment of constructed exon-exon junction sequences, a strategy used by many tools long ago. Does this strategy work for mini-exons that are shorter than 10 nt or 20 nt? This strategy also heavily depends on annotated exons. Can it identify any novel splice donors or acceptors? For tissues or organisms with incomplete annotation, how will this strategy be affected? All of above need be carefully evaluated and described in details.

Response: The novelty of NIFFLR lies in its approach of constructing transcripts from existing exons guided by long reads. NIFFLR shows significant improvements in sensitivity and quantification accuracy compared to existing tools. Since NIFFLR relies on existing exons, it cannot identify novel donor or acceptor sites; however, it can detect novel introns, i.e. novel donor-acceptor pairs. As we state in the manuscript, NIFFLR is designed for genomes with high-quality annotations, such as the human genome. NIFFLR strategy works well for exons that are longer than the minimum alignment K-mer size (default value is 12). Any exon shorter than this value will not be detected in the alignment and will likely be skipped. Fortunately, the percentage of transcripts containing such mini-exons is small. Human RefSeq annotation (mail chromosomes only) contains 379021 exons, and only 598 are shorter than 12 bp.

2. The aligner (psa_aligner) used by the authors needs comprehensively assessment, e.g. whether psa_aligner is suitable for noisy long-read data. An evaluation and comparison with the commonly used long-read aligner Minimap2 should be included.

Response: psa_aligner is not a novel tool. It is used in the MaSuRCA assembler [REF] to map super-reads built from Illumina reads to noisy long reads. Evaluating or comparing alignment methods was not the focus of our study. In our tests we found that Minimap2 lacked sufficient sensitivity, particularly by missing alignments of short exons. We decided to use instead psa_aligner which only produces pseudo-alignments.

3. The NIFFLR programming script is poorly written and difficult to install and use for analysis. For example, when I tried to run NIFFLR, I received an error message from python (not NIFFLR) and could not finished the analysis. The error, “IndexError: list index out of range”, occurred during the “performing filtering and quantification” step.

Response: Thank you for raising this concern and for sharing the error message. We reviewed the relevant part of the NIFFLR pipeline but we were unable to replicate the issue. Based on the error message reported “ IndexError: list index out of range” during the filtering and quantification step, the problem is likely due to unexpected formatting in the reference GFF/GTF file. The quantification.py script assumes that each line in the reference file follows the standard GFF/GTF structure, containing at least 9 tab-separated fields, with the final (ninth) field being a semicolon-delimited string of attributes (e.g., with the transcript ID as the first entry). If any line deviates from this structure, it can trigger an IndexError.

To improve stability and usability, we have significantly reworked the pipeline and released a new version, v2.0.0, which is described in the revised manuscript. This new version is easier to install and use and provides improved compatibility with input files that may slightly deviate from standard GTF/GFF specifications.

4. The authors used simulated data to evaluate NIFFLR and other tools. However, this evaluation may result in biased results. Many reports have noted that existing simulators are not suitable for evaluation. For example, according to a recent paper published by Dr. Hagen Tilgner and colleagues (Mikheenko A, et al., 2022 [Ref 1]), NanoSim randomly selects a starting position in a transcript to simulate truncation based on a uniform distribution. However, in real long-read data, a uniform distribution cannot be observed.

Response: We agree that simulated data have limitations and may not perfectly capture all features of real long-read sequencing. However, simulated data is the only way to evaluate software on datasets where the ground truth is fully known. Nanosim is a peer-reviewed, widely used tool for generating simulated long reads, and its use is standard practice for benchmarking transcriptome assembly methods. Therefore, we believe it is appropriate to use NanoSim to produce simulated reads for the comparisons presented in this study.

5. A direct comparison of the tools for transcript identification and quantification using real long-read data with ground truth need be included. Many previous studies used SIRV E2 for evaluation, which contains 69 synthesized transcript isoforms with different abundances. I actually tried to run NIFFLR to analyze SIRV data, but it failed. NIFFLR produced an "exon extraction failed" error when processing the GTF annotation of SIRV, while I did not encounter any errors when using other tools. This is another example of poor programming of NIFFLR.

Response: We thank the reviewer for the suggestion. We have added an evaluation of the ability of different programs to identify isoforms in the SIRV E2 dataset to the Results section. The new NIFFLR version 2.0.0 now works correctly with the SIRV GTF file, resolving the previous “exon extraction failed” error.

6. I am stunned that the authors used transcript abundance from short-read data to evaluate the performance of long-read tools. The bias of short-read data has been so widely reported. For example, a Nature Methods paper published by Chen et al. 2025 [Ref 2] provided important evidence on this. In addition to the bias, the novel transcript isoforms in the data that are similar to annotated isoforms would also confuse short-read quantification.

Response: We understand why the reviewer is raising this objection, as indeed biases of short-read data have been widely reported. However, transcript quantification using short-read data has long been a standard in the field, and thousands of studies have successfully used short reads to quantify transcript expression (e.g. Romeo-Cardeillac et al., BMC Genomics, 2024; Kuehl et al., Nature, 2025; D'Sa et al., Sci. Adv., 2025). Widely used tools such as StringTie2 (Kovaka et al., Genome Biology, 2019), Salmon ( Patro et al., Nat Methods, 2017) and Kallisto (Bray et al., Nat Biotechnol, 2016) all rely on short reads for quantification. While it is true that short-read data can exhibit biases and that novel isoforms similar to annotated transcripts may complicate quantification, these limitations notwithstanding, short-read quantification still provides a reliable reference that can be used to assess the accuracy of long-read quantifications, as has been done in previous studies (Pardo-Palacios et al., 2023, Tang et al., 2020). We therefore consider it appropriate to use short-read transcript quantification as a benchmark for evaluating long-read tools. This approach provides a practical and widely accepted reference for assessing quantification performance, while acknowledging that no method is entirely free of bias.

7. The authors need to provide sufficient evidence before arbitrarily judging existing tools. For example, they described ESPRESSO’s strategy for identifying novel splice junctions as “a stringent criterion that limits its ability to discover novel junctions”. However, they did not provide any evidence regarding the sensitivity of detecting novel splice junctions. Novel splice junctions can be further divided into junctions with novel splice sites and junctions as novel combination of annotated splice sites. Both types need to be compared between ESPRESSO and NIFFLR before the authors can draw such a conclusion.

Response: We have removed the statement that ESPRESSO uses “a stringent criterion that limits its ability to discover novel junctions” from the introduction. For context, our statement that ESPRESSO uses “a stringent criterion that limits its ability to discover novel junctions” was not based on empirical benchmarking but rather on the algorithmic description provided in the ESPRESSO manuscript. As described in the manuscript, ESPRESSO classifies novel junctions into two types: junctions with novel splice sites (Novel Not in Catalog, NNC) and novel combinations of annotated splice sites (Novel In Catalog, NIC). According to the ESPRESSO manuscript, a novel transcript isoform is reported only if (i) each splice junction is supported by at least two perfectly aligned reads and (ii) the combination of junctions is not a substring of any other novel isoform. Perfectly aligned reads are defined as having no mismatches or indels within 10 nt of splice sites.

In contrast, NIFFLR does not rely on spliced alignments and does not impose strict local alignment quality requirements. Instead, it uses a custom approximate aligner to map reference exons directly to long reads and assembles transcript models by optimizing exon tilings. This makes NIFFLR more tolerant of sequencing errors and more flexible in capturing novel combinations of annotated exons. Subsequent filtering and quantification steps in NIFFLR help exclude spurious transcripts. We note, however, that NIFFLR does not detect novel splice sites, as its transcript models are built from a predefined set of reference exons.

8. Why was BamBu (Chen Y, et al., 2023 [Ref 3]) not included in the evaluation and comparison? BamBu was published online more than two years ago and has been widely used for long-read data analysis. To demonstrate the advantages of their tool, the authors will need to compare it with BamBu.

Response: We added Bambu to all evaluations.

10.5256/f1000research.181114.r393925

Reviewer response for version 1

Reese

Fairlie

1 Referee https://orcid.org/0000-0002-9240-0102 1University of California, Irvine, California, USA

Competing interests: No competing interests were disclosed.

4 8 2025

2025

recommendation

reject

The authors present NIFFLR, a minimap2-free tool for the assembly and quantification of known and novel transcripts from long-read RNA-seq data. In the paper, they describe NIFFLR, the developed method, which works using partial suffix arrays and kmer-matching of annotated exons to the long reads themselves; thus bypassing the potential error-prone choices that are made when aligning noisy long reads at splice junctions. Overall, this seems like a really interesting and novel approach to a problem (errors in minimap2 alignments) that I’ve seen several times now in the field. However, I have several major concerns about the quality of the benchmarking and how the paper integrates into the overall field. I elaborate on my concerns below.

Major concerns:

It’s unclear why certain parameter choices were made for the implementation, such as k = 12 or coverage >= 35%, at the level of the exon to read alignment or Amin / Gmin at the level of choosing the most optimal path. The authors mention that they performed optimization but the results of these experiments are not included. Adding these results showing that this performance is optimal would increase the confidence in the tool and these decisions taken.

One of my main concerns is that, based on my understanding of the method, the only novel transcripts that can be discovered by NIFFLR are those that use already-annotated exons. This could substantially limit its ability to identify biologically relevant novel isoforms, which frequently involve novel splice sites or exons and are described in previous long-read RNA-seq studies. The authors should consider alternative methods to include novel splicing as part of their novel isoform discovery.

Related to point 2, it is hard to assess the performance of the tool relative to other reported findings in the field because it only contains novel-in-catalog novel transcripts according to the very commonly-used SQANTI classification (Pardo-Palacios F, et al., 2024 [Ref 1]). It is common in the field to examine the proportions of transcripts from each of these categories as a quality control metric.

My main concern in this paper however is the quality of the benchmarking. There have been many papers that have performed long-read RNA-seq benchmarking in the past and have employed various metrics to evaluate the results: (Dong X, et al., 2023 [Ref 2], Chen Y, et al., 2022 [Ref 3], Pardo-Palacios F, et al., 2024 [Ref 4]),

The manuscript would benefit from adopting such standard benchmarking strategies that have become widely-accepted. As it stands, it is very tough to contextualize its results in the field as a whole. Some more point-by-point recommendations would be at least the following:

For the simulations, as referenced in point 2, it is unrealistic to expect that novel transcripts will only arise from novel combinations of known exons (NICs). The authors should consider using existing simulated novel transcript ground truth datasets that exist, such as the one from LRGASP.

For the quantification benchmarking, it is more common to report a correlation metric between the ground truth and the estimated quantification from the tool. This would make these results more interpretable and comparable to other benchmarking efforts in the field.

Also related to the quantification benchmarking, there are no significance values reported for any of the pairwise comparisons; just written speculation on the visual appearance of the plots. Statistical analyses would increase the confidence of these results.

Finding more isoforms is not necessarily a metric of a “better” tool, but is referenced as if it is in the text related to Figure 5. In fact, the percentage of reported “novel” transcripts for various GTEx tissues is surprisingly high and therefore the high number of transcripts could be indicative of over-calling novel transcripts and therefore poor specificity. Instead, the authors should additionally overlap the discovered isoforms with those discovered by Glinos et al. in their original paper (or other external datasets) to see how well their method recapitulates what others have already said about the dataset.

Would the authors be able to speculate on any specific use case (a specific cohort, a specific technology, etc) where the exons-to-reads alignment approach might be especially beneficial compared to traditional minimap2-based approaches? This might add some insight to the discussion.

Minor concerns:

As the development of the method is of central importance to the paper, the implementation could be expanded on or explained a bit more. In particular, the concepts of the “implied” starts / ends of the alignments were confusing in Figure 1 and related text. Similarly, Figure 2 was a bit overly-complicated, and perhaps the authors could consider presenting the possible transcript paths separately as alternative transcripts in genome browser format (ie IGV or UCSC).

At the end of the first paragraph of main text on page 7, the authors state “This result demonstrates that when novel isoform discovery and quantification are the primary goals, NIFFLR is the best tool” when referring to the simulation experiment where they measured isoform detection / assembly only. This means this result has no relevance for quantification.

If the authors mean “false positive transcripts” by “spurious transcripts”, they should simply refer to them as the latter as there is no definition for “spurious” transcripts in the text.

In table 2, IsoQuant has the same number of known and total isoforms; implying it found no novel isoforms. I am highly doubtful this is correct and is probably a typo.

Figure 3 is missing y axis labels. Furthermore, the meaning of the box and whiskers are elaborated on in the results, which should just be in the legend. Additionally, Figure 3b is just a zoomed in duplicate of two plots from 3a and is unnecessary.

For Figure 4 the authors describe a strange method of depth normalization to compare the long-read RNA-seq to short-read RNA-seq transcript quantification estimates. They should use normal TPM / CPM normalization. This figure also has unnecessary details about the plot in the results section which should be in the legend (same as in Figure 3).

The authors make no references to their supplementary material PDF in the main body of the text. They should include references so that readers know where to find the calls made to perform the benchmarking etc.

Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Partly

Is the rationale for developing the new software tool clearly explained?

Yes

Is the description of the software tool technically sound?

Yes

Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Partly

Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Reviewer Expertise:

Long-read transcriptomics, including development of tools and their benchmarking for long-read transcriptomics analysis. I am less experienced in the field of alignment algorithms and cannot judge this implementation as thoroughly.

References 1

: SQANTI3: curation of long-read transcriptomes for accurate identification of known and novel isoforms. Nature Methods .2024;21(5) : 10.1038/s41592-024-02229-2 793-797

10.1038/s41592-024-02229-2

: Benchmarking long-read RNA-sequencing analysis tools using in silico mixtures. Nature Methods .2023;20(11) : 10.1038/s41592-023-02026-3 1810-1821

10.1038/s41592-023-02026-3

: A systematic benchmark of Nanopore long-read RNA sequencing for transcript-level analysis in human cell lines. Nature Methods .2025;22(4) : 10.1038/s41592-025-02623-4 801-812

10.1038/s41592-025-02623-4

: Systematic assessment of long-read RNA-seq methods for transcript identification and quantification. Nature Methods .2024;21(7) : 10.1038/s41592-024-02298-3 1349-1363

10.1038/s41592-024-02298-3

Zimin

Aleksey

Competing interests: No competing interests were disclosed.

24 10 2025

Major concerns:

1. It’s unclear why certain parameter choices were made for the implementation, such as k = 12 or coverage >= 35%, at the level of the exon to read alignment or Amin / Gmin at the level of choosing the most optimal path. The authors mention that they performed optimization but the results of these experiments are not included. Adding these results showing that this performance is optimal would increase the confidence in the tool and these decisions taken. Response: The default value of K=12 was empirically chosen as it provided the best balance between sensitivity and precision of the psa_aligner on multiple data sets. Values of 11 or lower introduced numerous false positive alignments and slowed down NIFFLR without improving sensitivity, whereas values above 12 resulted in missed alignments of short exons. In general, the optimal value of K depends on the read error rate and the minimum exon length in the annotation. For lower-error rate transcriptome sequencing data, K can be increased to 15–17 to achieve faster alignment. This information has been now added to our manuscript in section….

2. One of my main concerns is that, based on my understanding of the method, the only novel transcripts that can be discovered by NIFFLR are those that use already-annotated exons. This could substantially limit its ability to identify biologically relevant novel isoforms, which frequently involve novel splice sites or exons and are described in previous long-read RNA-seq studies. The authors should consider alternative methods to include novel splicing as part of their novel isoform discovery.

Response: We agree with the reviewer’s observation that using a known set of exons is a key limitation of NIFFLR. NIFFLR is designed to be used with well-annotated genomes, such as the human genome, where nearly all exons are known and most novel transcripts arise from alternative splicing. We have clearly stated this limitation in the manuscript. Despite this constraint, we believe NIFFLR provides tremendous value through its ability to quantify known isoforms and to detect and quantify novel alternatively spliced isoforms with high sensitivity and accuracy, rivalling the best published tools.

3. Related to point 2, it is hard to assess the performance of the tool relative to other reported findings in the field because it only contains novel-in-catalog novel transcripts according to the very commonly-used SQANTI classification (Pardo-Palacios F, et al., 2024 [Ref 1]). It is common in the field to examine the proportions of transcripts from each of these categories as a quality control metric.

Response: We designed our comparison of tools using simulated data so that not all reference transcripts present in the data were provided to the programs, including NIFFLR. This way we can evaluate each program’s ability to recover novel transcripts that are present in the sample, but absent from the supplied reference annotation. To measure precision, we report the number of false positive transcripts as well as the precision values in Table 1.

4. My main concern in this paper however is the quality of the benchmarking. There have been many papers that have performed long-read RNA-seq benchmarking in the past and have employed various metrics to evaluate the results: (Dong X, et al., 2023 [Ref 2], Chen Y, et al., 2022 [Ref 3], Pardo-Palacios F, et al., 2024 [Ref 4]),

4a. For the simulations, as referenced in point 2, it is unrealistic to expect that novel transcripts will only arise from novel combinations of known exons (NICs). The authors should consider using existing simulated novel transcript ground truth datasets that exist, such as the one from LRGASP.

Response: We appreciate the reviewer’s suggestion. For this study, we chose to use transcripts produced by Nanosim, a publicly available and peer-reviewed tool specifically designed to generate realistic long-read transcriptome data. Our goal was to ensure a controlled and reproducible comparison under well-defined simulation parameters. We agree that other simulated datasets, such as those from LRGASP, provide valuable complementary benchmarks and will consider incorporating them in future evaluations.

4b. For the quantification benchmarking, it is more common to report a correlation metric between the ground truth and the estimated quantification from the tool. This would make these results more interpretable and comparable to other benchmarking efforts in the field. Please see our answer to point 3 below.

4c. Also related to the quantification benchmarking, there are no significance values reported for any of the pairwise comparisons; just written speculation on the visual appearance of the plots. Statistical analyses would increase the confidence of these results.

Response: The reviewer makes a good point. To address it, we have added values of Pearson correlation coefficient (PCC) values to the box-and-whisker plots in the manuscript. These values provide a quantitative measure of concordance between methods and allow readers to assess the significance of the pairwise comparisons more rigorously.

4d. Finding more isoforms is not necessarily a metric of a “better” tool, but is referenced as if it is in the text related to Figure 5. In fact, the percentage of reported “novel” transcripts for various GTEx tissues is surprisingly high and therefore the high number of transcripts could be indicative of over-calling novel transcripts and therefore poor specificity. Instead, the authors should additionally overlap the discovered isoforms with those discovered by Glinos et al. in their original paper (or other external datasets) to see how well their method recapitulates what others have already said about the dataset.

Response: We completely agree that the sheer number of novel isoforms is not, by itself, a measure of a “better” tool. However, in light of our other benchmarks on both simulated and real data, we believe that the novel isoforms identified by NIFFLR are likely to be highly reliable, and the high percentage of novel transcripts observed in GTEx tissues could indeed reflect incomplete annotation rather than overcalling. Importantly, many independent studies (such as Glinos et al.) have reported that existing annotations capture only a subset of true transcripts, and numerous bona fide isoforms remain unannotated. In this context, NIFFLR’s detection of additional isoforms is consistent with these observations. We have updated the Results section to make this more clear:

“…The number of isoforms identified by NIFFLR far exceeds the number reported by FLAIR ( Glinos et al., 2022), which identified 93,718 transcripts across 21,067 genes, of which 77% were novel. 34,876 transcripts in 11,840 gene loci were in common between the set of transcripts identified by ( Glinos et al., 2022) and by this study…”

5. Would the authors be able to speculate on any specific use case (a specific cohort, a specific technology, etc) where the exons-to-reads alignment approach might be especially beneficial compared to traditional minimap2-based approaches? This might add some insight to the discussion.

Response: NIFFLR is designed to be used with well-annotated genomes, such as the human genome, where almost all exons are known and most of (or all) novel transcripts are due to alternative splicing. We stated the limitation in the manuscript. Using exons from the reference annotation allows us to avoid performing spliced alignment to high-error transcriptomic reads, thereby achieving higher sensitivity and more accurate quantification of reference transcripts.

Minor concerns:

1. As the development of the method is of central importance to the paper, the implementation could be expanded on or explained a bit more. In particular, the concepts of the “implied” starts / ends of the alignments were confusing in Figure 1 and related text. Similarly, Figure 2 was a bit overly-complicated, and perhaps the authors could consider presenting the possible transcript paths separately as alternative transcripts in genome browser format (ie IGV or UCSC).

Response: We have completely revised the Methods section to improve clarity and added a workflow diagram of the algorithm to the Supplementary materials as Figure S1. Here is the updated algorithm description from the Methods:

After building the alignments, we assign each long read to a gene locus using a majority vote approach. Specifically, for each read, we compute the total number of K-mers in all LCSs for matching exons across different gene loci and assign the read to the locus L whose exons collectively have the highest total number of matching K-mers. Alignments of exons that belong to different gene loci are then discarded. Next, we build the transcript matching the read by finding the best tiling of the read using exons that belong to locus L—that is, the sequence of exons that maximizes read coverage while minimizing gaps or overlaps in the implied alignment coordinates. The long read defines a 5’ to 3’ forward direction, which provides a natural topological order. We therefore sort the aligned exons by their alignment start coordinates if they align in the forward direction, or by their alignment end coordinates if they align in the reverse direction. Since we only kept alignments of exons that all belong to a single gene locus L, the exons must all align either in the forward or reverse direction. For simplicity, we describe the algorithm below assuming all exons are aligned in the forward direction; the reverse case is handled the same by reversing the read.

We represent the exon tiling problem as a directed graph, where nodes correspond to exons, and node weights are exon lengths. An edge connects the 3’ end of an exon A to the 5’ end of exon B if the absolute value of the distance between their aligned positions on the read is less than 20bp. The weight of the edge between exons A and B is defined as this distance. Next, we choose the “start” nodes. A valid start node is an exon that is not connected on the 5’ end and whose 5’ end lies upstream of the read start (i.e., has a negative coordinate relative to the read origin), indicating an overhang. If no such exon exists, we select the exon(s) with the smallest positive coordinate among all exons aligned to the read. If multiple exons share the same alignment start due to alternative splicing, we use all such exons as alternative start nodes. We select “end” nodes in a similar manner: an end node is an exon not connected on the 3’ end and whose 3’ ends extends beyond the end of the read (i.e., has an overhang). If none do, we choose the exon(s) whose 3’ end coordinate is closest to the 3’ end of the read.

We solve the exon tiling problem by finding the path through the graph that starts from any start node and ends at an end node and minimizes a penalty function. The penalty is defined as the sum of edge weights along the path plus an overhang penalty, calculated as defined as 0.1× (|5’ overhang of the start exon| + |3’ overhang of the end exon|), where |a| denotes the absolute value of a. Ideally, there should be no gaps or overlaps between aligned exons in the transcript, resulting in a perfect path with zero weight. However, because psa_aligner computes approximate alignments, exon start and end alignment coordinates are imprecise estimates. In case of a tie, we select the path with the larger total node weight (the sum of the lengths of the exons on the path). If a read can be spanned by a single exon — either because only one exon maps to the read, or because that exon simultaneously has the start closest to the 5′ end and the end closest to the 3′ end of the read— we report that single exon as the path. If multiple exons individually span the read, we select the exon that satisfies the condition of being a valid start exon or end exon and has the smallest total overhang length. Finally, if no valid path is found, we report the exon with the longest alignment to the read. Figure 2 illustrates an example of a valid exon path. Once the best path is identified, we examine the genomic coordinates of the exons, which are encoded in their sequence IDs. We discard the path if any exons in it overlap in genomic coordinates, as this likely indicates a long read is chimeric a substantial local genome rearrangement that NIFFLR cannot handle.

2. At the end of the first paragraph of main text on page 7, the authors state “This result demonstrates that when novel isoform discovery and quantification are the primary goals, NIFFLR is the best tool” when referring to the simulation experiment where they measured isoform detection / assembly only. This means this result has no relevance for quantification. Response: We thank the reviewer for this observation. We agree that the original placement of the statement implied a conclusion about quantification based solely on isoform detection/assembly, which could be misleading. To address this, we have moved the statement to the end of the last paragraph in the “Comparison on simulated long reads” section, immediately following the discussion of the quantification benchmarking results. This placement more accurately reflects the context of the data and avoids conflating isoform discovery with quantification performance.

3. If the authors mean “false positive transcripts” by “spurious transcripts”, they should simply refer to them as the latter as there is no definition for “spurious” transcripts in the text.

Response: We have changed the wording from “spurious transcripts” to “false positive transcripts”.

4. In table 2, IsoQuant has the same number of known and total isoforms; implying it found no novel isoforms. I am highly doubtful this is correct and is probably a typo.

Response: Thank you for pointing this out, indeed this was a typo and we corrected it.

5. Figure 3 is missing y axis labels. Furthermore, the meaning of the box and whiskers are elaborated on in the results, which should just be in the legend. Additionally, Figure 3b is just a zoomed in duplicate of two plots from 3a and is unnecessary.

Response: We have added the y-axis label to Figure 3. We also wish to clarify that Figure 3b is not a zoomed-in duplicate of plots from 3a. While Figure 3a shows quantification error across all transcripts detected by each tool, Figure 3b focuses specifically on the shared subset of transcripts quantified by both NIFFLR and IsoQuant. This allows for a fair, direct comparison between the two programs on the transcripts shared by both, highlighting differences in quantification performance that could be obscured when considering all transcripts.

6. For Figure 4 the authors describe a strange method of depth normalization to compare the long-read RNA-seq to short-read RNA-seq transcript quantification estimates. They should use normal TPM / CPM normalization. This figure also has unnecessary details about the plot in the results section which should be in the legend (same as in Figure 3).

Response: TPM and transcript coverage are linearly related, and many long-read transcriptome quantification programs report read counts rather than TPMs. Transcript coverage computed from the Illumina data is directly comparable to the number of long reads covering each transcript. By normalizing the Illumina coverage by the total sequencing depth, we made Illumina read coverage into a metric equivalent to long read counts, enabling a fair comparison between long- and short-read quantifications.

7. The authors make no references to their supplementary material PDF in the main body of the text. They should include references so that readers know where to find the calls made to perform the benchmarking etc.

Response: Thank you for noting that, we added the missing references to Supplementary Materials.

10.5256/f1000research.181114.r393921

Reviewer response for version 1

Dewey

Colin

1 Referee https://orcid.org/0000-0003-1498-9254 1University of Wisconsin-Madison, Wisconsin, USA

Competing interests: No competing interests were disclosed.

4 8 2025

2025

recommendation

approve-with-reservations

The authors describe a novel method and associated software, NIFFLR, for identifying and quantifying expressed transcript structures (both known and novel) from long, noisy RNA sequencing data, such as that produced by Oxford Nanopore Technologies (ONT). A key challenge for methods addressing this task is the difficulty in identifying the precise locations of exon boundaries in the presence of frequent sequencing errors, particularly indels. The novel approach taken by NIFFLR is the converse of that of other methods: instead of aligning reads to the genome, NIFFLR aligns annotated exons within the genome to the read. NIFFLR uses a partial suffix array approach to efficiently align exons to the reads and a graph-based algorithm to identify the likely transcript structure for each read, followed by a series of heuristics to filter and quantify the transcripts. With both simulated and real data, NIFFLR's accuracy is compared to that of three other methods: FLAIR2, IsoQuant, and ESPRESSO. These experiments suggest that NIFFLR has high sensitivity and quantification accuracy comparable to the next best method. Runtime measurements show that NIFFLR runs in time comparable to the fastest methods.

Major comments:

1. A key limitation of the method is its reliance on a known set of exons: all predicted transcripts must be combinations of known exons. It appears that even slight variations in the boundaries of exons must be previously annotated for the method to predict transcripts with those variant exons. This general issue is brought up in the last two sentences of the discussion. The authors suggest that when the full exon set is not known, short read data could be used to delineate the exons. That may be a feasible strategy, but it is not implemented or evaluated in this work, nor do the evaluations consider novel exons. This work would be strengthened by evaluations that include the more realistic scenario of novel exons (including 5' and 3' end variants of known exons). By definition, NIFFLR will not be able to detect or quantify transcripts including these exons, but the impact of these exons on the precision and quantification accuracy of other transcripts should be measured. Alternatively, the authors could implement the exon delineation strategy that they mentioned and examine NIFFLR's performance in combination with this strategy. Admittedly, the evaluation on the real data set potentially includes reads from transcripts with novel exons, but it is difficult to discern from the results presented how any novel exons impacted the method's performance.

2. A number of details of the method are omitted or unclear. A more formal and detailed presentation of the algorithm is needed. For example, (A) what algorithm is used to solve the "exon tiling problem"? (B) what, precisely, is the objective function? (C) is the algorithm guaranteed to find the optimal solution? (D) how is an edge allowed between overlapping exons? (E) what is the intuition behind distributing reads "proportionally to the size of the container transcripts"? Personally, I would like to see a more mathematical presentation and a figure depicting the steps of the entire procedure, particularly those detailed in the paragraph beginning "By design, all intron junctions...".

3. With respect to the software package, I was ultimately able to compile the software after installing the Boost and zlib development libraries. It would be helpful to have these compilation dependencies noted in the README. For ease of use by the community, it would also be helpful to have the package available via conda and/or Docker. Finally, please provide a small test dataset with the software such that users can make sure that their installation is working and can see what the inputs and outputs look like.

Minor comments:

4. From the filtering steps of the algorithm, could there be an output provided that might indicate the presence of novel exons at a given locus? For example, could the user be provided the number of reads that mapped to a locus but that were not ultimately assigned to a transcript?

5. For quantification evaluations, the box and whisker plots of log2 fold changes are helpful, but I would have also liked to have seen scatterplots of true vs. predicted counts (on log-scaled axes) along with correlation values, as has been common for short-read quantification methods. Such scatterplots help to visualize potential subsets of transcripts that have biased predictions and trends relative to the magnitude of expression.

Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Partly

Is the rationale for developing the new software tool clearly explained?

Yes

Is the description of the software tool technically sound?

Yes

Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Partly

Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Yes

Reviewer Expertise:

Computational biology, bioinformatics, transcriptomics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Zimin

Aleksey

Competing interests: No competing interests were disclosed.

24 10 2025

Major comments:

Response: We agree with the reviewer’s observation that using a known set of exons is a key limitation of NIFFLR. NIFFLR is designed to be used with well-annotated genomes, such as the human genome, where almost all exons are known and most novel transcripts are due to alternative splicing. We have added an analysis of the synthetic SIRV E2 data, which includes alternative exons whose boundaries differ by only a few bases, and we show that NIFFLR and ESPRESSO were able to find all but one of the isoforms that are present in the sample.

Response: We added a flowchart of the method as Supplementary Figure S1 and substantially revised the Methods section to improve clarity. The revised text is below:

Response: We added information about software dependencies to the README file. We also included a script that runs NIFFLR on the SIRV E2 dataset, that downloads the reads from SRA using fastq-dump. In addition, we have made NIFFLR available on Bioconda, where it can now be installed as ‘nifflr’.

Minor comments:

Response: At this time, we do not provide this directly, but this information is available in an intermediate GTF file that lists all reads contributing to each transcript. The file is named .gtf. If a transcript listed in that file is not present among the final output transcripts, it indicates that the reads supporting that transcript were ultimately discarded.

Response: We added the scatter plots to the Supplementary Materials as Figures S2-S6. We also included the Pearson correlation coefficient values in the plots shown in the main manuscript.