ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Genome Note

Draft genome assembly of the slender walking catfish, Prophagorus nieuhofii

[version 1; peer review: awaiting peer review]
PUBLISHED 14 Aug 2025
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS AWAITING PEER REVIEW

This article is included in the Genomics and Genetics gateway.

Abstract

The slender walking catfish, Prophagorus nieuhofii, plays an important role in small-scale fisheries across Southeast Asia, supporting food security. While IUCN currently lists it as a Least Concern species, growing demand and pressures such as overfishing, habitat loss, and degradation may elevate its conservation risk. To support sustainable fisheries management and aquaculture, we sequenced, assembled, and annotated the whole genome of this species. The specimen was part of an expedition to document and preserve the genetic resources of aquatic animals in Kalimantan’s freshwater ecosystems. Using 27 Gb of sequence data, we assembled a 1.1 Gb genome comprising 5,790 scaffolds. This genome assembly has high contiguity and completeness, with N50 of 33.7 Mb and a BUSCO score of 98.8%. Repeat annotation revealed that 48.17% of the genome consisted of repetitive elements, predominantly DNA transposons (18.56%) and retroelements (13.30%). Structural annotation identified 30,099 protein-coding genes and 37,734 transcripts, most of which were multi-exonic and rich in alternative splicing. BUSCO analysis confirmed the high completeness of the genome and annotation, with 97.7% of the conserved orthologs being detected.

Keywords

Whole genome sequencing, genome assembly, genome annotation, slender walking catfish, Prophagorus nieuhofii.

Introduction

The slender walking catfish, Prophagorus nieuhofii (previously known as Clarias nieuhofii), is widespread in Southeast Asia, including Indonesia –specifically Java, Sumatra, and Kalimantan–, the Malay Peninsula, Singapore, Thailand, and the Philippines.1 It is a popular food fish due to its good taste and nutritional benefits and is an important species for food security by supporting artisanal fisheries. While the IUCN Red List of threatened species classifies it as Least Concern in the global assessment of species conservation, habitat loss and degradation and fishing pressure have resulted in a decline in many natural populations.2–4 In Thailand, it has been classified as a vulnerable species,5 and a genetic assessment has been carried out to manage its natural populations.4

In addition to maintaining the sustainability of this fish population in its natural environment, several studies have been conducted to develop it into a farmed species. This species, owing to its air-breathing capability, resilience, and adaptability, shows significant potential for domestication and aquaculture. Preliminary studies on domestication and aquaculture have been conducted. These included the study of growth and survival during the early stages of domestication,6 breeding and reproduction7 and exploration as a probiotic source.8 Although further research is necessary to optimize its cultivation, its inherent characteristics are conducive to successful aquaculture.

Generation of the whole genome sequence of this species will provide a good resource for both fisheries management and aquaculture development. In the former case, the large discovery of single nucleotide polymorphisms (SNPs) that cover genome wide (neutral) and allele-specific (adaptive) diversity patterns will provide a good resource for genomic stock identification, traceability, fisheries-induced evolution and climate change.9 In the latter case, the whole genome sequence, combined with other technologies, such as quantitative trait loci (QTL) analysis, genome-wide association studies (GWAS), and expression profiling, allowing for the prediction of genotypic variants associated with phenotypic traits, can be used to improve traits in breeding programs.10,11

Methods

Sample collection, DNA extraction, and genome sequencing

Fish samples were collected during a 2024 expedition aimed at characterizing genetic resources of aquatic animals from a natural population in South Kalimantan, Indonesia (3°21′43.0″S, 114°42′08.3″E). Specimen were captured using bubu traps and held in a pond with 60 cm water depth at 27-28°C for three days to reduce stress. Prior to DNA tissue sampling, fish were anesthetized following12: they were placed in a 35-liter bucket with 7 cm of water at 28°C, cooled with liquid ice to 21°C, and then clove oil was added at 160 mg/L. Tissue samples were collected when the fish showed minimal movement after anesthesia. A 10 mg tissue sample was collected from an individual measuring of 32.5 cm in length and weighing 277 g, preserved in DNA shield solution and transported to the laboratory for sequencing. High-quality DNA was extracted using the Quick-DNA high molecular weight (HMW) MagBead kit (Zymo Research) with overnight proteinase K digestion incubation. The DNA extract was quantified using a Qubit fluorometer with an Equalbit 1x ds-DNA HS assay kit for sequencing.

Whole genome sequencing was performed using Oxford Nanopore Technology (ONT) – PromethION. Genomic DNA (1500 ng DNA in 48uL nuclease free water was incubated at 20°C for 30 min, followed by incubation at 65°C for 5 min. Sequencing by ligation was performed using the Ligation Sequencing DNA V14 workflow kit (SQK-LSK114). The basecaller tool was Dorado v0.9.1, using dna_r10.4.1_e8.2_400bps_sup@v5.0.0 basecalling model, with a minimum Q score of 10 and trimming of adapters and barcodes. The quality of the sequencing data was checked using NanoPlot.13

Genome assembly and annotation

Genome assembly estimation was done using Flye 2.9.5,14 while genome scaffolding was conducted with RagTag 2.1.015 guided by the reference genome of Clarias gariepinus (GCF_024256425.1). Genome size was estimated using Jellyfish software version 2.3.116 and further processed with GenomeScope 2.0 v2.0.1. The assembly statistics were calculated using assembly-stat version 1.0.1. The completeness of the assembly was estimated using Benchmarking Universal Single-Copy Orthologous (BUSCO) version 5.8.2, utilizing miniport.17–19

Repetitive elements within the genome assembly were identified using RepeatModeler v2.0.6 in conjunction with RepeatMasker v4.1.7 (http://www.repeatmasker.org). Prior to annotation, these repetitive regions were soft masked to minimize interference. Structural genome annotation encompassing gene prediction was conducted using the GALBA pipeline,20 which employs miniprot17 and AUGUSTUS,21 integrating protein data from closely related species as extrinsic evidence. Specifically, protein data from Clarias gariepinus (GCF_024256425.1), Ictalurus furcatus (GCF_023375685.1), Ictalurus punctatus (GCF_001660625.3), and Tachysurus fulvidraco (GCF_022655615.1) were utilized. Functional annotation of the resulting gene predictions was then performed using the ‘funannotate annotate’ command from the Funannotate pipeline (https://funannotate.readthedocs.io/en/latest/install.html), incorporating tools such as InterProScan5,22 Eggnog-Mapper,23 and SignalP 5.024 to assign gene names and predict protein functions. Finally, the completeness of the genome annotation was evaluated using BUSCO v5.8.2.19

Ethical approval

This research was approved by the Ethics Commission for Animal Husbandry and Use, National Research and Innovation Agency (Approval No. 174/KE.02/SK/07/2024). All animal-related procedures were conducted in accordance with institutional guidelines and complied with the ARRIVE 2.0 reporting standards, the checklists for which are available at https://doi.org/10.6084/m9.figshare.29612615.v1.25

Results

Sequence output and genome assembly

Sequencing produced a total of 27,388,841,658 bases from 4,440,560 reads, with 99.8% of bases meeting the designated quality standards. The highest observed mean basecall quality score was 46.4 with a read length of 133, while the longest read reached 6,129,425 with a mean basecall quality score of 14.5 ( Table 1). The draft of genome assembly, as illustrated in the snail graph ( Figure 1), comprises approximately 5,790 scaffolds, totaling 1.1 gigabases, with the longest scaffold of 54 spanning megabases.

Table 1. Summary statistics of sequences of slender walking catfish (Prophagorus nieuhofii).

Total bases (bases)27,388,841,658.0
Mean read length6,167.9
Mean read quality20.2
Median read length4,772.0
Median read quality23.6
Number of reads4,440,560
Read length (N50)8,687.0
STDEV read length5,796.8
436d62cc-03fc-46f5-9948-29ee0de02382_figure1.gif

Figure 1. A snail graph showing the main features of Prophagorus nieuhofii genome assembly.

The N50 and N90 values, measuring assembly continuity, are 33.7Mb and 20.6Mb, respectively. The base composition showed 39.5% GC content and 60.5% AT content, whereas the N content (gaps) remained minimal at 0.04%, indicating a highly contiguous and well-assembled genome. Using the Actinopterygii ortholog database, which is based on 3640 universal genes, the assembly demonstrated 98.8% completeness with a low percentage of missing BUSCO (1.15%), suggesting that most expected genes are present. The genome size of this species is similar to that of a related species, Clarias gariepinus, which has a genome size of 969.62 Mb and contig N50 of 33.71 Mb.26 Genome composition based on a 21-mer based characterization shows a heterozygosity rate of 0.78%, while the homozygosity rate was 99.12%.

Genome annotation

Repeats annotation

Repeat annotation analysis revealed that approximately 48.17% of the genome (529,073,132 bp) consisted of repetitive elements ( Table 2). Among these, retroelements accounted for 13.30% of the genome, spanning over 146 million base pairs across 492,154 elements. This category includes SINEs, which comprise 1.93% of the genome, and LINEs, the largest subgroup of retroelements, which occupy 5.33%. The LINEs were mainly composed of L2/CR1/Rex elements (4.01%), followed by the R1/LOA/Jockey, RTE/Bov-B, and L1/CIN4 subfamilies. LTR elements were also prominent, comprising 6.04% of the genome, largely represented by Gypsy/DIRS1 (2.55%) and retroviral elements (1.05%), along with smaller contributions from BEL/Pao and Ty1/Copia. Notably, some retroelement families such as CRE/SLACS were not detected.

Table 2. Classification of repeat elements of the slender walking catfish (Prophagorus nieuhofii) genome assembly.

Repeat categoryCountOccupied bp % of Genome
Retroelements492,154146,089,95613.30%
 SINEs104,15721,222,2231.93%
 Penelope3,786390,7410.04%
 LINEs170,94058,512,1425.33%
 CRE/SLACS000.00%
  L2/CR1/Rex115,14843,988,6224.01%
  R1/LOA/Jockey15,9573,985,1620.36%
  R2/R4/NeSL343218,3490.02%
  RTE/Bov-B 11,5032,944,0560.27%
  L1/CIN48,5903,281,6090.30%
  LTR elements217,05766,355,5916.04%
  BEL/Pao4,1443,425,2870.31%
  Ty1/Copia93112,5400.01%
  Gypsy/DIRS151,83627,984,7972.55%
  Retroviral43,63811,543,9371.05%
DNA Transposons924,291203,850,28318.56%
 hobo-Activator 163,99337,319,9843.40%
 TC1-IS630-Pogo581,362124,970,54111.38%
 En-Spm 000.00%
 MULE-MuDR 1,32399,9510.01%
 PiggyBac13,9674,552,6180.41%
 Tourist/Harbinger25,5807,005,4180.64%
 Other (e.g. Mirage, P- element, Transib)4,9505,3890.04%
**Rolling-circles**30,2626,192,2980.56%
**Unclassified**986,211133,225,46412.13%
**Total Interspersed**483,556,44444.03%
**Small RNA**57,98913,482,6761.23%
**Satellites**1,781570,1810.05%
**Simple repeats**773,67234,790,2473.17%
**Low complexity**59,0813,378,9700.31%
**Total Masked Bases**529,073,13248.17%

DNA transposons represented the largest category of repeats, both in number and genomic coverage, with 924,291 elements occupying 18.56% of the genome (203.9 million base pairs). Within this group, the TC1-IS630-Pogo family was predominant, covering 11.38% of the genome. Other notable contributors included hobo-Activator (3.40%), PiggyBac (0.41%), Tourist/Harbinger (0.64%), and MULE-MuDR (0.01%), while some families, such as En-Spm, showed no representation. Additionally, rolling-circle transposons comprising 30,262 elements and 0.56% of the genome were identified. A substantial portion of the genome (12.13%) contained unclassified elements, amounting to 986,211 entries. These may represent novel, divergent, or currently uncategorized repeat families.

Other repetitive elements included small RNA-related sequences (1.23%), simple repeats (e.g. microsatellites, 3.17%), low-complexity regions (0.31%), and satellite DNA (0.05%). Overall, interspersed repeats alone account for 44.03% of the genome (483.6 million base pairs), underscoring the genomic complexity and abundance of repetitive sequences, especially DNA transposons and retroelements.

Structural and functional annotation

The genome annotation process resulted in the identification of 30,099 protein-coding genes, which in turn produced 37,734 predicted transcripts with an average of 1.3 transcripts per gene ( Table 3). Alternative splicing was observed in 5,459 of these genes. The majority of genes (87.5%, corresponding to 26,327 genes) were found to be multi-exonic, whereas the remaining 12.5% were composed of a single exon. Each gene spans an average locus length of 15,993.4 base pairs, measured from the first exon to the last exon. On average, genes are composed of 8.9 distinct exons, with a total of 268,395 exons annotated across the genome. The mean exon size was 180.2 bp, and the average transcript size, inclusive of UTRs and coding regions, was 1,812.4 bp.

Table 3. Genome annotation summary of the slender walking catfish (Prophagorus nieuhofii) genome assembly.

Annotation features Values
Max number of transcripts per gene8
Mean exon size180.2
Mean gene locus size (first to last exon)15993.4
Mean number of distinct exons per gene8.9
Mean number of transcripts per gene1.3
Mean transcript size (UTR, CDS)1812.4
Number of distinct exons268,395
Number of genes30,099
Number of genes with alternative transcript variants5,459 (18.1%)
Number of multi-exon genes26,327 (87.5%)
Number of predicted transcripts37,734
Number of single-exon genes3,772 (12.5%)

Regarding genome composition, exons constitute 4% of the genome, spanning approximately 48 Mb, with a GC content of 51% ( Table 4). Genes collectively occupy 44% of the genome, covering approximately 481 Mb, and have a GC content of 40%. The median length of annotated genes was 6,955 bp. Introns accounted for an additional 40% of the genome, totaling 434 Mb. Their average length was 1,861 bp, a median of 481 bp, and a GC content of 39%. In total, 233,110 introns were identified.

Table 4. Genome composition of the slender walking catfish (Prophagorus nieuhofii) genome assembly.

% GC% of genomeAverage size (bp)Median size (bp)Number Total length (Mb)
Exon51%4%180126268,39548 Mb
Gene40%44%15,9936,95530,099481 Mb
Intron39%40%1,861481233,110434 Mb

To assess the completeness of the annotation, BUSCO analysis was conducted using the actinopterygii_odb10 lineage dataset. The analysis revealed that 97.7% of the 3,640 expected single-copy orthologs were complete, with 79.3% identified as single-copy and 18.5% as duplicated BUSCOs ( Figure 2). Only 0.7% were fragmented and 1.6% were missing, indicating a highly complete and well-annotated gene set. The high BUSCO score highlights the robustness of the genome annotation, affirming its appropriateness for subsequent biological and comparative analyses.

436d62cc-03fc-46f5-9948-29ee0de02382_figure2.gif

Figure 2. BUSCO assessment of genome annotation results for the slender walking catfish (Prophagorus nieuhofii) genome assembly.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 14 Aug 2025
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Imron I, Anggraeni F, Hidayat R et al. Draft genome assembly of the slender walking catfish, Prophagorus nieuhofii [version 1; peer review: awaiting peer review]. F1000Research 2025, 14:787 (https://doi.org/10.12688/f1000research.166849.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status:
AWAITING PEER REVIEW
AWAITING PEER REVIEW
?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 14 Aug 2025
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.