ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Genome Note
Revised

The genome sequence of Tethysbaena scabra (Pretus, 1991), the first known in the peracarid crustacean order Thermosbaenacea.

[version 2; peer review: 1 approved, 1 approved with reservations]
PUBLISHED 04 Jul 2025
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Genomics and Genetics gateway.

Abstract

We present a genome assembly of Tethysbaena scabra (Arthropoda; Crustacea; Malacostraca; Eumalacostraca; Peracarida; Thermosbaenacea; Monodellidae), a species endemic to Mallorca, Spain. The genome size is 1.18 gigabases that is scaffolded into 17 chromosomes plus a mitochondrial genome of 16,5 kilobases in length.

Keywords

Thermosbaenacea, anchialine environment, stygobiont species, Tethysbaena scabra

Revised Amendments from Version 1

In response to the reviewer’s comments, we have made the following revisions and additions to improve the clarity, completeness, and reproducibility of our genome note:
1. Project Context: We added a brief explanation of the Catalan Biogenome Project and its connection to the Earth BioGenome Project (EBP).
2. Sample Pools and Rationale: The methods section now clearly explains why pooled specimens were used for HiFi and Hi-C sequencing, including accession numbers and sequencing facilities.
3. Library Preparation Details: A brief explanation of protocols, reagents, and kits used for both HiFi and Hi-C library preparation have been included.
4. Assembly Parameters: We now specify the use of --nhap=40 in hifiasm and justify this setting based on diploidy and pooled individuals. We also explain the rationale for applying purge_dups and report the reduction in contig number.
5. Contamination Control: We clarified our approach to minimizing macroscopic contamination and acknowledged limitations in avoiding microscopic contaminants.
6. Contig and Scaffold Counts: The methods section was revised to provide a clearer description of the filtering steps and contig/scaffold numbers at each stage.
7. Sex Chromosome Detection: We expanded on why sex chromosome identification was not feasible, citing the absence of haploid coverage patterns and unknown sex determination mechanisms.
8. Repeat Annotation: A new table was added summarizing the chromosomal distribution and sequence composition of repetitive elements.
9. Read Metrics: The read length N50 of PacBio reads was added to the Results.
10. Gaps: We now report that the assembly contains 299 standardized 100 bp gaps.
11. Keywords and Phylogeny: The species name was added to the keywords. A phylogenetic analysis was not included due to lack of comparative genomic data, but we acknowledge its value for future work

To read any peer review reports and author responses for this article, follow the "read" links in the Open Peer Review table.

Introduction

Tethysbaena scabra (Pretus, 1991) (NCBI:txid203899) is a thermosbaenacean (Crustacea; Multicrustacea; Malacostraca; Eumalacostraca; Peracarida; Thermosbaenacea; Monodellidae), a relict group of peracarid crustaceans characterized by the display in gravid females of a dorsal brood pouch formed by a posterior extension of the carapace (Figure 1). This species measures 2–3 mm in length and is completely eyeless and depigmented, inhabiting subterranean waters of raised salinity in caves and wells located near the marine coast. It is endemic to the Mediterranean islands of Mallorca and Menorca (Balearic Archipelago). Its feeding habits correspond to those of a particle collector, thriving primarily in the pycnoclines that develop within the water column of anchialine caves, where organic debris, bacteria, and fungi accumulate. There is no available information on genome size and chromosome number in thermosbaenaceans. The closest taxa with known information on genome size (https://www.genomesize.com, 1C values in pg) are within the peracarid groups Isopoda (1.70-8.60); Amphipoda (0.52-64.62); and Mysida (10.81-12.00).

74a7193a-bf2f-496b-bc4b-740421d3d4f8_figure1.gif

Figure 1. Photograph of a Tethysbaena scabra (qmTetScab1) specimen.

The genome sequence from T. scabra will help to study adaptation to underground environments, particularly anchialine ones, that are characterized by oligotrophy, darkness and salinity. The genome of T. scabra was sequenced under the umbrella of the Catalan Initiative for the Earth BioGenome Project (CBP). Here we present a chromosome-level genome assembly for T. scabra from Mallorca, Spain, which represents the first reference genome for the order Thermosbaenacea.

Methods

Specimens were collected in late Spring 2022 with a modified plankton net from the bottom of a well in an old windmill at Es Pil·larí, Palma, Mallorca, Spain (39.533831, 2.747581). Specimens were sorted out under a stereo-microscope (Figure 2). Several batches of 20 specimens each were placed in a cryovial for snap-freezing in liquid nitrogen, and ulteriorly sent in dry ice to the sequencing facilities. Specimens were collected and identified by Damià Jaume. Extraction of High Molecular Weight DNA, construction of Pacific Biosciences HiFi circular consensus DNA sequencing libraries, and sequencing on Pacific Biosciences SEQUEL II (HiFi) instrument was performed by Delaware Biotechnology Institute, University of Delaware (DE, USA) using a pool of 20 specimens (Accession number: SAMEA11313135, qmTetScab1). Hi-C data was generated from another pool of 20 individuals from the same collection site (Accession number: SAMEA118091338) using the library preparation Omni-C DNA and sequenced 2 x 150 pb on the Illumina NovaSeq 6000 S4 instrument at the Centre Nacional d’Anàlisi Genòmica (CNAG), Barcelona, Spain.

74a7193a-bf2f-496b-bc4b-740421d3d4f8_figure2.gif

Figure 2. Photograph of Tethysbaena scabra specimens under magnification.

The genome size was estimated using GenomeScope2 (Vurture et al., 2017), and diploidy was confirmed with Smudgeplot (Ranallo-Benavidez et al., 2020). Assembly was conducted using hifiasm (Cheng et al., 2021) with n_hap=40 (considering diploidy and 20 individuals). Large number of haplotypic duplications presumably caused the high number of specimens used for DNA extraction were withdrawn with purge_dups (Guan et al., 2020), passing from 2208 to 1272 contigs. Genomic DNA was extracted from individuals whose size is smaller than 5 mm, therefore they were not externally cleaned so it could also contain DNA from microbial and other eukaryote contaminants. Hence, contig sequences from contaminant species were removed from assembly using two bioinformatic tools, Foreign Contamination Screen (FCS, Astashyn et al., 2024), and Whokaryote (Pronk and Medema, 2022), obtaining 993 contigs. The former achieves this by aligning assemblies, preprocessed to mask repetitive and low-complexity regions, to a curated reference database. The pipeline segments scaffolds into 100-kb subsequences and employs hashed k-mers as alignment seeds. Sequences assigned to taxonomic groups distinct from the query organism (NCBI:txid203899) were then excluded. The latter is a computational tool that differentiates eukaryotic from prokaryotic contig sequences based on fundamental differences in gene structure between the two taxonomic domains. It utilizes a Random Forests approach in combination with Tiara predictions, which incorporate k-mer frequency distributions as classification feature. The assembly was scaffolded with Hi-C data (Rao et al., 2014) using YaHS (Zhou et al., 2023), obtaining 821 scaffolds. The assembly was checked for contamination with two rounds of Blobtools, to ensure complete decontamination, obtaining 59 scaffolds. FCS and Whokaryote removed very few sequences compared to BlobToolKit because the first ones only use a close taxon reference, not available in Thermosbaenacea, and gene structure and domains, while the latter is based on several features (GC content, coverage, BUSCO reference, etc.). Curation of contact map was performed using Pretext (Harry, 2022). A final assembly is obtained with 322 contigs, structured in 23 scaffolds, which present contact patterns in the central regions, suggesting a connection between the scaffolds, which ultimately allows for a total of 17 scaffolds to be obtained. Putative sex chromosomes have not been identified, likely due to the genomic material being sourced from a pool of 20 individuals of unknown sex, and the Hi-C data being derived from a separate pool of specimens. Additionally, the coverage obtained has not been sufficient to deduce sex-linked chromosomes. The genome was analysed within the BlobToolKit environment and BUSCO scores were generated (Challis et al., 2020). Table 1 list the software tool versions used, where appropriate. To assess the assembly metrics, the k-mer completeness and QV consensus quality values were calculated using Meryl and Merqury (Rhie et al., 2020).

The assembly of mitochondrial genome failed using MitoHiFi (Uliano-Silva et al., 2023), likely due to lack in genome databanks of a mitogenome sequence of sufficiently close taxa. For this reason, sequence contigs were compared with a relaxed BLASTn algorithm against a database built with mitogenome sequences of several peracarid species. The sequence of 30 kb with a positive match was circularized in MitoMaker (Schomaker-Bastos and Prosdocimi, 2018), and annotated in Mitos2 (Donath et al., 2019).

Repetitive annotation was performed using RepeatMasker (Smit et al., 2013–2015) and RepeatOBserver (Elphinstone et al., 2025). The former tool identifies DNA low complexity regions as well as interspersed repeats. In contrast, RepeatOBserver describes tandem repeats and cluster of transposons found on a chromosome level assembly, based in repeat patterns. In also returns a predicted centromere location for each chromosome.

Results

The genome sequence was obtained from a DNA pool of 20 specimens of T. scabra for HiFi data, plus another identical pool for Hi-C data, from individuals collected in a well in Es Pil·larí, Palma, Mallorca, Spain. Two Pacific Biosciences sequencing cells yielded a total of 63.5 giga bases of high-fidelity (HiFi) long reads with a N50 of 13,270 bp, achieving a coverage of 53.8X. Afterward, primary contig assemblies were scaffolded using 73.9 Gb of paired-end Illumina reads derived from chromosome conformation Hi-C data. Manual curation corrected 39 misassemblies, including missing joins and missjoins, resulting in a 0.28% reduction in the total assembly length, a 61.02% decrease in scaffold count, and an 89.99% increase in scaffold N50. The final genome assembly spans 1.18 Gb across 23 scaffolds, with a scaffold N50 of 74.6 Mb (Figure 3, Table 2). GC-coverage (Figure 4) and cumulative sequence plots (Figure 5) from BlobToolKit showed minimal parameter variation with few outliers, and only a very low fraction of sequences failed to match Arthropoda ones deposited in databases. Most of the assembly sequence (99.2%) has been mapped to the final chromosomes. The final assembly sequence confirmed by Hi-C data was assigned to 17 chromosomal-level scaffolds that are designated as they appear in the PretextMap (Figure 6; Table 3). The assembly has a BUSCO v5.5.0 (Manni et al., 2021; Simão FA et al., 2015) completeness of 94.7% (single 93.7%, duplicated 0.7%) using the arthropoda_odb10 reference set. The mitochondrial genome contig can be found within the multifasta file of the genome submission.

74a7193a-bf2f-496b-bc4b-740421d3d4f8_figure3.gif

Figure 3. Snailplot of the genome assembly of Tethysbaena scabra, qmTetScab1.

This snailplot generated by BlobToolKit displays several metrics, including the longest scaffold, N50, and BUSCO gene completeness, among others. The main plot is segmented into 50 bins, ordered by size around the circumference, with each bin representing 2% of the 1.18 Gbp assembly. Scaffold length distribution is shown in dark grey, with the plot radius scaled to the length of the longest scaffold in the assembly (104 Mbp). Orange and light-orange arcs indicate the N50 and N90 scaffold lengths (74.6 Mbp and 55.4 Mbp, respectively). A pale grey spiral illustrates the cumulative scaffold count on a log scale, with white scale lines marking successive orders of magnitude. The blue and pale-blue areas along the plot's outer edge depict the GC, AT, and N content distribution across these bins. A summary of the BUSCO results appears in the figure’s top right corner.

Table 2. Genome data for Tethysbaena scabra, qmTetScab1.1.

Assembly metrics benchmarks are adapted from the 6.C.Q40 of Earth Biogenome Project from (Lawniczak et al., 2022). BUSCO scores based on the arthropoda_odb10 BUSCO set using v5.5.0. C = complete, [S = single copy, D = duplicated], F = fragmented, M = missing, n = number of orthologues in comparison.

Project accession data
Assembly name Tethysbaena scabra
Assembly accessionGCA_964277195
Accession of alternate haplotype-
Span (Mb)1200
Number of contigs322
Contig N50 length (Mb)6.1Mb
Number of scaffolds23
Scaffold N50 length (Mb)74.5Mb
Longest scaffold (Mb)104.45Mb
‍Gaps (bp)299 standardized 100 bp gaps
Assembly metrics Benchmark
Consensus quality (QV)50.41≥40
K-mer completeness92.5≥90
BuscoC:93.7%[S:93,D:0.7%],
F:3%,M:3.4%,n:1,013
C ≥90%, D <5%
Percentage of assembly mapped to chromosomes99.2%≥90%
OrganellesMTComplete single alleles
74a7193a-bf2f-496b-bc4b-740421d3d4f8_figure4.gif

Figure 4. Genome assembly of Tethysbaena scabra, qmTetScab1.1: BlobToolKit GC-coverage plot.

Scaffolds are shown by phylum. Circles are sized in proportion to scaffold length. Histograms show the distribution of scaffold length sum along.

74a7193a-bf2f-496b-bc4b-740421d3d4f8_figure5.gif

Figure 5. Genome assembly of Tethysbaena scabra: BlobToolKit cumulative sequence plot, qmTetScab1.1.

The gray line represents the cumulative length of all scaffolds, while the colored lines indicate the cumulative lengths of scaffolds assigned to each individual phylum.

74a7193a-bf2f-496b-bc4b-740421d3d4f8_figure6.gif

Figure 6. Genome assembly of Tethysbaena scabra, qmTetScab1: Hi-C contact map of assembly, visualised using PretextMap.

Chromosomes are shown as they appear in PretextMap, not by size order.

Table 3. Chromosomal pseudomolecules in the genome assembly of Tethysbaena scabra.

https://www.ebi.ac.uk/ena/browser/view/GCA_964277195.1?show=chromosomes.

AccessionNameLength (Mb) GC%
OZ195310 tros_183.1133.29
OZ195311 tros_2104.4633.18
OZ195312 tros_385.7233.29
OZ195313 tros_482.7933.44
OZ195314 tros_587.2033.33
OZ195315 tros_674.5633.45
OZ195316 tros_774.6733.31
OZ195317 tros_872.9833.51
OZ195318 tros_961.7033.58
OZ195319 tros_1049.1433.44
OZ195320 tros_1156.6733.72
OZ195321 tros_1270.0533.68
OZ195322 tros_1355.3533.43
OZ195323 tros_1459.3733.46
OZ195324 tros_1557.1233.76
OZ195325 tros_1655.6833.69
OZ195326 tros_1745.1033.69
OZ195327 MT0.01632.04

The genome annotation was assessed using BUSCO obtaining: C:93.1% [S:73.2%, D:19.9%], F:2.2%, M:4.7%, also 27,004 transcripts and 22,834 genes. RNAQuast has been performed to check the average alignment length, being 1248.6 bp. Repetitive regions are summarized in Table 4.

Table 4. Summary of the repetitive elements found by RepeatMasker in the genome of Tethysbaena scabra, qmTetScab1.1.

Number of elementsLength occupied %
SINEs:3,285217,586 bp0.02%
ALUs7499 bp0.00%
MIRs38130,645 bp0.00%
LINEs:100,87697,666,309 bp8.30%
LINE13,138378,718 bp0.03%
LINE247,59144,895,798 bp3.81%
L3/CR149,21052,023,230 bp4.42%
LTR elements:1,726541,618 bp0.05%
ERVL8010,534 bp0.00%
ERVL-MaLRs11812,225 bp0.00%
ERV_classI1,224337,524 bp0.03%
ERV_classII464,692 bp0.00%
DNA elements:39,90919,071,121 bp1.62%
hAT-Charlie20,9039,453,122 bp0.80%
TcMar-Tigger3,2851,466,820 bp0.12%
Unclassified203,649 bp0.00%
TotalInterspersed117,500,283 bp9.98%
Small RNA1,757176,391 bp0.01%
Satellites:9413,096 bp0.00%
Simple repeats552,45726,333,387 bp2.24%
Low complexity71,1773,444,953 bp0.29%

Ethics and consent

Ethical approval and consent were not required.

Author contributions

Conceptualization (JP, CJ, DJ, JAJR), Data Curation (KDSA, LTL, JP), Formal Analysis (LTL, KDSA, JP), Funding Acquisition (JAJR, JP), Resources (DJ), Writing – Original Draft Preparation (LTL, KDSA, JP), and Writing – Review & Editing (all).

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 14 Mar 2025
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Pons J, Schöninger-Almaraz KD, Triginer-Llabrés L et al. The genome sequence of Tethysbaena scabra (Pretus, 1991), the first known in the peracarid crustacean order Thermosbaenacea. [version 2; peer review: 1 approved, 1 approved with reservations]. F1000Research 2025, 14:293 (https://doi.org/10.12688/f1000research.161461.2)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 2
VERSION 2
PUBLISHED 04 Jul 2025
Revised
Views
4
Cite
Reviewer Report 04 Sep 2025
Martin Schwentner, Naturhistorisches Museum Vienna (Austria), Vienna, Austria 
Approved with Reservations
VIEWS 4
The authors present the first genome of the thermosbaenacean, which will be an important resource for future research. The overall manuscript is well written and structured and the methods and results are appropriate and well presented. 
I have a ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Schwentner M. Reviewer Report For: The genome sequence of Tethysbaena scabra (Pretus, 1991), the first known in the peracarid crustacean order Thermosbaenacea. [version 2; peer review: 1 approved, 1 approved with reservations]. F1000Research 2025, 14:293 (https://doi.org/10.5256/f1000research.183424.r406386)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
4
Cite
Reviewer Report 22 Aug 2025
Pascal Angst, University of Basel, Basel, Switzerland 
Approved
VIEWS 4
The revised manuscript effectively addresses the comments I made as a reviewer. I appreciate the improvements made and the attention given to the issues I raised. Most of the revisions are clear and well implemented. However, I would be interested ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Angst P. Reviewer Report For: The genome sequence of Tethysbaena scabra (Pretus, 1991), the first known in the peracarid crustacean order Thermosbaenacea. [version 2; peer review: 1 approved, 1 approved with reservations]. F1000Research 2025, 14:293 (https://doi.org/10.5256/f1000research.183424.r396911)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Version 1
VERSION 1
PUBLISHED 14 Mar 2025
Views
18
Cite
Reviewer Report 16 Apr 2025
Pascal Angst, University of Basel, Basel, Switzerland 
Approved with Reservations
VIEWS 18
This genome note presents the genome of Tethysbaena scabra. The authors sampled two pools of specimens for sequencing using PacBio HiFi and Hi-C technologies. They used latest software for assembly of sequencing reads and for assessing the assembly’s quality, completeness, ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Angst P. Reviewer Report For: The genome sequence of Tethysbaena scabra (Pretus, 1991), the first known in the peracarid crustacean order Thermosbaenacea. [version 2; peer review: 1 approved, 1 approved with reservations]. F1000Research 2025, 14:293 (https://doi.org/10.5256/f1000research.177497.r376178)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 14 Mar 2025
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.