ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Genome Note

The genome sequence of Tethysbaena scabra (Pretus, 1991), the first known in the peracarid crustacean order Thermosbaenacea.

[version 1; peer review: 1 approved with reservations]
PUBLISHED 14 Mar 2025
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Genomics and Genetics gateway.

Abstract

We present a genome assembly of Tethysbaena scabra (Arthropoda; Crustacea; Malacostraca; Eumalacostraca; Peracarida; Thermosbaenacea; Monodellidae), a species endemic to Mallorca, Spain. The genome size is 1.18 gigabases that is scaffolded into 17 chromosomes plus a mitochondrial genome of 16,5 kilobases in length.

Keywords

Thermosbaenacea, anchialine environment, stygobiont species

Introduction

Tethysbaena scabra (Pretus, 1991) (NCBI:txid203899) is a thermosbaenacean (Crustacea; Multicrustacea; Malacostraca; Eumalacostraca; Peracarida; Thermosbaenacea; Monodellidae), a relict group of peracarid crustaceans characterized by the display in gravid females of a dorsal brood pouch formed by a posterior extension of the carapace (Figure 1). This species measures 2–3 mm in length and is completely eyeless and depigmented, inhabiting subterranean waters of raised salinity in caves and wells located near the marine coast. It is endemic to the Mediterranean islands of Mallorca and Menorca (Balearic Archipelago). Its feeding habits correspond to those of a particle collector, thriving primarily in the pycnoclines that develop within the water column of anchialine caves, where organic debris, bacteria, and fungi accumulate. There is no available information on genome size and chromosome number in thermosbaenaceans. The closest taxa with known information on genome size (https://www.genomesize.com, 1C values in pg) are within the peracarid groups Isopoda (1.70-8.60); Amphipoda (0.52-64.62); and Mysida (10.81-12.00).

e412e3ec-3a76-4efa-a75e-e76a42596820_figure1.gif

Figure 1. Photograph of a Tethysbaena scabra (qmTetScab1) specimen.

The genome sequence from T. scabra will help to study adaptation to underground environments, particularly anchialine ones, that are characterized by oligotrophy, darkness and salinity. The genome of T. scabra was sequenced under the umbrella of the Catalan Initiative for the Earth BioGenome Project (CBP). Here we present a chromosome-level genome assembly for T. scabra from Mallorca, Spain, which represents the first reference genome for the order Thermosbaenacea.

Methods

Specimens were collected in late Spring 2022 with a modified plankton net from the bottom of a well in an old windmill at Es Pil·larí, Palma, Mallorca, Spain (39.533831, 2.747581). Specimens were sorted out under a stereo-microscope (Figure 2). Several batches of 20 specimens each were placed in a cryovial for snap-freezing in liquid nitrogen, and ulteriorly sent in dry ice to the sequencing facilities. Specimens were collected and identified by Damià Jaume. Extraction of High Molecular Weight DNA, construction of Pacific Biosciences HiFi circular consensus DNA sequencing libraries, and sequencing on Pacific Biosciences SEQUEL II (HiFi) instrument was performed by Delaware Biotechnology Institute, University of Delaware (DE, USA) using a pool of 20 specimens (qmTetScab1). Hi-C data was generated from another pool of 20 individuals from the same collection site using the library preparation Omni-C DNA and sequenced 2 x 150 pb on the Illumina NovaSeq 6000 S4 instrument at the Centre Nacional d’Anàlisi Genòmica (CNAG), Barcelona, Spain.

e412e3ec-3a76-4efa-a75e-e76a42596820_figure2.gif

Figure 2. Photograph of Tethysbaena scabra specimens under magnification.

The genome size was estimated using GenomeScope2 (Vurture et al., 2017), and diploidy was confirmed with Smudgeplot (Ranallo-Benavidez et al., 2020). Assembly was conducted using hifiasm (Cheng et al., 2021) and haplotypic duplications were withdrawn with purge_dups (Guan et al., 2020), having obtained 2208 and 1272 contigs, respectively. Genomic DNA was extracted from individuals that were not externally cleaned so it could also contain DNA from microbial and other eukaryote contaminants. Hence, contig sequences from contaminant species were removed from assembly using two bioinformatic tools, Foreign Contamination Screen (FCS, Astashyn et al., 2024), and Whokaryote (Pronk and Medema, 2022), obtaining 993 contigs. The former achieves this by aligning assemblies, preprocessed to mask repetitive and low-complexity regions, to a curated reference database. The pipeline segments scaffolds into 100-kb subsequences and employs hashed k-mers as alignment seeds. Sequences assigned to taxonomic groups distinct from the query organism (NCBI:txid203899) were then excluded. The latter is a computational tool that differentiates eukaryotic from prokaryotic contig sequences based on fundamental differences in gene structure between the two taxonomic domains. It utilizes a Random Forests approach in combination with Tiara predictions, which incorporate k-mer frequency distributions as classification feature. The assembly was scaffolded with Hi-C data (Rao et al., 2014) using YaHS (Zhou et al., 2023). After performing the previous steps, 821 contigs were obtained. The assembly was checked for contamination with two rounds of Blobtools, to ensure complete decontamination, obtaining 59 contigs. Curation of contact map was performed using Pretext (Harry, 2022). Putative sex chromosomes have not been identified, likely due to the genomic material being sourced from a pool of 20 individuals of unknown sex, and the Hi-C data being derived from a separate pool of specimens. Additionally, the coverage obtained has not been sufficient to deduce sex-linked chromosomes. The genome was analysed within the BlobToolKit environment and BUSCO scores were generated (Challis et al., 2020). Table 1 list the software tool versions used, where appropriate. To assess the assembly metrics, the k-mer completeness and QV consensus quality values were calculated using Meryl and Merqury (Rhie et al., 2020).

The assembly of mitochondrial genome failed using MitoHiFi (Uliano-Silva et al., 2023), likely due to lack in genome databanks of a mitogenome sequence of sufficiently close taxa. For this reason, sequence contigs were compared with a relaxed BLASTn algorithm against a database built with mitogenome sequences of several peracarid species. The sequence of 30 kb with a positive match was circularized in MitoMaker (Schomaker-Bastos and Prosdocimi, 2018), and annotated in Mitos2 (Donath et al., 2019).

Results

The genome sequence was obtained from a DNA pool of 20 specimens of T. scabra for HiFi data, plus another identical pool for Hi-C data, from individuals collected in a well in Es Pil·larí, Palma, Mallorca, Spain. Two Pacific Biosciences sequencing cells yielded a total of 63.5 giga bases of high-fidelity (HiFi) long reads, achieving a coverage of 53.8X. Afterward, primary contig assemblies were scaffolded using 73.9 Gb of paired-end Illumina reads derived from chromosome conformation Hi-C data. Manual curation corrected 39 misassemblies, including missing joins and missjoins, resulting in a 0.28% reduction in the total assembly length, a 61.02% decrease in scaffold count, and an 89.99% increase in scaffold N50. The final genome assembly spans 1.18 Gb across 23 scaffolds, with a scaffold N50 of 74.6 Mb (Figure 3, Table 2). GC-coverage (Figure 4) and cumulative sequence plots (Figure 5) from BlobToolKit showed minimal parameter variation with few outliers, and only a very low fraction of sequences failed to match Arthropoda ones deposited in databases. Most of the assembly sequence (99.2%) has been mapped to the final chromosomes. The final assembly sequence confirmed by Hi-C data was assigned to 17 chromosomal-level scaffolds that are designated as they appear in the PretextMap (Figure 6; Table 3). The assembly has a BUSCO v5.5.0 (Manni et al., 2021; Simão FA et al., 2015) completeness of 94.7% (single 93.7%, duplicated 0.7%) using the arthropoda_odb10 reference set. The mitochondrial genome contig can be found within the multifasta file of the genome submission.

e412e3ec-3a76-4efa-a75e-e76a42596820_figure3.gif

Figure 3. Snailplot of the genome assembly of Tethysbaena scabra, qmTetScab1.

This snailplot generated by BlobToolKit displays several metrics, including the longest scaffold, N50, and BUSCO gene completeness, among others. The main plot is segmented into 50 bins, ordered by size around the circumference, with each bin representing 2% of the 1.18 Gbp assembly. Scaffold length distribution is shown in dark grey, with the plot radius scaled to the length of the longest scaffold in the assembly (104 Mbp). Orange and light-orange arcs indicate the N50 and N90 scaffold lengths (74.6 Mbp and 55.4 Mbp, respectively). A pale grey spiral illustrates the cumulative scaffold count on a log scale, with white scale lines marking successive orders of magnitude. The blue and pale-blue areas along the plot's outer edge depict the GC, AT, and N content distribution across these bins. A summary of the BUSCO results appears in the figure’s top right corner.

Table 2. Genome data for Tethysbaena scabra, qmTetScab1.1.

Assembly metrics benchmarks are adapted from the 6.C.Q40 of Earth Biogenome Project from (Lawniczak et al., 2022). BUSCO scores based on the arthropoda_odb10 BUSCO set using v5.5.0. C = complete, [S = single copy, D = duplicated], F = fragmented, M = missing, n = number of orthologues in comparison.

Project accession data
Assembly nameTethysbaena scabra
Assembly accessionGCA_964277195
Accession of alternate haplotype-
Span (Mb)1200
Number of contigs322
Contig N50 length (Mb)6.1Mb
Number of scaffolds23
Scaffold N50 length (Mb)74.5Mb
Longest scaffold (Mb)104.45Mb
Assembly metrics Benchmark
Consensus quality (QV)50.41≥40
K-mer completeness92.5≥90
BuscoC:93.7%[S:93,D:0.7%],
F:3%,M:3.4%,n:1,013
C ≥90%, D <5%
Percentage of assembly mapped to chromosomes99.2%≥90%
OrganellesMTComplete single alleles
e412e3ec-3a76-4efa-a75e-e76a42596820_figure4.gif

Figure 4. Genome assembly of Tethysbaena scabra, qmTetScab1.1: BlobToolKit GC-coverage plot.

Scaffolds are shown by phylum. Circles are sized in proportion to scaffold length. Histograms show the distribution of scaffold length sum along.

e412e3ec-3a76-4efa-a75e-e76a42596820_figure5.gif

Figure 5. Genome assembly of Tethysbaena scabra: BlobToolKit cumulative sequence plot, qmTetScab1.1.

The gray line represents the cumulative length of all scaffolds, while the colored lines indicate the cumulative lengths of scaffolds assigned to each individual phylum.

e412e3ec-3a76-4efa-a75e-e76a42596820_figure6.gif

Figure 6. Genome assembly of Tethysbaena scabra, qmTetScab1: Hi-C contact map of assembly, visualised using PretextMap.

Chromosomes are shown as they appear in PretextMap, not by size order.

Table 3. Chromosomal pseudomolecules in the genome assembly of Tethysbaena scabra.

https://www.ebi.ac.uk/ena/browser/view/GCA_964277195.1?show=chromosomes.

AccessionNameLength (Mb) GC%
OZ195310 tros_183.1133.29
OZ195311 tros_2104.4633.18
OZ195312 tros_385.7233.29
OZ195313 tros_482.7933.44
OZ195314 tros_587.2033.33
OZ195315 tros_674.5633.45
OZ195316 tros_774.6733.31
OZ195317 tros_872.9833.51
OZ195318 tros_961.7033.58
OZ195319 tros_1049.1433.44
OZ195320 tros_1156.6733.72
OZ195321 tros_1270.0533.68
OZ195322 tros_1355.3533.43
OZ195323 tros_1459.3733.46
OZ195324 tros_1557.1233.76
OZ195325 tros_1655.6833.69
OZ195326 tros_1745.1033.69
OZ195327 MT0.01632.04

Ethics and consent

Ethical approval and consent were not required.

Author contributions

Conceptualization (JP, CJ, DJ, JAJR), Data Curation (KDSA, LTL, JP), Formal Analysis (LTL, KDSA, JP), Funding Acquisition (JAJR, JP), Resources (DJ), Writing – Original Draft Preparation (LTL, KDSA, JP), and Writing – Review & Editing (all).

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 14 Mar 2025
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Pons J, Schöninger-Almaraz KD, Triginer-Llabrés L et al. The genome sequence of Tethysbaena scabra (Pretus, 1991), the first known in the peracarid crustacean order Thermosbaenacea. [version 1; peer review: 1 approved with reservations]. F1000Research 2025, 14:293 (https://doi.org/10.12688/f1000research.161461.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 14 Mar 2025
Views
18
Cite
Reviewer Report 16 Apr 2025
Pascal Angst, University of Basel, Basel, Switzerland 
Approved with Reservations
VIEWS 18
This genome note presents the genome of Tethysbaena scabra. The authors sampled two pools of specimens for sequencing using PacBio HiFi and Hi-C technologies. They used latest software for assembly of sequencing reads and for assessing the assembly’s quality, completeness, ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Angst P. Reviewer Report For: The genome sequence of Tethysbaena scabra (Pretus, 1991), the first known in the peracarid crustacean order Thermosbaenacea. [version 1; peer review: 1 approved with reservations]. F1000Research 2025, 14:293 (https://doi.org/10.5256/f1000research.177497.r376178)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 14 Mar 2025
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.