Introduction

F1000Research

2046-1402

F1000 Research Limited

London, UK

10.12688/f1000research.161461.1

Genome Note

Articles

The genome sequence of Tethysbaena scabra (Pretus, 1991), the first known in the peracarid crustacean order Thermosbaenacea.

[version 1; peer review: 1 approved with reservations]

Pons

Joan

Conceptualization Data Curation Formal Analysis Funding Acquisition Writing – Original Draft Preparation Writing – Review & Editing https://orcid.org/0000-0002-4683-8840 a 1 Schöninger-Almaraz

Karen D.

Data Curation Formal Analysis Writing – Original Draft Preparation Writing – Review & Editing https://orcid.org/0009-0006-8477-2279 2 Triginer-Llabrés

Laura

Data Curation Formal Analysis Writing – Original Draft Preparation Writing – Review & Editing https://orcid.org/0009-0002-4680-1172 2 Juan

Carlos

Conceptualization Writing – Review & Editing https://orcid.org/0000-0002-6067-2963 1 3 Jaume

Damià

Conceptualization Resources Writing – Review & Editing 1 Jurado-Rivera

José A.

Conceptualization Funding Acquisition Writing – Review & Editing https://orcid.org/0000-0003-0999-2803 3 1Animal and Microbial Biodiversity, Institut Mediterrani d'Estudis Avancats, Esporles, Illes Balears, 07190, Spain 2Centre Balear de Biodiversitat, Departament de Biologia, Universitat de les Illes Balears, Palma, Balearic Islands, 07122, Spain 3Biologia, Universitat de les Illes Balears, Palma, Balearic Islands, 07122, Spain

a jpons@imedea.uib-csic.es

No competing interests were disclosed.

14 3 2025

2025

293

3 3 2025

2025

This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

We present a genome assembly of Tethysbaena scabra (Arthropoda; Crustacea; Malacostraca; Eumalacostraca; Peracarida; Thermosbaenacea; Monodellidae), a species endemic to Mallorca, Spain. The genome size is 1.18 gigabases that is scaffolded into 17 chromosomes plus a mitochondrial genome of 16,5 kilobases in length.

Thermosbaenacea anchialine environment stygobiont species

Govern de les Illes Balears

Conselleriad’EducacióIUniversitatsandbytheEuropeanUnion-NextGenerationEU(BIO2022/013A)

Institut d'Estudis Catalans

CatalanBiogemomeProject(PRO2021-S02-Jurado)

Funding: This work has been partially sponsored and promoted by Institut d'Estudis Catalans (Catalan Biogemome Project grant PRO2021-S02-Jurado) and the Govern de les Illes Balears - Conselleria d’Educació i Universitats and by the European Union - Next Generation EU (BIO2022/013A). KDSA and LTL’s work was partially funded by the Govern de les Illes Balears - Conselleria d’Educació i Universitats and by the European Union - Next Generation EU/PRTR-C17. I1. Nevertheless, the views and opinions expressed are solely those of the authors, and do not necessarily reflect those of the Conselleria d’Educació i Universitats, the European Union or the European Commission. Therefore, none of these organizations are to be held responsible.

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Introduction

Tethysbaena scabra (Pretus, 1991) (NCBI:txid203899) is a thermosbaenacean (Crustacea; Multicrustacea; Malacostraca; Eumalacostraca; Peracarida; Thermosbaenacea; Monodellidae), a relict group of peracarid crustaceans characterized by the display in gravid females of a dorsal brood pouch formed by a posterior extension of the carapace ( Figure 1). This species measures 2–3 mm in length and is completely eyeless and depigmented, inhabiting subterranean waters of raised salinity in caves and wells located near the marine coast. It is endemic to the Mediterranean islands of Mallorca and Menorca (Balearic Archipelago). Its feeding habits correspond to those of a particle collector, thriving primarily in the pycnoclines that develop within the water column of anchialine caves, where organic debris, bacteria, and fungi accumulate. There is no available information on genome size and chromosome number in thermosbaenaceans. The closest taxa with known information on genome size ( https://www.genomesize.com, 1C values in pg) are within the peracarid groups Isopoda (1.70-8.60); Amphipoda (0.52-64.62); and Mysida (10.81-12.00).

Figure 1. Photograph of a <italic toggle="yes">Tethysbaena scabra</italic> (qmTetScab1) specimen.

The genome sequence from T. scabra will help to study adaptation to underground environments, particularly anchialine ones, that are characterized by oligotrophy, darkness and salinity. The genome of T. scabra was sequenced under the umbrella of the Catalan Initiative for the Earth BioGenome Project (CBP). Here we present a chromosome-level genome assembly for T. scabra from Mallorca, Spain, which represents the first reference genome for the order Thermosbaenacea.

Methods

Specimens were collected in late Spring 2022 with a modified plankton net from the bottom of a well in an old windmill at Es Pil·larí, Palma, Mallorca, Spain (39.533831, 2.747581). Specimens were sorted out under a stereo-microscope ( Figure 2). Several batches of 20 specimens each were placed in a cryovial for snap-freezing in liquid nitrogen, and ulteriorly sent in dry ice to the sequencing facilities. Specimens were collected and identified by Damià Jaume. Extraction of High Molecular Weight DNA, construction of Pacific Biosciences HiFi circular consensus DNA sequencing libraries, and sequencing on Pacific Biosciences SEQUEL II (HiFi) instrument was performed by Delaware Biotechnology Institute, University of Delaware (DE, USA) using a pool of 20 specimens (qmTetScab1). Hi-C data was generated from another pool of 20 individuals from the same collection site using the library preparation Omni-C DNA and sequenced 2 x 150 pb on the Illumina NovaSeq 6000 S4 instrument at the Centre Nacional d’Anàlisi Genòmica (CNAG), Barcelona, Spain.

Figure 2. Photograph of <italic toggle="yes">Tethysbaena scabra</italic> specimens under magnification.

The genome size was estimated using GenomeScope2 ( Vurture et al., 2017), and diploidy was confirmed with Smudgeplot ( Ranallo-Benavidez et al., 2020). Assembly was conducted using hifiasm ( Cheng et al., 2021) and haplotypic duplications were withdrawn with purge_dups ( Guan et al., 2020), having obtained 2208 and 1272 contigs, respectively. Genomic DNA was extracted from individuals that were not externally cleaned so it could also contain DNA from microbial and other eukaryote contaminants. Hence, contig sequences from contaminant species were removed from assembly using two bioinformatic tools, Foreign Contamination Screen (FCS, Astashyn et al., 2024), and Whokaryote ( Pronk and Medema, 2022), obtaining 993 contigs. The former achieves this by aligning assemblies, preprocessed to mask repetitive and low-complexity regions, to a curated reference database. The pipeline segments scaffolds into 100-kb subsequences and employs hashed k-mers as alignment seeds. Sequences assigned to taxonomic groups distinct from the query organism (NCBI:txid203899) were then excluded. The latter is a computational tool that differentiates eukaryotic from prokaryotic contig sequences based on fundamental differences in gene structure between the two taxonomic domains. It utilizes a Random Forests approach in combination with Tiara predictions, which incorporate k-mer frequency distributions as classification feature. The assembly was scaffolded with Hi-C data ( Rao et al., 2014) using YaHS ( Zhou et al., 2023). After performing the previous steps, 821 contigs were obtained. The assembly was checked for contamination with two rounds of Blobtools, to ensure complete decontamination, obtaining 59 contigs. Curation of contact map was performed using Pretext ( Harry, 2022). Putative sex chromosomes have not been identified, likely due to the genomic material being sourced from a pool of 20 individuals of unknown sex, and the Hi-C data being derived from a separate pool of specimens. Additionally, the coverage obtained has not been sufficient to deduce sex-linked chromosomes. The genome was analysed within the BlobToolKit environment and BUSCO scores were generated ( Challis et al., 2020). Table 1 list the software tool versions used, where appropriate. To assess the assembly metrics, the k-mer completeness and QV consensus quality values were calculated using Meryl and Merqury ( Rhie et al., 2020).

Table 1. Software tools: versions and sources.

Software tool	Version	Source
Blastn	2.12.0+	https://blast.ncbi.nlm.nih.gov/doc/blast-help/downloadblastdata.html
BlobToolKit	4.3.5	https://github.com/blobtoolkit/blobtoolkit
BUSCO	5.5.0	https://gitlab.com/ezlab/busco/-/archive/5.5.0/busco-5.5.0.zip
FCS	0.5.3	https://github.com/ncbi/fcs
GenomeScope2	2.0	https://github.com/tbenavi1/genomescope2.0
Hifiasm	0.20.0-r639	https://github.com/chhylp123/hifiasm
Merqury	1.3	https://github.com/marbl/merqury
Meryl	1.4.1	https://github.com/marbl/meryl
PretextMap	0.1.9	https://github.com/sanger-tol/PretextMap
Purge_dups	1.2.5	https://github.com/dfguan/purge_dups
Smudgeplot	0.3.0	https://github.dev/KamilSJaron/smudgeplot
Whokaryote	1.1.2	https://github.com/LottePronk/whokaryote
YaHS	1.2	https://github.com/c-zhou/yahs

The assembly of mitochondrial genome failed using MitoHiFi ( Uliano-Silva et al., 2023), likely due to lack in genome databanks of a mitogenome sequence of sufficiently close taxa. For this reason, sequence contigs were compared with a relaxed BLASTn algorithm against a database built with mitogenome sequences of several peracarid species. The sequence of 30 kb with a positive match was circularized in MitoMaker ( Schomaker-Bastos and Prosdocimi, 2018), and annotated in Mitos2 ( Donath et al., 2019).

Results

The genome sequence was obtained from a DNA pool of 20 specimens of T. scabra for HiFi data, plus another identical pool for Hi-C data, from individuals collected in a well in Es Pil·larí, Palma, Mallorca, Spain. Two Pacific Biosciences sequencing cells yielded a total of 63.5 giga bases of high-fidelity (HiFi) long reads, achieving a coverage of 53.8X. Afterward, primary contig assemblies were scaffolded using 73.9 Gb of paired-end Illumina reads derived from chromosome conformation Hi-C data. Manual curation corrected 39 misassemblies, including missing joins and missjoins, resulting in a 0.28% reduction in the total assembly length, a 61.02% decrease in scaffold count, and an 89.99% increase in scaffold N50. The final genome assembly spans 1.18 Gb across 23 scaffolds, with a scaffold N50 of 74.6 Mb ( Figure 3, Table 2). GC-coverage ( Figure 4) and cumulative sequence plots ( Figure 5) from BlobToolKit showed minimal parameter variation with few outliers, and only a very low fraction of sequences failed to match Arthropoda ones deposited in databases. Most of the assembly sequence (99.2%) has been mapped to the final chromosomes. The final assembly sequence confirmed by Hi-C data was assigned to 17 chromosomal-level scaffolds that are designated as they appear in the PretextMap ( Figure 6; Table 3). The assembly has a BUSCO v5.5.0 ( Manni et al., 2021; Simão FA et al., 2015) completeness of 94.7% (single 93.7%, duplicated 0.7%) using the arthropoda_odb10 reference set. The mitochondrial genome contig can be found within the multifasta file of the genome submission.

Figure 3. Snailplot of the genome assembly of <italic toggle="yes">Tethysbaena scabra</italic>, qmTetScab1.

This snailplot generated by BlobToolKit displays several metrics, including the longest scaffold, N50, and BUSCO gene completeness, among others. The main plot is segmented into 50 bins, ordered by size around the circumference, with each bin representing 2% of the 1.18 Gbp assembly. Scaffold length distribution is shown in dark grey, with the plot radius scaled to the length of the longest scaffold in the assembly (104 Mbp). Orange and light-orange arcs indicate the N50 and N90 scaffold lengths (74.6 Mbp and 55.4 Mbp, respectively). A pale grey spiral illustrates the cumulative scaffold count on a log scale, with white scale lines marking successive orders of magnitude. The blue and pale-blue areas along the plot's outer edge depict the GC, AT, and N content distribution across these bins. A summary of the BUSCO results appears in the figure’s top right corner.

Table 2. Genome data for <italic toggle="yes">Tethysbaena scabra</italic>, qmTetScab1.1.

Assembly metrics benchmarks are adapted from the 6.C.Q40 of Earth Biogenome Project from ( Lawniczak et al., 2022). BUSCO scores based on the arthropoda_odb10 BUSCO set using v5.5.0. C = complete, [S = single copy, D = duplicated], F = fragmented, M = missing, n = number of orthologues in comparison.

Project accession data
Assembly name	Tethysbaena scabra
Assembly accession	GCA_964277195
Accession of alternate haplotype	-
Span (Mb)	1200
Number of contigs	322
Contig N50 length (Mb)	6.1Mb
Number of scaffolds	23
Scaffold N50 length (Mb)	74.5Mb
Longest scaffold (Mb)	104.45Mb
Assembly metrics		Benchmark
Consensus quality (QV)	50.41	≥40
K-mer completeness	92.5	≥90
Busco	C:93.7%[S:93,D:0.7%], F:3%,M:3.4%,n:1,013	C ≥90%, D <5%
Percentage of assembly mapped to chromosomes	99.2%	≥90%
Organelles	MT	Complete single alleles

Figure 4. Genome assembly of <italic toggle="yes">Tethysbaena scabra, </italic> qmTetScab1.1: BlobToolKit GC-coverage plot.

Scaffolds are shown by phylum. Circles are sized in proportion to scaffold length. Histograms show the distribution of scaffold length sum along.

Figure 5. Genome assembly of <italic toggle="yes">Tethysbaena scabra</italic>: BlobToolKit cumulative sequence plot, qmTetScab1.1.

The gray line represents the cumulative length of all scaffolds, while the colored lines indicate the cumulative lengths of scaffolds assigned to each individual phylum.

Figure 6. Genome assembly of <italic toggle="yes">Tethysbaena scabra, </italic> qmTetScab1: Hi-C contact map of assembly, visualised using PretextMap.

Chromosomes are shown as they appear in PretextMap, not by size order.

Table 3. Chromosomal pseudomolecules in the genome assembly of <italic toggle="yes">Tethysbaena scabra.</italic>

https://www.ebi.ac.uk/ena/browser/view/GCA_964277195.1?show=chromosomes.

Accession	Name	Length (Mb)	GC%
OZ195310	tros_1	83.11	33.29
OZ195311	tros_2	104.46	33.18
OZ195312	tros_3	85.72	33.29
OZ195313	tros_4	82.79	33.44
OZ195314	tros_5	87.20	33.33
OZ195315	tros_6	74.56	33.45
OZ195316	tros_7	74.67	33.31
OZ195317	tros_8	72.98	33.51
OZ195318	tros_9	61.70	33.58
OZ195319	tros_10	49.14	33.44
OZ195320	tros_11	56.67	33.72
OZ195321	tros_12	70.05	33.68
OZ195322	tros_13	55.35	33.43
OZ195323	tros_14	59.37	33.46
OZ195324	tros_15	57.12	33.76
OZ195325	tros_16	55.68	33.69
OZ195326	tros_17	45.10	33.69
OZ195327	MT	0.016	32.04

Ethics and consent

Ethical approval and consent were not required.

Author contributions

Conceptualization (JP, CJ, DJ, JAJR), Data Curation (KDSA, LTL, JP), Formal Analysis (LTL, KDSA, JP), Funding Acquisition (JAJR, JP), Resources (DJ), Writing – Original Draft Preparation (LTL, KDSA, JP), and Writing – Review & Editing (all).

Data and software availability

The Tethysbaena scabra genome project is integrated into the Catalan Initiative for the Earth BioGenome Project (CBP), and all raw data and assembly were deposited in European Nucleotide Archive: Tethysbaena scabra. Accession number PRJEB61927; https://identifiers.org/ena.embl/PRJEB61927. (IMEDEA, 2024). Raw data and assembly accession identifiers are reported in Table 3.

Acknowledgements

We are thankful to the bioinformaticians Jessica Gómez-Garrido and Tyler Alioto (Centre Nacional d’Anàlisi Genòmic, CNAG) and Emilio Righi (Centre for Genomic Regulation, CRG), both in Barcelona (Spain), for their invaluable assistance.

References

Astashyn

Tvedte

Sweeney

: Rapid and sensitive detection of genome contamination at scale with FCS-GX. Genome Biol. 2024;25(1):60. 38409096

10.5281/zenodo.10651084

PMC10898089

Challis

Richards

Rajan

: BlobToolKit–interactive quality assessment of genome assemblies. G3: Genes, Genomes. Genetics. 2020;10(4):1361–1374. 32071071

10.1534/g3.119.400908

PMC7144090

Cheng

Concepcion

Feng

: Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods. 2021;18(2):170–175. 33526886

10.1038/s41592-020-01056-5

PMC7961889

Donath

Jühling

Al-Arab

: Improved annotation of protein-coding genes boundaries in metazoan mitochondrial genomes. Nucleic Acids Res. 2019;47(20):10543–10552. 31584075

10.1093/nar/gkz833

PMC6847864

Guan

McCarthy

Wood

: Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics. 2020;36(9):2896–2898. 31971576

10.1093/bioinformatics/btaa025

PMC7203741

Harry

: PretextView (Paired REad TEXTure Viewer): A desktop application for viewing pretext contact maps. 2022. Reference Source

Manni

Berkeley

Seppey

: BUSCO: assessing genomic data quality and beyond. Curr. Protoc. 2021;1:e323. 34936221

10.1002/cpz1.323

Lawniczak

Durbin

Flicek

: Standards recommendations for the earth BioGenome project. Proc. Natl. Acad. Sci. 2022;119(4):e2115639118. 35042802

10.1073/pnas.2115639118

PMC8795494

Pronk

Medema

: Whokaryote: distinguishing eukaryotic and prokaryotic contigs in metagenomes based on gene structure. Microb. Genom. 2022;8:000823. 35503723

10.1099/mgen.0.000823

PMC9465069

Ranallo-Benavidez

Jaron

Schatz

: GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat. Commun. 2020;11(1):1432. 32188846

10.1038/s41467-020-14998-3

PMC7080791

Rao

Huntley

Durand

: A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014;159(7):1665–1680. 25497547

10.1016/j.cell.2014.11.021

PMC5635824

Rhie

Walenz

Koren

: Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 2020;21:227–245. 32928274

10.1186/s13059-020-02134-9

PMC7488777

Schomaker-Bastos

Prosdocimi

: mitoMaker: a pipeline for automatic assembly and annotation of animal mitochondria using raw NGS data. Preprints. 2018. 10.20944/preprints201808.0423.v1

Simão

Waterhouse

Ioannidis

: BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31(19):3210–3212. 10.1093/bioinformatics/btv351

Uliano-Silva

Ferreira

JGR

Krasheninnikova

: MitoHiFi: a python pipeline for mitochondrial genome assembly from PacBio high fidelity reads. BMC Bioinformatics. 2023;24(1):288. 37464285

10.1101/2022.12.23.521667

PMC10354987

Vurture

Sedlazeck

Nattestad

: GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics. 2017;33(14):2202–2204. 28369201

10.1093/bioinformatics/btx153

PMC5870704

Zhou

McCarthy

Durbin

: YaHS: yet another Hi-C scaffolding tool. Bioinformatics. 2023;39(1):btac808. 36525368

10.1093/bioinformatics/btac808

PMC9848053

10.5256/f1000research.177497.r376178

Reviewer response for version 1

Angst

Pascal

1 Referee https://orcid.org/0000-0002-8654-2251 1University of Basel, Basel, Switzerland

Competing interests: No competing interests were disclosed.

16 4 2025

2025

This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

recommendation

approve-with-reservations

This genome note presents the genome of Tethysbaena scabra. The authors sampled two pools of specimens for sequencing using PacBio HiFi and Hi-C technologies. They used latest software for assembly of sequencing reads and for assessing the assembly’s quality, completeness, and contamination level. They discussed potential sources of contamination and they separated target versus non-target, contaminant contigs based on specialized software. Generally, this article is sound, but I identified a few inconsistencies and lack of detail which I would like the authors to address. I also expected more details on the sequence (variation) of the genome from a genome note, e.g., a summary of the repeat content and other annotations.

Details on the mandatory reviewer questions:

The protocols and the work seem technically sound but would profit from more details. For example, it is mentioned that pools of specimens were used for sequencing but not why this was done. Clearly, sequencing a single individual would be preferable. Related to this, the methods state that two pools of 20 specimens were sampled. However, the results state that “identical” pools were used. The word identical is confusing in that context, because in the methods the pools are described as separate pools. Also, on NCBI, there is only one BioSample (a batch of 20 individuals) registered and is linked to the Illumina and the PacBio sequencing, which is not what is described in the article. The BioSample should either be a batch of 40 individuals or there should be two BioSamples of 20 individuals each.

Details on DNA extraction and library preparations are missing. To reproduce the work or to apply it to other systems, it is necessary to know what kits, reagents, and protocols were used.

It is not clear how many contigs remained after each step in the methods. What was the number of contigs after Hi-C scaffolding? In the method is says 821. It also says 59 contigs were obtained after applying BlobToolKit. Are the other 762 (821 - 59) contigs from contaminants? How does that align with the 322 contigs mentioned in Table 2 and the 17 scaffolds mentioned in the Results?

The genome is available from NCBI. However, there are no annotations provided. At least a description of the repetitive content would be valuable. Repetitive content seems to have been assessed since assemblies were "preprocessed to mask repetitive and low-complexity regions".

Additional aspects:

The completeness of the assembly was assessed using BUSCOs and k-mers. Given the chromosome-level assembly, it would be valuable to describe the sequence content and arrangement, and the structural variation. For example, what is the telomeric end repeat motif; what characterizes the centromeres (GC content, sequence content, repeat content); what is the distribution of repeat versus genic content?

It would be important to know the average read lengths or read length N50s.

The keywords should include the full species name.

Are there assembly gaps? What is their size?

There is no phylogenetic analysis. I suggest including one or refer to a previous solid (whole genome) phylogeny.

Are the datasets clearly presented in a usable and accessible format, and the assembly and annotation available in an appropriate subject-specific repository?

Partly

Are sufficient details of the sequencing and extraction, software used, and materials provided to allow replication by others?

Partly

Are the rationale for sequencing the genome and the species significance clearly described?

Partly

Are the protocols appropriate and is the work technically sound?

Partly

Reviewer Expertise:

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Pons

Joan

Animal&Microbial Biodiversity, Institut Mediterrani d'Estudis Avancats, Esporles, Illes Balears, Spain

Competing interests: No competing interests

6 6 2025

The main rationale presented for sequencing the genome was the Catalan Initiative for the Earth BioGenome Project (CBP). It would thus be nice to have a short explanation of what that is and what its significance is. ** We added additional information as suggested. “The Catalan Biogenome is EBP-affiliated project network with the objective of sequencing the genome of more than 40000 eukaryotic species living in the Catalan Linguistic Area (such as Balearic Islands)”.

The protocols and the work seem technically sound but would profit from more details. For example, it is mentioned that pools of specimens were used for sequencing but not why this was done. ** We agree that our wording was confusing so we rewrote the text to clarify the issue “Extraction of high molecular weight DNA, construction of Pacific Biosciences HiFi circular consensus DNA sequencing libraries, and sequencing on Pacific Biosciences SEQUEL II (HiFi) instrument was performed by Delaware Biotechnology Institute, University of Delaware (DE, USA) using a pool of 20 specimens (Accession number: SAMEA113414145 qmTetScab1). Hi-C data was generated from another pool of 20 individuals from the same collection site (Accession number: SAMEA118091338) using the library preparation Omni-C DNA and sequenced 2 x 150 pb on the Illumina NovaSeq 6000 S4 instrument at the Centre Nacional de Seqüenciació Genòmica (CNAG), Barcelona, Spain."

Details on DNA extraction and library preparations are missing. To reproduce the work or to apply it to other systems, it is necessary to know what kits, reagents, and protocols were used. ** For the HIFI sequencing, a DNA library was prepared using the PacBio SMRTbell Express Template Prep Kit 2.0 (Pacific Biosciences of California, CA, USA) following the official protocol. Approximately 300 ng of high-quality genomic DNA was sheared to ~15–20 kb, repaired, and ligated with SMRTbell adapters to create circular molecules. The library was size-selected to remove fragments smaller than 1,000 bp, purified with AMPure PB beads, and quality-checked using Qubit and TapeStation. Finally, it was sequenced on the PacBio Sequel II platform in CCS mode to generate highly accurate HiFi reads suitable for genome assembly and analysis. These previous steps were performed in the University of Delaware (USA). ** Hi-C libraries were prepared using the Omni-C Dovetail protocol (Cantara Bio, CA, USA). Briefly, chromatin was extracted from frozen tissue and crosslinked with formaldehyde to preserve the native three-dimensional genome architecture by stabilizing DNA-protein and DNA-DNA interactions within the nucleus. The crosslinked chromatin was then fragmented using DNase I, and spatially proximal DNA ends were ligated to capture physical interactions between genomic regions, providing long-range linkage information useful for validating and scaffolding genome assemblies. Hi-C library preparation and sequencing were performed at the Centre Nacional d’Anàlisi Genòmica (CNAG), Barcelona, Spain.

To replicate the assembly process and subsequent assembly modification the parameters used for the software are missing. Especially, what parameters were used for hifiasm? Given it was not designed for assembly of multiple specimens’ genomes, were the parameters adjusted accordingly? It seems hifiasm had issues in the collapsing step since the authors applied purge_dups, which lead to a great reduction in the number of contigs. It is normally not desired to purge haplotigs from hifiasm assemblies and if applied does not result in such a drastic reduction in the number of contigs. I missed discussion of all of this. ** The number of haplotypes was considered using the --nhap parameter in hifiasm. Given that our species is presumably diploid, as estimated by Smudgeplot, and the sequencing pool i ncluded 20 individuals, the parameter was set to nhap=40. However, d ue to the presence of multiple individuals, hifiasm may struggle to accurately resolve haplotypes, which may necessitate the use of purge_dups to remove redundant contigs. ** We suspect that most of the duplications are due to the DNA being sourced from a pool of 20 individuals, as a single individual did not provide enough material to construct a HiFi library. We clarify this question in the main text “The genome size was estimated using GenomeScope2 (Vurture et al., 2017), and diploidy was confirmed with Smudgeplot (Ranallo-Benavidez et al., 2020). Assembly was conducted using hifiasm (Cheng et al., 2021) with n_hap=40 (considering diploidy and 20 individuals). Larger number of Haplotypic duplications presumably caused by the high number of specimens used for DNA extraction were withdrawn with purge_dups (Guan et al., 2020), passing from 2208 to 1272 contigs". We would like to point out that final genome size after purging duplicates and removing contamination matched the size initially predicted by Genomescope.

“individuals that were not externally cleaned so it could also contain DNA from microbial and other eukaryote contaminants”. Why were the individuals not cleaned before assembly despite that this is known to cause assembly issues? ** Specimens were isolated from well water and individually collected to minimize contamination from other macroscopic species. However, this approach did not prevent contamination by microscopic organisms.

"The coverage obtained has not been sufficient to deduce sex-linked chromosomes". Would this be a sensible analysis given the pools of specimens? Why is 53.8x coverage not enough? Is this the haploid sequence coverage? From Figure 4, the coverage seems twice as high. ** Several factors hindered the identification of sex chromosomes in our diploid species. Most prominently, the characteristic haploid coverage pattern typically associated with sex chromosomes was absent. Furthermore, the genome assembly and scaffolding were performed using two separate DNA pools without prior knowledge of the individuals’ sex, complicating the detection of sex-specific sequences. In addition, the lack of biological information on the species and genus—particularly whether sex determination is chromosomal or genetic—further limits the applicability of standard methods for identifying sex chromosomes.

It would be important to know the average read lengths or read length N50s. ** The read length N50 of PacBio raw reads has been added to the results section.

The keywords should include the full species name. ** The species name has been added to the keywords section.

Are there assembly gaps? What is their size? ** The final assembly contains 299 gaps, each 100 bp in length. This is due to the default behavior of tools such as Hifiasm, YAHS, and PretextMap, which insert standardized 100 bp gaps when the true gap size cannot be determined.

There is no phylogenetic analysis. I suggest including one or refer to a previous solid (whole genome) phylogeny. ** We appreciate the suggestion to include a phylogenetic analysis. However, given the current lack of available genomic data from other species in the Pericarid crustacean order, we believe that conducting a robust phylogenetic analysis at this stage would not be sufficiently reliable. We agree that such an analysis would be highly valuable, particularly once more genomic data from related taxa becomes available, and it is a future goal of our research group.