The genome sequence of the Loggerhead sea turtle, Caretta caretta Linnaeus 1758

We present a genome assembly of Caretta caretta (the Loggerhead sea turtle; Chordata, Testudines, Cheloniidae), generated from genomic data from two unrelated females. The genome sequence is 2.13 gigabases in size. The assembly has a busco completion score of 96.1% and N50 of 130.95 Mb. The majority of the assembly is scaffolded into 28 chromosomal representations with a remaining 2% of the assembly being excluded from these.


Introduction
The loggerhead sea turtle, Caretta caretta, is one of only seven extant marine turtle species and is globally distributed throughout the subtropical and temperate regions of the Mediterranean Sea and Pacific, Indian and Atlantic Oceans (Wallace et al., 2010, Casale andTucker, 2015). The species is divided in various Regional Management Units (RMUs) and management units (MUs) that vary greatly by population size, geographic range, and population trends (Wallace et al., 2010, Casale and Tucker, 2015, Shamblin et al., 2014. Events such as fisheries bycatch (Caracappa et al., 2018, Pulcinella et al., 2019, human intrusion and disturbance (Mazaris et al., 2009), oceanic pollution (Savoca et al., 2018), and climate change and severe weather (Alduina et al., 2020) have caused the global population to continuously decline

REVISED Amendments from Version 1
Based on suggestions made by reviewers, we have made several revision and clarifications to improve the clarity and precision of our findings.
We utilized RepeatMasker to analyze repetitive elements and have now included the findings in the result section. Additionally, we have specified the parameters used for each software in Table 3. We have rephrased the gene annotation section to clarify results for both the RefSeq and Ensemble annotation pipelines. We clarified that JupyterPlot is used for scaffold-level alignment and synteny plots in the syntenic analysis. Latly, QC metrics are specified in the abstract.
Any further responses from the reviewers can be found at the end of the article (Casale and Tucker, 2015). Consequently, the highly migratory C. caretta requires the collaborative efforts of numerous international conservation and protection organizations (Species at Risk Act, 2002), and is currently listed as Vulnerable by the International Union for the Conservation of Nature (IUCN) (Casale and Tucker, 2015). The genome of C. caretta was sequenced as part of the Canadian BioGenome Project (CBP) and CanSeq150 initiatives. The C. caretta genome will provide insights into genomic diversity and architecture, and inform conservation genomics applications.

Sample collection
Blood samples from an adult female and a juvenile of unknown sex were collected from the Fondazione Cetacea (43.9940 N, 12.6745 E) by Nicola Ridolfi (veterinarian; Fondazione Cetacea). Animal husbandry and welfare were overseen by Fondazione Cetacea. The specimens were transferred to Canada with two CITES permits between institutions (IT002 and CA027).

Sample extraction, library construction and sequencing
High-molecular weight (HMW) DNA was extracted from nucleated blood using the MagAttract HMW DNA kit (QIAGEN, Germantown, MD, USA). Nanopore genome libraries were constructed according to manufacturer instructions and sequenced using the PromethION instrument (Oxford Nanopore Technologies). A PCR-free genome library was sequenced in a multiplexed pool of an Illumina NovaSeq 6000 instrument S4 flowcell with paired-end 150 bp (PE150) reads. A Hi-C library was constructed using the Arima-HiC kit 2.0 (Arima Genomics, San Diego, CA) and the Swift Biosciences Accel-NGS 2S Plus DNA Library Kit (Integrated DNA Technologies, Mississauga, ON, Canada) and subjected to PE150 sequencing on an Illumina NovaSeq 6000 instrument. All lab work were performed at Canada's Michael Smith Genome Sciences Centre at BC Cancer.

Genome assembly
Assembly was carried out using Redbean (Ruan and Li, 2019), followed by four rounds of racon (Vaser et al., 2017) polishing and medaka (medaka, n.d.) polishing. Scaffolding with Hi-C data was carried out using nf-core/hic workflow (Servant and Peltzer, 2019), Salsa (Ghurye et al., 2019) and LongStitch (Coombe et al., 2021). The Hi-C scaffolded assembly was polished using Illumina short-reads using Pilon (Walker et al., 2014). Four rounds of manual assembly curation and re-scaffolding with nf-core/hic workflow (Servant and Peltzer, 2019) and Salsa (Ghurye et al., 2019) corrected 54 missing/misjoins. The changes were visualized with a Hi-C contact map using Juicer (Durand et al., 2016b). JupiterPlots (Chu, 2018) was used to perform scaffold-level alignment with Green turtle reference genome and generate synteny plot for synteny analysis. The final sequence was analyzed using BlobToolKit (Challis et al., 2020) for quality assessment and RepeatMasker (Tarailo-Graovac & Chen, 2009) for annotation of repetitive regions. The parameter and version number of software tools are listed in Table 3.

Genome sequence report
The genomes of two unrelated loggerhead sea turtles were sequenced from the same population collected from the Fondazione Cetacea hospital, Riccione, Italy. A total of 39-fold coverage in Nanopore PromethION long reads were generated from a single adult female. Approximately 50-fold coverage in Illumina NovaSeq6000 150 bp paired-end (PE150) reads and 18-fold coverage in Illumina NovaSeq6000 Hi-C sequencing were generated from a second individual. Primary assembly contigs from Nanopore data were further polished with Illumina PE150 shotgun sequencing data and scaffolded with Hi-C data. The final assembly has a total length of 2.13 Gb in 2007 sequence scaffolds with a scaffold N50 of 130.95 Mb (Table 1). The majority (98.0%) of the assembly sequence was assigned to 28 chromosomal-   level scaffolds representing the species' known 28 autosomes (Kamezaki, 1989, Machado et al., 2020 (numbered by sequence length; Figure 1- Figure 4; Table 2). Aligned reads from the second turtle to the final assembly had an estimated heterozygosity of 0.11% (2,449,606 heterozygous hits). Determining gene coverage using BUSCO, we estimated 96.1% gene completeness using the sauropsida_odb10 reference set (Manni et al., 2021). The assembly was compared to a previous chromosome-scale assembly of the closely-related green sea turtle, Chelonia mydas (Wang et al., 2013), which has been reported to hybridize with the loggerhead sea turtle (James et al., 2004, Vilaça et al., 2012. The loggerhead sea turtle assembly showed strong synteny to the green sea turtle assembly, as shown in Figure 5. The primary haplotype (rCheMyd1.pri.v2) of the green sea turtle was downloaded from NCBI on July 16, 2022. The proportions of SINEs, LINEs, LTR elements, and DNA transposons within the genomic sequences were determined to be 1.55%, 8.75%, 0.13%, and 1.10%, respectively. Figure 5. Jupiter plot alignment of Caretta caretta with Chelonia mydas (green sea turtle). Full genome alignment of Caretta caretta genome, rCarCar2 (right), and Chelonia mydas (green sea turtle) genome (primary haplotype v2), rCheMyd1 (left), generated using Jupiter Plot (Chu, 2018). The left of the circle shows 28 green sea turtle chromosomes and the right of the circle shows 28 loggerhead sea turtle chromosomes. Coloured bands represent synteny between the genomes, and lines crossing the circle indicate genomic rearrangements, or break points in the scaffolds.

Genome annotation
The loggerhead sea turtle genome assembly was annotated by both RefSeq annotation pipeline (Li et al., 2020) and Ensembl gene annotation system (Aken et al., 2016). The RefSeq annotation pipeline includes 24,923 genes and pseudogenes, and 54,583 mRNA transcripts (NCBI Caretta caretta Annotation Release). The Ensembl annotation includes 19,633 coding genes, 4,161 non-coding genes and 42,302 mRNA transcripts (Caretta caretta -Ensembl Rapid Release).

Data availability
Underlying data National Centre for Biotechnology Information BioProject: Loggerhead Sea turtle (Caretta caretta) genome sequencing and assembly, rCarCar2. Accession number: PRJNA826225.
The genome sequence is released openly for reuse. The C. caretta genome sequencing initiative is part of the Canadian BioGenome Project and CanSeq150 Projects initiatives. All raw sequence data and the assembly have been deposited in INSDC databases. The genome is annotated through the Reference Sequence (RefSeq) database in BioProject accession number PRJNA853764. Raw data and assembly accession identifiers are reported in Table 1.

Open Peer Review © 2023 Challis R.
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Richard Challis
Tree of Life, Wellcome Sanger Institute, Hinxton, England, UK Chang et al. present a chromosomal genome assembly of the Loggerhead sea turtle, Caretta caretta, using a combination of Nanopore long reads, HiC and Illumina. The conservation importance of having a genome assembly for this globally distributed but vulnerable species is made very clear.
As the second chromosomal assembly of a marine turtle it is informative to see a synteny plot comparing this to the green sea turtle, Chelonia midas. This highlights the strongly conserved synteny, similarity in overall assembly span and relative chromosome sizes between these species while maintaining the concise focussed approach typical of a Genome Note.
Overall the article was very clearly presented, however the presentation of summary information about the 2 sets of gene annotation was slightly inconsistent and I found myself referring to the RefSeq annotation page to compare the numbers of coding vs no-coding genes with the values presented for the Ensembl annotation.
Are the rationale for sequencing the genome and the species significance clearly described? Yes

Are the protocols appropriate and is the work technically sound? Yes
Are sufficient details of the sequencing and extraction, software used, and materials provided to allow replication by others? Yes Are the datasets clearly presented in a usable and accessible format, and the assembly and annotation available in an appropriate subject-specific repository? Yes Dear Dr. Richard Challis, Thank you for reviewing our genome note and providing valuable comments. We have carefully considered your comments and made the necessary revisions to address your concerns.
In particular, we have taken steps to clarify the genome annotation sections. We have made the results of the RefSeq and Ensembl annotation pipelines more distinct in the paper. Additionally, we have provided hyperlinks to both sets of results, allowing readers to access them directly.
Once again, we sincerely appreciate your time and effort in reviewing our genome note. We believe that the changes we have made effectively address your concerns and improve the clarity of our paper.

Competing Interests:
No competing interests were disclosed.
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias • You can publish traditional articles, null/negative results, case reports, data notes and more • The peer review process is transparent and collaborative • Your article is indexed in PubMed after passing peer review • Dedicated customer support at every stage • For pre-submission enquiries, contact research@f1000.com