ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Genome Note
Revised

The genome sequence of the Loggerhead sea turtle, Caretta caretta Linnaeus 1758

[version 2; peer review: 2 approved]
PUBLISHED 27 Jun 2023
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Nanopore Analysis gateway.

This article is included in the Genomics and Genetics gateway.

Abstract

We present a genome assembly of Caretta caretta (the Loggerhead sea turtle; Chordata, Testudines, Cheloniidae), generated from genomic data from two unrelated females. The genome sequence is 2.13 gigabases in size. The assembly has a busco completion score of 96.1% and N50 of 130.95 Mb. The majority of the assembly is scaffolded into 28 chromosomal representations with a remaining 2% of the assembly being excluded from these.

Keywords

Caretta caretta, Loggerhead sea turtle, genome sequence, chromosomal, reptile

Revised Amendments from Version 1

Based on suggestions made by reviewers, we have made several revision and clarifications to improve the clarity and precision of our findings.

We utilized RepeatMasker to analyze repetitive elements and have now included the findings in the result section. Additionally, we have specified the parameters used for each software in Table 3. We have rephrased the gene annotation section to clarify results for both the RefSeq and Ensemble annotation pipelines. We clarified that JupyterPlot is used for scaffold-level alignment and synteny plots in the syntenic analysis. Latly, QC metrics are specified in the abstract.

See the authors' detailed response to the review by Cinta Pegueroles
See the authors' detailed response to the review by Richard Challis

Species taxonomy

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Archelosauria; Testudinata; Testudines; Cryptodira; Durocryptodira; Americhelydia; Chelonioidea; Cheloniidae; Caretta; Caretta caretta Linnaeus 1758 (NCBI txid 8467).

Introduction

The loggerhead sea turtle, Caretta caretta, is one of only seven extant marine turtle species and is globally distributed throughout the subtropical and temperate regions of the Mediterranean Sea and Pacific, Indian and Atlantic Oceans (Wallace et al., 2010, Casale and Tucker, 2015). The species is divided in various Regional Management Units (RMUs) and management units (MUs) that vary greatly by population size, geographic range, and population trends (Wallace et al., 2010, Casale and Tucker, 2015, Shamblin et al., 2014). Events such as fisheries bycatch (Caracappa et al., 2018, Pulcinella et al., 2019), human intrusion and disturbance (Mazaris et al., 2009), oceanic pollution (Savoca et al., 2018), and climate change and severe weather (Alduina et al., 2020) have caused the global population to continuously decline (Casale and Tucker, 2015). Consequently, the highly migratory C. caretta requires the collaborative efforts of numerous international conservation and protection organizations (Species at Risk Act, 2002), and is currently listed as Vulnerable by the International Union for the Conservation of Nature (IUCN) (Casale and Tucker, 2015). The genome of C. caretta was sequenced as part of the Canadian BioGenome Project (CBP) and CanSeq150 initiatives. The C. caretta genome will provide insights into genomic diversity and architecture, and inform conservation genomics applications.

Methods

Sample collection

Blood samples from an adult female and a juvenile of unknown sex were collected from the Fondazione Cetacea (43.9940 N, 12.6745 E) by Nicola Ridolfi (veterinarian; Fondazione Cetacea). Animal husbandry and welfare were overseen by Fondazione Cetacea. The specimens were transferred to Canada with two CITES permits between institutions (IT002 and CA027).

Sample extraction, library construction and sequencing

High-molecular weight (HMW) DNA was extracted from nucleated blood using the MagAttract HMW DNA kit (QIAGEN, Germantown, MD, USA). Nanopore genome libraries were constructed according to manufacturer instructions and sequenced using the PromethION instrument (Oxford Nanopore Technologies). A PCR-free genome library was sequenced in a multiplexed pool of an Illumina NovaSeq 6000 instrument S4 flowcell with paired-end 150 bp (PE150) reads. A Hi-C library was constructed using the Arima-HiC kit 2.0 (Arima Genomics, San Diego, CA) and the Swift Biosciences Accel-NGS 2S Plus DNA Library Kit (Integrated DNA Technologies, Mississauga, ON, Canada) and subjected to PE150 sequencing on an Illumina NovaSeq 6000 instrument. All lab work were performed at Canada’s Michael Smith Genome Sciences Centre at BC Cancer.

Genome assembly

Assembly was carried out using Redbean (Ruan and Li, 2019), followed by four rounds of racon (Vaser et al., 2017) polishing and medaka (medaka, n.d.) polishing. Scaffolding with Hi-C data was carried out using nf-core/hic workflow (Servant and Peltzer, 2019), Salsa (Ghurye et al., 2019) and LongStitch (Coombe et al., 2021). The Hi-C scaffolded assembly was polished using Illumina short-reads using Pilon (Walker et al., 2014). Four rounds of manual assembly curation and re-scaffolding with nf-core/hic workflow (Servant and Peltzer, 2019) and Salsa (Ghurye et al., 2019) corrected 54 missing/misjoins. The changes were visualized with a Hi-C contact map using Juicer (Durand et al., 2016b). JupiterPlots (Chu, 2018) was used to perform scaffold-level alignment with Green turtle reference genome and generate synteny plot for synteny analysis. The final sequence was analyzed using BlobToolKit (Challis et al., 2020) for quality assessment and RepeatMasker (Tarailo‐Graovac & Chen, 2009) for annotation of repetitive regions. The parameter and version number of software tools are listed in Table 3.

Results

Genome sequence report

The genomes of two unrelated loggerhead sea turtles were sequenced from the same population collected from the Fondazione Cetacea hospital, Riccione, Italy. A total of 39-fold coverage in Nanopore PromethION long reads were generated from a single adult female. Approximately 50-fold coverage in Illumina NovaSeq6000 150 bp paired-end (PE150) reads and 18-fold coverage in Illumina NovaSeq6000 Hi-C sequencing were generated from a second individual. Primary assembly contigs from Nanopore data were further polished with Illumina PE150 shotgun sequencing data and scaffolded with Hi-C data. The final assembly has a total length of 2.13 Gb in 2007 sequence scaffolds with a scaffold N50 of 130.95 Mb (Table 1). The majority (98.0%) of the assembly sequence was assigned to 28 chromosomal-level scaffolds representing the species’ known 28 autosomes (Kamezaki, 1989, Machado et al., 2020) (numbered by sequence length; Figure 1Figure 4; Table 2). Aligned reads from the second turtle to the final assembly had an estimated heterozygosity of 0.11% (2,449,606 heterozygous hits). Determining gene coverage using BUSCO, we estimated 96.1% gene completeness using the sauropsida_odb10 reference set (Manni et al., 2021). The assembly was compared to a previous chromosome-scale assembly of the closely-related green sea turtle, Chelonia mydas (Wang et al., 2013), which has been reported to hybridize with the loggerhead sea turtle (James et al., 2004, Vilaça et al., 2012). The loggerhead sea turtle assembly showed strong synteny to the green sea turtle assembly, as shown in Figure 5. The primary haplotype (rCheMyd1.pri.v2) of the green sea turtle was downloaded from NCBI on July 16, 2022. The proportions of SINEs, LINEs, LTR elements, and DNA transposons within the genomic sequences were determined to be 1.55%, 8.75%, 0.13%, and 1.10%, respectively.

Table 1. Genome data for Caretta caretta, rCarCar2.

Project accession data
Assembly identifierrCarCar2
SpeciesCaretta caretta
SpecimenSJ_126, SJ_184
NCBI Taxonomy ID8467
BioProjectPRJNA826225
BioSample IDSAMN28968396, SAMN27958248
Isolate InformationSJ_184/204:Loco2, SJ_126:Eziel1
Raw data accessions
Oxford Nanopore PromethIONSRX15677840, SRX15677841
Hi-C IlluminaSRX15677843
Illumina short-readSRX15677842
Genome assembly
Assembly accessionGCA_023653815.1
Assembly nameGSC_CCare_1.0
Span (Mb)2,134
Number of contigs2,753
Contig N50 length (Mb)18,214
Number of scaffolds2,008
Scaffold N50 length (Mb)130,956
Longest scaffold (Mb)345.7
BUSCO* genome scoreC:96.1%[S:95.2%,D:0.9%],F:0.4%,M:3.5%,n:7480

* BUSCO scores based on the sauropsida_odb10 BUSCO set using v5.0.0. C=complete [S=single copy, D=duplicated], F=fragmented, M=missing, n=number of orthologues in comparison.

b4a85b29-19ae-4b8b-8bbf-1a075a2d4fee_figure1.gif

Figure 1. Genome assembly of Caretta caretta, rCarCar2: metrics.

Snail plot showing N50 metrics, base pair composition and BUSCO gene completeness for C. caretta (rCarCar2) generated from Blobtoolkit v.2.6.4 (Challis et al., 2020). The plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 2,134,012,717 bp assembly. The distribution of chromosome lengths is shown in dark grey with the plot radius scaled to the longest chromosome present in the assembly (345,741,823 bp) shown in red. Orange and pale-orange arcs show the N50 and N90 chromosome lengths (130,956,235 and 23,648,662 bp, respectively). The pale grey spiral shows the cumulative chromosome count on a log scale with white scale lines showing successive orders of magnitude. The blue and pale-blue area around the outside of the plot displays the distribution of GC (blue), AT (pale blue) and N (white) percentages using the same bins as the inner plot. A summary of complete (96.1%), fragmented (0.4%), duplicated (0.9%), and missing (3.5%) BUSCO genes in the sauropsida_odb10 set is show in the top right.

b4a85b29-19ae-4b8b-8bbf-1a075a2d4fee_figure2.gif

Figure 2. Genome assembly of Caretta caretta, rCarCar2: GC-content.

GC-coverage plot of C. caretta (rCarCar2) generated from Blobtoolkit v.2.6.4 (Challis et al., 2020). Scaffolds are coloured by phylum with Chordata represented by blue and no-hit represented by pale blue. Circles are sized in proportion to scaffold length. Histograms show the distribution of scaffold length sum along each axis.

b4a85b29-19ae-4b8b-8bbf-1a075a2d4fee_figure3.gif

Figure 3. Genome assembly of Caretta caretta, rCarCar2: cumulative sequence length.

Cumulative sequence length of C. caretta (rCarCar2) generated from Blobtoolkit v.2.6.4 (Challis et al., 2020). The grey line shows the cumulative length for all scaffolds. Coloured lines show cumulative lengths of scaffolds assigned to each phylum using the BUSCO genes tax rule, with Chordata represented by blue and no-hit represented by pale blue.

b4a85b29-19ae-4b8b-8bbf-1a075a2d4fee_figure4.gif

Figure 4. Genome assembly of Caretta caretta, rCarCar2: Hi-C contact map.

HiC contact map of rCarCar2 assembly visualized using JuiceBox v2.13.07 (Durand et al., 2016a). Chromosomes are shown in order of size from left to right and top to bottom. As an additional confirmation for the quality of the assembly, the microchromosomes are visible as a cluster of spatially-associated contigs in the lower right, as reported in by Waters et al., 2021.

Table 2. Chromosomal pseudomolecules in the genome assembly of Caretta caretta, rCarCar2.

RefSeq sequenceChromosomeSize (Mb)GC%
NC_064473.11345.7442.86
NC_064474.12265.3242.62
NC_064475.13208.0842.71
NC_064476.14135.6342.34
NC_064477.15130.9642.42
NC_064478.16128.6643.74
NC_064479.17123.3143.74
NC_064480.18108.5443.66
NC_064481.19101.3443.68
NC_064482.11085.2844.40
NC_064483.11176.5343.00
NC_064484.11243.1943.81
NC_064485.11338.2047.24
NC_064486.11435.7945.97
NC_064487.11533.4845.53
NC_064488.11625.6946.28
NC_064489.11724.7045.64
NC_064490.11823.6546.93
NC_064491.11920.2148.10
NC_064492.12019.0447.85
NC_064493.12118.9946.81
NC_064494.12217.9352.48
NC_064495.12316.7847.24
NC_064496.12416.6549.92
NC_064497.12516.3750.20
NC_064498.12613.3154.27
NC_064499.12712.5557.47
NC_064500.1285.3457.00

Table 3. Software tools used.

SoftwareVersionParametersSource
Racon1.4.13Default parametersVaser et al., 2017
Medaka1.2.0Default parametershttps://github.com/nanoporetech/medaka
Pilon1.23Default parametersWalker et al., 2014
Salsa2.3-m CLEAN -e GATC,GANTC,CTNAG,TTAAGhurye et al., 2019
BlobToolKit2.6.4 (BTK pipeline)
3.1.0 (Blobtoolkit)
Default parametersChallis et al., 2020
nf-core/hic1.1.0--restriction_site ‘^GATC,G^ANTC,C^TNAG,T^TAA’ --ligation_site ‘GATCGATC,GANTGATC,GANTANTC,GATCANTC’ --skip_tadsServant and Peltzer, 2019
Juicer Tools2.13.06Default parametersDurand et al., 2016b
Juice Box2.13.06Default parametersDurand et al., 2016a
Redbean2.5Default parametersRuan and Li, 2019
LongStitch1.0.1tigmint-ntLink-arks G=2e9 z=100Coombe et al., 2021
Jupiter Plot1.0ng=98Chu, 2018
Busco5.2.2-l sauropsida_odb10Manni et al., 2021
Quast5.0.2Default parametersGurevich et al., 2013
RepeatMasker4.1.5-species “Caretta caretta”Tarailo‐Graovac & Chen, 2009
b4a85b29-19ae-4b8b-8bbf-1a075a2d4fee_figure5.gif

Figure 5. Jupiter plot alignment of Caretta caretta with Chelonia mydas (green sea turtle).

Full genome alignment of Caretta caretta genome, rCarCar2 (right), and Chelonia mydas (green sea turtle) genome (primary haplotype v2), rCheMyd1 (left), generated using Jupiter Plot (Chu, 2018). The left of the circle shows 28 green sea turtle chromosomes and the right of the circle shows 28 loggerhead sea turtle chromosomes. Coloured bands represent synteny between the genomes, and lines crossing the circle indicate genomic rearrangements, or break points in the scaffolds.

Genome annotation

The loggerhead sea turtle genome assembly was annotated by both RefSeq annotation pipeline (Li et al., 2020) and Ensembl gene annotation system (Aken et al., 2016). The RefSeq annotation pipeline includes 24,923 genes and pseudogenes, and 54,583 mRNA transcripts (NCBI Caretta caretta Annotation Release). The Ensembl annotation includes 19,633 coding genes, 4,161 non-coding genes and 42,302 mRNA transcripts (Caretta caretta - Ensembl Rapid Release).

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 27 Mar 2023
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Chang G, Jones S, Leelakumari S et al. The genome sequence of the Loggerhead sea turtle, Caretta caretta Linnaeus 1758 [version 2; peer review: 2 approved]. F1000Research 2023, 12:336 (https://doi.org/10.12688/f1000research.131283.2)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 2
VERSION 2
PUBLISHED 27 Jun 2023
Revised
Views
4
Cite
Reviewer Report 12 Jul 2023
Cinta Pegueroles, Department of Genetics, Microbiology and Statistics, and Institute for Research on Biodiversity (IRBio), Universitat de Barcelona, Barcelona, Catalonia, Spain 
Approved
VIEWS 4
I thank the authors for their thorough revisions, which carefully addressed the comments raised. Congratulations for generating this high ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Pegueroles C. Reviewer Report For: The genome sequence of the Loggerhead sea turtle, Caretta caretta Linnaeus 1758 [version 2; peer review: 2 approved]. F1000Research 2023, 12:336 (https://doi.org/10.5256/f1000research.151837.r181940)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Version 1
VERSION 1
PUBLISHED 27 Mar 2023
Views
20
Cite
Reviewer Report 27 Apr 2023
Cinta Pegueroles, Department of Genetics, Microbiology and Statistics, and Institute for Research on Biodiversity (IRBio), Universitat de Barcelona, Barcelona, Catalonia, Spain 
Approved with Reservations
VIEWS 20
This manuscript describes the sequencing and annotation of the Caretta caretta genome, which is already available in public data bases. It is a high quality genome that for sure is positively impacting the sea turtles community.

The ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Pegueroles C. Reviewer Report For: The genome sequence of the Loggerhead sea turtle, Caretta caretta Linnaeus 1758 [version 2; peer review: 2 approved]. F1000Research 2023, 12:336 (https://doi.org/10.5256/f1000research.144107.r168077)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 27 Jun 2023
    Glenn Chang, Canada's Michael Smith Genome Sciences Centre, Vancouver, V5Z 4S6, Canada
    27 Jun 2023
    Author Response
    Dear Dr. Cinta Pegueroles,
    Thank you for your thorough review of our paper. We have carefully considered your comments and have made the following changes to the revised version of ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 27 Jun 2023
    Glenn Chang, Canada's Michael Smith Genome Sciences Centre, Vancouver, V5Z 4S6, Canada
    27 Jun 2023
    Author Response
    Dear Dr. Cinta Pegueroles,
    Thank you for your thorough review of our paper. We have carefully considered your comments and have made the following changes to the revised version of ... Continue reading
Views
20
Cite
Reviewer Report 06 Apr 2023
Richard Challis, Tree of Life, Wellcome Sanger Institute, Hinxton, England, UK 
Approved
VIEWS 20
Chang et al. present a chromosomal genome assembly of the Loggerhead sea turtle, Caretta caretta, using a combination of Nanopore long reads, HiC and Illumina. The conservation importance of having a genome assembly for this globally distributed but vulnerable species ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Challis R. Reviewer Report For: The genome sequence of the Loggerhead sea turtle, Caretta caretta Linnaeus 1758 [version 2; peer review: 2 approved]. F1000Research 2023, 12:336 (https://doi.org/10.5256/f1000research.144107.r168076)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 27 Jun 2023
    Glenn Chang, Canada's Michael Smith Genome Sciences Centre, Vancouver, V5Z 4S6, Canada
    27 Jun 2023
    Author Response
    Dear Dr. Richard Challis,

    Thank you for reviewing our genome note and providing valuable comments. We have carefully considered your comments and made the necessary revisions to address your ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 27 Jun 2023
    Glenn Chang, Canada's Michael Smith Genome Sciences Centre, Vancouver, V5Z 4S6, Canada
    27 Jun 2023
    Author Response
    Dear Dr. Richard Challis,

    Thank you for reviewing our genome note and providing valuable comments. We have carefully considered your comments and made the necessary revisions to address your ... Continue reading

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 27 Mar 2023
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.