ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Genome Note
Revised

A draft genome sequence of the common, or spectacled caiman Caiman crocodilus

[version 2; peer review: 1 approved, 1 not approved]
PUBLISHED 15 Jan 2025
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

Abstract

The common, or spectacled, caiman Caiman crocodilus is an abundant, widely distributed Neotropical crocodilian exhibiting notable morphological and molecular diversification. As the type species for the Caimaninae subfamily - the sister taxa for the subfamily to which members of the genus Alligator belong - C. crocodilus occupies a key position in our understanding of crocodilian and archosaur genetics and evolution. The species also accounts by far for the largest share of crocodilian hides on the global market, with the C. crocodilus hide trade alone valued at about US$86.5 million per year. Thus, the genome sequence of C. crocodilus can potentially be of considerable use for both basic and applied research. We obtained 239,911,946 paired-end reads comprising approximately 72 G bases using Illumina TM sequencing of tissue sampled from a single Caiman crocodilus individual. These reads were de-novo assembled and progressively aligned against the genomes of increasingly related crocodilians; liftoff was used to annotate the draft C. crocodilus genome assembly based on an Alligator mississipiensis (a confamilial species) annotation. The draft C. crocodilus genome assembly and sequences reads have been deposited with the National Center for Biotechnology Information with accession numbers JAGPOW000000000.1 for the assembly, and SRR22317059 for the sequence read archives under Bioproject PRJNA716363.

Keywords

Caiman crocodilus, spectacled caiman, genome, assembly, next-generation sequencing, crocodilian, vertebrate genome

Revised Amendments from Version 1

The key comments both Reviewers highlighted are the need to (i) make the underlying data more readily available by including them in more established and standard repositories, and (ii) include additional analyses characterizing the draft assembly and annotation results. Briefly, in response to comments by all the reviewers, we report further summary statistics that allow readers to put our genome assembly in context, including aspects of the annotation requested by the reviewers based on the annotation submitted to Genbank.

We also now include NCBI accession numbers for the sequence read archive (SRA) and draft assembly. The draft annotation to Genbank has been in the processing stage there for some time now, so we have not yet been issued an accession number at the time of submitting the present revision. Nevertheless, we note that our annotation submission has passed all automated checks on NCBI’s end.

See the authors' detailed response to the review by Marc Tollis
See the authors' detailed response to the review by Steven Salzberg

Introduction

The common, or spectacled, caiman, Caiman crocodilus, is one of the most widely distributed and abundant crocodilian species, ranging continuously from Mexico to Argentina (Busack and Pandya 2001; US Fish and Wildlife Service 2018). A generalist predator, C. crocodilus is remarkably adaptable, occupying a wide range of habitats from urban to seasonal savannahs to tropical rainforests (Medem 1981, 1983), and has recently been introduced to Cuba, Puerto Rico and Florida where it is considered an invasive species (US Fish and Wildlife Service 2018). The broad distribution and diversity of habitats has facilitated considerable intraspecific diversification within C. crocodilus; a recent analysis by Roberto et al. (2020) identified between seven and ten lineages within C. crocodilus across differing biogeographic regions and watersheds throughout Central and South America. Within-species diversity is also morphologically apparent, with skull shape in particular exhibiting systematic patterns of regional differentiation (Medem 1955; Gans 1980; Medem 1981, 1983; Ayarzaguena 1984; Escobedo-Galván et al. 2015). These intraspecific patterns of cranial shape variation within C. crocodilus have been shown to parallel patterns of interspecific cranial diversity found in extant crocodilians (Okamoto et al. 2015).

Additionally, C. crocodilus is a species of commercial importance, chiefly in the leather industry. While the hides of C. crocodilus contain osteoderms that render the manufacturing process more difficult than for other crocodilians, a majority of the approximately 1.5 million crocodilian skins traded globally come from C. crocodilus (Brazaitis et al. 1998; Caldwell 2015). As with other crocodilians, most legal hides come from commercial farming operations, and the market for caiman hides is estimated to be over US $85 million (Caldwell 2015). Wild populations of C. crocodilus are also hunted for meat and even fishing bait (Da Silveira and Thorbjarnarson 1999; Brum et al. 2015; Pimenta et al. 2018) and provide ecosystem services including nutrient cycling and biological control (Valencia-Aguilar et al. 2013; Marley et al. 2019). Due to its role as an apex predator, C. crocodilus exhibits considerable bioaccumulation, with genotoxic analyses demonstrating molecular signatures of pollution on the C. crocodilus genome (Oliveira et al. 2021).

Thus, a draft genome sequence for C. crocodilus can not only assist with improved husbandry, ecotoxicology and wildlife management, but also has the potential to provide insight into evolutionary processes driving intraspecific diversification in continental systems more broadly.

Such a genome sequence can further propel both basic and applied research beyond C. crocodilus. At present, five other crocodilian genome sequence assemblies are available - two each in the genera Alligator (the American and Chinese alligator, A. mississippiensis and A. sinensis, respectively) and Crocodylus (the Saltwater and Cuban crocodile, Cr. porosus and Cr. rhombifer, respectively), and one - the Gharial Gavialis gangeticus - in the genus Gavialis. Beyond their utility to economic (e.g., Miles et al. 2009) and conservation (e.g., Vashistha et al. 2020; Yang et al. 2023) activities, crocodilian genome assemblies have facilitated investigating such basic research questions as the evolution of temperature-dependent sex determination (e.g., Rice et al. 2017), the rate and nature of archosaur genome evolution (especially as determined in comparison with avian genomes - e.g., St. John et al. 2012; Green et al. 2014; Brittain et al. 2021), and the genetic basis of key evolutionary adaptations in amniotes, including, among others, immune responses (e.g., Wan et al. 2013; López-Pérez et al. 2022; Merchant et al. 2024), morphogenesis (e.g., Kusumi et al. 2013; Wu et al. 2018; Morris and Abzhanov 2021) and globin expression (e.g., Wan et al. 2013; Hoffmann et al. 2018; Natarajan et al. 2023). A C. crocodilus genome sequence could therefore provide a useful complement to these broader comparative genomic studies, which routinely use genomes from the genus Alligator, by including genomic data from a widely-distributed, living representative of Alligator’s sister taxa.

Materials and methods

DNA was extracted from a tissue sample belonging to a single Caiman crocodilus museum specimen (UF-FLMNH 171438) using the DNeasy ™kit from Qiagen (Hilden, Germany). DNA was quantitated using Thermofisher’s (Waltham, MA, USA) Picogreen ™kit (for a final Picogreen concentration of 77.78 ng/ μ L). Tecan’s (Männedorf, Switzerland) NuGEN Celero ™kit was then used to construct a paired-end library, which was subsequently sequenced on a single Illumina (San Diego, CA, USA) NovaSeq S4 lane. This yielded 239,911,946 paired-end reads of 2 × 150 bp each. Nucleic acid isolation, quantitation, library generation and raw-read sequencing were performed at the University of Minnesota Genomics Center.

The paired-end reads (Sequence Read Archive available at Genbank with Accession number SRR22317059) were assembled de novo using the Iterative de Bruijn Graph Assembler (IDBA-UD; Peng et al. 2012). To assess the reliability of our pipeline from sequencing to de novo assembly using IDBA-UD, we repeated the sequencing and assembly using a museum-derived tissue sample from a single Alligator mississippiensis individual (UF-FLMNH 175565). This resulted in 249,325,204 paired-end reads of 2 × 150 bp each. As was the case for the C. crocodilus individual, the reads were then de novo assembled using IDBA-UD, and we used QUAST (Gurevich et al. 2013) to determine that the IDBA assembly of A. missippiensis captured approximately 94.2% of a recently published A. missippiensis assembly (GCA_000281125.4; Rice et al. 2017), with an NG50 of 21172 based on de novo assembled contigs alone.

We scaffolded the resulting draft C. crocodilus contigs using a two-step procedure. First, we scaffolded the caiman’s contigs against a Crocodylus porosus assembly (GCF_001723895.1; Ghosh et al. 2020) using ragtag (Alonge et al. 2019). We then re-scaffolded the resulting contigs/scaffolds against the confamilial Alligator mississipiensis assembly (GCA_000281125.4), again using ragtag.

Contaminants, mitochondrial DNA, vectors, adapters, and sequences shorter than 200 bp identified by NCBI were manually removed using seqkit (Shen et al. 2016) and custom scripts (available at http://github.com/kewok/ncbi_scrubber). The genome assembly has been deposited to Genbank with accession number JAGPOW000000000.1.

The resulting scaffold (10.5281/zenodo.4755063) was then masked using RepeatMasker (Smit et al. 2015) relying on the HMMER database (Finn et al. 2011) and with “alligator” specified as species. Liftoff (Shumate and Salzberg 2020) was then used to generate a draft annotation based on the masked assembly using the annotations associated with A. mississipiensis (GCA_000281125.4; Rice et al. 2017) as a reference.

table 2asn_gff (National Center for Biotecnology Information 2020) was used to generate a Sequin file (National Center for Biotechnology Information (US) 2014), and features flagged as errors were manually removed using custom scripts (available at https://github.com/kewok/ncbi_scrubber); as of December 2024 the draft annotation is available at 10.5281/zenodo.4755063.

Results and conclusions

Our assembly yielded a draft genome sequence of length 2,341,057,913 bp with 465,471 scaffolds and 723,636 contigs. Our draft C. crocodilus genome assembly has a scaffold N50 of 70,464,410 bp, or approximately 70.5 Mbp (Telatin et al. 2021). For context, in other crocodilian assemblies, scaffold N50s of approximately 478.2 Kbp, 2.2 Mbp 96.1 Mbp, 84.4 Mbp and 255.1 Mbp are reported for the Cuban crocodile (GCA_038503035.1; Meredith et al. 2024), the Chinese alligator (GCF_000455745.1; Wan et al. 2013), the gharial (Green et al. 2014), the Saltwater crocodile (GCF_001723895.1; Rice et al. 2017), and the American alligator (GCF_030867095.1), respectively. Among other reptile reference genome assemblies, the scaffold N50 we report is comparable in value to those reported for the reference genome assemblies of the common mock viper (Psammodynastes pulverulentus; GCA_024509165.1), the rock pigeon (Columba livia; GCF_036013475.1) and the Asian water monitor (Varanus salvator; GCA_023646645.1).

A QUAST analysis of contigs with more than 3,000 bp against the reference A. mississippiensis assembly GCA_000281125.4_ASM28112v4 identified 211 local misassemblies, 22 misassemblies (of which 14 are contig translocations, 6 are scaffold relocations and 2 are scaffold translocations). The misassembled contigs length is 4,572,832 bp.

We further used BUSCO (Simão et al. 2015) to evaluate the gene completeness of our C. crocodlius draft genome, querying against the sauropsida_odb10 database. This assessment yielded 7,224 out of 7,480 complete BUSCOs (for a completeness score of 96.5%), of which 7,176 were single-copy complete BUSCOs and 48 were duplicated BUSCOs.

A total of 297,374 gene features were predicted for the annotation. Using AGAT (Dianat 2020), we identified 18,836 functional transcripts, 20,020 mRNAs and 15,981 coding sequences with average lengths of 37,890bp, 39,524 bp and 1,063 bp, respectively. 115,941 exons (average length 562 bp) were identified, with an average of 7.3 exons per coding sequence, and 99,960 introns (average length 4,334 bp) were identified in the coding sequences. We further used HMMER (Eddy 2011) to determine the number of pFam protein families database hits for our annotation, finding 9,983 hits at the E=0.00001 sequence reporting threshold. The number of hits were comparable at the E=0.001 (10,255 hits) and E=0.0000001 (9,745 hits) levels. Finally, we conducted a reciprocal BLAST hit analysis against an annotation for the Saltwater crocodile Crocodylus porosus (GCF_001723895.1_CroPor_comp1), a non-alligatorid crocodilian for whom an annotation is presently available. Briefly, this Cr. porosus annotation contains 28,663 coding sequences with an average length of 1,527 bps and 19,538 genes and pseudogenes. Using proteinortho (Lechner et al. 2011), our reciprocal BLAST hit analysis found 7,894 orthologous groups across the annotations.

Here we have described the first draft assembly and annotation of the C. crocodilus genome. We feel these data can assist natural resource management, ecotoxicology, agriculture, as well as research into broader questions about the interplay between microevolutionary and macroevolutionary processes across broad biogeographic scales. In addition to potentially facilitating both basic and applied research into C. crocodilus biology, our C. crocodilus genome sequence expands the available crocodilian genome sequences to include the subfamily Caimaninae, the extant sister group to Alligator and a major crocodilian lineage hitherto unrepresented among assembled genome sequences. Our assembly can thus provide a useful resource not only for crocodilian genomics, but also for archosaur, reptile and amniote comparative genomics more broadly.

Data availability

The draft C. crocodilus genome assembly and sequence data have been deposited with the National Center for Biotechnology Information with accession numbers JAGPOW000000000.1 for the assembly, and SRR22317059 for the sequence read archives under Bioproject PRJNA716363. At present, the draft annotatation is in processing at the National Center for Biotechnology Information and is currently available for review at (doi.org/10.5281/zenodo.4755063).

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 02 Dec 2021
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Okamoto K, Dopkins N and Kinfu E. A draft genome sequence of the common, or spectacled caiman Caiman crocodilus [version 2; peer review: 1 approved, 1 not approved]. F1000Research 2025, 10:1230 (https://doi.org/10.12688/f1000research.73066.2)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 2
VERSION 2
PUBLISHED 15 Jan 2025
Revised
Views
2
Cite
Reviewer Report 16 Jan 2025
Steven Salzberg, Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA;  Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA;  Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA 
Approved
VIEWS 2
I'm now satisfied and I approve this version. Note that the authors did not fix one ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Salzberg S. Reviewer Report For: A draft genome sequence of the common, or spectacled caiman Caiman crocodilus [version 2; peer review: 1 approved, 1 not approved]. F1000Research 2025, 10:1230 (https://doi.org/10.5256/f1000research.176214.r359372)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Version 1
VERSION 1
PUBLISHED 02 Dec 2021
Views
29
Cite
Reviewer Report 28 Apr 2022
Marc Tollis, School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ, USA 
Not Approved
VIEWS 29
A spectacled caiman genome is a welcome contribution to the field, however I believe some additional steps need to be taken before this manuscript can be acceptable for indexing. Fortunately, these steps are standard for most genome reports so the ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Tollis M. Reviewer Report For: A draft genome sequence of the common, or spectacled caiman Caiman crocodilus [version 2; peer review: 1 approved, 1 not approved]. F1000Research 2025, 10:1230 (https://doi.org/10.5256/f1000research.76689.r128753)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 15 Jan 2025
    Kenichi Okamoto, Department of Biology, University of Saint Thomas, Saint Paul, 55105, USA
    15 Jan 2025
    Author Response
    Comment: A spectacled caiman genome is a welcome contribution to the field, however I believe some additional steps need to be taken before this manuscript can be acceptable for indexing. ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 15 Jan 2025
    Kenichi Okamoto, Department of Biology, University of Saint Thomas, Saint Paul, 55105, USA
    15 Jan 2025
    Author Response
    Comment: A spectacled caiman genome is a welcome contribution to the field, however I believe some additional steps need to be taken before this manuscript can be acceptable for indexing. ... Continue reading
Views
56
Cite
Reviewer Report 24 Dec 2021
Steven Salzberg, Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA;  Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA;  Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA 
Not Approved
VIEWS 56
This paper describes the assembly and annotation of a crocodile genome. It’s a very brief note that provides some useful background about the species, Caiman crocodilus, but it needs some additional work before it would be acceptable for indexing. All ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Salzberg S. Reviewer Report For: A draft genome sequence of the common, or spectacled caiman Caiman crocodilus [version 2; peer review: 1 approved, 1 not approved]. F1000Research 2025, 10:1230 (https://doi.org/10.5256/f1000research.76689.r101899)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 15 Jan 2025
    Kenichi Okamoto, Department of Biology, University of Saint Thomas, Saint Paul, 55105, USA
    15 Jan 2025
    Author Response
    Comment: This paper describes the assembly and annotation of a crocodile genome. It’s a very brief note that provides some useful background about the species, Caiman crocodilus, but it needs some ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 15 Jan 2025
    Kenichi Okamoto, Department of Biology, University of Saint Thomas, Saint Paul, 55105, USA
    15 Jan 2025
    Author Response
    Comment: This paper describes the assembly and annotation of a crocodile genome. It’s a very brief note that provides some useful background about the species, Caiman crocodilus, but it needs some ... Continue reading

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 02 Dec 2021
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.