A draft genome sequence of the common, or spectacled caiman<i> Caiman crocodilus</i>

Kenichi Okamoto; Nichole Dopkins; Elias Kinfu

doi:10.12688/f1000research.73066.2

Home Browse A draft genome sequence of the common, or spectacled caiman Caiman...

ALL Metrics

Views

Downloads

Get PDF

Get XML

Export

▬

✚

Genome Note

Revised

A draft genome sequence of the common, or spectacled caiman Caiman crocodilus

[version 2; peer review: 1 approved, 1 not approved]

Kenichi Okamoto ¹, Nichole Dopkins¹, Elias Kinfu^1,2

PUBLISHED 15 Jan 2025

Author details Author details

¹ Department of Biology, University of Saint Thomas, Saint Paul, MN, 55105, USA
² Department of Biochemistry, University of Washington, Seattle, WA, 98195, USA

Kenichi Okamoto
Roles: Conceptualization, Data Curation, Formal Analysis, Investigation, Methodology, Project Administration, Resources, Software, Supervision, Validation, Writing – Original Draft Preparation, Writing – Review & Editing

Nichole Dopkins
Roles: Data Curation, Formal Analysis, Investigation, Resources, Software, Writing – Review & Editing

Elias Kinfu
Roles: Data Curation, Formal Analysis, Investigation, Software, Validation, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

Abstract

The common, or spectacled, caiman Caiman crocodilus is an abundant, widely distributed Neotropical crocodilian exhibiting notable morphological and molecular diversification. As the type species for the Caimaninae subfamily - the sister taxa for the subfamily to which members of the genus Alligator belong - C. crocodilus occupies a key position in our understanding of crocodilian and archosaur genetics and evolution. The species also accounts by far for the largest share of crocodilian hides on the global market, with the C. crocodilus hide trade alone valued at about US$86.5 million per year. Thus, the genome sequence of C. crocodilus can potentially be of considerable use for both basic and applied research. We obtained 239,911,946 paired-end reads comprising approximately 72 G bases using Illumina ^TM sequencing of tissue sampled from a single Caiman crocodilus individual. These reads were de-novo assembled and progressively aligned against the genomes of increasingly related crocodilians; liftoff was used to annotate the draft C. crocodilus genome assembly based on an Alligator mississipiensis (a confamilial species) annotation. The draft C. crocodilus genome assembly and sequences reads have been deposited with the National Center for Biotechnology Information with accession numbers JAGPOW000000000.1 for the assembly, and SRR22317059 for the sequence read archives under Bioproject PRJNA716363.

Keywords

Caiman crocodilus, spectacled caiman, genome, assembly, next-generation sequencing, crocodilian, vertebrate genome

Corresponding author: Kenichi Okamoto

Competing interests: No competing interests were disclosed.

Grant information: This study was funded by a University of St. Thomas CAS Startup Fund.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2025 Okamoto K et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Okamoto K, Dopkins N and Kinfu E. A draft genome sequence of the common, or spectacled caiman Caiman crocodilus [version 2; peer review: 1 approved, 1 not approved]. F1000Research 2025, 10:1230 (https://doi.org/10.12688/f1000research.73066.2) First published: 02 Dec 2021, 10:1230 (https://doi.org/10.12688/f1000research.73066.1) Latest published: 15 Jan 2025, 10:1230 (https://doi.org/10.12688/f1000research.73066.2)

Revised Amendments from Version 1

The key comments both Reviewers highlighted are the need to (i) make the underlying data more readily available by including them in more established and standard repositories, and (ii) include additional analyses characterizing the draft assembly and annotation results. Briefly, in response to comments by all the reviewers, we report further summary statistics that allow readers to put our genome assembly in context, including aspects of the annotation requested by the reviewers based on the annotation submitted to Genbank.

We also now include NCBI accession numbers for the sequence read archive (SRA) and draft assembly. The draft annotation to Genbank has been in the processing stage there for some time now, so we have not yet been issued an accession number at the time of submitting the present revision. Nevertheless, we note that our annotation submission has passed all automated checks on NCBI’s end.

See the authors' detailed response to the review by Marc Tollis
See the authors' detailed response to the review by Steven Salzberg

Introduction

The common, or spectacled, caiman, Caiman crocodilus, is one of the most widely distributed and abundant crocodilian species, ranging continuously from Mexico to Argentina (Busack and Pandya 2001; US Fish and Wildlife Service 2018). A generalist predator, C. crocodilus is remarkably adaptable, occupying a wide range of habitats from urban to seasonal savannahs to tropical rainforests (Medem 1981, 1983), and has recently been introduced to Cuba, Puerto Rico and Florida where it is considered an invasive species (US Fish and Wildlife Service 2018). The broad distribution and diversity of habitats has facilitated considerable intraspecific diversification within C. crocodilus; a recent analysis by Roberto et al. (2020) identified between seven and ten lineages within C. crocodilus across differing biogeographic regions and watersheds throughout Central and South America. Within-species diversity is also morphologically apparent, with skull shape in particular exhibiting systematic patterns of regional differentiation (Medem 1955; Gans 1980; Medem 1981, 1983; Ayarzaguena 1984; Escobedo-Galván et al. 2015). These intraspecific patterns of cranial shape variation within C. crocodilus have been shown to parallel patterns of interspecific cranial diversity found in extant crocodilians (Okamoto et al. 2015).

Additionally, C. crocodilus is a species of commercial importance, chiefly in the leather industry. While the hides of C. crocodilus contain osteoderms that render the manufacturing process more difficult than for other crocodilians, a majority of the approximately 1.5 million crocodilian skins traded globally come from C. crocodilus (Brazaitis et al. 1998; Caldwell 2015). As with other crocodilians, most legal hides come from commercial farming operations, and the market for caiman hides is estimated to be over US $85 million (Caldwell 2015). Wild populations of C. crocodilus are also hunted for meat and even fishing bait (Da Silveira and Thorbjarnarson 1999; Brum et al. 2015; Pimenta et al. 2018) and provide ecosystem services including nutrient cycling and biological control (Valencia-Aguilar et al. 2013; Marley et al. 2019). Due to its role as an apex predator, C. crocodilus exhibits considerable bioaccumulation, with genotoxic analyses demonstrating molecular signatures of pollution on the C. crocodilus genome (Oliveira et al. 2021).

Thus, a draft genome sequence for C. crocodilus can not only assist with improved husbandry, ecotoxicology and wildlife management, but also has the potential to provide insight into evolutionary processes driving intraspecific diversification in continental systems more broadly.

Such a genome sequence can further propel both basic and applied research beyond C. crocodilus. At present, five other crocodilian genome sequence assemblies are available - two each in the genera Alligator (the American and Chinese alligator, A. mississippiensis and A. sinensis, respectively) and Crocodylus (the Saltwater and Cuban crocodile, Cr. porosus and Cr. rhombifer, respectively), and one - the Gharial Gavialis gangeticus - in the genus Gavialis. Beyond their utility to economic (e.g., Miles et al. 2009) and conservation (e.g., Vashistha et al. 2020; Yang et al. 2023) activities, crocodilian genome assemblies have facilitated investigating such basic research questions as the evolution of temperature-dependent sex determination (e.g., Rice et al. 2017), the rate and nature of archosaur genome evolution (especially as determined in comparison with avian genomes - e.g., St. John et al. 2012; Green et al. 2014; Brittain et al. 2021), and the genetic basis of key evolutionary adaptations in amniotes, including, among others, immune responses (e.g., Wan et al. 2013; López-Pérez et al. 2022; Merchant et al. 2024), morphogenesis (e.g., Kusumi et al. 2013; Wu et al. 2018; Morris and Abzhanov 2021) and globin expression (e.g., Wan et al. 2013; Hoffmann et al. 2018; Natarajan et al. 2023). A C. crocodilus genome sequence could therefore provide a useful complement to these broader comparative genomic studies, which routinely use genomes from the genus Alligator, by including genomic data from a widely-distributed, living representative of Alligator’s sister taxa.

Materials and methods

DNA was extracted from a tissue sample belonging to a single Caiman crocodilus museum specimen (UF-FLMNH 171438) using the DNeasy ™kit from Qiagen (Hilden, Germany). DNA was quantitated using Thermofisher’s (Waltham, MA, USA) Picogreen ™kit (for a final Picogreen concentration of 77.78 ng/ $μ$ L). Tecan’s (Männedorf, Switzerland) NuGEN Celero ™kit was then used to construct a paired-end library, which was subsequently sequenced on a single Illumina (San Diego, CA, USA) NovaSeq S4 lane. This yielded 239,911,946 paired-end reads of 2 × 150 bp each. Nucleic acid isolation, quantitation, library generation and raw-read sequencing were performed at the University of Minnesota Genomics Center.

The paired-end reads (Sequence Read Archive available at Genbank with Accession number SRR22317059) were assembled de novo using the Iterative de Bruijn Graph Assembler (IDBA-UD; Peng et al. 2012). To assess the reliability of our pipeline from sequencing to de novo assembly using IDBA-UD, we repeated the sequencing and assembly using a museum-derived tissue sample from a single Alligator mississippiensis individual (UF-FLMNH 175565). This resulted in 249,325,204 paired-end reads of 2 × 150 bp each. As was the case for the C. crocodilus individual, the reads were then de novo assembled using IDBA-UD, and we used QUAST (Gurevich et al. 2013) to determine that the IDBA assembly of A. missippiensis captured approximately 94.2% of a recently published A. missippiensis assembly (GCA_000281125.4; Rice et al. 2017), with an NG50 of 21172 based on de novo assembled contigs alone.

We scaffolded the resulting draft C. crocodilus contigs using a two-step procedure. First, we scaffolded the caiman’s contigs against a Crocodylus porosus assembly (GCF_001723895.1; Ghosh et al. 2020) using ragtag (Alonge et al. 2019). We then re-scaffolded the resulting contigs/scaffolds against the confamilial Alligator mississipiensis assembly (GCA_000281125.4), again using ragtag.

Contaminants, mitochondrial DNA, vectors, adapters, and sequences shorter than 200 bp identified by NCBI were manually removed using seqkit (Shen et al. 2016) and custom scripts (available at http://github.com/kewok/ncbi_scrubber). The genome assembly has been deposited to Genbank with accession number JAGPOW000000000.1.

The resulting scaffold (10.5281/zenodo.4755063) was then masked using RepeatMasker (Smit et al. 2015) relying on the HMMER database (Finn et al. 2011) and with “alligator” specified as species. Liftoff (Shumate and Salzberg 2020) was then used to generate a draft annotation based on the masked assembly using the annotations associated with A. mississipiensis (GCA_000281125.4; Rice et al. 2017) as a reference.

table 2asn_gff (National Center for Biotecnology Information 2020) was used to generate a Sequin file (National Center for Biotechnology Information (US) 2014), and features flagged as errors were manually removed using custom scripts (available at https://github.com/kewok/ncbi_scrubber); as of December 2024 the draft annotation is available at 10.5281/zenodo.4755063.

Results and conclusions

Our assembly yielded a draft genome sequence of length 2,341,057,913 bp with 465,471 scaffolds and 723,636 contigs. Our draft C. crocodilus genome assembly has a scaffold N50 of 70,464,410 bp, or approximately 70.5 Mbp (Telatin et al. 2021). For context, in other crocodilian assemblies, scaffold N50s of approximately 478.2 Kbp, 2.2 Mbp 96.1 Mbp, 84.4 Mbp and 255.1 Mbp are reported for the Cuban crocodile (GCA_038503035.1; Meredith et al. 2024), the Chinese alligator (GCF_000455745.1; Wan et al. 2013), the gharial (Green et al. 2014), the Saltwater crocodile (GCF_001723895.1; Rice et al. 2017), and the American alligator (GCF_030867095.1), respectively. Among other reptile reference genome assemblies, the scaffold N50 we report is comparable in value to those reported for the reference genome assemblies of the common mock viper (Psammodynastes pulverulentus; GCA_024509165.1), the rock pigeon (Columba livia; GCF_036013475.1) and the Asian water monitor (Varanus salvator; GCA_023646645.1).

A QUAST analysis of contigs with more than 3,000 bp against the reference A. mississippiensis assembly GCA_000281125.4_ASM28112v4 identified 211 local misassemblies, 22 misassemblies (of which 14 are contig translocations, 6 are scaffold relocations and 2 are scaffold translocations). The misassembled contigs length is 4,572,832 bp.

We further used BUSCO (Simão et al. 2015) to evaluate the gene completeness of our C. crocodlius draft genome, querying against the sauropsida_odb10 database. This assessment yielded 7,224 out of 7,480 complete BUSCOs (for a completeness score of 96.5%), of which 7,176 were single-copy complete BUSCOs and 48 were duplicated BUSCOs.

A total of 297,374 gene features were predicted for the annotation. Using AGAT (Dianat 2020), we identified 18,836 functional transcripts, 20,020 mRNAs and 15,981 coding sequences with average lengths of 37,890bp, 39,524 bp and 1,063 bp, respectively. 115,941 exons (average length 562 bp) were identified, with an average of 7.3 exons per coding sequence, and 99,960 introns (average length 4,334 bp) were identified in the coding sequences. We further used HMMER (Eddy 2011) to determine the number of pFam protein families database hits for our annotation, finding 9,983 hits at the E=0.00001 sequence reporting threshold. The number of hits were comparable at the E=0.001 (10,255 hits) and E=0.0000001 (9,745 hits) levels. Finally, we conducted a reciprocal BLAST hit analysis against an annotation for the Saltwater crocodile Crocodylus porosus (GCF_001723895.1_CroPor_comp1), a non-alligatorid crocodilian for whom an annotation is presently available. Briefly, this Cr. porosus annotation contains 28,663 coding sequences with an average length of 1,527 bps and 19,538 genes and pseudogenes. Using proteinortho (Lechner et al. 2011), our reciprocal BLAST hit analysis found 7,894 orthologous groups across the annotations.

Here we have described the first draft assembly and annotation of the C. crocodilus genome. We feel these data can assist natural resource management, ecotoxicology, agriculture, as well as research into broader questions about the interplay between microevolutionary and macroevolutionary processes across broad biogeographic scales. In addition to potentially facilitating both basic and applied research into C. crocodilus biology, our C. crocodilus genome sequence expands the available crocodilian genome sequences to include the subfamily Caimaninae, the extant sister group to Alligator and a major crocodilian lineage hitherto unrepresented among assembled genome sequences. Our assembly can thus provide a useful resource not only for crocodilian genomics, but also for archosaur, reptile and amniote comparative genomics more broadly.

Data availability

The draft C. crocodilus genome assembly and sequence data have been deposited with the National Center for Biotechnology Information with accession numbers JAGPOW000000000.1 for the assembly, and SRR22317059 for the sequence read archives under Bioproject PRJNA716363. At present, the draft annotatation is in processing at the National Center for Biotechnology Information and is currently available for review at (doi.org/10.5281/zenodo.4755063).

Acknowledgments

We are especially indebted to Dr. P. S. Soltis, T. A. Lott and the Genetic Resources Repository at the University of Florida - Florida Natural History Museum (UF-FLNHM) for generously providing us with tissue samples. We would like to thank the University of Minnesota Genomics Center (Minneapolis, MN, USA) for their guidance and for isolating DNA from museum samples, and for performing library preparation and raw sequencing. We wish to thank the Minnesota Supercomputing Institute (MSI) at the University of Minnesota and the Department of Chemistry at the University of St. Thomas for providing critical computational resources that contributed to the research results reported within this paper. Finally, we are very grateful to Dr. S. Pirro and Dr. M. Kieras at Iridian Genomes (Bethesda, MD, USA) for valuable insight on scaffolding the draft assemblies, as well as two Reviewers for comments that significantly improved the manuscript.

References

Alonge M, Soyk S, Ramakrishnan S, et al.: RaGOO: Fast and accurate reference-guided scaffolding of draft genomes. Genome Biol. 2019; 20: 1–17. Publisher Full Text
Ayarzagüena J: Variaciones en la dieta de Caiman sclerops. La relación entre morfología bucal y dieta. Memoria De La Sociedad De Ciencias Naturales La Salle. 1984; 44: 123–140.
Brazaitis P, Watanabe ME, Amato G: The Caiman Trade. Sci. Am. 1998; 278: 70–76. Publisher Full Text
Brittain K, Ray DA, Gongora J, et al.: Crocodilian Genome Advances. In Zucoloto RB, Amavet PS, Verdade LM, et al. editors. Conservation Genetics of New World Crocodilians Springer International Publishing: Cham, Switzerland; 2021; pp. 185–202. Publisher Full Text
Brum SM, da Silva VM , Rossoni F, et al.: Use of dolphins and caimans as bait for Calophysus macropterus (Lichtenstein, 1819) (Siluriforme: Pimelodidae) in the Amazon. J. Appl. Ichthyol. 2015; 31: 675–680. Publisher Full Text
Busack SD, Pandya S: Geographic variation in Caiman crocodilus and Caiman yacare (Crocodylia: Alligatoridae): Systematic and legal implications. Herpetologica. 2001; 57: 294–312.
Caldwell J: World Trade in Crocodilian Skins 2013-2015. Technical report. UN Environment Programme World Conservation Monitoring Centre; 2015.
da Silveira R, Thorbjarnarson JB: Conservation implications of commercial hunting of black and spectacled caiman in the Mamirauá Sustainable Development Reserve, Brazil. Biol. Conserv. 1999; 88: 103–109. Publisher Full Text
Dainat J: Agat: Another gff analysis toolkit to handle annotations in any gtf/gff format. version v0.7.0. 2020. Accessed December 2024. Publisher Full Text
Eddy SR: Accelerated Profile HMM Searches. PLoS Comput. Biol. 2011; 7: e1002195. PubMed Abstract | Publisher Full Text | Free Full Text
Escobedo-Galván AH, Velasco JA, González-Maya JF, et al.: Morphometric analysis of the Rio Apaporis Caiman (Reptilia, Crocodylia, Alligatoridae). Zootaxa. 2015; 4059: 541–554. PubMed Abstract | Publisher Full Text
Finn RD, Clements J, Eddy SR: HMMER web server: Interactive sequence similarity searching. Nucleic Acids Res. 2011; 39: W29–W37. PubMed Abstract | Publisher Full Text | Free Full Text
Gans C: Allometric Changes in the Skull and Brain of Caiman crocodilus. J. Herpetol. 1980; 14: 297–301. Publisher Full Text
Ghosh A, Johnson MG, Osmanski AB, et al.: A High-Quality Reference Genome Assembly of the Saltwater Crocodile, Crocodylus porosus, Reveals Patterns of Selection in Crocodylidae. Genome Biol. Evol. 2020; 12: 3635–3646. PubMed Abstract | Publisher Full Text | Free Full Text
Green RE, Braun EL, Armstrong J, et al.Three crocodilian genomes reveal ancestral patterns of evolution among archosaurs. Science (New York, N.Y.). 2014; 346. : 1254449.
Gurevich A, Saveliev V, Vyahhi N, et al.: QUAST: Quality assessment tool for genome assemblies. Bioinformatics. 2013; 29: 1072–1075. PubMed Abstract | Publisher Full Text | Free Full Text
Hoffmann FG, Vandewege MW, Storz JF, et al.: Gene Turnover and Diversification of the α- and β-Globin Gene Families in Sauropsid Vertebrates. Genome Biol. Evol. 2018; 10: 344–358. PubMed Abstract | Publisher Full Text
Kusumi K, May CM, Eckalbar WL: A large-scale view of the evolution of amniote development: Insights from somitogenesis in reptiles. Curr. Opin. Genet. Dev. 2013; 23: 491–497. PubMed Abstract | Publisher Full Text
López-Pérez JE, Crother BI, Murray CM: The Inference of the Evolution of Immune Traits as Constrained by Phylogeny: Insight into the Immune System of the Basal Diapsid. Animals. 2022; 12: 2482. PubMed Abstract | Publisher Full Text | Free Full Text
Lechner M, Findeiß S, Steiner L, et al.: Proteinortho: Detection of (Co-)orthologs in large-scale analysis. BMC Bioinformatics. 2011; 12: 124. PubMed Abstract | Publisher Full Text | Free Full Text
Marley G, Lawrence AJ, Phillip DA, et al.: Mangrove and mudflat food webs are segregated across four trophic levels, yet connected by highly mobile top predators. Mar. Ecol. Prog. Ser. 2019; 632: 13–25. Publisher Full Text
Medem F: A new subspecies of Caiman sclerops from Colombia. Fieldiana: Zoology. 1955; 37: 339–343. Publisher Full Text
Medem F: Los Crocodylia de Sur America Volumen II. Bogotá, Colombia: Ministerio de Educación Nacional; 1983.
Medem F: Los Crocodylia de Sur America Volumen I. Bogotá, Colombia: Ministerio de Educación Nacional; 1981.
Merchant M, Hebert M, Salvador AC, et al.: Constitutive Innate Immunity and Systemic Responses to Infection of the American Alligator (Alligator mississippiensis). Animals. 2024; 14: 965. PubMed Abstract | Publisher Full Text | Free Full Text
Meredith RW, Milián-Garcia Y, Gatesy J, et al.: Draft assembly and annotation of the Cuban crocodile (Crocodylus rhombifer) genome. BMC Genomic Data. 2024; 25: 53. PubMed Abstract | Publisher Full Text | Free Full Text
Miles LG, Isberg SR, Glenn TC, et al.: A genetic linkage map for the saltwater crocodile (Crocodylus porosus). BMC Genomics. 2009; 10: 339. PubMed Abstract | Publisher Full Text | Free Full Text
Morris ZS, Abzhanov A: Heading for higher ground: Developmental origins and evolutionary diversification of the amniote face. Curr. Top. Dev. Biol. 2021; 141: 241–277. PubMed Abstract | Publisher Full Text
Natarajan C, Signore AV, Bautista NM, et al.: Evolution and molecular basis of a novel allosteric property of crocodilian hemoglobin. Curr. Biol. 2023; 33: 98–108.e4. PubMed Abstract | Publisher Full Text | Free Full Text
National Center for Biotechnology Information (US): Submitting Sequences using Specific NCBI Submission Tools. The GenBank Submissions Handbook. Bethesda, MD: National Center for Biotechnology Information (US); 2014; p. NBK566995
National Center for Biotecnology Information2020. table 2asn_gff.2020. Accessed December 2024. Reference Source
Okamoto KW, Langerhans RB, Rashid R, et al.: Microevolutionary patterns in the common caiman predict macroevolutionary trends across extant crocodilians. Biol. J. Linn. Soc. 2015; 116: 834–846. Publisher Full Text
Oliveira VCS, Viana PF, Gross MC, et al.: Looking for genetic effects of polluted anthropized environments on Caiman crocodilus crocodilus (Reptilia, Crocodylia): A comparative genotoxic and chromosomal analysis. Ecotoxicol. Environ. Saf. 2021; 209: 111835. PubMed Abstract | Publisher Full Text
Peng Y, Leung HC, Yiu SM, et al.: IDBA-UD: A de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics. 2012; 28: 1420–1428. PubMed Abstract | Publisher Full Text
Pimenta NC, Barnett AA, Botero-Arias R, et al.: When predators become prey: Community-based monitoring of caiman and dolphin hunting for the catfish fishery and the broader implications on Amazonian human-natural systems. Biol. Conserv. 2018; 222: 154–163. Publisher Full Text
Rice ES, Kohno S, St John J, et al.: Improved genome assembly of American alligator genome reveals conserved architecture of estrogen signaling. Genome Res. 2017; 27: 686–696. PubMed Abstract | Publisher Full Text | Free Full Text
Roberto IJ, Bittencourt PS, Muniz FL, et al.: Unexpected but unsurprising lineage diversity within the most widespread Neotropical crocodilian genus Caiman (Crocodylia, Alligatoridae). Syst. Biodivers. 2020; 18: 377–395. Publisher Full Text
Shen W, Le S, Li Y, et al.: SeqKit: A cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLoS ONE. 2016; 11: e0163962. PubMed Abstract | Publisher Full Text | Free Full Text
Shumate A, Salzberg SL: Liftoff: accurate mapping of gene annotations. Bioinformatics. 2021; 37: 1639–1643. PubMed Abstract | Publisher Full Text | Free Full Text
Simão FA, Waterhouse RM, Ioannidis P, et al.: BUSCO: Assessing genome assembly and annotation completeness with single- copy orthologs. Bioinformatics. 2015; 31: 3210–3212. Publisher Full Text
Smit A, Hubley R, Grenn PRepeatMasker Open-4.0.2015.
Smit A, Hubley R, Grenn P: Repeatmasker open-4.0. 2015: 2013–2015. Reference Source
St John J, Braun E, Isberg S, et al.: Sequencing three crocodilian genomes to illuminate the evolution of archosaurs and amniotes. Genome Biol. 2012; 13: 415. PubMed Abstract | Publisher Full Text | Free Full Text
Telatin A, Fariselli P, Birolo G: SeqFu: A Suite of Utilities for the Robust and Reproducible Manipulation of Sequence Files. Bioengineering. 2021; 8: 59. PubMed Abstract | Publisher Full Text | Free Full Text
US Fish and Wildlife Service: Common Caiman (Caiman crocodilus) Ecological Risk Screening Summary. Technical report. US Fish and Wildlife Service; 2018.
Valencia-Aguilar A, Cortés-Gómez AM, Ruiz-Agudelo CA: Ecosystem services provided by amphibians and reptiles in Neotropical ecosystems. Int. J. Biodivers. Sci. Ecosyst. Serv. Manag. 2013; 9: 257–272. Publisher Full Text
Vashistha G, Deepika S, Dhakate PM, et al.: The effectiveness of microsatellite DNA as a genetic tool in crocodilian conservation. Conserv. Genet. Resour. 2020; 12: 733–744. Publisher Full Text
Wan Q-H, Pan S-K, Hu L, et al.: Genome analysis and signature discovery for diving and sensory properties of the endangered Chinese alligator. Cell Res. 2013; 23: 1091–1105. PubMed Abstract | Publisher Full Text | Free Full Text
Wu P, Yan J, Lai Y-C, et al.: Multiple Regulatory Modules Are Required for Scale-to-Feather Conversion. Mol. Biol. Evol. 2018; 35: 417–430. PubMed Abstract | Publisher Full Text | Free Full Text
Yang S, Lan T, Zhang Y, et al.: Genomic investigation of the Chinese alligator reveals wild-extinct genetic diversity and genomic consequences of their continuous decline. Mol. Ecol. Resour. 2023; 23: 294–311. PubMed Abstract | Publisher Full Text | Free Full Text

Comments on this article Comments (0)

Version 2

VERSION 2 PUBLISHED 02 Dec 2021

Author details Author details

¹ Department of Biology, University of Saint Thomas, Saint Paul, MN, 55105, USA
² Department of Biochemistry, University of Washington, Seattle, WA, 98195, USA

Nichole Dopkins
Roles: Data Curation, Formal Analysis, Investigation, Resources, Software, Writing – Review & Editing

Elias Kinfu
Roles: Data Curation, Formal Analysis, Investigation, Software, Validation, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

This study was funded by a University of St. Thomas CAS Startup Fund.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Article Versions (2)

version 2

Revised

Published: 15 Jan 2025, 10:1230

https://doi.org/10.12688/f1000research.73066.2

version 1

Published: 02 Dec 2021, 10:1230

https://doi.org/10.12688/f1000research.73066.1

© 2025 Okamoto K et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

SEE MORE DETAILS

CITE

how to cite this article

Okamoto K, Dopkins N and Kinfu E. A draft genome sequence of the common, or spectacled caiman Caiman crocodilus [version 2; peer review: 1 approved, 1 not approved]. F1000Research 2025, 10:1230 (https://doi.org/10.12688/f1000research.73066.2)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 2

VERSION 2

PUBLISHED 15 Jan 2025

Revised

Views

Reviewer Report 16 Jan 2025

Steven Salzberg, Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA; Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA; Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA

Approved

https://doi.org/10.5256/f1000research.176214.r359372

I'm now satisfied and I approve this version. Note that the authors did not fix one ... Continue reading

CITE

Report a concern

Respond or Comment

Version 1

VERSION 1

PUBLISHED 02 Dec 2021

Views

Reviewer Report 28 Apr 2022

Marc Tollis, School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ, USA

Not Approved

https://doi.org/10.5256/f1000research.76689.r128753

A spectacled caiman genome is a welcome contribution to the field, however I believe some additional steps need to be taken before this manuscript can be acceptable for indexing. Fortunately, these steps are standard for most genome reports so the authors should have little problem addressing these issues.

The authors give a convincing rationale for sequencing the spectacled caiman genome - it is economically and ecologically important - but I believe they left out a key point. This is the first publicly available caiman genome, and the third member of the Alligatoridae (American and Chinese alligator genomes are available). There are relatively few other crocodilian genomes available (a crocodile and a gharial), so this assembly helps with the lack of crocodilian genomes and fills in an important taxonomical gap for vertebrate comparative genomics. I think the authors can make this point rather easily.

The sequencing and assembly steps are well described, as well as the annotation process, however the results of these methods are not well described and thus it is difficult to place the importance of this resource in further proper context. How does the contiguity in terms of N50 compare to other crocodilian (or reptile) genomes? A BUSCO analysis would help provide an estimate of the gene completeness of the genome, and is part of the standard battery of tests for new genome assemblies, but has not been done.

The authors report almost 300,000 gene features, but it is not elaborated upon what these features are. How many contain coding sequences? How many contain a Pfam domain? How many contain a reciprocal blast hit with another crocodilian genome? What is the average length of exons and introns, average number of exons per gene? Answers to these questions and a comparison to annotations from other genomes would be essential for understanding the usefulness of this annotation.

Finally, the authors need to provide the accession numbers or BioProject number that indicates the submissions to NCBI. If the assembly has been submitted as stated in the manuscript, this should be relatively straightforward as the BioProject already exists or the accession number has been issued. The same goes for the raw reads, they should be deposited in the NCBI Short Read Archive and accession numbers should be reported. Zenodo is a good place to keep the assembly that was used for the analyses, but NCBI submission is standard for the field.

Are the rationale for sequencing the genome and the species significance clearly described?

Yes
Are the protocols appropriate and is the work technically sound?

Partly
Are sufficient details of the sequencing and extraction, software used, and materials provided to allow replication by others?

Yes
Are the datasets clearly presented in a usable and accessible format, and the assembly and annotation available in an appropriate subject-specific repository?

Partly

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Comparative genomics, phylogenetics, vertebrates

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

CITE

Report a concern

Author Response 15 Jan 2025

Kenichi Okamoto, Department of Biology, University of Saint Thomas, Saint Paul, 55105, USA

15 Jan 2025

Author Response

Comment: A spectacled caiman genome is a welcome contribution to the field, however I believe some additional steps need to be taken before this manuscript can be acceptable for indexing. ... Continue reading Comment: A spectacled caiman genome is a welcome contribution to the field, however I believe some additional steps need to be taken before this manuscript can be acceptable for indexing. Fortunately, these steps are standard for most genome reports so the authors should have little problem addressing these issues.

The authors give a convincing rationale for sequencing the spectacled caiman genome - it is economically and ecologically important - but I believe they left out a key point. This is the first publicly available caiman genome, and the third member of the Alligatoridae (American and Chinese alligator genomes are available). There are relatively few other crocodilian genomes available (a crocodile and a gharial), so this assembly helps with the lack of crocodilian genomes and fills in an important taxonomical gap for vertebrate comparative genomics. I think the authors can make this point rather easily.

Response: We thank the reviewer for sharing this evaluation with us. Their point that our assembly can also contribute to vertebrate comparative genomics more broadly had not occurred to us and is well taken. We now raise this point explicitly throughout our revised submission.

Comment: The sequencing and assembly steps are well described, as well as the annotation process, however the results of these methods are not well described and thus it is difficult to place the importance of this resource in further proper context. How does the contiguity in terms of N50 compare to other crocodilian (or reptile) genomes? A BUSCO analysis would help provide an estimate of the gene completeness of the genome, and is part of the standard battery of tests for new genome assemblies, but has not been done.

Response: Although our initial hope was that assessing our assembly strategy using genomic DNA extracted from an American alligator specimen could help reassure readers, we agree with the reviewer that more thoroughly describing the characteristics of our caiman genome sequence, even if we don’t have another assembly to compare to, was warranted. We therefore appreciate the reviewer identifying which metrics on our results would be of value to readers.

We thus revised the manuscript to include further characterizations, including the BUSCO analysis as the reviewer recommended. Briefly, 7480 BUSCO groups were searched, of which 96% were identified as single-copy complete BUSCOs.

Furthermore, in response to the reviewer’s comment on comparing contiguity, we now report the results of the QUAST analysis for our common caiman genome, as was suggested by Reviewer #1. Per Reviewer #2’s recommendation, in the revised manuscript, this value is now placed in the context of other crocodilian and selected reptile genomes.

Comment: The authors report almost 300,000 gene features, but it is not elaborated upon what these features are. How many contain coding sequences? How many contain a Pfam domain? How many contain a reciprocal blast hit with another crocodilian genome? What is the average length of exons and introns, average number of exons per gene? Answers to these questions and a comparison to annotations from other genomes would be essential for understanding the usefulness of this annotation.

Response: We thank the reviewer taking the time to identify the relevant statistics that would better describe our annotation to readers. As we note in our response to Reviewer #1, who had a made a very similar observation, we have revised our manuscript to more thoroughly characterize our annotation along the lines requested by the Reviewers.

Briefly, in response to the comment, the annotation includes 15981 coding sequences (excluding isoforms), 9983 pFam domain hits (at E<10^-5), and average length of 562 bps and 4334 bps for exons and introns, respectively and 7.3 exons per coding sequence on average. We have revised our manuscript to report all these values.

Finally, we strongly agreed with the Reviewer’s suggestion that placing the annotation in the context of those of other crocodilians could be informative to readers. Thus, in addition to the basic summary statistics about the annotation, we also include in our revision the results of comparing our annotation to other crocodilian genomes. We note that because we based our annotation off an existing American alligator annotation, we evaluated reciprocal blast hits with the Crocodylus porosus (saltwater crocodile) annotation and add in the revision how we found 7894 hits.

Comment: Finally, the authors need to provide the accession numbers or BioProject number that indicates the submissions to NCBI. If the assembly has been submitted as stated in the manuscript, this should be relatively straightforward as the BioProject already exists or the accession number has been issued. The same goes for the raw reads, they should be deposited in the NCBI Short Read Archive and accession numbers should be reported. Zenodo is a good place to keep the assembly that was used for the analyses, but NCBI submission is standard for the field.

Response: Reviewer #1 also made a very similar point. As we discuss in our response to their comment as well, we have now been issued accession numbers for the assembly and sequence read archive (SRA) and the assembly. We appreciate both reviewers highlighting the need to include these numbers, and the manuscript has now been updated to include the BioProject Identifier (PRJNA716363) and the Sequence Read Archives (SRR22317059).
Comment: A spectacled caiman genome is a welcome contribution to the field, however I believe some additional steps need to be taken before this manuscript can be acceptable for indexing. Fortunately, these steps are standard for most genome reports so the authors should have little problem addressing these issues.

The authors give a convincing rationale for sequencing the spectacled caiman genome - it is economically and ecologically important - but I believe they left out a key point. This is the first publicly available caiman genome, and the third member of the Alligatoridae (American and Chinese alligator genomes are available). There are relatively few other crocodilian genomes available (a crocodile and a gharial), so this assembly helps with the lack of crocodilian genomes and fills in an important taxonomical gap for vertebrate comparative genomics. I think the authors can make this point rather easily.

Response: We thank the reviewer for sharing this evaluation with us. Their point that our assembly can also contribute to vertebrate comparative genomics more broadly had not occurred to us and is well taken. We now raise this point explicitly throughout our revised submission.

Comment: The sequencing and assembly steps are well described, as well as the annotation process, however the results of these methods are not well described and thus it is difficult to place the importance of this resource in further proper context. How does the contiguity in terms of N50 compare to other crocodilian (or reptile) genomes? A BUSCO analysis would help provide an estimate of the gene completeness of the genome, and is part of the standard battery of tests for new genome assemblies, but has not been done.

Response: Although our initial hope was that assessing our assembly strategy using genomic DNA extracted from an American alligator specimen could help reassure readers, we agree with the reviewer that more thoroughly describing the characteristics of our caiman genome sequence, even if we don’t have another assembly to compare to, was warranted. We therefore appreciate the reviewer identifying which metrics on our results would be of value to readers.

We thus revised the manuscript to include further characterizations, including the BUSCO analysis as the reviewer recommended. Briefly, 7480 BUSCO groups were searched, of which 96% were identified as single-copy complete BUSCOs.

Furthermore, in response to the reviewer’s comment on comparing contiguity, we now report the results of the QUAST analysis for our common caiman genome, as was suggested by Reviewer #1. Per Reviewer #2’s recommendation, in the revised manuscript, this value is now placed in the context of other crocodilian and selected reptile genomes.

Comment: The authors report almost 300,000 gene features, but it is not elaborated upon what these features are. How many contain coding sequences? How many contain a Pfam domain? How many contain a reciprocal blast hit with another crocodilian genome? What is the average length of exons and introns, average number of exons per gene? Answers to these questions and a comparison to annotations from other genomes would be essential for understanding the usefulness of this annotation.

Response: We thank the reviewer taking the time to identify the relevant statistics that would better describe our annotation to readers. As we note in our response to Reviewer #1, who had a made a very similar observation, we have revised our manuscript to more thoroughly characterize our annotation along the lines requested by the Reviewers.

Briefly, in response to the comment, the annotation includes 15981 coding sequences (excluding isoforms), 9983 pFam domain hits (at E<10^-5), and average length of 562 bps and 4334 bps for exons and introns, respectively and 7.3 exons per coding sequence on average. We have revised our manuscript to report all these values.

Finally, we strongly agreed with the Reviewer’s suggestion that placing the annotation in the context of those of other crocodilians could be informative to readers. Thus, in addition to the basic summary statistics about the annotation, we also include in our revision the results of comparing our annotation to other crocodilian genomes. We note that because we based our annotation off an existing American alligator annotation, we evaluated reciprocal blast hits with the Crocodylus porosus (saltwater crocodile) annotation and add in the revision how we found 7894 hits.

Comment: Finally, the authors need to provide the accession numbers or BioProject number that indicates the submissions to NCBI. If the assembly has been submitted as stated in the manuscript, this should be relatively straightforward as the BioProject already exists or the accession number has been issued. The same goes for the raw reads, they should be deposited in the NCBI Short Read Archive and accession numbers should be reported. Zenodo is a good place to keep the assembly that was used for the analyses, but NCBI submission is standard for the field.

Response: Reviewer #1 also made a very similar point. As we discuss in our response to their comment as well, we have now been issued accession numbers for the assembly and sequence read archive (SRA) and the assembly. We appreciate both reviewers highlighting the need to include these numbers, and the manuscript has now been updated to include the BioProject Identifier (PRJNA716363) and the Sequence Read Archives (SRR22317059).
Competing Interests: No competing interests were disclosed. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 15 Jan 2025

Kenichi Okamoto, Department of Biology, University of Saint Thomas, Saint Paul, 55105, USA

15 Jan 2025

Author Response

Comment: A spectacled caiman genome is a welcome contribution to the field, however I believe some additional steps need to be taken before this manuscript can be acceptable for indexing. ... Continue reading Comment: A spectacled caiman genome is a welcome contribution to the field, however I believe some additional steps need to be taken before this manuscript can be acceptable for indexing. Fortunately, these steps are standard for most genome reports so the authors should have little problem addressing these issues.

The authors give a convincing rationale for sequencing the spectacled caiman genome - it is economically and ecologically important - but I believe they left out a key point. This is the first publicly available caiman genome, and the third member of the Alligatoridae (American and Chinese alligator genomes are available). There are relatively few other crocodilian genomes available (a crocodile and a gharial), so this assembly helps with the lack of crocodilian genomes and fills in an important taxonomical gap for vertebrate comparative genomics. I think the authors can make this point rather easily.

Response: We thank the reviewer for sharing this evaluation with us. Their point that our assembly can also contribute to vertebrate comparative genomics more broadly had not occurred to us and is well taken. We now raise this point explicitly throughout our revised submission.

Comment: The sequencing and assembly steps are well described, as well as the annotation process, however the results of these methods are not well described and thus it is difficult to place the importance of this resource in further proper context. How does the contiguity in terms of N50 compare to other crocodilian (or reptile) genomes? A BUSCO analysis would help provide an estimate of the gene completeness of the genome, and is part of the standard battery of tests for new genome assemblies, but has not been done.

Response: Although our initial hope was that assessing our assembly strategy using genomic DNA extracted from an American alligator specimen could help reassure readers, we agree with the reviewer that more thoroughly describing the characteristics of our caiman genome sequence, even if we don’t have another assembly to compare to, was warranted. We therefore appreciate the reviewer identifying which metrics on our results would be of value to readers.

We thus revised the manuscript to include further characterizations, including the BUSCO analysis as the reviewer recommended. Briefly, 7480 BUSCO groups were searched, of which 96% were identified as single-copy complete BUSCOs.

Furthermore, in response to the reviewer’s comment on comparing contiguity, we now report the results of the QUAST analysis for our common caiman genome, as was suggested by Reviewer #1. Per Reviewer #2’s recommendation, in the revised manuscript, this value is now placed in the context of other crocodilian and selected reptile genomes.

Comment: The authors report almost 300,000 gene features, but it is not elaborated upon what these features are. How many contain coding sequences? How many contain a Pfam domain? How many contain a reciprocal blast hit with another crocodilian genome? What is the average length of exons and introns, average number of exons per gene? Answers to these questions and a comparison to annotations from other genomes would be essential for understanding the usefulness of this annotation.

Response: We thank the reviewer taking the time to identify the relevant statistics that would better describe our annotation to readers. As we note in our response to Reviewer #1, who had a made a very similar observation, we have revised our manuscript to more thoroughly characterize our annotation along the lines requested by the Reviewers.

Briefly, in response to the comment, the annotation includes 15981 coding sequences (excluding isoforms), 9983 pFam domain hits (at E<10^-5), and average length of 562 bps and 4334 bps for exons and introns, respectively and 7.3 exons per coding sequence on average. We have revised our manuscript to report all these values.

Finally, we strongly agreed with the Reviewer’s suggestion that placing the annotation in the context of those of other crocodilians could be informative to readers. Thus, in addition to the basic summary statistics about the annotation, we also include in our revision the results of comparing our annotation to other crocodilian genomes. We note that because we based our annotation off an existing American alligator annotation, we evaluated reciprocal blast hits with the Crocodylus porosus (saltwater crocodile) annotation and add in the revision how we found 7894 hits.

Comment: Finally, the authors need to provide the accession numbers or BioProject number that indicates the submissions to NCBI. If the assembly has been submitted as stated in the manuscript, this should be relatively straightforward as the BioProject already exists or the accession number has been issued. The same goes for the raw reads, they should be deposited in the NCBI Short Read Archive and accession numbers should be reported. Zenodo is a good place to keep the assembly that was used for the analyses, but NCBI submission is standard for the field.

Response: Reviewer #1 also made a very similar point. As we discuss in our response to their comment as well, we have now been issued accession numbers for the assembly and sequence read archive (SRA) and the assembly. We appreciate both reviewers highlighting the need to include these numbers, and the manuscript has now been updated to include the BioProject Identifier (PRJNA716363) and the Sequence Read Archives (SRR22317059).
Comment: A spectacled caiman genome is a welcome contribution to the field, however I believe some additional steps need to be taken before this manuscript can be acceptable for indexing. Fortunately, these steps are standard for most genome reports so the authors should have little problem addressing these issues.

The authors give a convincing rationale for sequencing the spectacled caiman genome - it is economically and ecologically important - but I believe they left out a key point. This is the first publicly available caiman genome, and the third member of the Alligatoridae (American and Chinese alligator genomes are available). There are relatively few other crocodilian genomes available (a crocodile and a gharial), so this assembly helps with the lack of crocodilian genomes and fills in an important taxonomical gap for vertebrate comparative genomics. I think the authors can make this point rather easily.

Response: We thank the reviewer for sharing this evaluation with us. Their point that our assembly can also contribute to vertebrate comparative genomics more broadly had not occurred to us and is well taken. We now raise this point explicitly throughout our revised submission.

Comment: The sequencing and assembly steps are well described, as well as the annotation process, however the results of these methods are not well described and thus it is difficult to place the importance of this resource in further proper context. How does the contiguity in terms of N50 compare to other crocodilian (or reptile) genomes? A BUSCO analysis would help provide an estimate of the gene completeness of the genome, and is part of the standard battery of tests for new genome assemblies, but has not been done.

Response: Although our initial hope was that assessing our assembly strategy using genomic DNA extracted from an American alligator specimen could help reassure readers, we agree with the reviewer that more thoroughly describing the characteristics of our caiman genome sequence, even if we don’t have another assembly to compare to, was warranted. We therefore appreciate the reviewer identifying which metrics on our results would be of value to readers.

We thus revised the manuscript to include further characterizations, including the BUSCO analysis as the reviewer recommended. Briefly, 7480 BUSCO groups were searched, of which 96% were identified as single-copy complete BUSCOs.

Furthermore, in response to the reviewer’s comment on comparing contiguity, we now report the results of the QUAST analysis for our common caiman genome, as was suggested by Reviewer #1. Per Reviewer #2’s recommendation, in the revised manuscript, this value is now placed in the context of other crocodilian and selected reptile genomes.

Comment: The authors report almost 300,000 gene features, but it is not elaborated upon what these features are. How many contain coding sequences? How many contain a Pfam domain? How many contain a reciprocal blast hit with another crocodilian genome? What is the average length of exons and introns, average number of exons per gene? Answers to these questions and a comparison to annotations from other genomes would be essential for understanding the usefulness of this annotation.

Response: We thank the reviewer taking the time to identify the relevant statistics that would better describe our annotation to readers. As we note in our response to Reviewer #1, who had a made a very similar observation, we have revised our manuscript to more thoroughly characterize our annotation along the lines requested by the Reviewers.

Briefly, in response to the comment, the annotation includes 15981 coding sequences (excluding isoforms), 9983 pFam domain hits (at E<10^-5), and average length of 562 bps and 4334 bps for exons and introns, respectively and 7.3 exons per coding sequence on average. We have revised our manuscript to report all these values.

Finally, we strongly agreed with the Reviewer’s suggestion that placing the annotation in the context of those of other crocodilians could be informative to readers. Thus, in addition to the basic summary statistics about the annotation, we also include in our revision the results of comparing our annotation to other crocodilian genomes. We note that because we based our annotation off an existing American alligator annotation, we evaluated reciprocal blast hits with the Crocodylus porosus (saltwater crocodile) annotation and add in the revision how we found 7894 hits.

Comment: Finally, the authors need to provide the accession numbers or BioProject number that indicates the submissions to NCBI. If the assembly has been submitted as stated in the manuscript, this should be relatively straightforward as the BioProject already exists or the accession number has been issued. The same goes for the raw reads, they should be deposited in the NCBI Short Read Archive and accession numbers should be reported. Zenodo is a good place to keep the assembly that was used for the analyses, but NCBI submission is standard for the field.

Response: Reviewer #1 also made a very similar point. As we discuss in our response to their comment as well, we have now been issued accession numbers for the assembly and sequence read archive (SRA) and the assembly. We appreciate both reviewers highlighting the need to include these numbers, and the manuscript has now been updated to include the BioProject Identifier (PRJNA716363) and the Sequence Read Archives (SRR22317059).
Competing Interests: No competing interests were disclosed. Close
Report a concern

Views

Reviewer Report 24 Dec 2021

Not Approved

https://doi.org/10.5256/f1000research.76689.r101899

This paper describes the assembly and annotation of a crocodile genome. It’s a very brief note that provides some useful background about the species, Caiman crocodilus, but it needs some additional work before it would be acceptable for indexing. All of the following requests should be quite easy to satisfy, but they are all essential.

First, the data are available on a Zenodo website, but that is unacceptable. All genomic data must be deposited in GenBank, EMBL, or DDBJ, and made available for any scientific publication. This practice has been near-universal for the past 25 years. The authors seem to know this, because they state that the assembly has been submitted to NCBI (the home of GenBank). However, the paper fails to provide GenBank accession numbers or a BioProject identifier. The authors need to get these identifiers and put them in the manuscript before the paper can be accepted. NCBI routinely provides such identifiers prior to publication.

Second, the raw reads are also at their Zenodo site, which again is not adequate. They need to deposit them either in SRA or ENA, and provide the accession numbers for those as well.

Third, because the paper does not provide anything other than a brief description of the methods used for assembly and annotation, it needs at least a cursory effort at quality evaluation. For the assembly, the authors could run Merqury to get an overall base-level quality value. That’s a really minimal step. A more thorough step would be to run QUAST to estimate the number of mis-assemblies. This is especially important with an assembly based only upon short Illumina reads.

The other missing part is at least a short discussion of the annotation results. They mapped over the annotation from an alligator genome, which is fine, but they don’t report any numbers other than the total number of “features,” which is nearly 300,000. How many protein-coding genes did they annotate? How many non-coding RNAs? How many total protein-coding transcripts, and total transcripts of all types? These numbers are very easy to extract and the authors should at least comment on them.

Many of the references are garbled or nonsensical:

For example, this is supposedly a reference about the term N50:

Telatin A: Proch::N50.2018.

I don’t know what this means.
Another example is this reference to the RepeatMasker software:

Smit A, Hubley R, Grenn PRepeatMasker Open-4.0.2015.

I can't figure out what that is, but the RepeatMasker program is well known and they should reference it properly.
This reference refers to a piece of unpublished software from NCBI, apparently, and note that it misspells “Biotechnology” in the NCBI name:

National Center for Biotecnology Information: table 2asn_gff.2020.
The reference to Shumate and Salzberg (2020) says "in press' but that paper was published in June 2021: https://academic.oup.com/bioinformatics/article-abstract/37/12/1639/6035128.

Are the rationale for sequencing the genome and the species significance clearly described?

Yes
Are the protocols appropriate and is the work technically sound?

Partly
Are sufficient details of the sequencing and extraction, software used, and materials provided to allow replication by others?

Yes
Are the datasets clearly presented in a usable and accessible format, and the assembly and annotation available in an appropriate subject-specific repository?

No

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: genomics, bioinformatics

CITE

Report a concern

Author Response 15 Jan 2025

Kenichi Okamoto, Department of Biology, University of Saint Thomas, Saint Paul, 55105, USA

15 Jan 2025

Author Response

Comment: This paper describes the assembly and annotation of a crocodile genome. It’s a very brief note that provides some useful background about the species, Caiman crocodilus, but it needs some ... Continue reading Comment: This paper describes the assembly and annotation of a crocodile genome. It’s a very brief note that provides some useful background about the species, Caiman crocodilus, but it needs some additional work before it would be acceptable for indexing. All of the following requests should be quite easy to satisfy, but they are all essential.

First, the data are available on a Zenodo website, but that is unacceptable. All genomic data must be deposited in GenBank, EMBL, or DDBJ, and made available for any scientific publication. This practice has been near-universal for the past 25 years. The authors seem to know this, because they state that the assembly has been submitted to NCBI (the home of GenBank). However, the paper fails to provide GenBank accession numbers or a BioProject identifier. The authors need to get these identifiers and put them in the manuscript before the paper can be accepted. NCBI routinely provides such identifiers prior to publication.

Second, the raw reads are also at their Zenodo site, which again is not adequate. They need to deposit them either in SRA or ENA, and provide the accession numbers for those as well.

Response: We thank the reviewer for highlighting these points. We had planned to update the NCBI identifiers upon acceptance, but now realize the Zenodo references were insufficient substitutes for the NCBI identifiers at the review stage as well.

The manuscript has now been updated to include the BioProject Identifier (PRJNA716363) and the Sequence Read Archives (SRR22317059).

Comment: Third, because the paper does not provide anything other than a brief description of the methods used for assembly and annotation, it needs at least a cursory effort at quality evaluation. For the assembly, the authors could run Merqury to get an overall base-level quality value. That’s a really minimal step. A more thorough step would be to run QUAST to estimate the number of mis-assemblies. This is especially important with an assembly based only upon short Illumina reads.

Response: We had initially hoped to reassure readers about the reliability of our approach by testing it on a de novo Alligator mississippiensis assembly and comparing the resulting genome to the published reference genome for this species.

However, we agree with the reviewer that descriptions of comparable assessments on the final Caiman crocodilus assembly itself are needed. We now characterize the misassemblies identified by QUAST and, as we describe in our response to Reviewer #2 as well, our revised manuscript also describes the results from running a BUSCO analysis on the genome assembly.

Comment: The other missing part is at least a short discussion of the annotation results. They mapped over the annotation from an alligator genome, which is fine, but they don’t report any numbers other than the total number of “features,” which is nearly 300,000. How many protein-coding genes did they annotate? How many non-coding RNAs? How many total protein-coding transcripts, and total transcripts of all types? These numbers are very easy to extract and the authors should at least comment on them.

Response: Reviewer #2 also thought the annotation needed to be better characterized, and we appreciate both reviewers’ constructive suggestions on the relevant summary statistics to include.

Our revised manuscript now reports the values requested by the Reviewers, and we comment on them in the context of two other existing crocodilian annotations. Briefly, in response to Reviewer #1's questions, after removing isoforms our annotation contains 18836 functional transcripts, 20020 mRNAs and 15981 coding sequences. The annotation includes 115941 exons and 99960 introns in the coding sequences.

Comment: Many of the references are garbled or nonsensical:
For example, this is supposedly a reference about the term N50:

Telatin A: Proch::N50.2018.

I don’t know what this means.

Another example is this reference to the RepeatMasker software:

Smit A, Hubley R, Grenn PRepeatMasker Open-4.0.2015.

I can't figure out what that is, but the RepeatMasker program is well known and they should reference it properly.

This reference refers to a piece of unpublished software from NCBI, apparently, and note that it misspells “Biotechnology” in the NCBI name:

National Center for Biotecnology Information: table 2asn_gff.2020.

The reference to Shumate and Salzberg (2020) says "in press' but that paper was published in June 2021: https://academic.oup.com/bioinformatics/article-abstract/37/12/1639/6035128.

Response: We thank the reviewer for noticing these and have corrected all these references in our revised manuscript.
Comment: This paper describes the assembly and annotation of a crocodile genome. It’s a very brief note that provides some useful background about the species, Caiman crocodilus, but it needs some additional work before it would be acceptable for indexing. All of the following requests should be quite easy to satisfy, but they are all essential.

First, the data are available on a Zenodo website, but that is unacceptable. All genomic data must be deposited in GenBank, EMBL, or DDBJ, and made available for any scientific publication. This practice has been near-universal for the past 25 years. The authors seem to know this, because they state that the assembly has been submitted to NCBI (the home of GenBank). However, the paper fails to provide GenBank accession numbers or a BioProject identifier. The authors need to get these identifiers and put them in the manuscript before the paper can be accepted. NCBI routinely provides such identifiers prior to publication.

Second, the raw reads are also at their Zenodo site, which again is not adequate. They need to deposit them either in SRA or ENA, and provide the accession numbers for those as well.

Response: We thank the reviewer for highlighting these points. We had planned to update the NCBI identifiers upon acceptance, but now realize the Zenodo references were insufficient substitutes for the NCBI identifiers at the review stage as well.

The manuscript has now been updated to include the BioProject Identifier (PRJNA716363) and the Sequence Read Archives (SRR22317059).

Comment: Third, because the paper does not provide anything other than a brief description of the methods used for assembly and annotation, it needs at least a cursory effort at quality evaluation. For the assembly, the authors could run Merqury to get an overall base-level quality value. That’s a really minimal step. A more thorough step would be to run QUAST to estimate the number of mis-assemblies. This is especially important with an assembly based only upon short Illumina reads.

Response: We had initially hoped to reassure readers about the reliability of our approach by testing it on a de novo Alligator mississippiensis assembly and comparing the resulting genome to the published reference genome for this species.

However, we agree with the reviewer that descriptions of comparable assessments on the final Caiman crocodilus assembly itself are needed. We now characterize the misassemblies identified by QUAST and, as we describe in our response to Reviewer #2 as well, our revised manuscript also describes the results from running a BUSCO analysis on the genome assembly.

Comment: The other missing part is at least a short discussion of the annotation results. They mapped over the annotation from an alligator genome, which is fine, but they don’t report any numbers other than the total number of “features,” which is nearly 300,000. How many protein-coding genes did they annotate? How many non-coding RNAs? How many total protein-coding transcripts, and total transcripts of all types? These numbers are very easy to extract and the authors should at least comment on them.

Response: Reviewer #2 also thought the annotation needed to be better characterized, and we appreciate both reviewers’ constructive suggestions on the relevant summary statistics to include.

Our revised manuscript now reports the values requested by the Reviewers, and we comment on them in the context of two other existing crocodilian annotations. Briefly, in response to Reviewer #1's questions, after removing isoforms our annotation contains 18836 functional transcripts, 20020 mRNAs and 15981 coding sequences. The annotation includes 115941 exons and 99960 introns in the coding sequences.

Comment: Many of the references are garbled or nonsensical:
For example, this is supposedly a reference about the term N50:

Telatin A: Proch::N50.2018.

I don’t know what this means.

Another example is this reference to the RepeatMasker software:

Smit A, Hubley R, Grenn PRepeatMasker Open-4.0.2015.

I can't figure out what that is, but the RepeatMasker program is well known and they should reference it properly.

This reference refers to a piece of unpublished software from NCBI, apparently, and note that it misspells “Biotechnology” in the NCBI name:

National Center for Biotecnology Information: table 2asn_gff.2020.

The reference to Shumate and Salzberg (2020) says "in press' but that paper was published in June 2021: https://academic.oup.com/bioinformatics/article-abstract/37/12/1639/6035128.

Response: We thank the reviewer for noticing these and have corrected all these references in our revised manuscript.
Competing Interests: None. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 15 Jan 2025

Kenichi Okamoto, Department of Biology, University of Saint Thomas, Saint Paul, 55105, USA

15 Jan 2025

Author Response

Comment: This paper describes the assembly and annotation of a crocodile genome. It’s a very brief note that provides some useful background about the species, Caiman crocodilus, but it needs some ... Continue reading Comment: This paper describes the assembly and annotation of a crocodile genome. It’s a very brief note that provides some useful background about the species, Caiman crocodilus, but it needs some additional work before it would be acceptable for indexing. All of the following requests should be quite easy to satisfy, but they are all essential.

First, the data are available on a Zenodo website, but that is unacceptable. All genomic data must be deposited in GenBank, EMBL, or DDBJ, and made available for any scientific publication. This practice has been near-universal for the past 25 years. The authors seem to know this, because they state that the assembly has been submitted to NCBI (the home of GenBank). However, the paper fails to provide GenBank accession numbers or a BioProject identifier. The authors need to get these identifiers and put them in the manuscript before the paper can be accepted. NCBI routinely provides such identifiers prior to publication.

Second, the raw reads are also at their Zenodo site, which again is not adequate. They need to deposit them either in SRA or ENA, and provide the accession numbers for those as well.

Response: We thank the reviewer for highlighting these points. We had planned to update the NCBI identifiers upon acceptance, but now realize the Zenodo references were insufficient substitutes for the NCBI identifiers at the review stage as well.

The manuscript has now been updated to include the BioProject Identifier (PRJNA716363) and the Sequence Read Archives (SRR22317059).

Comment: Third, because the paper does not provide anything other than a brief description of the methods used for assembly and annotation, it needs at least a cursory effort at quality evaluation. For the assembly, the authors could run Merqury to get an overall base-level quality value. That’s a really minimal step. A more thorough step would be to run QUAST to estimate the number of mis-assemblies. This is especially important with an assembly based only upon short Illumina reads.

Response: We had initially hoped to reassure readers about the reliability of our approach by testing it on a de novo Alligator mississippiensis assembly and comparing the resulting genome to the published reference genome for this species.

However, we agree with the reviewer that descriptions of comparable assessments on the final Caiman crocodilus assembly itself are needed. We now characterize the misassemblies identified by QUAST and, as we describe in our response to Reviewer #2 as well, our revised manuscript also describes the results from running a BUSCO analysis on the genome assembly.

Comment: The other missing part is at least a short discussion of the annotation results. They mapped over the annotation from an alligator genome, which is fine, but they don’t report any numbers other than the total number of “features,” which is nearly 300,000. How many protein-coding genes did they annotate? How many non-coding RNAs? How many total protein-coding transcripts, and total transcripts of all types? These numbers are very easy to extract and the authors should at least comment on them.

Response: Reviewer #2 also thought the annotation needed to be better characterized, and we appreciate both reviewers’ constructive suggestions on the relevant summary statistics to include.

Our revised manuscript now reports the values requested by the Reviewers, and we comment on them in the context of two other existing crocodilian annotations. Briefly, in response to Reviewer #1's questions, after removing isoforms our annotation contains 18836 functional transcripts, 20020 mRNAs and 15981 coding sequences. The annotation includes 115941 exons and 99960 introns in the coding sequences.

Comment: Many of the references are garbled or nonsensical:
For example, this is supposedly a reference about the term N50:

Telatin A: Proch::N50.2018.

I don’t know what this means.

Another example is this reference to the RepeatMasker software:

Smit A, Hubley R, Grenn PRepeatMasker Open-4.0.2015.

I can't figure out what that is, but the RepeatMasker program is well known and they should reference it properly.

This reference refers to a piece of unpublished software from NCBI, apparently, and note that it misspells “Biotechnology” in the NCBI name:

National Center for Biotecnology Information: table 2asn_gff.2020.

The reference to Shumate and Salzberg (2020) says "in press' but that paper was published in June 2021: https://academic.oup.com/bioinformatics/article-abstract/37/12/1639/6035128.

Response: We thank the reviewer for noticing these and have corrected all these references in our revised manuscript.
Comment: This paper describes the assembly and annotation of a crocodile genome. It’s a very brief note that provides some useful background about the species, Caiman crocodilus, but it needs some additional work before it would be acceptable for indexing. All of the following requests should be quite easy to satisfy, but they are all essential.

First, the data are available on a Zenodo website, but that is unacceptable. All genomic data must be deposited in GenBank, EMBL, or DDBJ, and made available for any scientific publication. This practice has been near-universal for the past 25 years. The authors seem to know this, because they state that the assembly has been submitted to NCBI (the home of GenBank). However, the paper fails to provide GenBank accession numbers or a BioProject identifier. The authors need to get these identifiers and put them in the manuscript before the paper can be accepted. NCBI routinely provides such identifiers prior to publication.

Second, the raw reads are also at their Zenodo site, which again is not adequate. They need to deposit them either in SRA or ENA, and provide the accession numbers for those as well.

Response: We thank the reviewer for highlighting these points. We had planned to update the NCBI identifiers upon acceptance, but now realize the Zenodo references were insufficient substitutes for the NCBI identifiers at the review stage as well.

The manuscript has now been updated to include the BioProject Identifier (PRJNA716363) and the Sequence Read Archives (SRR22317059).

Comment: Third, because the paper does not provide anything other than a brief description of the methods used for assembly and annotation, it needs at least a cursory effort at quality evaluation. For the assembly, the authors could run Merqury to get an overall base-level quality value. That’s a really minimal step. A more thorough step would be to run QUAST to estimate the number of mis-assemblies. This is especially important with an assembly based only upon short Illumina reads.

Response: We had initially hoped to reassure readers about the reliability of our approach by testing it on a de novo Alligator mississippiensis assembly and comparing the resulting genome to the published reference genome for this species.

However, we agree with the reviewer that descriptions of comparable assessments on the final Caiman crocodilus assembly itself are needed. We now characterize the misassemblies identified by QUAST and, as we describe in our response to Reviewer #2 as well, our revised manuscript also describes the results from running a BUSCO analysis on the genome assembly.

Comment: The other missing part is at least a short discussion of the annotation results. They mapped over the annotation from an alligator genome, which is fine, but they don’t report any numbers other than the total number of “features,” which is nearly 300,000. How many protein-coding genes did they annotate? How many non-coding RNAs? How many total protein-coding transcripts, and total transcripts of all types? These numbers are very easy to extract and the authors should at least comment on them.

Response: Reviewer #2 also thought the annotation needed to be better characterized, and we appreciate both reviewers’ constructive suggestions on the relevant summary statistics to include.

Our revised manuscript now reports the values requested by the Reviewers, and we comment on them in the context of two other existing crocodilian annotations. Briefly, in response to Reviewer #1's questions, after removing isoforms our annotation contains 18836 functional transcripts, 20020 mRNAs and 15981 coding sequences. The annotation includes 115941 exons and 99960 introns in the coding sequences.

Comment: Many of the references are garbled or nonsensical:
For example, this is supposedly a reference about the term N50:

Telatin A: Proch::N50.2018.

I don’t know what this means.

Another example is this reference to the RepeatMasker software:

Smit A, Hubley R, Grenn PRepeatMasker Open-4.0.2015.

I can't figure out what that is, but the RepeatMasker program is well known and they should reference it properly.

This reference refers to a piece of unpublished software from NCBI, apparently, and note that it misspells “Biotechnology” in the NCBI name:

National Center for Biotecnology Information: table 2asn_gff.2020.

The reference to Shumate and Salzberg (2020) says "in press' but that paper was published in June 2021: https://academic.oup.com/bioinformatics/article-abstract/37/12/1639/6035128.

Response: We thank the reviewer for noticing these and have corrected all these references in our revised manuscript.
Competing Interests: None. Close
Report a concern

Comments on this article Comments (0)

Version 2

VERSION 2 PUBLISHED 02 Dec 2021

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2
Version 2 (revision) 15 Jan 25	read
Version 1 02 Dec 21	read	read

Steven Salzberg, Johns Hopkins University, Baltimore, USA; Johns Hopkins University, Baltimore, USA; Johns Hopkins University, Baltimore, USA
Marc Tollis, Northern Arizona University, Flagstaff, USA

Comments on this article

All Comments(0)

Add a comment

Browse by related subjects

Back to all reports

Reviewer Report

3 Views

16 Jan 2025 | for Version 2

3 Views Cite this report Responses(0)

Approved

I'm now satisfied and I approve this version. Note that the authors did not fix one of the garbled references which is still in their reference list: Smit A, Hubley R, Grenn PRepeatMasker Open-4.0.2015.

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

genomics, bioinformatics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

34 Views

28 Apr 2022 | for Version 1

Marc Tollis, School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ, USA

34 Views Cite this report Responses(1)

Not Approved

Are the rationale for sequencing the genome and the species significance clearly described?

Yes
Are the protocols appropriate and is the work technically sound?

Partly
Are sufficient details of the sequencing and extraction, software used, and materials provided to allow replication by others?

Yes
Are the datasets clearly presented in a usable and accessible format, and the assembly and annotation available in an appropriate subject-specific repository?

Partly

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Comparative genomics, phylogenetics, vertebrates

Respond to this report

Responses (1)

Author Response

15 Jan 2025

Kenichi Okamoto, Department of Biology, University of Saint Thomas, Saint Paul, 55105, USA

Comment: A spectacled caiman genome is a welcome contribution to the field, however I believe some additional steps need to be taken before this manuscript can be acceptable for indexing. Fortunately, these steps are standard for most genome reports so the authors should have little problem addressing these issues.

The authors give a convincing rationale for sequencing the spectacled caiman genome - it is economically and ecologically important - but I believe they left out a key point. This is the first publicly available caiman genome, and the third member of the Alligatoridae (American and Chinese alligator genomes are available). There are relatively few other crocodilian genomes available (a crocodile and a gharial), so this assembly helps with the lack of crocodilian genomes and fills in an important taxonomical gap for vertebrate comparative genomics. I think the authors can make this point rather easily.

Response: We thank the reviewer for sharing this evaluation with us. Their point that our assembly can also contribute to vertebrate comparative genomics more broadly had not occurred to us and is well taken. We now raise this point explicitly throughout our revised submission.

Comment: The sequencing and assembly steps are well described, as well as the annotation process, however the results of these methods are not well described and thus it is difficult to place the importance of this resource in further proper context. How does the contiguity in terms of N50 compare to other crocodilian (or reptile) genomes? A BUSCO analysis would help provide an estimate of the gene completeness of the genome, and is part of the standard battery of tests for new genome assemblies, but has not been done.

Response: Although our initial hope was that assessing our assembly strategy using genomic DNA extracted from an American alligator specimen could help reassure readers, we agree with the reviewer that more thoroughly describing the characteristics of our caiman genome sequence, even if we don’t have another assembly to compare to, was warranted. We therefore appreciate the reviewer identifying which metrics on our results would be of value to readers.

We thus revised the manuscript to include further characterizations, including the BUSCO analysis as the reviewer recommended. Briefly, 7480 BUSCO groups were searched, of which 96% were identified as single-copy complete BUSCOs.

Furthermore, in response to the reviewer’s comment on comparing contiguity, we now report the results of the QUAST analysis for our common caiman genome, as was suggested by Reviewer #1. Per Reviewer #2’s recommendation, in the revised manuscript, this value is now placed in the context of other crocodilian and selected reptile genomes.

Comment: The authors report almost 300,000 gene features, but it is not elaborated upon what these features are. How many contain coding sequences? How many contain a Pfam domain? How many contain a reciprocal blast hit with another crocodilian genome? What is the average length of exons and introns, average number of exons per gene? Answers to these questions and a comparison to annotations from other genomes would be essential for understanding the usefulness of this annotation.

Response: We thank the reviewer taking the time to identify the relevant statistics that would better describe our annotation to readers. As we note in our response to Reviewer #1, who had a made a very similar observation, we have revised our manuscript to more thoroughly characterize our annotation along the lines requested by the Reviewers.

Briefly, in response to the comment, the annotation includes 15981 coding sequences (excluding isoforms), 9983 pFam domain hits (at E<10^-5), and average length of 562 bps and 4334 bps for exons and introns, respectively and 7.3 exons per coding sequence on average. We have revised our manuscript to report all these values.

Finally, we strongly agreed with the Reviewer’s suggestion that placing the annotation in the context of those of other crocodilians could be informative to readers. Thus, in addition to the basic summary statistics about the annotation, we also include in our revision the results of comparing our annotation to other crocodilian genomes. We note that because we based our annotation off an existing American alligator annotation, we evaluated reciprocal blast hits with the Crocodylus porosus (saltwater crocodile) annotation and add in the revision how we found 7894 hits.

Comment: Finally, the authors need to provide the accession numbers or BioProject number that indicates the submissions to NCBI. If the assembly has been submitted as stated in the manuscript, this should be relatively straightforward as the BioProject already exists or the accession number has been issued. The same goes for the raw reads, they should be deposited in the NCBI Short Read Archive and accession numbers should be reported. Zenodo is a good place to keep the assembly that was used for the analyses, but NCBI submission is standard for the field.

Response: Reviewer #1 also made a very similar point. As we discuss in our response to their comment as well, we have now been issued accession numbers for the assembly and sequence read archive (SRA) and the assembly. We appreciate both reviewers highlighting the need to include these numbers, and the manuscript has now been updated to include the BioProject Identifier (PRJNA716363) and the Sequence Read Archives (SRR22317059).

View more View less

Competing Interests

No competing interests were disclosed.

Back to all reports

Reviewer Report

57 Views

24 Dec 2021 | for Version 1

57 Views Cite this report Responses(1)

Not Approved

For example, this is supposedly a reference about the term N50:

Telatin A: Proch::N50.2018.

I don’t know what this means.
Another example is this reference to the RepeatMasker software:

Smit A, Hubley R, Grenn PRepeatMasker Open-4.0.2015.

I can't figure out what that is, but the RepeatMasker program is well known and they should reference it properly.
This reference refers to a piece of unpublished software from NCBI, apparently, and note that it misspells “Biotechnology” in the NCBI name:

National Center for Biotecnology Information: table 2asn_gff.2020.
The reference to Shumate and Salzberg (2020) says "in press' but that paper was published in June 2021: https://academic.oup.com/bioinformatics/article-abstract/37/12/1639/6035128.

Are the rationale for sequencing the genome and the species significance clearly described?

Yes
Are the protocols appropriate and is the work technically sound?

Partly
Are sufficient details of the sequencing and extraction, software used, and materials provided to allow replication by others?

Yes
Are the datasets clearly presented in a usable and accessible format, and the assembly and annotation available in an appropriate subject-specific repository?

No

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

genomics, bioinformatics

Respond to this report

Responses (1)

Author Response

15 Jan 2025

Kenichi Okamoto, Department of Biology, University of Saint Thomas, Saint Paul, 55105, USA

Comment: This paper describes the assembly and annotation of a crocodile genome. It’s a very brief note that provides some useful background about the species, Caiman crocodilus, but it needs some additional work before it would be acceptable for indexing. All of the following requests should be quite easy to satisfy, but they are all essential.

First, the data are available on a Zenodo website, but that is unacceptable. All genomic data must be deposited in GenBank, EMBL, or DDBJ, and made available for any scientific publication. This practice has been near-universal for the past 25 years. The authors seem to know this, because they state that the assembly has been submitted to NCBI (the home of GenBank). However, the paper fails to provide GenBank accession numbers or a BioProject identifier. The authors need to get these identifiers and put them in the manuscript before the paper can be accepted. NCBI routinely provides such identifiers prior to publication.

Second, the raw reads are also at their Zenodo site, which again is not adequate. They need to deposit them either in SRA or ENA, and provide the accession numbers for those as well.

Response: We thank the reviewer for highlighting these points. We had planned to update the NCBI identifiers upon acceptance, but now realize the Zenodo references were insufficient substitutes for the NCBI identifiers at the review stage as well.

The manuscript has now been updated to include the BioProject Identifier (PRJNA716363) and the Sequence Read Archives (SRR22317059).

Comment: Third, because the paper does not provide anything other than a brief description of the methods used for assembly and annotation, it needs at least a cursory effort at quality evaluation. For the assembly, the authors could run Merqury to get an overall base-level quality value. That’s a really minimal step. A more thorough step would be to run QUAST to estimate the number of mis-assemblies. This is especially important with an assembly based only upon short Illumina reads.

Response: We had initially hoped to reassure readers about the reliability of our approach by testing it on a de novo Alligator mississippiensis assembly and comparing the resulting genome to the published reference genome for this species.

However, we agree with the reviewer that descriptions of comparable assessments on the final Caiman crocodilus assembly itself are needed. We now characterize the misassemblies identified by QUAST and, as we describe in our response to Reviewer #2 as well, our revised manuscript also describes the results from running a BUSCO analysis on the genome assembly.

Comment: The other missing part is at least a short discussion of the annotation results. They mapped over the annotation from an alligator genome, which is fine, but they don’t report any numbers other than the total number of “features,” which is nearly 300,000. How many protein-coding genes did they annotate? How many non-coding RNAs? How many total protein-coding transcripts, and total transcripts of all types? These numbers are very easy to extract and the authors should at least comment on them.

Response: Reviewer #2 also thought the annotation needed to be better characterized, and we appreciate both reviewers’ constructive suggestions on the relevant summary statistics to include.

Our revised manuscript now reports the values requested by the Reviewers, and we comment on them in the context of two other existing crocodilian annotations. Briefly, in response to Reviewer #1's questions, after removing isoforms our annotation contains 18836 functional transcripts, 20020 mRNAs and 15981 coding sequences. The annotation includes 115941 exons and 99960 introns in the coding sequences.

Comment: Many of the references are garbled or nonsensical:
For example, this is supposedly a reference about the term N50:

Telatin A: Proch::N50.2018.

I don’t know what this means.

Another example is this reference to the RepeatMasker software:

Smit A, Hubley R, Grenn PRepeatMasker Open-4.0.2015.

I can't figure out what that is, but the RepeatMasker program is well known and they should reference it properly.

This reference refers to a piece of unpublished software from NCBI, apparently, and note that it misspells “Biotechnology” in the NCBI name:

National Center for Biotecnology Information: table 2asn_gff.2020.

The reference to Shumate and Salzberg (2020) says "in press' but that paper was published in June 2021: https://academic.oup.com/bioinformatics/article-abstract/37/12/1639/6035128.

Response: We thank the reviewer for noticing these and have corrected all these references in our revised manuscript.

View more View less

Competing Interests

None.

Alongside their report, reviewers assign a status to the article:

Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions

[1] Alonge M, Soyk S, Ramakrishnan S, et al.: RaGOO: Fast and accurate reference-guided scaffolding of draft genomes. Genome Biol. 2019; 20: 1–17. Publisher Full Text

[2] Ayarzagüena J: Variaciones en la dieta de Caiman sclerops. La relación entre morfología bucal y dieta. Memoria De La Sociedad De Ciencias Naturales La Salle. 1984; 44: 123–140.

[3] Brazaitis P, Watanabe ME, Amato G: The Caiman Trade. Sci. Am. 1998; 278: 70–76. Publisher Full Text

[4] Brittain K, Ray DA, Gongora J, et al.: Crocodilian Genome Advances. In Zucoloto RB, Amavet PS, Verdade LM, et al. editors. Conservation Genetics of New World Crocodilians Springer International Publishing: Cham, Switzerland; 2021; pp. 185–202. Publisher Full Text

[5] Brum SM, da Silva VM , Rossoni F, et al.: Use of dolphins and caimans as bait for Calophysus macropterus (Lichtenstein, 1819) (Siluriforme: Pimelodidae) in the Amazon. J. Appl. Ichthyol. 2015; 31: 675–680. Publisher Full Text

[6] Busack SD, Pandya S: Geographic variation in Caiman crocodilus and Caiman yacare (Crocodylia: Alligatoridae): Systematic and legal implications. Herpetologica. 2001; 57: 294–312.

[7] Caldwell J: World Trade in Crocodilian Skins 2013-2015. Technical report. UN Environment Programme World Conservation Monitoring Centre; 2015.

[8] da Silveira R, Thorbjarnarson JB: Conservation implications of commercial hunting of black and spectacled caiman in the Mamirauá Sustainable Development Reserve, Brazil. Biol. Conserv. 1999; 88: 103–109. Publisher Full Text

[9] Dainat J: Agat: Another gff analysis toolkit to handle annotations in any gtf/gff format. version v0.7.0. 2020. Accessed December 2024. Publisher Full Text

[10] Eddy SR: Accelerated Profile HMM Searches. PLoS Comput. Biol. 2011; 7: e1002195. PubMed Abstract | Publisher Full Text | Free Full Text

[11] Escobedo-Galván AH, Velasco JA, González-Maya JF, et al.: Morphometric analysis of the Rio Apaporis Caiman (Reptilia, Crocodylia, Alligatoridae). Zootaxa. 2015; 4059: 541–554. PubMed Abstract | Publisher Full Text

[12] Finn RD, Clements J, Eddy SR: HMMER web server: Interactive sequence similarity searching. Nucleic Acids Res. 2011; 39: W29–W37. PubMed Abstract | Publisher Full Text | Free Full Text

[13] Gans C: Allometric Changes in the Skull and Brain of Caiman crocodilus. J. Herpetol. 1980; 14: 297–301. Publisher Full Text

[14] Ghosh A, Johnson MG, Osmanski AB, et al.: A High-Quality Reference Genome Assembly of the Saltwater Crocodile, Crocodylus porosus, Reveals Patterns of Selection in Crocodylidae. Genome Biol. Evol. 2020; 12: 3635–3646. PubMed Abstract | Publisher Full Text | Free Full Text

[15] Green RE, Braun EL, Armstrong J, et al.Three crocodilian genomes reveal ancestral patterns of evolution among archosaurs. Science (New York, N.Y.). 2014; 346. : 1254449.

[16] Gurevich A, Saveliev V, Vyahhi N, et al.: QUAST: Quality assessment tool for genome assemblies. Bioinformatics. 2013; 29: 1072–1075. PubMed Abstract | Publisher Full Text | Free Full Text

[17] Hoffmann FG, Vandewege MW, Storz JF, et al.: Gene Turnover and Diversification of the α- and β-Globin Gene Families in Sauropsid Vertebrates. Genome Biol. Evol. 2018; 10: 344–358. PubMed Abstract | Publisher Full Text

[18] Kusumi K, May CM, Eckalbar WL: A large-scale view of the evolution of amniote development: Insights from somitogenesis in reptiles. Curr. Opin. Genet. Dev. 2013; 23: 491–497. PubMed Abstract | Publisher Full Text

[19] López-Pérez JE, Crother BI, Murray CM: The Inference of the Evolution of Immune Traits as Constrained by Phylogeny: Insight into the Immune System of the Basal Diapsid. Animals. 2022; 12: 2482. PubMed Abstract | Publisher Full Text | Free Full Text

[20] Lechner M, Findeiß S, Steiner L, et al.: Proteinortho: Detection of (Co-)orthologs in large-scale analysis. BMC Bioinformatics. 2011; 12: 124. PubMed Abstract | Publisher Full Text | Free Full Text

[21] Marley G, Lawrence AJ, Phillip DA, et al.: Mangrove and mudflat food webs are segregated across four trophic levels, yet connected by highly mobile top predators. Mar. Ecol. Prog. Ser. 2019; 632: 13–25. Publisher Full Text

[22] Medem F: A new subspecies of Caiman sclerops from Colombia. Fieldiana: Zoology. 1955; 37: 339–343. Publisher Full Text

[23] Medem F: Los Crocodylia de Sur America Volumen II. Bogotá, Colombia: Ministerio de Educación Nacional; 1983.

[24] Medem F: Los Crocodylia de Sur America Volumen I. Bogotá, Colombia: Ministerio de Educación Nacional; 1981.

[25] Merchant M, Hebert M, Salvador AC, et al.: Constitutive Innate Immunity and Systemic Responses to Infection of the American Alligator (Alligator mississippiensis). Animals. 2024; 14: 965. PubMed Abstract | Publisher Full Text | Free Full Text

[26] Meredith RW, Milián-Garcia Y, Gatesy J, et al.: Draft assembly and annotation of the Cuban crocodile (Crocodylus rhombifer) genome. BMC Genomic Data. 2024; 25: 53. PubMed Abstract | Publisher Full Text | Free Full Text

[27] Miles LG, Isberg SR, Glenn TC, et al.: A genetic linkage map for the saltwater crocodile (Crocodylus porosus). BMC Genomics. 2009; 10: 339. PubMed Abstract | Publisher Full Text | Free Full Text

[28] Morris ZS, Abzhanov A: Heading for higher ground: Developmental origins and evolutionary diversification of the amniote face. Curr. Top. Dev. Biol. 2021; 141: 241–277. PubMed Abstract | Publisher Full Text

[29] Natarajan C, Signore AV, Bautista NM, et al.: Evolution and molecular basis of a novel allosteric property of crocodilian hemoglobin. Curr. Biol. 2023; 33: 98–108.e4. PubMed Abstract | Publisher Full Text | Free Full Text

[30] National Center for Biotechnology Information (US): Submitting Sequences using Specific NCBI Submission Tools. The GenBank Submissions Handbook. Bethesda, MD: National Center for Biotechnology Information (US); 2014; p. NBK566995

[31] National Center for Biotecnology Information2020. table 2asn_gff.2020. Accessed December 2024. Reference Source

[32] Okamoto KW, Langerhans RB, Rashid R, et al.: Microevolutionary patterns in the common caiman predict macroevolutionary trends across extant crocodilians. Biol. J. Linn. Soc. 2015; 116: 834–846. Publisher Full Text

[33] Oliveira VCS, Viana PF, Gross MC, et al.: Looking for genetic effects of polluted anthropized environments on Caiman crocodilus crocodilus (Reptilia, Crocodylia): A comparative genotoxic and chromosomal analysis. Ecotoxicol. Environ. Saf. 2021; 209: 111835. PubMed Abstract | Publisher Full Text

[34] Peng Y, Leung HC, Yiu SM, et al.: IDBA-UD: A de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics. 2012; 28: 1420–1428. PubMed Abstract | Publisher Full Text

[35] Pimenta NC, Barnett AA, Botero-Arias R, et al.: When predators become prey: Community-based monitoring of caiman and dolphin hunting for the catfish fishery and the broader implications on Amazonian human-natural systems. Biol. Conserv. 2018; 222: 154–163. Publisher Full Text

[36] Rice ES, Kohno S, St John J, et al.: Improved genome assembly of American alligator genome reveals conserved architecture of estrogen signaling. Genome Res. 2017; 27: 686–696. PubMed Abstract | Publisher Full Text | Free Full Text

[37] Roberto IJ, Bittencourt PS, Muniz FL, et al.: Unexpected but unsurprising lineage diversity within the most widespread Neotropical crocodilian genus Caiman (Crocodylia, Alligatoridae). Syst. Biodivers. 2020; 18: 377–395. Publisher Full Text

[38] Shen W, Le S, Li Y, et al.: SeqKit: A cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLoS ONE. 2016; 11: e0163962. PubMed Abstract | Publisher Full Text | Free Full Text

[39] Shumate A, Salzberg SL: Liftoff: accurate mapping of gene annotations. Bioinformatics. 2021; 37: 1639–1643. PubMed Abstract | Publisher Full Text | Free Full Text

[40] Simão FA, Waterhouse RM, Ioannidis P, et al.: BUSCO: Assessing genome assembly and annotation completeness with single- copy orthologs. Bioinformatics. 2015; 31: 3210–3212. Publisher Full Text

[41] Smit A, Hubley R, Grenn PRepeatMasker Open-4.0.2015.

[42] Smit A, Hubley R, Grenn P: Repeatmasker open-4.0. 2015: 2013–2015. Reference Source

[43] St John J, Braun E, Isberg S, et al.: Sequencing three crocodilian genomes to illuminate the evolution of archosaurs and amniotes. Genome Biol. 2012; 13: 415. PubMed Abstract | Publisher Full Text | Free Full Text

[44] Telatin A, Fariselli P, Birolo G: SeqFu: A Suite of Utilities for the Robust and Reproducible Manipulation of Sequence Files. Bioengineering. 2021; 8: 59. PubMed Abstract | Publisher Full Text | Free Full Text

[45] US Fish and Wildlife Service: Common Caiman (Caiman crocodilus) Ecological Risk Screening Summary. Technical report. US Fish and Wildlife Service; 2018.

[46] Valencia-Aguilar A, Cortés-Gómez AM, Ruiz-Agudelo CA: Ecosystem services provided by amphibians and reptiles in Neotropical ecosystems. Int. J. Biodivers. Sci. Ecosyst. Serv. Manag. 2013; 9: 257–272. Publisher Full Text

[47] Vashistha G, Deepika S, Dhakate PM, et al.: The effectiveness of microsatellite DNA as a genetic tool in crocodilian conservation. Conserv. Genet. Resour. 2020; 12: 733–744. Publisher Full Text

[48] Wan Q-H, Pan S-K, Hu L, et al.: Genome analysis and signature discovery for diving and sensory properties of the endangered Chinese alligator. Cell Res. 2013; 23: 1091–1105. PubMed Abstract | Publisher Full Text | Free Full Text

[49] Wu P, Yan J, Lai Y-C, et al.: Multiple Regulatory Modules Are Required for Scale-to-Feather Conversion. Mol. Biol. Evol. 2018; 35: 417–430. PubMed Abstract | Publisher Full Text | Free Full Text

[50] Yang S, Lan T, Zhang Y, et al.: Genomic investigation of the Chinese alligator reveals wild-extinct genetic diversity and genomic consequences of their continuous decline. Mol. Ecol. Resour. 2023; 23: 294–311. PubMed Abstract | Publisher Full Text | Free Full Text

A draft genome sequence of the common, or spectacled caiman Caiman crocodilus

Abstract

Keywords

Revised Amendments from Version 1

Introduction

Materials and methods

Results and conclusions

Data availability

Acknowledgments

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated