Keywords
Caiman crocodilus, spectacled caiman, genome, assembly, next-generation sequencing, crocodilian, vertebrate genome
Caiman crocodilus, spectacled caiman, genome, assembly, next-generation sequencing, crocodilian, vertebrate genome
The common, or spectacled, caiman, Caiman crocodilus, is one of the most widely distributed and abundant crocodilian species, ranging continuously from Mexico to Argentina (Busack and Pandya 2001; US Fish and Wildlife Service 2018). A generalist predator, C. crocodilus is remarkably adaptable, occupying a wide range of habitats from urban to seasonal savannahs to tropical rainforests (Medem 1981, 1983), and has recently been introduced to Cuba, Puerto Rico and Florida where it is considered an invasive species (US Fish and Wildlife Service 2018). The broad distribution and diversity of habitats has facilitated considerable intraspecific diversification within C. crocodilus; a recent analysis by Roberto et al. (2020) identified between seven and ten lineages within C. crocodilus across differing biogeographic regions and watersheds throughout Central and South America. Within-species diversity is also morphologically apparent, with skull shape in particular exhibiting systematic patterns of regional differentiation (Medem 1955; Gans 1980; Medem 1981, 1983; Ayarzaguena 1984; Escobedo-Galván et al. 2015). These intraspecific patterns of cranial shape variation within C. crocodilus have been shown to parallel patterns of interspecific cranial diversity found in extant crocodilians (Okamoto et al. 2015).
Additionally, C. crocodilus is a species of commercial importance, chiefly in the leather industry. While the hides of C. crocodilus contain osteoderms that render the manufacturing process more difficult than for other crocodilians, a majority of the approximately 1.5 million crocodilian skins traded globally come from C. crocodilus (Brazaitis et al. 1998; Caldwell 2015). As with other crocodilians, most legal hides come from commercial farming operations, and the market for caiman hides is estimated to be over US $85 million (Caldwell 2015). Wild populations of C. crocodilus are also hunted for meat and even fishing bait (Da Silveira and Thorbjarnarson 1999; Brum et al. 2015; Pimenta et al. 2018) and provide ecosystem services including nutrient cycling and biological control (Valencia-Aguilar et al. 2013; Marley et al. 2019). Due to its role as an apex predator, C. crocodilus exhibits considerable bioaccumulation, with genotoxic analyses demonstrating molecular signatures of pollution on the C. crocodilus genome (Oliveira et al. 2021).
Thus, a draft genome sequence for C. crocodilus can not only assist with improved husbandry, ecotoxicology and wildlife management, but also has the potential to provide insight into evolutionary processes driving intraspecific diversification in continental systems more broadly.
DNA was extracted from a tissue sample belonging to a single Caiman crocodilus museum specimen (UF-FLMNH 171438) using the DNeasy ™kit from Qiagen (Hilden, Germany). DNA was quantitated using Thermofisher’s (Waltham, MA, USA) Picogreen ™kit (for a final Picogreen concentration of 77.78 ng/L). Tecan’s (Männedorf, Switzerland) NuGEN Celero ™kit was then used to construct a paired-end library, which was subsequently sequenced on a single Illumina (San Diego, CA, USA) NovaSeq S4 lane. This yielded 239,911,946 paired-end reads of 2 × 150 bp each. Nucleic acid isolation, quantitation, library generation and raw-read sequencing were performed at the University of Minnesota Genomics Center.
The paired-end reads (as of October 2021, accessible at doi.org/10.5281/zenodo.5598241) were assembled de novo using the Iterative de Bruijn Graph Assembler (IDBA-UD; Peng et al. 2012). To assess the reliability of our pipeline from sequencing to de novo assembly using IDBA-UD, we repeated the sequencing and assembly using a museum-derived tissue sample from a single Alligator mississippiensis individual (UF-FLMNH 175565). This resulted in 249,325,204 paired-end reads of 2 × 150 bp each. As was the case for the C. crocodilus individual, the reads were then de novo assembled using IDBA-UD, and we used QUAST (Gurevich et al. 2013) to determine that the IDBA assembly of A. missippiensis captured approximately 94.2% of a recently published A. missippiensis assembly (GCA_000281125.4; Rice et al. 2017), with an NG50 of 21172 based on de novo assembled contigs alone.
We scaffolded the resulting draft C. crocodilus contigs using a two-step procedure. First, we scaffolded the caiman’s contigs against a Crocodylus porosus assembly (GCF_001723895.1; Ghosh et al. 2020) using ragtag (Alonge et al. 2019). We then re-scaffolded the resulting contigs/scaffolds against the confamilial Alligator mississipiensis assembly (GCA_000281125.4), again using ragtag. The draft assembly was then submitted to the National Center for Biotechnology Information (NCBI).
Contaminants, mitochondrial DNA, vectors, adapters, and sequences shorter than 200 bp identified by NCBI were manually removed using seqkit (Shen et al. 2016) and custom scripts (available at http://github.com/kewok/ncbi_scrubber). As of July 2021, this draft assembly can be accessed at doi.org/10.5281/zenodo.4755063.
The resulting scaffold (10.5281/zenodo.4755063) was then masked using RepeatMasker (Smit et al. 2015) relying on the HMMER database (Finn et al. 2011) and with “alligator” specified as species. Finally, liftoff (Shumate and Salzberg 2020) was used to generate a draft annotation based on the masked assembly using the annotations associated with A. mississipiensis (GCA_000281125.4; Rice et al. 2017) as a reference.
table 2asn_gff (National Center for Biotecnology Information 2020) was used to generate a Sequin file (National Center for Biotechnology Information (US) 2014), and features flagged as errors were manually removed using custom scripts (available at https://github.com/kewok/ncbi_scrubber); as of July 2021 the draft annotation is available at 10.5281/zenodo.4755063).
Our assembly yielded a draft assembly of length 2,341,057,913 bp with 465,471 scaffolds and contigs, and an N50 of 70,464,410 bp (Proch::N50 - Telatin 2018). A total of 297,374 gene features were predicted.
Here we have described the first draft assembly and annotation of the C. crocodilus genome. We feel these data can assist natural resource management, ecotoxicology, agriculture, as well as research into broader questions about the interplay between microevolutionary and macroevolutionary processes across broad biogeographic scales.
The draft assembly has been submitted to the National Center for Biotechnology Information and both assembly and annotation are currently available for review at (doi.org/10.5281/zenodo.4755063).
We are especially indebted to Dr. P. S. Soltis, T. A. Lott and the Genetic Resources Repository at the University of Florida - Florida Natural History Museum (UF-FLNHM) for generously providing us with tissue samples. We would like to thank the University of Minnesota Genomics Center (Minneapolis, MN, USA) for their guidance and for isolating DNA from museum samples, and for performing library preparation and raw sequencing. We wish to thank the Minnesota Supercomputing Institute (MSI) at the University of Minnesota and the Department of Chemistry at the University of St. Thomas for providing critical computational resources that contributed to the research results reported within this paper. Finally, we are very grateful to Dr. S. Pirro and Dr. M. Kieras at Iridian Genomes (Bethesda, MD, USA) for valuable insight on scaffolding the draft assemblies.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Are the rationale for sequencing the genome and the species significance clearly described?
Yes
Are the protocols appropriate and is the work technically sound?
Partly
Are sufficient details of the sequencing and extraction, software used, and materials provided to allow replication by others?
Yes
Are the datasets clearly presented in a usable and accessible format, and the assembly and annotation available in an appropriate subject-specific repository?
Partly
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Comparative genomics, phylogenetics, vertebrates
Are the rationale for sequencing the genome and the species significance clearly described?
Yes
Are the protocols appropriate and is the work technically sound?
Partly
Are sufficient details of the sequencing and extraction, software used, and materials provided to allow replication by others?
Yes
Are the datasets clearly presented in a usable and accessible format, and the assembly and annotation available in an appropriate subject-specific repository?
No
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: genomics, bioinformatics
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | ||
---|---|---|
1 | 2 | |
Version 2 (revision) 15 Jan 25 |
read | |
Version 1 02 Dec 21 |
read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)