Draft genome of tule elk Cervus canadensis nannodes

This paper presents the first draft genome of the tule elk ( Cervus elaphus nannodes), a subspecies native to California that underwent an extreme genetic bottleneck in the late 1800s. The genome was generated from Illumina HiSeq 3000 whole genome sequencing of four individuals, resulting in the assembly of 2.395 billion base pairs (Gbp) over 602,862 contigs over 500 bp and N50 = 6,885 bp. This genome provides a resource to facilitate future genomic research on elk and other cervids.

At the initiation of this project, no genome assembly existed for any member of the deer family (Cerivdae). We therefore sought to generate the first such assembly for the tule elk (Cervus canadensis nannodes). We note that after we completed our project and submitted the intial draft of this manuscript, a full assembly of red deer (Cervus elaphus hippelaphus) became available online 1 . The present paper presents the first de novo genomic draft of the tule elk. This California-endemic elk subspecies underwent a major genetic bottleneck when its numbers were reduced to as few as three individuals in the 1870s 2,3 . Although their numbers have increased to >5,000 today 4 , the historical bottleneck nevertheless left its mark on the elk's genome, rendering it more homozygous than other elk subspecies.
Our motivation for generating a genomic resource for the tule elk was to create a reference for identifying single nucleotide polymorphisms (SNPs) to develop assays to monitor elk population abundance and for related population genetic applications. Due to the relatively low coverage generated in this work (40X overall with an average of 10X coverage from each individual), we used the MEGAHIT metagenome assembler, which has been found to perform well on low-quality or low-coverage DNA sequencing in bacteria 5 . Bioinformatics processing Sequencing quality on demultiplexed reads was evaluated using FastQC v0.11.3 (RRID:SCR_014583) 6 . The Illumina TruSeq3-PE sequencing adapters were removed using Trimmomatic v0.30 (RRID:SCR_011848) 7 with the ILLUMINACLIP parameter set to TruSeq3-PE.fa:2:40:15. The TruSeq3-PE.fa sequence was downloaded from https://anonscm.debian.org/cgit/debian-med/ trimmomatic.git/plain/adapters/TruSeq3-PE.fa. LEADING and TRAILING parameters were set to 2, resulting in the removal of bases with a quality score of 2 or less according to a phred33 quality scoring matrix. The SLIDINGWINDOW parameter of 4:2 was used to clip reads once the quality score fell below 2 within the window. The MINLENGTH parameter set to 25 dropped any reads that fell below that length due to quality trimming. The demultiplexed, quality-filtered reads were interleaved using the interleave-reads.py script in khmer v2.0 (RRID:SCR_001156) 8 . The assembly was performed using MEGAHIT v1.0.5 9 on interleaved quality filtered reads. Genome statistical analysis was done using QUAST v3.0 (RRID:SCR_001228) 10 . All code used is publicly available at https://github.com/dib-lab/2017-tule-elk/.

Amendments from Version 1
In this version, the name Cervus elaphus nannodes was changed to Cervus canadensis nannoodes everywhere it appeared in the publication because most people now refer to the elk as Cervus canadensis to differentiate it from Eurasian red deer. Our original publication stated that we were presenting the first Cervidae genome, but this statement has been edited to reflect the recent addition (since our initial submission) of a red deer genome Cervus elaphus hippelaphus available on NCBI. Reference 1 has also been updated to point to this genome. The reported code in the "Bioinformatics processing" section contained an erroneous "SLIDING" parameter for trimmomatic, and this has been deleted to match the correct code on GitHub. Additional information about the quality of the sequencing run was added to the Results. Table 1 was reformatted for easier viewing.

REVISED
This genome can serve as the basis for further genomic work on tule elk and other cervids, such as the development of a SNP assay to track elk population movement across increasingly developed northern Californian terrain. Furthermore, it is one of the first whole genome assemblies available from the family Cervidae, providing a useful interim reference genome for bioinformatic analyses on other deer and elk species.

Data availability
Raw

Competing interests
No competing interests were disclosed.

Grant information Support for this project was provided by a grant to BNS from the California Department of Fish and Wildlife, FY1516 Big Game Management Program (Grant ID P1580009).
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The authors describe the generation of a draft assembly for tule elk in the style of a brief genome announcement. For SNP detection and primer design this assembly is fine. It could e.g. be used in combination with Genotyping by Sequencing on additional individuals.

Materials and methods are sound and provided in full.
However a quick search of NCBI's taxonomy resource reveals that since June 2017 there is a genome assembly for red deer available . The authors therefore https://www.ncbi.nlm.nih.gov/genome/10790 cannot claim to present the first whole genome assembly from the family Cervidae. Please change that statement.

Suggested further improvements:
Results I would have liked to see a figure for the total amount of sequence after filtering as a simple way of showing how good or bad the sequence run was.
showing how good or bad the sequence run was. I'd also recommend to add another assembly metric to look at the gene content; either using something like or by mapping the refseq sequences of a related, well annotated species (e.g. cattle) against BUSCO the draft genome.

Sample collection and library prep
I see that each individual has two tissue samples. The authors entered a sample ID into the 'tissue' field of NCBI's BioSample database. I'd recommend removing this and adding the animal ID in the 'isolate' field.
Please expand the entries in the 'isolation source' field. It says e.g. "Am. Cyn" which probably means American Canyon.

Bioinformatics processing
Checking the code I believe the statement "LEADING, TRAILING, and SLIDING parameters were set to 2" should read "LEADING and TRAILING parameters were set to 2".

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format? Yes No competing interests were disclosed.

Competing Interests:
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above. Thank you for your review of this paper. Version 2 has been edited to reflect the presence of the red deer genome and a citation to that genome has been made. Table 1 has been reformatted for readability. The changes you've requested to the NCBI BioSample entry have been made. The trimmomatic code in the Bioinformatics Processing section has been edited to remove the erroneous "SLIDING" parameter. We've added text to the first sentence of the results section that describes the quality of sequence data in terms of standard quality scores. We opted not to provide details on the gene content relative to a related genome as we felt this could be done more details on the gene content relative to a related genome as we felt this could be done more comprehensively in the future once the red deer genome has been published and peer-reviewed. At 602,862 contigs, the genome is very prelminary and will require quite a bit of additional work in order for it to be applicable to a wide range of applications. The report basically falls into a category of a genome announcement.

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format? Yes No competing interests were disclosed.

Competing Interests:
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.