A SNP resource for studying North American moose

Background: Moose ( Alces alces) colonized the North American continent from Asia less than 15,000 years ago, and spread across the boreal forest regions of Canada and the northern United States (US). Contemporary populations have low genetic diversity, due either to low number of individuals in the original migration (founder effect), and/or subsequent population bottlenecks in North America. Genetic tests based on informative single nucleotide polymorphism (SNP) markers are helpful in forensic and wildlife conservation activities, but have been difficult to develop for moose, due to the lack of a reference genome assembly and whole genome sequence (WGS) data. Methods: WGS data were generated for four individual moose from the US states of Alaska, Idaho, Wyoming, and Vermont with minimum and average genome coverage depths of 14- and 19-fold, respectively. Cattle and sheep reference genomes were used for aligning sequence reads and identifying moose SNPs. Results: Approximately 11% and 9% of moose WGS reads aligned to cattle and sheep genomes, respectively. The reads clustered at genomic segments, where sequence identity between these species was greater than 95%. In these segments, average mapped read depth was approximately 19-fold. Sets of 46,005 and 36,934 high-confidence SNPs were identified from cattle and sheep comparisons, respectively, with 773 and 552 of those having minor allele frequency of 0.5 and conserved flanking sequences in all three species. Among the four moose, heterozygosity and allele sharing of SNP genotypes were consistent with decreasing levels of moose genetic diversity from west to east. A minimum set of 317 SNPs, informative across all four moose, was selected as a resource for future SNP assay design. Conclusions: All SNPs and associated information are available, without restriction, to support development of SNP-based tests for animal identification, parentage determination, and estimating relatedness in North American moose.


Introduction
Alces alces is the largest member of the Cervidae family, and ranges throughout the circumpolar boreal forests of Eurasia and North America 1,2 . The species diverged from the ancestors of domestic cattle and sheep approximately 27 million years ago 3 . Moose are important ecologically, as a large ungulate with strong ecosystem impacts; economically, due to their value for tourism and hunting; and culturally, as a prominent symbol in many regions 4 . Consequently, there is active management of moose populations by wildlife agencies throughout their range in North America. However, management is hampered by a lack of genetic tools for monitoring moose, assessing the genetic health of populations, and even detecting illegal harvesting. Moose populations appear to be declining in some regions, including parts of the Upper Midwest of the United States 5,6 , and effective management is often dependent on data that are logistically challenging and/or costly to collect.
Identifying individual animals and measuring relatedness among and within populations are important for effective wildlife management and conservation efforts [7][8][9] . Identifying individuals can be as simple as observing their unique color patterns, for example in wild dogs (Lycaon pictus) 10 . However, this is not practical in species such as moose that have few features with obvious variation between individuals. Moreover, coat color patterns provide little information about genetic relatedness. Association of younger and older animals has been used to infer relationships, for example in swift foxes (Vulpes velox) where pups at a den are presumed to be offspring of the attending parents based on the monogamous behaviors they exhibit. However, detailed parentage studies have revealed multiple paternity within swift fox litters 11 and other fox species 12 . Generally, genetic testing provides a more accurate assignment of parentage and supports unique identification of individuals in the vast majority of instances, as well as an estimation of intra-and inter-population genetic variability.
Genetic testing using DNA markers has been applied to human, livestock, and wildlife studies for many years [13][14][15][16] . This form of testing first gained popularity with the development of microsatellite short tandem repeat (STR), and mitochondrial genome markers, concurrent with the development of DNA amplification and sequencing technologies. Approximately 5 to 11 microsatellite markers from cattle, sheep, and caribou (Rangifer tarandus) have been adapted for moose studies [17][18][19][20][21][22] . These studies form the basis of our current understanding of North American moose population structure and genetic diversity. DNA technology developments in the past decade have led to the replacement of microsatellite and mitochondrial genome markers with SNP markers because SNPs are more abundant, have greater stability over generations, are more accurately genotyped, and are amenable to automating the genotyping processes 23 . Moreover, panels of SNPs broaden the use of genotyping for management and conservation efforts, because they can provide not only identification of individuals and parentage, but also estimation of inbreeding and relatedness, and detection of admixture between populations of wildlife. For example, an SNP-based approach has been used for conservation efforts in endangered species such as the Iberian lynx (Lynx pardinus) 24 and Tasmanian devil (Sarcophilus harrisii) 25 , as well as more common but wide-ranging species like the brown bear (Ursus arctos) 26 . The application of SNP-based approaches, however, requires first the identification of polymorphisms segregating in the populations being studied, and developing assays that support accurate genotyping. Accordingly, a SNP panel spanning the genome of North American moose would be useful for addressing fundamental questions about population genetics in this species.
Low genetic diversity among North American moose populations has been previously reported 17,18 , making development of SNP panels challenging. The low diversity has been attributed in the prevailing theory, to a relatively recent (ca. 11,000-14,000 years ago) colonization from Asia and subsequent founder effect induced by extended range expansion from an original small group of animals 27 . However, no definitive evidence has been presented that refutes an alternative hypothesis, that North American moose experienced a severe population bottleneck at some time in the past 20 , as occurred for North American bison (Bison bison) populations 28 . Whether a founder effect, bottleneck, or both, the small effective population size simultaneously increases the need for developing genetic tools for management and the challenge of creating SNP marker panels.
Discovery of SNPs in a species has generally been preceded by development of its reference genome assembly. Using this assembly, whole genome sequence (WGS) reads from individual animals can be aligned, and differences between segregating alleles identified. However, creation of a reference assembly still represents a significant barrier for most research communities interested in wildlife species. Fortunately, an alternative approach that uses the reference genomes of related species has been developed 29 , and shown to effectively identify high-confidence SNPs likely to be segregating within the target species. Here we report the whole genome sequencing of four moose genomes, each obtained from distant geographic regions of North America, and the use of the cattle and sheep reference genomes to align the sequence data and identify SNPs likely to be segregating among moose populations. A set of criteria was developed to select the potentially most useful set of moose SNPs, and to identify 317 autosomal variants meeting these criteria. The associated sequence information was made freely available, and represents a resource for developing genotyping assays to support moose genetic research.

Ethical statement
This article contains no studies performed with animal subjects, and thus, no additional institutional ethical permits were required. Samples for DNA extraction were donated by private individuals not associated with this research. These were hunters that had legally harvested moose during the firearm hunting season in their state. No additional approvals were needed, since all hunters obtained valid hunting licenses for the harvesting of moose.

Animal samples
Samples of muscle tissue were obtained from four animals likely comprising three putative subspecies of A. alces based on their location in North America: A. gigas, A. shirasi, and A. americana 30 . These animals were harvested at four distinct geographic locations ( Figure 1) and entered as BioSamples in NCBI BioProject Accession PRJNA325061 (Table 1). As is typical, hunters removed the internal organs in the field, the carcasses were chilled, and the meat was subsequently processed for frozen storage. Each of the four owners donated approximately 50 g of frozen tissue from their harvested animal, and that tissue was archived at USMARC for use in this project.
WGS production, alignment, and SNP genotyping DNA was extracted from muscle with a typical phenol: chloroform method and stored at 4°C in 10 mM TrisCl, 1 mM EDTA (pH 8.0) as previously described 31 . Approximately 5 μg of moose genomic DNA was fragmented by focused-ultrasonication to generate fragments less than 800 bp long (Covaris, Inc. Woburn, Massachusetts USA). These fragments were used to make an indexed, 500 bp paired-end library according to the manufacturer's instructions (TruSeq DNA PCR-Free LT Library Preparation Kits A and B, Illumina, Inc., San Diego, California USA). After construction, indexed libraries were pooled with other indexed samples in groups of four to eight, and sequenced with a massively parallel sequencing machine and high-output kits (NextSeq500, two by 150 paired-end reads, Illumina Inc.). After sequencing, the raw reads were filtered to remove adaptor sequences, contaminating dimer sequences, and low-quality reads. Pooled libraries with compatible indexes were repeatedly sequenced until a minimum of 40 Gb of sequence with greater than Q20 quality was collected for each animal. Previous results showed that this level of coverage provided genotype scoring rates and accuracies that exceeded 99% 29 .
The DNA sequence alignment process was similar to that previously reported 29 . Briefly, FASTQ files corresponding to a minimum of 40 Gb of Q20 sequence were aggregated for each animal. The reference assemblies for both UMD3.1 32 and Oar_v3.1 were downloaded from the NCBI genomes download site and indexed for use with the Burrows Wheeler aligner (BWA) version 0.7.12 33 . The fastq files corresponding to R1 and R2 runs for the paired end libraries of each respective animal were aligned individually using the BWA aln algorithm and bovine reference assembly UMD3.1. The R1 and R2 datasets were then merged and collated using BWA sampe. The process was repeated for the mapping of the reads to the ovine Oar_v3.1 reference assembly. The resulting sequence alignment map (SAM) files were converted to binary alignment map (BAM) files, and subsequently sorted using Samtools (version 0.1.18) 34 . PCR duplicates were marked in the BAM files using the Genome Analysis Toolkit (GATK, version 1.5-32-g2761da9) 35 . Regions in the mapped dataset that would benefit from realignment due to small insertions and deletions were identified using the GATK module Realign-erTargetCreator, and realigned using the module IndelRealigner. The BAM file produced at each of these steps was indexed using Samtools. The resulting indexed BAM files were made available via the Intrepid Bioinformatics genome browser http://www. intrepidbio.com/, with groups of animals linked at the USMARC WGS browser (mapped to cattle, mapped to sheep). The raw reads were deposited at NCBI BioProject Accession PRJNA325061. Some SNP variants were identified manually by inspecting the target sequence with Integrative Genomics Viewer (IGV) software version 2.1.28 36,37 , as described in previously 38 . In these cases, read depth, allele count, allele position in the read, and quality score were taken into account when the manual genotype determination was made.

Variant detection and filtering
The above mapping efforts produced BAM files for the alignments to both UMD3.1, and Oar_v3.1. The BAM files for all four animals were analyzed simultaneously for variation against both the UMD3.1 and the Oar_v3.1 genomes. The GATK UnifiedGenotyper was used with the genotype mode (-gt_mode) flag set to DISCOVERY, and the likelihood model (-glm) flag was set to BOTH in order to identify both single nucleotide variants, and small insertions and deletions. The maximum number of alternate alleles (--max_alternate_alleles) flag was set to allow only three. Other than those mentioned, default parameters were used. Samtools was used to generate a pileup file containing the measured allele and depth of coverage at each position for all four animals. Variant sites in the four moose were filtered for having a minimal read depth of ten, and a minimum genotype quality score of 30. The SNPs were filtered for having a minor allele frequency (MAF) of 0.5, with both homozygous genotypes present among four animals. Fifty bases of flanking DNA sequence on either side of the targeted moose SNP were analyzed for nucleotide alleles that were homozygous in all four moose yet different from the cattle or sheep reference sequences. These nucleotide sites were flagged as potential moose "species-specific" alleles and the 101 bp of context sequence was edited to create a moose consensus reference sequence. The 101 bp of moose consensus sequence derived from the alignment of one reference genome was then tested for alignment to the other reference genome. Moose SNPs with MAFs of 0.5, and having been derived independently from alignment to both reference genomes, were manually assigned genome-wide bins based on their chromosome and proximity as inferred by alignment with the cattle genome. The goal of assigning markers to bins was to minimize linkage while allowing automated SNP assay design software the opportunity to select the best candidate marker for each distinct genomic region. All of these conservative filters were intended to maximize marker informativity in North American moose populations, and minimize potential technical difficulties with SNP assay designs that rely on oligonucleotide hybridization for genotype detection.

Results
An average of 63.5 Gb total genome sequence was collected for four moose. Based on similar estimates in cattle and sheep, this would correspond to an average read depth of 19-fold coverage if aligned to a moose reference genome of similar quality (Table 1). However, when cattle and sheep reference genomes were used, an average of 11.0 and 8.7% of the moose reads were aligned, respectively. For comparison, the same alignment method was performed with sets of bovine and ovine genomic sequences and resulted in 88.4% and 83.8% reads aligned to their respective genome assemblies (Table 1). For cross-species comparison, 22.2% of the ovine set of genomic sequence reads were aligned to the bovine assembly. Although the moose read depth was low when averaged across the entire genome of cattle or sheep, at conserved genome regions it was consistent with the expected average read depth of 19-fold. Thus, the moose read depth in conserved genomic regions appeared to be sufficient for identifying polymorphic sites and accurately assigning variant alleles.
Alignment of moose reads to the cattle and sheep genomes identified approximately 48.3 million and 39.7 million sites that differed from the reference assemblies, respectively. These included SNPs, insertions and deletions, and sites where moose-associated nucleotide differences occurred. The latter sites were defined as having homozygous genotypes in the four moose, with alleles differing from those in cattle or sheep ( Figure 2). After stringent filtering for read depth and alignment quality, there were 1,095,371 and 813,006 moose variants identified with the respective cattle and sheep genome assemblies (  Table 2). The most informative moose SNPs (i.e., "highly informative") were defined as those with a 0.5 MAF and both homozygous genotypes present among any of the four moose,  (Table S1 and Table S2). These regions also contained no SNPs among the four moose. d SNP that were independently identified by alignment in each reference genome and manually grouped into 216 chromosomal bins for assay design (Table S3). and are candidate SNPs that may have arisen to a high MAF prior to the species arrival in North America ( Figure 3). There were 1,341 and 1,014 of these moose SNPs identified with the cattle and sheep alignments, respectively (Table 2).
Candidate moose SNPs were further excluded when the flanking sequences in one reference genome were not uniquely identified in the other. This left 773 and 552 highly informative moose SNPs identified in conserved regions of the cattle and sheep genomes, respectively (Supplementary Table S1 and  Supplementary Table S2). Of these 1,325 highly informative SNPs, 1,008 were unique between the two sets, while 317 were common to both sets. The latter represents the most informative moose SNPs, with the highest flanking sequence conservation, due to their independent alignment to both reference genomes ( Table 2).   Table S1 and Table S2 for marker details). The inset shows the chromosomal map positions with the cattle UMD3.1 reference assembly.
The alignment coordinates of the 1,325 highly informative SNPs were analyzed for genome-wide distribution patterns that may indicate ascertainment biases caused by the variant selection.
Overall, the distribution of SNP sites in the sets with 773, 552, and the 317 intersecting markers, appeared to be widespread in the cattle and sheep genomes and generally appropriate for genomewide estimates (Figure 4). However, some SNP clustering was observed as the set of 317 had a mean and median spacing of 5.3 and 2.1 Mb, respectively (Supplementary Figure S1A). To facilitate SNP genotype assay design, the clustered SNPs were manually grouped into 216 bins with a mean size of approximately 8.1 Mb (median 5.9 Mb, Supplementary Figure S1B). Thus, SNP assay designs could be directed to each bin, with the option to use any SNP from that bin for multiplex assay design (Table S3).
Genotype analysis for the 773 and 552 moose SNPs derived from the cattle and sheep alignments, respectively, showed that each moose had approximately the same proportion of opposing homozygous genotypes ( Figure 5A, Supplementary Table S4  and Supplementary Table S5). However, there were significant differences in the ratio of heterozygous genotypes to homozygous genotypes ( Figure 5B). The Alaskan moose had the most favorable average heterozygosity ratio (1.26), followed by the moose from Wyoming and Idaho (1.10 and 1.06, respectively), and the Vermont moose (0.68). Note that the numerical value of the ratios calculated from these SNP is likely an underestimate of the within-animal genome-wide heterozygosity, because there may be ascertainment bias resulting from targeting of SNP discovery to genomic regions conserved between three species. The SNP allele sharing between each of the four moose was analyzed with the sets of 773 and 552 markers to obtain a genome-wide measurement of their relatedness. This was possible because the method for selecting each of these SNPs was not dependent on which two of the four moose were heterozygous. The pair of moose from Alaska and Idaho had the highest proportion of shared alleles (0.430 and 0.397), while the Alaska and Vermont pair had the lowest (0.255 and 0.279, Table 3). Together, the genotype results with these sets of 773 and 552 SNPs indicate that there was a west-to-east pattern of decreasing genetic diversity in the four moose used in this study.
The combined set of 1,008 highly informative moose SNPs were also evaluated for their relative proximity to genes in the annotated reference assemblies of cattle and sheep. In the sets of 773 and 552 moose SNPs, 256 and 181 were present within genes, respectively (Table S6 and Table S7). Some genes contained more than one polymorphism, and thus, there were 221 and 178 total cattle and sheep genes, respectively, with highly informative SNPs. Of these genes with moose SNPs, 84 were identified in both cattle  Table S1 and Table S2 for marker details). (A) Genotype counts for each of the four animals with the 773 moose SNPs identified in the alignment to cattle (Table S1) and the 552 moose SNPs identified in the alignment to sheep (Table S2). (B) The heterozygosity ratios calculated for each of the four animals from the 773 SNP set (grey circles); and the 552 SNP set (tan circles). The ratio consisted of the number of heterozygous sites divided by the combined number of homozygous sites. and sheep alignments. In addition, there were a number of informative SNPs in noteworthy genes that did not pass the read depth and quality score filters. For example, the prion gene (PRNP) affects susceptibility to spongiform encephalopathies such as chronic wasting disease in cervids. By manually viewing the PRNP coding sequence with IGV software, a coding SNP with 0.5 MAF and both homozygotes present was identified (M217I ,  Table 4). Thus, the publicly searchable and viewable moose WGS presented here represents a novel genomics resource that may facilitate candidate gene-based research in this species.

Discussion
We sequenced four moose from regions that span the United States, to approximately 19-fold genome coverage, and aligned them to the cattle and sheep reference genomes. Approximately 10% of moose sequences were aligned and used to identify more than 40 k moose SNPs in this cross-species approach. The relatively low alignment rate may be a reflection of the 27 million year average molecular divergence time between moose and non-cervid members of the Pecora infraorder 3 . In spite of the alignment rate, 1,008 highly informative moose SNPs were identified for future use in developing DNA-based genetic tests to support forensic and wildlife conservation activities. These 1,008 moose SNPs were derived from the intersection of two overlapping sets aligned to cattle (773 SNPs) and sheep (552 SNPs) reference genome assemblies. The 1,008 moose SNPs were refined to a minimal subset of 317 moose SNPs found in the most highly conserved genome regions. All of these markers are publicly available and ready for validation on a variety of SNP genotyping technology platforms. An important first step in evaluating these SNPs will be characterizing their MAFs in wild populations of North American moose. The online whole genome moose sequences, together with reference genotypes (Supplementary Table S4 and  Supplementary Table S5) and DNA from these four moose, provide the opportunity for immediate design, testing, and validation of these candidate parentage SNPs.
Genotype information from the 1,008 moose SNPs was useful for measuring genome-wide differences in DNA sequence diversity among the four individuals. Measurements of heterozygosity and allele sharing showed that the Alaskan moose was the most diverse, the Vermont moose was the least, with the moose from Idaho and Wyoming being intermediate. This is consistent with a species that crossed the Bering Land Bridge into Alaska and radiated outward from west to east across North America. SNPs  have been previously used to estimate genome diversity in other species with low genetic diversity like the European bison (B. bonasus) 41 and the Tasmanian devil (S. harrisii) 25 . A caveat with our results is the overall heterozygosity of each moose may be underestimated due to ascertainment bias for highly informative SNPs in highly conserved genomic regions. In other words, variation in conserved moose genome regions may occur at a lower rate than that in non-conserved regions. In spite of this potential ascertainment bias, the results suggest that combinations of these markers may be useful in detecting population structure.
An important unanswered question is: how informative will these SNPs be in moose populations? Population-wide data to address this question will require development and application of genotyping assays, and assembly of pertinent samples for testing, which was beyond the scope and resources of the present report. The data presented here, which identify polymorphisms with alternate homozygous genotypes in a limited sample of only four individuals, suggest that the SNP selected represent variation that existed prior to arrival of moose in North America.

Conclusions
These moose SNPs and associated sequence information are available for use without restriction, and provide a basis for developing commercial SNP-based "parentage" SNP DNA tests for validation in North American moose populations.

Data availability
FASTQ files for the four moose combined are available in NCBI SRA, with contiguous accession numbers SRX3218250 -SRX3218281.
The data are part of NCBI BioProject Accession PRJNA325061.

Competing interests
No competing interests were disclosed. Mention of trade names or commercial products in this publication is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the U.S. Department of Agriculture. The USDA is an equal opportunity provider and employer.  Table S1. List of 773 moose parentage SNPs aligned to bovine reference assembly UMD3.1.

Supplementary materials
Click here to access the data. Table S2. List of 552 moose parentage SNPs aligned to ovine reference assembly Oar_v3.1.
Click here to access the data. Table S3. List of 317 candidate moose parentage SNPs grouped into 216 bins for use in multiplex assay design.
Click here to access the data. Table S4. Genotypes for 773 moose SNPs aligned to the bovine reference assembly UMD3.1.
Click here to access the data. Table S5. Genotypes for 552 moose SNPs aligned to the ovine reference assembly Oar_v3.1.
Click here to access the data. Table S6. List of 256 moose parentage SNPs occurring within genes annotated in bovine reference assembly UMD3.1.
Click here to access the data. Table S7. List of 181 moose parentage SNPs occurring within genes annotated in ovine reference assembly Oar_v3.1.
Click here to access the data.   The inclusion of 4 individuals in this study is good, as is the broad geographic range of the samples across North America. The methods are appropriate, although as the first reviewer points out, RADSeq could have been used in SNP discovery and has some advantages over the methods used, but I believe that this report on moose is secondary to the authors' overall goals and we are fortunate that Kalbfleish et al. decided to pursue this and share their findings. The comparison to reference genomes, although from bovids not very closely related to moose, yielded some advantages, including identifying SNPs in important functional genes, such as the PRNP locus. Nonetheless, a white-tailed deer genome was made public in summer 2017 and although these authors may not be willing to start over by comparison of their data to a very similar genome (same subfamily and same number of chromosomes) the prospect exists and should be pursued. I can see the point of Reviewer 2 concerning Fig. 4 and assumed synteny but I believe I got the intended message from the figure as is. But I think mentioning somewhere that North American moose have 34 pairs of autosomes when discussing the success of mapping SNPs to the reference genomes would be appropriate. I find the last sentence interesting in that the authors believe the SNP variants they found existed prior to moose entering North America. Considering that event likely happened within the last 15,000 years that is a very reasonable statement, and may cause some to think that SNPs may contain little information about geographic variation in moose and all that goes with it. Given the morphological and behavioral differences between, say, Alaskan moose and those in the eastern continent, however, it is obvious that moose have evolved rapidly in that time, despite limited genetic diversity due to Pleistocene bottlenecks and founder effects. How that translates to current SNP diversity and what the latter may be able to tell us about moose are exciting questions to be asked, for which this manuscript sets the stage. about moose are exciting questions to be asked, for which this manuscript sets the stage.
Minor comments: 2 paragraph of Introduction: should be "among individuals" not "between individuals." Third paragraph of Introduction: should be "a SNP-based approach" not "an SNP-based approach." Last paragraph of Introduction, last sentence: should it be "has been made freely available" rather than "was made freely available?" Last paragraph of WGS Production …, last words of 2 -to-last sentence: Should be "previously." not "in previously." Discussion, last sentence: should this be "SNPs" instead of SNP?"

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Yes
No competing interests were disclosed.

Competing Interests:
Referee Expertise: Moose evolution, population and spatial genetics of moose and other large mammals I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. This well-written manuscript describes the discovery and characterization of single nucleotide polymorphisms (SNPs) in moose. These SNPs will be valuable for future research and conservation efforts, as they can be used for animal identification, assigning parentage, and estimating intra-and inter-population genetic variability. The SNP discovery and genotyping are done using well-established and clearly described approaches, and care has been taken to avoid false positive SNPs (requiring that both homozygous genotypes be observed for example). The raw sequencing data is available through the SRA database, while the aligned reads (BAM) files are available via the USDA and Intrepid Bioinformatics sites. The final filtered variants are provided with flanking sequence in the supplementary materials. Larger, less-filtered collections of variants are provided in the form of VCF files, again in the supplementary materials. The clear, detailed manuscript and the raw data and progressively filtered results will make is easy for others to reproduce or make use of the results of this work.
Minor comments 1. In the last paragraph of the introduction the authors note the challenge of creating a whole genome assembly for use in SNP discovery, and that the use of an existing reference genome from a related species can be an effective alternative. In the discussion section the authors mention that the cross-species mapping approach, however, has the drawback of targeting conserved regions of the genome. I am curious as to why the authors chose not to employ a technique like RADseq for SNP discovery, as it does not depend on the availability of a reference genome and would allow them to better assess minor allele frequency, through the inclusion of more individuals. Also, RADseq would be equally effective at targeting conserved and non-conserved regions. The reasons for not using RADseq (or its potential value in future studies) could be addressed in the introduction. Figure 2 and Figure 3 could be expanded slightly to explain to readers not familiar with IGV which elements represent reads, coverage, reference sequence. Also, in Figure 3 it isn't clear to me why the sequence of one read is shown (moose 4, second read from top, aligned to the sheep reference).

The figure legends for
3. In the Methods section "fastq" is written as "FASTQ" and "fastq". genomic sequence data from 4 moose across their range in the United States, aligned the reads to both cow and sheep reference genomes, and then applied a series of filters to select a subset of loci in highly conserved regions. The resulting set of loci showed a gradient of diversity decreasing from west to east, consistent with hypothesized colonization. The authors state that these loci will serve as a resource for future management and conservation applications. I think that this study has laudable goals and adds a valuable resource for moose conservation. However, there are some issues the analyses and presentation that need to be clarified.

Overall comments
The alignment and filtering procedures seem overly restrictive. Authors note that only ~10% of their reads aligned to the cow and sheep reference genomes (not surprising given the levels of divergence among the species) meaning that 90% of the data was essentially thrown away. Why not start with a assembly of the moose sequence? de novo Along these lines there are several recent papers that detail a hybrid procedure for genome construction that begins with assembly and then apply cross-species alignment for de novo scaffolding, e.g. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1911-6 If part of the goals is to eventually use these loci for "population genetic applications" (e.g. looking for population structure and assigning individuals to those populations) it strikes me that selecting loci that are in such heavily conserved regions may result in biased estimates. Can the authors comment on this? Doesn't this procedure result in a lot of "moose specific" variants being missed? I can understand that it is not the objective of this paper to create a draft genome sequence for the moose (though the authors note in several places that this would be useful, and the data generated here seem appropriate to attempt this), but then more justification as to why this was not attempted needs to be given. Did the authors check and make sure that the "highly conserved genomic regions" are not in repetitive elements? A blast search of these regions should do the trick.

Specific comments
In the sentence in the introduction starting with "DNA technology developments…" saying that SNPs have "replaced" microsatellites and mitochondrial DNA is a gross overstatement. There are many studies still using these markers and there are many applications where these markers may be preferable. Would be better to say something along the lines that SNPs have gained in use.
As currently written, the sentence in the introduction starting with "Moreover, panels pf SNPs No competing interests were disclosed.

Competing Interests:
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias You can publish traditional articles, null/negative results, case reports, data notes and more The peer review process is transparent and collaborative Your article is indexed in PubMed after passing peer review Dedicated customer support at every stage For pre-submission enquiries, contact research@f1000.com