Whole genome sequence and genome-wide distributed single nucleotide polymorphisms ( SNPs ) of the Black Bengal goat

The Black Bengal goat (BBG) is a dwarf sized heritage goat ( ) Capra hircus breed from Bangladesh  and is well known for its high fertility, excellent meat , and skin quality. Here we present the first whole genome sequence and genome-wide distributed single nucleotide polymorphisms (SNPs) of the BBG. A total of 833,469,900 raw reads consisting of 125,020,485,000 bases were obtained by sequencing one male BBG sample. The reads were aligned to the San Clemente and the Yunnan black goat genome which resulted in 98.65% (properly paired, 94.81%) and 98.50% (properly paired, 97.10%) of the reads aligning, respectively. Notably, the estimated sequencing coverages were 48.22X and 44.28X compared to published San Clemente and the Yunnan black goat genomes respectively. On the other hand, a total of 9,497,875 high quality SNPs (Q ≥ 20) along with 1,023,359 indels, and 8,746,849 high quality SNPs along with 842,706 indels were identified in BBG against the San Clemente and Yunnan black goat genomes respectively. The dataset is publicly available from NCBI BioSample (SAMN10391846), Sequence Read Archive (SRR8182317, SRR8549413 and SRR8549904), with BioProject ID PRJNA504436 These data might be useful genomic resources in . conducting genome wide association studies, identification of quantitative trait loci (QTLs) and functional genomic analysis of the Black Bengal goat.


Introduction
The Black Bengal goat (BBG) is a small-sized breed of goat (Capra hircus) distributed throughout Bangladesh, West Bengal, Bihar, and Orissa regions of northeastern India.The predominant coat color of this breed is black but it is also found in brown, white and gray (Jalil et al., 2018).It is a heritage goat breed of Bangladesh, and well known for its high fertility, excellent meat and skin quality.This animal is a source of high quality meat, milk, and leather, and contributes substantially to the economy of Bangladesh (Amin et al., 2000;Faruque et al., 2017).The BBG is reported to have originated from wild goat, also known as the bezoar or Pasang (Capra aegagrus) (Herre & Röhrs, 1990), having introgressed genes from the markhor (Capra falconeri).Inheritance of genetic materials from the goats from the Southern region of China to the BBG has been hypothesized, given the historical cultural and geographical connection between South China and the Bengal across the South-Eastern offshoot of the Tibetan plateau (Nozawa, 1991).Despite its economic importance, no large-scale genomic resource is available to date for this goat breed.Here we used the Illumina HiSeq sequencing platform to sequence the whole genome of the BBG, generated short reads and identified high quality genome wide distributed single nucleotide polymorphisms (SNPs).These data might provide useful insight for conducting genome wide association studies, the identification of quantitative trait loci (QTLs) and functional genomic analysis of the Black Bengal goat.

Experimental animal
The experimental goat was reared at Bangladesh Livestock Research Institute (BLRI) goat farm under semi-intensive management system including slatted floor, well ventilated open sided house attached to pasture.All efforts were made to ameliorate harm to the animal.A small piece of ear tissue from an adult (30 months old) pedigreed goat (BioSample SAMN10391846) was collected by ear punching using a sterilized tissue puncher following local anesthesia (Lidocaine hydrochloride, 2%) on the right ear and immediately frozen into liquid nitrogen.Prior to ear punching, the goat was handled calmly with great care by a trained animal operator to prevent distress and injury to the animal and the handler.The tissue punching site was finally treated with antiseptic cream (Cetrimide, 0.5% and chlorhexidine digluconate 0.1%).All the animal procedure conformed the guidelines of the AWEC (Animal Welfare and Ethical Committee) of Bangladesh Agricultural University.

Sample processing
The tissue was finely ground by Micro Pestle (Sigma-Aldrich cat # SIAL501ZZ0), and high molecular weight DNA was extracted from the fresh frozen tissue using the Phenol:chloroform:isoamyl alcohol method (Sambrook & Russell, 2001).DNA purity was evaluated by NanoDrop 1000 Spectrophotometer (Life Technologies, CA, USA) and 0.8% agarose gel electrophoresis.DNA quantity was quantified using Qubit 2.0 Fluorometer and Qubit dsDNA HS Assay Kit (Life Technologies, CA, USA cat # Q32851).DNA was fragmented by acoustic disruption using Covaris S220 ultrasonicator and then underwent end repair, detailing, adapter ligation and purification (NEBNext UltraII DNA library Prep Kit cat # E7645S) following manufacturer instructions.The purified DNA was further selected for the right size before PCR amplification for library construction.The preliminary quantification and dilution of the library was performed using Qubit 2.0 Fluorometer, and, then Agilent 2100 Bioanalyzer was used to determine the insert size and nucleic acid concentration of the resulting library.The effective concentration of each sample in the library mixture was determined by qPCR (ABI 7500, Applied Biosystems, CA, USA) using the KAPA Library Quantification Kit (Cat.# KK4824) following the manufacturer's standard protocol with the primer pair Primer 1: 5'-AAT GAT ACG GCG ACC ACC GA-3' Primer 2: 5'-CAA GCA GAA GAC GGC ATA CGA-3'.The PCR conditions were as follow: an initial denaturation at 95°C for 5 min followed by 35 cycles (denaturation at 95°C for 30 sec, annealing/extension/ data acquisition at 60°C for 45 sec) and melt curve analysis at 65 -95°C before sequencing to ensure the accuracy of the sample concentration.

Sequencing
Sequencing was performed on the Illumina system (HiseqX) according to manufacturer's instructions.The samples were sequenced using a 2 × 150 paired-end (PE) configuration (GENEWIZ, Suzhou, China) using Illumina Truseq SBS Kit v4 (cat # FC-401-4003) in high output mode.Base calling was achieved with the sequencer built-in software Real-Time Analysis (RTA) (v1.5.15.1), which performs real-time conversion of the four fluorescent signals obtained from CCD (charge-coupled device) to binary base call (BCL) data.BCL data were then converted to fastq files using bcl2fastq (v2.17,Illumina).Data demultiplexing was then performed simultaneously based on index information.
Primary analysis was performed using the sequencer's built-in software HCS (v3.4.0) to determine whether the read can pass the chastity filter based on the signal quality of the first 25 cycles.If the read had no more than 2 out of the 25 cycles with chastity values below 0.6, the read was called PF (Pass Filter).PF clusters converted by bcl2fastq were called PF data and stored in FASTQ format.The raw data were filtered to remove adapter sequences, PE reads having Q scores of < 20 and N composition of >10%.After primary cleaning of reads, mitochondrial genomes were removed.Then the remaining high quality, contamination-free reads were aligned to both the San Clemente (GCA_001704415.1)and the Yunnan black goat genome (GCA_000317765.2) separately using Bowtie2 (v2.3.4.3) (Langmead & Salzberg, 2012).Samtools (v1.9) (Li et al., 2009) was used to convert the resulting SAM sequence alignment files to BAM format, followed by sorting, indexing and quality filtering.BCFtools (v1.9) (Narasimhan et al., 2016) was used to call and filter the variants.

Validation
A total of 833,469,900 raw reads consisting of 125,020,485,000 bases were obtained by sequencing of one male BBG sample (BioSample SAMN10391846).After the QC, a total of 812,209,030 reads containing 118,911,538,136 bases were kept which was 97.45% of the total raw reads.After quality filtration and removal of the mitochondrial genome, the reads were aligned to the San Clemente and the Yunnan black goat genome which resulted in 98.65% (properly paired, 94.81%) and 98.50% (properly paired, 97.10%) of the reads aligning, respectively.Additionally, a total of 9,497,875 high quality SNPs (Q ≥ 20) along with 1,023,359 indels were identified in BBG versus the San Clemente genome (See underlying data (Mollah et al., 2019a)).Similarly 8,746,849 high quality (Q ≥ 20) SNPs along with 842,706 indels were identified BBG versus the Yunnan black goat genome genome (See underlying data (Mollah et al., 2019b)).The transition and transversion ratio was 2.27 and 2.29 in BBG against the San Clemente and the Yunnan black goat respectively.

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes No competing interests were disclosed.

Competing Interests:
Reviewer Expertise: Genome assembly I have read this submission.I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias You can publish traditional articles, null/negative results, case reports, data notes and more The peer review process is transparent and collaborative Your article is indexed in PubMed after passing peer review Dedicated customer support at every stage For pre-submission enquiries, contact research@f1000.com