Keywords
Genome annotation, reference genome, bettong, marsupial
This article is included in the Genomics and Genetics gateway.
The eastern or Tasmanian bettong (Bettongia gaimardi) is one of four extant bettong species and is listed as ‘Near Threatened’ by the IUCN. We sequenced short read data on the 10x system to generate a reference genome 3.46Gb in size and contig N50 of 87.36Kb and scaffold N50 of 2.93Mb. Additionally, we used GeMoMa to provide and accompanying annotation for the reference genome. The generation of a reference genome for the eastern bettong provides a vital resource for the conservation of the species.
Genome annotation, reference genome, bettong, marsupial
We have addresses concerns from all reviewers and updated the manuscript to reflect these changes. Including, providing additional details of the sequencing and assembly methodology reflected by providing specific information on software used and the amount of data generated. We have also provided reasoning for the sequencing methods used but acknowledge the increase in genome quality that could be achieved by using long read technologies and sequencing multiple individuals.
See the authors' detailed response to the review by Jonathan J Hughes
See the authors' detailed response to the review by Naoki Osada
See the authors' detailed response to the review by Németh Attila
The eastern or Tasmanian bettong (Bettongia gaimardi) is a small nocturnal Australian marsupial in the potoroid family and is considered an important ecosystem engineer due to its habit of digging and feeding on fungi (Munro et al. 2019, Ross et al. 2019a, 2019b). Eastern bettongs were once widespread across south-eastern Australia but are reported to have gone extinct on mainland Australia around the 1920’s due to predation from introduced carnivores and land clearing (Short 1998). Eastern bettongs are now confined to the eastern half of Tasmania where they are listed as ‘Near Threatened’ by the IUCN Red List (Burbidge et al. 2016).
Australia has experienced the highest extinction rate of mammals on any continent over the past 200 years, accounting for 28% of the world’s mammal extinctions since the year 1600 (McKenzie et al. 2007). Nationwide, a number of reintroduction programs are being implemented for the conservation of locally-extinct mammals, typically in the ‘critical weight range’ of 35 – 5500g (Burbidge and McKenzie 1989). In the case of the eastern bettong, the species was reintroduced from the state of Tasmania to two fenced reserves in the Australian Capital Territory (ACT) between 2011 and 2012 (Batson et al. 2016).
The generation of a reference genome will provide a valuable resource in the management of the two reintroduced populations of eastern bettongs and contribute to the global effort to sequence all eukaryotic life on Earth (Lewin et al. 2022). To generate a reference genome, we sequenced DNA with 10x Genomics short reads and used GeMoMa to produce a genome annotation.
We used muscle tissue from a deceased male pouch young (B. gaimardi) individual collected from Mulligan’s Flat Woodland Sanctuary during population monitoring in 2014 and was frozen immediately after collection. The Sample was collected under Australian National University Animal Experimentation Ethics Committee ethics protocol A2011/017. DNA extraction used the QIAGEN Genomic Tips kit (Qiagen Catalogue # 10223) yielding 170ng/μl of DNA. Sequencing used the KAPA HyperPrep PCR free library kits (Roche Catalogue # KK8503) and two lanes of HiSeq Xten 150 bp PE sequencing (Illumina) at the Ramaciotti Centre for Genomics (UNSW, Sydney). Additional sequencing using the 10X Chromium Genomics library prep with >50 kb size selection was performed using 2 lanes of HiSeq Xten (150 bp PE) (Illumina). The sample was accessioned to the Australia Museum Accession number AM M.56404.
Raw 10x data was assembled using Supernova v2.0.0 (RRID:SCR_016756) (Weisenfeld et al. 2017) with default settings and pseudohaplotype output. Haplotypes were tidied and filtered using Diploidocus v0.3.0 (RRID:SCR_021231) (Stuart et al. 2022), which filtered redundant and 100% unresolved scaffolds, and assigned the longest version of each scaffold to haplotype 1 [bettong.v1.0]. Scaffolds flagged as diploid (identical in both haplotypes) were then added back to haplotype 2 and each haplotype ran through Telociraptor v0.9.0 (https://github.com/slimsuite/telociraptor) to length-sort and rename scaffolds after modifying the ends to reveal telomeres, where appropriate [bettong.v1.1]. Scaffolds identified by Tiara v1.0.3 (Karlicki et al. 2022) as being from archaea, bacteria, prokarya or organelle were filtered, along with scaffolds flagged for exclusion or review by FCS-GX [bettong.v1.2] (Astashyn et al. 2023).
Completeness was estimated using Benchmarking Universal Single-Copy Orthologues (BUSCO, RRID:SCR_015008) v5.4.3 (Simao et al. 2015) using the mammalia_odb10 (n:9226) and vertebrata_odb10 datasets (n:3354).
Repetitive elements of the genome were identified, classified and masked using a Pawsey Supercomputing Centre Nimbus cloud machine (256GB RAM, 64 vCPU, 3 TB storage) by building a database using RepeatModeler v2.0.1 (RRID:SCR_015027) (Flynn et al. 2020) with default settings; repeats were then masked using RepeatMasker v4.0.9 (RRID:SCR_012954) (Smit et al. 2013-2015).
A homology-based annotation was created independently for each haplotype using GeMoMa v1.9 (RRID:SCR_012954) (Keilwagen et al. 2019) using the annotation from nine Ensembl mammalian genomes (cow [Bos taurus], human [Homo sapiens], opossum [Monodelphis domestica], mouse [Mus musculus], Tammar wallaby [Macropus eugenii], platypus [Ornithorhynchus anatinus], koala [Phascolarctos cinereus], Tasmanian devil [Sarcophilus harrisii], wombat [Vombatus ursinus]) ( Table 1) and default settings. SAAGA v0.7.9 (https://github.com/slimsuite/saaga.git) was used to map annotated proteins onto a combined dataset of SwissProt (Edwards and Palopoli 2015, UniProt 2023) and Quest for Orthologues reference proteomes (Nevers et al. 2022) to add descriptions and extract the longest isoform per gene for completeness estimation using BUSCO v.5.4.3 (RRID:SCR_015008) in protein mode against the vertebrata_obd10 (n:3354) and mammalia_obd10 lineages (n:9226) (Simao et al. 2015).
Common name | Scientific name | Assembly ID | Reference |
---|---|---|---|
Cow | Bos taurus | ARS-UCD1.2 | (Rosen et al. 2020) |
Human | Homo sapiens | GRCh38.p13 | |
Opossum | Monodelphis domestica | ASM229v1 | (Mikkelsen et al. 2007) |
Mouse | Mus musculus | GRCm39 | |
Tammar wallaby | Notamacropus eugenii | Meug_1.0 | (Renfree et al. 2011) |
Platypus | Ornithorhynchus anatinus | mOrnAna1.p.v.a | (Zhou et al. 2021) |
Koala | Phascolarctos cinereus | phaCin_unsw_v4.1 | (Johnson et al. 2018) |
Tasmanian devil | Sarcophilus harrisii | mSarHar1.11 | (Stammnitz et al. 2023) |
Wombat | Vombatus ursinus | bare-nosed_wombat_genome_assembly |
The ‘genestats’ script (https://github.com/darencard/GenomeAnnotation) was used to obtain the average number of exons and introns and the average exon and intron length.
Sequencing generated 185M reads of short read data and 149M reads of 10× Genomics data. Genome assembly with Supernova estimated a 3.79 Gb genome size (46.88X raw coverage) and assembled a 3.57 Gb genome in 38,249 scaffolds (scaffold N50=2.77 Mb) (Silver 2024). Following Diploidocus cleanup, there were 27,408 primary scaffolds (3.46 Gb) with 1,681 alternative scaffolds (3.01 Gb). Telociraptor made five inversions and trimmed one contig. Contamination removal filtered 786 scaffolds (2.00 Mb) from each haplotype. This gave a final genome size of 3.46 Gb with 26,623 scaffolds ( Table 2). The genome size is comparable to that of other marsupial genomes, including that of the closely related woylie (Bettongia penicillate ogilbyi) (Haouchar et al. 2016, Peel et al. 2021). BUSCO completeness of the final genome was 92.2% for mammalia_odb10 and over 96.9% for vertebrata_odb10 (96.8% for haplotype two) ( Table 2). Whilst the BUSCO scores suggest a highly complete genome, the use of long read data such as PacBio or Oxford Nanopore would assist in increasing the contiguity of the assembly. Additionally, 53.08% of the genome was identified as repeats, which is similar to other marsupials, including the closely related woylie (53.05%) (Peel et al. 2021) ( Table 3).
Genome annotation with GeMoMa predicted 36,068 and 36,015 protein coding genes for haplotype one and two, respectively, which is a large over-estimation with other marsupials having around 20,000 protein coding genes (Johnson et al. 2018, Brandies et al. 2020). In the future, generating transcriptomes for a variety of eastern bettong tissues would likely improve the accuracy of the annotation. The annotation is highly complete with 94.1% of mammalian protein BUSCOs complete ( Table 4). The average protein length was 384.3 and 384.7 amino acids for haplotype one and two, respectively, with an average of 6.52 exons per gene ( Table 4). On average, predicted proteins were 87.8% the length of their best SwissProt/QFO hit, suggesting some fragmentation of the annotation, which might be inflating the numbers of annotated genes.
The raw data are publicly available through the Bioplatforms Australia Oz Mammals Genomes: https://data.bioplatforms.com/organization/bpa-omg . The assembled and annotated genome herein is hosted on the Australasian Genomes site (https://awgg-lab.github.io/australasiangenomes/) in addition to NCBI.
Raw genome sequences are available on:
NCBI’s Short Read Archive (SRA): Raw DNA data for generation of genome. SRX26311185 and SRX26311186 (https://www.ncbi.nlm.nih.gov/sra/PRJNA1095660) (Silver et al. 2024).
The data produced as part of this study are stored on NCBI under BioProjects PRJNA1095660 (Silver et al. 2024). Databases of molecular data on the NCBI Web site include such examples as nucleotide sequences (GenBank), protein sequences, macromolecular structures, molecular variation, gene expression, and mapping data. They are designed to provide and encourage access within the scientific community to sources of current and comprehensive information. Therefore, NCBI itself places no restrictions on the use or distribution of the data contained therein.
Figshare: ARRIVE checklist for A reference genome for the eastern bettong (Bettongia gaimardi), DOI: https://doi.org/10.6084/m9.figshare.27144360.v1 (Silver 2024).
The project contains the following reporting guidelines:
Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).
The authors would like to acknowledge computational resource support from: Galaxy Australia, a service provided by the Australian Biocommons and its partners; the University of Sydney’s High Performance Computing facility Artemis provided by the Sydney Informatics Hub; the University of New South Wales Katana High Performance Computing (doi:10.26190/669X-A286). Support for DNA sequencing was provided through the Oz Mammals Genomics (OMG) Initiative consortium (https://ozmammalsgenomics.com/consortium/), which was funded by Bioplatforms Australia through the Australian Government National Collaborative Research Infrastructure Strategy.
The eastern bettong reintroduction was conducted as part of and with support of the “Mulligans Flat – Goorooyarroo Woodland Experiment” (https://www.coexistenceconservationlab.org/mulligans-flat-goorooyarroo-woodland-experiment). Thanks to the ACT Government and Woodlands and Wetlands Trust and their staff for their support for the eastern bettong reintroduction project at Mulligans Flat Woodland Sanctuary. Thanks to Brittany Brocket for the initial DNA extraction.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: molecular phylogentics, phylogeography, (molecular) taxonomy, conservation genetics
Are the rationale for sequencing the genome and the species significance clearly described?
Yes
Are the protocols appropriate and is the work technically sound?
Partly
Are sufficient details of the sequencing and extraction, software used, and materials provided to allow replication by others?
Yes
Are the datasets clearly presented in a usable and accessible format, and the assembly and annotation available in an appropriate subject-specific repository?
Partly
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: molecular phylogentics, phylogeography, (molecular) taxonomy, conservation genetics
Are the rationale for sequencing the genome and the species significance clearly described?
Yes
Are the protocols appropriate and is the work technically sound?
Yes
Are sufficient details of the sequencing and extraction, software used, and materials provided to allow replication by others?
Partly
Are the datasets clearly presented in a usable and accessible format, and the assembly and annotation available in an appropriate subject-specific repository?
Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Genomics, phylogenetics, mammalogy
Are the rationale for sequencing the genome and the species significance clearly described?
Yes
Are the protocols appropriate and is the work technically sound?
Yes
Are sufficient details of the sequencing and extraction, software used, and materials provided to allow replication by others?
Partly
Are the datasets clearly presented in a usable and accessible format, and the assembly and annotation available in an appropriate subject-specific repository?
Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Population genetics, molecular evolution, genomics
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | |||
---|---|---|---|
1 | 2 | 3 | |
Version 2 (revision) 27 Jan 25 |
read | ||
Version 1 20 Dec 24 |
read | read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)