The genome sequence of the critically endangered Kroombit tinkerfrog ( Taudactylus pleione)

The Kroombit tinkerfrog ( Taudactylus pleione) is a stream-dwelling amphibian of the Myobatrachidae family. It is listed as Critically Endangered and is at high risk of extinction due to chytridiomycosis. Here, we provide the first genome assembly of the evolutionarily distinct Taudactylus genus. We sequenced PacBio HiFi reads to assemble a high-quality long-read genome and identified the mitochondrial genome. We also generated a global transcriptome from a tadpole to improve gene annotation. The genome was 5.52 Gb in length and consisted of 4,196 contigs with a contig N50 of 8.853 Mb and an L50 of 153. This study provides the first genomic resources for the Kroombit tinkerfrog to assist in future phylogenetic, environmental DNA, conservation breeding, and disease susceptibility studies.


Introduction
The Kroombit tinkerfrog (Taudactylus pleione) is a stream-dwelling Anuran of the Myobatrachidae family.It is endemic to Queensland, Australia, with a distribution restricted to fragmented patches above 400m altitude on an isolated plateau in the Kroombit Tops temperate rainforest (Skerratt et al., 2016).The Kroombit tinkerfrog is listed as Critically Endangered by the International Union for the Conservation of Nature (IUCN) with less than 200 individuals estimated to remain in just a 19 km 2 area of occupancy (IUCN SSC Amphibian Specialist Group, 2022) and is the highest ranked frog species requiring management action in Australia (Gillespie et al., 2020).Threatening processes include infection by chytrid fungus Batrachochytrium dendrobatidis, habitat degradation due to agriculture, feral animals and plants, and fire (Hines, 2014).The Kroombit tinkerfrog was identified as one of seven Australian amphibians at high risk of extinction due to chytridiomycosis (Skerratt et al., 2016), and the fifth most likely frog to go extinct in an analysis of 26 Critically Endangered and Endangered Australian frogs (Geyle et al., 2022).A captive breeding program was established at Currumbin Wildlife Sanctuary in 2018, with the aim of releasing captive-bred tinkerfrogs back to the wild.
The Taudactylus genus is estimated to have diverged from other myobatrachids 65 million years ago, contributing to the high Evolutionary Distinctiveness and Global Endangerment (EDGE) score of 6.52 for the Kroombit tinkerfrog, which places it as the seventh highest EDGE amphibian (Zoological Society of London, 2020).However, there are currently no published reference genomes available for the Taudactylus genus.The Kroombit tinkerfrog is primarily nocturnal and is secretive, making it difficult to find (Clarke, 2006).Characterising the mitochondrial genome may therefore assist in efforts to develop environmental DNA (eDNA) approaches for monitoring the species in the wild using freshwater samples, as has been demonstrated in other endangered frog species (Eiler et al., 2018;Villacorta-Rath et al., 2021).Therefore, in this study we sequenced DNA and RNA to assemble the genome, mitogenome, and transcriptomes and provide the first genomic resources for the Kroombit tinkerfrog.

Sample collection and DNA/RNA extraction
Due to the critically endangered status of the Kroombit tinkerfrog, we did not lethally sample an adult.Instead, three tadpoles of unknown sex from the captive breeding program at Currumbin Wildlife Sanctuary were medically euthanised due to a failure to thrive, by immersion in 10 mL of 250 mg/L Tricaine MS222, buffered to pH 7 with sodium bicarbonate until cessation of a visibly detectable heartbeat, or in very small tadpoles, an absence of reflexes after prolonged immersion (University of Sydney Animal Research Authority 2021/1899).Tadpoles were then either flash frozen at -80°C or preserved in RNALater before being stored at -80°C.The tadpoles were skinned to avoid pigmentation issues that could impact sequencing.High molecular weight (HMW) DNA was extracted from the flash frozen tadpole tissue using the Nanobind Tissue Big DNA Kit v1.0 11/19 (Circulomics).A Qubit fluorometer was used to assess the concentration of DNA with the Qubit dsDNA BR assay kit (Thermo Fisher Scientific).RNA was extracted from the other two tadpoles preserved in RNALater, using the RNeasy Plus Mini Kit (Qiagen) with RNAse-free DNAse (Qiagen) digestion.Extractions were performed using tissue from the head, midsection, and tail of the tadpoles.Only tissues from one tadpole yielded acceptable quality RNA as determined by NanoDrop (Thermo Fisher Scientific), so were sequenced.

Library construction and sequencing
We first performed short-read sequencing to provide an estimate of genome size, which was previously unknown.HMW DNA underwent PCR-Free DNA Preparation and Illumina NovaSeq 150-bp paired end sequencing at the Australian Genome Research Facility, Melbourne, Australia.GenomeScope v1.0 (Vurture et al., 2017) estimated the haploid genome size at 3.1 Gb.As a result, HMW DNA was sent for PacBio HiFi library preparation with Pippin Prep and sequencing on three single molecule real-time (SMRT) cells of the PacBio Sequel II (Australian Genome Research Facility, Brisbane, Australia).Additional HMW DNA from the same tadpole was later sent for sequencing on a fourth SMRT cell after the initial assembly resulted in low coverage due to a larger than expected genome (see Results).

REVISED Amendments from Version 1
In this revision, we add context for the discrepancy between the short-read estimation of genome size (3.1 Gb) and the assembled genome size (5.519Gb).We hypothesise that the underestimation of genome size by short-read data was due to known limitations of short-read assemblies of repetitive regions of the genome.The Kroombit tinkerfrog genome was highly repetitive, with 63.35% of the total sequence annotated as repeat elements.We have also updated Figure 2 to be more easily readable.No other changes have been made.
Total RNA from the head, midsection, and tail of one tadpole was sequenced as 100 bp paired-end reads using Illumina NovaSeq 6000 with Illumina Stranded mRNA library preparation at the Ramaciotti Centre for Genomics (University of New South Wales, Sydney, Australia).

Transcriptome assembly
Transcriptome assembly was conducted on the University of Sydney High Performance Computer, Artemis.The raw transcriptome reads were quality assessed both prior to and after quality trimming with FastQC v0.11.8 (Andrews, 2010).The completeness of the global transcriptome was assessed using BUSCO v5.2.2 in 'transcriptome' mode with the vertebrata_odb10 lineage.

Genome annotation
Genome annotation was performed using FGENESH++ v7.2.2 (Softberry; (Solovyev et al., 2006)) on a Pawsey Supercomputing Centre Nimbus cloud machine (256 GB RAM, 64 vCPU, 3 TB storage) using the longest open reading frame predicted from the global transcriptome, non-mammalian settings, and optimised parameters supplied with the Xenopus (generic) gene-finding matrix.BUSCO v5.2.2 in 'protein' mode was used to assess the completeness of the annotation with the vertebrata_odb10 lineage.The 'genestats' script (GitHub) was used to obtain the average number of exons and introns, and average exon and intron length.

Genome size and assembly
Initial genome size prediction from the short-read sequencing data predicted a total haploid length of 3.1 Gb (Figure 1).The initial genome assembly using PacBio HiFi data from three SMRT cells yielded a genome of 5.59 Gb in length, comprising 9,966 contigs with a contig N50 of 2.401 Mb.We hypothesise that the short-read data underestimated genome size due to the highly repetitive nature of large amphibian genomes (Kosch et al., 2023) and the known limitations of short reads that are too short to span long repeats or may collapse repeats in the assembly (Wang et al., 2021).Coverage of the initial genome assembly was low (14Â) due to the underestimation of the genome size, so re-assembly with the addition of a fourth SMRT cell yielded a genome of 5.519 Gb, comprising 4,196 contigs and with an improved contig N50 of 8.853 Mb, and a coverage of 21Â (Table 1).The mitochondrial genome was 22,974 bp long and consisted of 38 genes, including 13 protein-coding genes, 2 rRNAs, and 23 tRNAs, with a GC content of 41.89% (Figure 2).A total of 14,448 predicted genes were used as evidence for genome annotation.Repetitive elements comprised 63.35% of the total genomic sequence, with 37.53% unclassified repeats (Table 2).A total of 70,371 genes were predicted from the annotation.This is likely to be an overestimate of the true number of protein-coding genes, expected to be within the range of 20,000 to 30,000 (Sun et al., 2020), possibly due to a lack of homology-based evidence for amphibians.There was an average of 4.9 exons (SE=0.03)and 3.9 introns (SE=0.03)per putative gene, with an average exon length of 340 bp (SE=16) and an average intron length of 7,187 bp (SE=220).The annotation had 84.1% complete BUSCOs [Single copy: 81.7%; Duplicated: 2.4%]; 9.1% fragmented BUSCOs and 6.8% missing BUSCOs.
In summary, we have generated a high-quality long-read draft annotated reference genome, mitogenome, and global transcriptome for the critically endangered Kroombit tinkerfrog, providing the first genome for the Taudactylus genus.

Ethical considerations
Tadpoles were sampled under the University of Sydney's Animal Research Authority (Ethics) 2021/1899.Samples were held at the laboratory under NSW Scientific Licence SL101204.The manuscript "The genome sequence of the critically endangered Kroombit tinkerfrog ( Taudactylus pleione)" presents the first genome assembly and annotation, and the mitogenome of an endangered frog from Australia.The methods used are appropriate and detailed in the manuscript.
The manuscript is an important contribution to the kroombit tinkerfrog conservation.Future transposable element and satellitome characterizations will improve genome annotation, since repetitive DNA accounts for a large fraction of amphibian genomes.
Regarding the mitogenome, I suggest the authors to calculate the AT/GC-skews for the genes, to improve the mitogenome characterization.Additionally, in figure 2, some tRNA gene names are incomplete, the intended amino acid information is missing.
Are the rationale for sequencing the genome and the species significance clearly described?Yes

Are the protocols appropriate and is the work technically sound? Yes
Are sufficient details of the sequencing and extraction, software used, and materials provided to allow replication by others?Yes Are the datasets clearly presented in a usable and accessible format, and the assembly and annotation available in an appropriate subject-specific repository?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Evolutionary biology; Cytogenomics; Fish genomics The article "The genome sequence of the critically endangered Kroombit tinkerfrog (Taudactylus pleione)" presents the first genome, mitogenome and transcriptome assemblies for the Kroombit tinkerfrog, a Critically Endangered species from Australia.The authors clearly explain how valuable this dataset is for the species' conservation and potential future evolutionary biology studies.
The methods used are appropriate and sufficiently detailed in the text.It would be great to have a more polished annotation, but the authors do mention that their number of predicted genes is likely an over estimation.
Overall, I believe that this article is an important contribution to amphibian genomics and I'm looking forward to seeing how the authors use this dataset in their future research.
Are the rationale for sequencing the genome and the species significance clearly described?Yes

Are the protocols appropriate and is the work technically sound? Yes
Are sufficient details of the sequencing and extraction, software used, and materials provided to allow replication by others?Yes Are the datasets clearly presented in a usable and accessible format, and the assembly and annotation available in an appropriate subject-specific repository?Yes Competing Interests: No competing interests were disclosed.

Natalie Forsdick
School of Biological Sciences, University of Canterbury, Christchurch, Canterbury, New Zealand The manuscript 'The genome sequence of the critically endangered Kroombit tinkerfrog ( Taudactylus pleione)' presents a brief description of the genome assembly and annotation, and mitogenome assembly for a frog of high conservation value.The methods used are clearly described and appropriate, and produced good results in terms of genome contiguity and annotation completeness.I look forward to seeing these resources used to support conservation in the future, including through non-invasive monitoring using eDNA methods.
My only query is whether any attempt at flow cytometry had been considered to assess genome size?Is the discrepancy between the GenomeScope estimate based on short read data and the final assembly size purely the result of a high proportion of repetitive elements?I recommend that Figure 2 be replaced with a higher resolution version, as it is currently quite grainy, making it difficult to read the smaller text.
Are the rationale for sequencing the genome and the species significance clearly described?Yes The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias • You can publish traditional articles, null/negative results, case reports, data notes and more • The peer review process is transparent and collaborative • Your article is indexed in PubMed after passing peer review • Dedicated customer support at every stage • For pre-submission enquiries, contact research@f1000.com Trimmomatic v0.39 (Bolger et al., 2014)  was used to quality trim reads specifying TruSeq3-PE adapters, SLIDING-WINDOW:4:5, LEADING:5, TRAILING:5 and MINLEN:25.The repeat-masked genome was indexed and reads aligned with HiSat2 v2.1.0(Kim et al., 2019).Resulting SAM files were converted to a coordinate-sorted BAM format with SamTools v1.9 view and sort.StringTie v2.1.6(Pertea et al., 2015) generated a GTF for each transcriptome.The aligned RNAseq reads were then merged into transcripts and filtered to remove transcripts found in only one tissue with FPKM < 0.1, using TAMA-merge v2020/12/17(Kuo et al., 2020) and CPC2 v2019-11-19 (Kang et al., 2017).TransDecoder v2.0.1 (Haas, 2022) was used to predict open reading frames in the resulting global transcriptome.

Figure 1 .
Figure 1.GenomeScope profile based on the Illumina short-read sequencing data.The total length of the genome sequence was estimated at 3.119 Gb.