An updated version of the Madagascar periwinkle genome

The Madagascar periwinkle, Catharanthus roseus, belongs to the Apocynaceae family. This medicinal plant, endemic to Madagascar, produces many important drugs including the monoterpene indole alkaloids (MIA) vincristine and vinblastine used to treat cancer worldwide. Here, we provide a new version of the C. roseus genome sequence obtained through the combination of Oxford Nanopore Technologies long-reads and Illumina short-reads. This more contiguous assembly consists of 173 scaffolds with a total length of 581.128 Mb and an N50 of 12.241 Mb. Using publicly available RNAseq data, 21,061 protein coding genes were predicted and functionally annotated. A total of 42.87% of the genome was annotated as transposable elements, most of them being long-terminal repeats. Together with the increasing access to MIA-producing plant genomes, this updated version should ease evolutionary studies leading to a better understanding of MIA biosynthetic pathway evolution.


Introduction
The Madagascar periwinkle, Catharanthus roseus (L.) G. Don, is an Apocynaceae plant native to Madagascar. C. roseus produces several specialized metabolites including monoterpene indole alkaloids (MIA;O'Connor and Maresh, 2006). These molecules are produced by plants to face biotic and abiotic pressures accounting for their wide range of bioactive properties (Dugé de Bernonville et al., 2015). Above all, MIAs produced by C. roseus are well-known for being part of the human pharmacopoeia against cancer, such as the well-known vinblastine and vincristine, and other MIA derivatives, including vinorelbine (O'Connor and Maresh, 2006). Due to its high economic importance, C. roseus has extensively been studied within the last three decades becoming the model species for MIA biosynthetic pathway studies (see Pan et al., 2016 andKulagina et al., 2022 for extensive review). C. roseus genome was firstly sequenced in 2015 (Kellner et al., 2015). Recently, a more contiguous version (v2) was generated to ease inter-species genomic comparison (Franke et al., 2019). To date, C. roseus genome sequencing and assembly did not benefit from the development of third generation sequencing technologies that lead to more contiguous genome (Jiao and Schneeberger, 2017). Thanks to these new technologies, we present here an even more contiguous genome assembly. This updated version (v2.1) should ease inter-species studies in order to better understand the diversification of MIAs and the evolution of their biosynthetic pathways.

Results
Genome assembly C. roseus genome was assembled from ONT long-reads using Flye (v.2.5) resulting in a 651.9 Mb assembly distributed across 788 contigs. This assembly was collapsed using purge_haplotigs into 173 scaffolds reducing length to 585, Gene annotation RNA-seq based gene model prediction using publicly available data resulted in a total of 21,061 genes. Despite less genes were annotated; a higher BUSCO score was obtained (Figure 1). The combination of BLASTP and BLASTX against  UniProt database and hmmscan against the PFAM database led to the functional annotation of 76.5% of the predicted genes (16,118 of the 21,062 genes, Supplementary Table S1 in Underlying data (Cuello et al., 2022)). All functionally validated MIA biosynthetic genes from C. roseus could be found in this new version v.2.1 of the genome with identity and coverage percentage ranging from 95 to 100% and 94 to 100%, respectively, with the exception of G10H and DAT (Supplementary Table S2-S3 in Underlying data (Cuello et al., 2022)).

Transposable element annotation
Finally, we analyzed TE composition of this updated C. roseus genome. While 38.78% of the genome consisted in TE in C. roseus v.2, a higher proportion (42.87%) was annotated as TE in this new version (v.2.1) with similar distribution across the different TE families (Figure 2). It is worth noting that TE proportion of this v.2.1 is closer to the one in its recently sequenced closely related species Vinca minor (Stander et al., 2022).

Amit Rai
Graduate School of Pharmaceutical Sciences, Chiba University, Chiba, Japan In the presented article, authors reported an updated version of genome assembly for Madagascar periwinkle, which is a valuable model plant species to study MIA biosynthesis. Compared to the previously published genome assemblies for Madagascar periwinkle, this study used long read sequencing technology and achieved an improvement in terms of contig N50.
Without a doubt, this is a better genome assembly, but authors should have considered scaffolding through HiC to achieve a chromosome-scale genome assembly as that would have allowed them to discover novel features contributing MIA biosynthesis and evolution.
Nevertheless, the resource presented here is valuable, and will inspire researchers to combine the generated datasets in this study with new sequencing data to derive a chromosome-scale genome assembly for C. roseus. For these reasons, I support its indexing.
Are the rationale for sequencing the genome and the species significance clearly described? Yes

Are the protocols appropriate and is the work technically sound? Yes
Are sufficient details of the sequencing and extraction, software used, and materials provided to allow replication by others? Yes Are the datasets clearly presented in a usable and accessible format, and the assembly and annotation available in an appropriate subject-specific repository? Yes I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias • You can publish traditional articles, null/negative results, case reports, data notes and more • The peer review process is transparent and collaborative • Your article is indexed in PubMed after passing peer review • Dedicated customer support at every stage • For pre-submission enquiries, contact research@f1000.com