DNA barcoding for the discrimination of Uncaria gambir and its closely related species using internal transcribed spacer genes [version 1; peer review: awaiting peer review]

Background: Uncaria gambir is one Uncaria species that exclusively grows in Indonesia. The phytochemical constituents of this species have been widely explored and its extracts are used as traditional medicine. However, the relationship between Uncaria gambir and other Uncaria species is still unknown. DNA barcoding was used in this study to reveal this relationship. Methods: Genomic DNA was isolated from four main cultivated variants of Uncaria gambir species in Indonesia. ITS primer was used to amplify the specific gene region. Genetic distance analysis was carried out on Uncaria gambir and 12 other Uncaria species. A phylogenetic tree was created to determine the relationship among Uncaria species using the maximum likelihood method. Results: The ITS primer successfully amplified the ITS region in Uncaria gambir. Genetic distance and phylogenetic tree analyses showed that Uncaria gambir has a close relationship with Uncaria scandens, Uncaria yunnanensis, and Uncaria macrophylla which is also indicated by Interspecific distance analysis. Conclusions: Although the DNA barcoding gap is absent in genetic distance analysis, the phylogenetic tree analysis from the ITS region can differentiate Uncaria gambir from other Uncaria species.


Introduction
Gambier, extracted from the twigs and leaves of Uncaria gambir, is traditionally used to treat various diseases in Indonesia, such as diarrhea, gastrointestinal diseases, burns, acne, and cancer (Musdja et al., 2018). Uncaria gambir is commonly distributed in southeast Asia tropical regions, mostly in Indonesia and Malaysia (Taniguchi et al., 2007). The Sumatera Barat region has become the biggest contributor of gambier with Udang, Cubadak, Riau, and Mancik being the main cultivated variants in Indonesia.
The identification of Uncaria gambir is still based on morphological characteristics and phytochemical constituents. These identifications are mainly affected by environmental conditions and conditions of the sample, thus resulting in a biased interpretation (Zhu et al., 2011). To address this problem, DNA barcoding can be a solution to discriminate the species by using short and standardized DNA fragments (Li et al., 2021). The Consortium for the Barcode of Life (CBOL) has approved chloroplast genes sequences, such as psbA-trnH, trnL-trnF, ycf1, and rpoC1, as potential DNA barcodes and suggests a combination of matK and rbcL genes as an alternative barcode for Embryophyta (Hollingsworth et al., 2009). In addition, internal transcribed spacer (ITS) genes (ribosomal region) have been proposed as supplemental barcodes for matK and rbcL (Hollingsworth et al., 2009).
The ITS gene is located between 18S and 28S genes in the nrDNA repeat unit and covers the ITS1 and ITS2 regions, connected by the 5.8S rRNA gene (Bellemain et al., 2010). Previous studies have used restriction fragment length polymorphism (RFLP) and random amplified polymorphic DNA (RAPD) to investigate catechin molecular markers in Uncaria gambir (Istino et al., 2013). Molecular identification using matK and rbcL genes gives a low amplification success rate and was inaccurate to discriminate among Uncaria gambir variants (Wardi et al., 2020). In a recent study, utilization of the ITS-2 gene only gives one species-specific site among Uncaria gambir variants and no genetic distance analysis was studied (Wardi et al., 2021b).
In this study, we investigated the utilization of the ITS gene to discriminate Uncaria gambir within the other 12 Uncaria species. Genetic distance analysis was carried out to provide relationship data between Uncaria gambir and the other 12 Uncaria species. This study is the first to provide an ITS gene sequence from Uncaria gambir.

DNA extraction, PCR amplification and sequencing
Six fresh leaves of four Uncaria gambir variants were collected from Siguntur, West Sumatera Province, Indonesia (1°05'36.8"S 100°28'27.0"E -1.093562, 100.474153). Morphology identification was conducted in the Herbarium of Andalas University. Harvested fresh leaves were frozen immediately in liquid nitrogen and was stored at -80°C for 24 hours prior to DNA isolation. DNA isolation was performed using the CTAB method (Allen et al., 2006). In total 300 mg of frozen leaf sample was ground and put into a 2 ml Eppendorf tube. 1 ml of 2x CTAB buffer and merchaptoethanol was added and vortexed until homogenized. The solution was incubated at 65°C for 30 minutes and inverted every 10 minutes. 500 µL of phenol: chloroform: isoamylalcohol mixture (25:24:1) was added and vortexed for 1 minute then centrifuged at 12,000 rpm for 10 minutes. The supernatant was transferred to a new 2 ml Eppendorf tube. 500 µL of chloroform mixture: isoamylalcohol (24:1) was added and vortexed for 10 minutes then centrifuged at 12,000 rpm for 10 minutes. The supernatant was transferred to a new 1.5 mL Eppendorf tube. Sodium acetate was added as much as 1/10 times the volume of the supernatant, then 1 ml of cold ethanol 99% was added and swirled for 1 minute. The solution was centrifuged at 12,000 rpm for 5 minutes, supernatant was removed and 500 µL of 70% ethanol was added. The solution was centrifuged at 12,000 rpm for 5 minutes, the supernatant was discarded and the DNA was dried in the oven. Finally 100 µL of 1xTE buffer was added to dried DNA and stored at -20°C.
The isolation and amplification results were visualized using electrophoresis with 1.5% agarose gel. The expected amplification product was sequenced by 1 st BASE DNA Sequencing Service (Apical Scientific Sdn Bhd) bidirectionally using the specific primer.
Sequence and distance analysis A total of 132 Uncaria ITS sequences were downloaded from National Center for Biotechnology Information (NCBI) and were aligned with Uncaria gambir using MEGA X software version 10.2.6 (Tamura et al., 2013). The intraspecific and interspecific distances were computed with Kimura 2-Parameter (K2P) using MEGA X software (Kimura, 1980) (Meier et al., 2006).

Phylogenetic analysis
The sequences were analyzed by the maximum likelihood method for phylogenetic tree construction. The maximum likelihood analyses, including 1000 nonparametric bootstrap replicates, was performed using MEGA X software under the Kimura 2-parameter (K2P)+G model. Nauclea orientalis and Nauclea diderrichi sequences that are relative to the Uncaria genus were downloaded from NCBI and used as an outgroup species.

Sequences and distance analysis
The primer ITS-u1 and ITS-u4 showed 100% amplification efficiency to Uncaria gambir. Sequences were deposited in the NCBI GenBank database under the accession numbers MZ927014, MZ927015, MZ927016, and MZ926993 (Table 1). The region's length was obtained from 644 to 646 bp.
The dataset used in this study includes 4 species of Uncaria gambir and 132 Uncaria sequences of the ITS region downloaded from the NCBI GenBank database (Extended data). This dataset contained 655 conserve sites, 68 variable sites, and 54 parsimony informative sites.
The mean of intraspecific distance (Table 2) and interspecific distance (

Phylogenetic analysis
The phylogenetic tree was constructed using 13 Uncaria species. Uncaria gambir were clustered from other species (Figure 1). The four variants of Uncaria gambir had a close relationship with Uncaria macrophylla MF033303.
The phylogenetic tree showed Uncaria gambir has a close relationship with Uncaria yunnanensis but not Uncaria scandens. In contrast, the interspecific distance value showed Uncaria gambir had a close relationship with Uncaria yunnanensis and Uncaria scandens.

Discussion
Uncaria is a member of the Rubiaceae family and consists of 34 species (Ridsdale, 1978) that are well known as medicinal plants (Heitzman et al., 2005). The method to identify Uncaria      (Liu et al., 2014;Rach et al., 2008). The phylogenetic tree was created with the maximum likelihood method. The advantage of this method is that it includes evolutionary models to account for the variation in the sequences compared with maximum parsimony and distance methods (Mount, 2008).

Conclusion
In conclusion, the phylogenetic tree ITS sequence can differentiate Uncaria gambir from other Uncaria species. In contrast, genetic distance analysis showed different results as the absence of a DNA barcoding gap means it is not reliable to differentiate between species.  (Wardi et al., 2021a) This project contains the following extended data:

Data availability
• Table_E1_GeneBank_Accession_Numbers.docx (This data contains accession numbers from 132 sequence of Uncaria species, and two accession numbers from Nauclea).
Data are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication).