Genomic profiles of Indonesian colorectal cancer patients

Background: Colorectal cancer (CRC) is one of the most commonly diagnosed cancers worldwide and genetic mutation plays a vital role in CRC development. A previous study has suggested that genetic alterations among Indonesian patients with CRC might differ from those known in developed countries. This study aimed to describe the genomic profiles of Indonesian patients with CRC. Methods: A total of 13 patients were recruited for this study from May to July 2019. Tissue samples were collected, and genomic DNA was extracted from the samples. AmpliSeq for Illumina Cancer HotSpot Panel v2 Next-generation sequencing was used for DNA sequencing and a genome analysis toolkit was used for local realignment around the discovered variants. Results: A total of 45 genes comprising 391 single nucleotide variants (SNVs) with a depth >10 were observed. The genes with the most variants were STK11, SMAD4, EGFR, and ERBB4 and the genes with the most non-synonymous variants were SMAD4, TP53, FGFR3, CDKN2A, and STK11. Genes and SNVs in at least 90% of all samples consisted of 43 genes comprising 286 variants. Genes with the most non-synonymous SNVs were EGFR, SMO, FGFR3, TP53, STK11, CDKN2A. Genes related to the chromosomal instability pathway, such as TP53, SMAD4, KRAS, and APC, are also found in the analysis. Conclusions: Our findings showed that all patients with CRC in this study had genetic mutations in the chromosomal instability pathway. Analysis of genetic mutation of Indonesian patients with CRC might be crucial for advanced targeted therapy and for better clinical outcomes.


Introduction
Colorectal cancer (CRC) is one of the leading causes of cancer-related mortality worldwide. CRC is the fourth most commonly diagnosed cancer and the third most deadly cancer in the world for both sexes. 1 CRC has been responsible for 881,000 (9.2%) cases of cancer-related mortality worldwide. In Indonesia, the World Health Organization (WHO) placed CRC fourth in terms of the highest mortality burden caused by malignancies. The incidence and mortality rates of CRC have rapidly increased over the past decades due to environmental changes, such as sedentary lifestyles and increased lifespan. Several studies have shown that the 5-year survival rate of patients with CRC has remained at approximately 60% in the last decade. 2,3 There are some reported differences in CRC characteristics between Western and other populations. The prevalence of CRC in the Western population under the age of 50 years is around 2-8%. 4,5 Sehbai et al. reported that the incidence of CRC in Asian Indian and Pakistani populations under the age of 50 years was higher than that of white populations in the United States of America (USA). 6 Epidemiological data in Indonesia also showed that the proportion of patients with CRC under 40 years old was more than 30%. Other studies in developed countries have found that young-onset CRC is often associated with family history. However, in a previous study among young Indonesian patients with CRC, there was no positive family history. 7 Early onset CRC in developed countries showed several characteristics, such as localization in the ascending colon, low pathological stage, rare metastasis, and a better prognosis. By contrast, most young patients with CRC in the Indonesian population showed distal localization (rectum), a high clinical population, and poor prognosis. 8 Genomic instability, which allows the accumulation of numerous genetic mutations, is essential for CRC development. There are three pathways of genomic instability in CRC. The first pathway is the chromosomal instability (CIN) pathway, which consists of several gene mutations, including those in APC, KRAS, SMAD4, and TP53. The second pathway is the microsatellite instability (MSI) pathway, which is caused by defects in the nucleotide mismatch repair (MMR) mechanism and is represented by mutations in MSH2, MLH1, MSH3, PMS1, and PMS2. The third pathway is the inflammatory pathway, which involves the expression and activation of nuclear factor kappa-B (NF-κB) and COX-2. 9 In developed countries, the CIN pathway is conventionally found among sporadic CRCs, whereas the MSI pathway is found among younger patients with CRC. A study in Indonesia indicated that young Indonesian patients with CRC have defects in the DNA MMR system, which may promote MSI. However, when tested using BAT26, a surrogate marker of MSI, the frequency of MSI was very low and consistent with sporadic cancer features. Further testing of the same specimens using SMAD4 protein expression confirmed that neither young nor older Indonesian patients with CRC exhibited the MSI pathway. Another study in Indonesia also supported the hypothesis that inflammation may play a role in Indonesian patients with CRC. These results indicate that the molecular characteristics of patients with CRC in Indonesia may differ, anchored by pathways different from those previously found in the developed countries. 8,10 These highly heterogeneous results among populations require further investigation. The characterization of molecular subtypes, particularly among Indonesian patients with CRC, will lead to improved treatment selection and outcomes, such as molecularly targeted agents, often called precision medicine. In this study, we investigated the genomic profiles of patients with CRC in Indonesia using next-generation sequencing (NGS) analysis. NGS enables the identification of various genetic mutations that might be used further in the new era of targeted therapy among patients with CRC. Patients and clinical specimens A total of 13 patients with CRC undergoing surgical resection of primary tumors at Cipto Mangunkusumo National General Hospital, Indonesia, were consecutively recruited from May to July 2019. Clinical data, including gender, age,

REVISED Amendments from Version 1
This revised version includes the early and late onset proportion of CRC in the Results section, and the discussion now has a few additional comparisons regarding the most common mutated genes in our findings and population outside Indonesia (namely Japan, China, USA, and European and African descendants).
Any further responses from the reviewers can be found at the end of the article cancer location, metastasis, and staging, were recorded from a structural questionnaire and histopathological results. Tissues were collected and separated for the determination of the clinical stage of cancer by histopathologists and for specimen collection. The tissues were then stored with 10% fetal bovine serum in Dulbecco's Modified Eagle's Medium (DMEM) with 1% antibiotics containing penicillin and streptomycin in liquid nitrogen until the DNA extraction process was performed.
DNA extraction and quality control DNA was extracted from the tissue using Quick-DNA TM Mini prep plus kit (Zymo Research) following the manufacturer's protocol. Nucleic acid quantity and purity were assessed using a Qubit fluorometer (Invitrogen, Carlsbad, CA, USA) and a Nanodrop spectrophotometer (Invitrogen), respectively. Only samples with a concentration of 1.04-5.5 ng/μL with purity range of 1.7-1.9 passed the assessment and were subjected to sequencing.
Amplicon sequencing and variant discovery Sequencing libraries were generated using AmpliSeq for Illumina Cancer HotSpot Panel v2 following the manufacturer's protocol, and 2 Â 150 bp paired-end sequencing was performed on the Illumina MiSeq system. The variant discovery was performed following the GATK best practice. Briefly, the sequencing reads of each sample were aligned to the human reference genome GRCh38 (hg38) with Burrows-Wheeler Aligner version 1.61 (BWA, RRID:SCR_010910). As the AmpliSeq protocol was short amplicon sequencing, the PCR deduplication step was skipped. GATK v4 (GATK, RRID: SCR_001876) was used for local realignment around the variants. Variants calling of both single-nucleotide variants (SNVs) and indels were performed using the HaplotypeCaller tool from GATK, and the candidate variants were annotated using snpEff with the GRCh38.86 dataset. The following variants were defined as non-synonymous SNVs: stop-gain SNVs, stop-loss SNVs, and frameshift indels. Non-coding SNVs were defined as variants in the non-proteincoding regions of the genes, such as introns or untranslated regions.

Results
Thirteen samples from colorectal patients were sequenced using AmpliSeq for Illumina Cancer HotSpot Panel v2. Table 1 shows the clinicopathological characteristics of the patients in this study. Among these patients, nine were male (69.2%) and four were female (30.8%), with the highest proportion of patients are aged 55-59 (23.1%). Early onset CRC defined   Based on the staging, we found that two patients, six patients, two patients, and three patients were on stage I, II, III, and IV, respectively. Among all the patients, three who were eventually on stage IV had liver metastasis. No metastasis was found in the other 10 patients (particularly for patients with stages I, II, and III). Figure 1 shows all the SNVs that occurred in every sample with a depth > 10. A total of 45 genes comprising 391 variants were observed. The genes with the most variants observed were ERBB4 (31 SNVs), EGFR (29 SNVs), SMAD4 (29 SNVs), and STK11 (26 SNVs). Genes with the most non-synonymous variants were STK11, CDKN2A, FGFR3, TP53, and SMAD4 with 21, 19, 15, 14, and 12 SNVs, respectively. Figure 2 shows the heatmap of non-synonymous variants observed in each gene for each sample in this study.   14 The increasing dietary factors such as long-term consumption of alcohol and processed meat, lack of exercise, and obesity appear to be the possible causes for this increasing incidence. 15 Furthermore, urbanization and pollution are also associated with the overall increase in cancer incidence. 16 According to US data registries from 2009 to 2013, the lower gastrointestinal tract cancer distribution was more frequent in the proximal colon (proximal and including the splenic flexure, 41%), followed by the rectum (28%), distal colon (descending and sigmoid, 22%), and 8% in other sites. 17 It seems that our patients with CRC have different common locations of cancer. Tumors and cancers in the proximal colon and distal colon have several morphological and genetic differences. Flat sessile serrated adenomas and cancers are more common in the proximal colon than polypoid adenomas and cancers, which are more common in the distal colon. Distal colon tumors also more commonly present with chromosomally unstable tumors, 18,19 which is consistent with our finding that all subjects had genetic characteristics of mutation in the genes of the CIN pathway.
The next-generation sequencing (NGS) approach has been shown to be effective and accurate in determining the targeted therapy for cancer, including colorectal cancer. 20 This study focused on genetic mutations in Indonesian patients with CRC. We used Illumina Cancer HotSpot Panel, which covered 50 genes attributable to cancer. Our findings differed slightly from the data provided by the American Cancer Society (ACS), in which the CRC stage distribution among Asian/Pacific Inlander population was 37% local, 37% regional, 20% distant, and 7% unstaged. 21 We found more local and distant stage CRC, and fewer regional stage CRC. Halpern et al., who investigated factors related to colon cancer stage at diagnosis, found that advanced-stage disease at diagnosis was common among uninsured patients, black patients, women, and patients from low socioeconomic status regions. Screening disparities may also lead to more advanced-stage colon cancer at diagnosis. 22 In this study, we found that KIT, KDR, TP53, ERBB4, APC, RET, and FLT3KI, which correlated with CRC development, were among the genes with substantial mutations in all 13 patients. These mutations were predicted to be somatic due to the absence of a family history of CRC among our patients. KIT is a classic proto-oncogene and receptor tyrosine kinase that is activated through the PI3K, RAS, and JAK/STAT pathways. 23 These pathways are involved in tumor cell proliferation. KIT signaling is activated by the binding of its ligand, the stem cell factor (SCF) protein, which is activated by a phosphorylation cascade, resulting in the regulation of cell growth. 24 KDR is a gene that plays a role in stimulating blood vessel permeability and dilatation. Several studies have shown that KDR is a therapeutic biomarker that can be targeted by tyrosine kinase inhibitors. KDR also plays an important role in VEGF signaling, stimulating proliferation, chemotaxis, survival, and differentiation of endothelial cells. 25 We also observed a mutation in the TP53 gene. This gene is a tumor-suppressor gene and is associated with the progression of sporadic CRC. TP53 has many functions, such as DNA repair and cell cycle arrest, and it can trigger apoptosis when the damage is too severe. This gene mutation is associated with a poor prognosis due to the activation of the oncogenic and inflammatory pathways, which can accelerate CRC progression to later stages. 26 Activation of APC is a key process in β-catenin complex destruction. APC mutation leads to the accumulation of β-catenin protein in the cytoplasm and can promote the proliferation, migration, invasion, and metastasis of cancer cells. This gene mutation is found in 90% of patients with CRC. 27 ERBB4 is a member of the tyrosine kinase and EGFR sub-family, which promotes colonocyte survival. Activation of this gene can promote cellular responses, including proliferation, differentiation, apoptosis, survival, and migration of tumor cells. ERBB4 alteration is an early step in tumorigenesis, although the mechanism remains incompletely understood. 28 RET is a proto-oncogene that encodes transmembrane receptors with the tyrosine-protein kinase domain. The main function of RET is to induce apoptosis in cells through the regulation of several signaling pathways. 29 Mutation of RET results in kinase activation, which induces downstream signaling pathways such as PI3K, leading to tumor growth and cell survival. 30 FLT3KI is a gene that encodes a class III receptor tyrosine kinase that regulates hematopoiesis. This somatic mutation is commonly observed in patients with acute myeloid leukemia. Mutation of this gene leads to the activation of the FLT3 receptor tyrosine kinase and the proliferation of cells in vitro. 31 Unfortunately, the amplicon panel used in this study only covered genes related to the microsatellite instability pathway, MLH1. Similarly, genes related to inflammatory pathways were absent from this panel. Nevertheless, our findings clearly showed that all the patients with CRC presented here had genetic characteristics of mutation in the genes of the CIN pathway. We found mutations in KIT, KDR, TP53, ERBB4, APC, RET, and FLT3KI. This finding is different from our previous results, which showed COX2 expression among 49% of patients with CRC, NF-kB expression in 73.5% of the patients, and KRAS gene expression in only 16.3% of them. 9 TP53 and APC gene mutation in also similarly high in other countries such as Japan and US, although there are significant differences in mutation types between the two countries. 32 In China, KRAS, APC, and TP53 genes are also commonly mutated in CRC cases, along with CHEK2, MDC1, GNAQ, and SMAD4. 33 Similarly, these genes are also known to be frequently mutated among patients of African and European descent. However, there are differences such as KRAS and APC that are more frequent in African descent, whereas BRAF is less frequent compared to European descent. 34 A study in Australia found that POLE and POLD1 genes are associated with an increased risk of hereditary CRC, especially those who have the carrier genes. 35 The small sample size was a limitation of this study and could have affected the interpretation of the obtained results. However, this research was a pilot study that provided the first overview of gene mutations in Indonesian patients with CRC, due to the unavailability of data about Indonesian CRC gene mutations. The analysis of genetic mutations among patients with CRC might be important for future targeted therapy in CRC.

Ahmad Rusdan Handoyo Utomo
Graduate School of Biomedical Science, Universitas Yarsi, Central Jakarta, Jakarta, Indonesia I would suggest to update Table 2 to include a column specifying numbers of pathogenic mutation. For examples BRAF V600E is pathogenic variant of SNV found in this cohort. So is G12D KRAS is also pathogenic SNV. These pathogenic SNVs should be tabulated in a column because their presence are known to be useful to guide clinical decision.
undergoing surgical procedures. They found many genetic variants in many genes using Illumina NGS 50-gene panel. This is an interesting article providing preliminary results of genetic variants found using NGS. There are several issues that the authors need to address before the manuscript can be accepted for indexing.
The authors should cite previous papers describing mutations in KRAS genes (Levi et al., 2018 1 ) and MIN pathways (Susanti et al., 2021 2 ) in Indonesia. The authors may discuss these papers and compare the results with the current manuscript.
The methods should describe the percentage of tumor content of the specimens. This can be done by looking at the corresponding FFPE (formalin fixed paraffin embedded) block. This quality control check is to ensure that specimens being tested for NGS contained high percentage of tumor cells.
The study design should describe the purpose of the study more clearly. The authors had described that indonesian colorectal patients may have different genetic and clinicopathological profiles (age, tumor locations) from western patients. Therefore, they should describe in their paper the prevalence of genetic mutations in different categories such as age and tumor locations as well. The paper should also analyse what is the percentage of transition and transversion mutations in order to gain insight and speculate the roles of local diet and pollutants. For instance the KRAS tranversion mutation has been associated with smoking (Dogan et al., 2012 3 ). There is no need for statistical analysis. A descriptive percentage table should be sufficient.