Mutations of the CHEK2 gene in patients with cancer and their

Checkpoint Kinase 2) encodes CHK2, a Background: CHEK2 serine/threonine kinase involved in maintaining the G1/S and G2/M checkpoints and repair of double-strand DNA breaks via homologous recombination. Functions of CHK2 include the prevention of damaged cells from going through the cell cycle or proliferating and the maintenance of chromosomal stability. mutations have been reported in a variety of CHEK2 cancers including glioblastoma, ovarian, prostate, colorectal, gastric, thyroid, and lung cancer in studies performed mainly in White populations. The most studied mutation in is c.1100delC, which was associated CHEK2 with increased risk of breast cancer. The objective of this study was to compile mutations in identified in cancer genomics studies in CHEK2 different populations and especially in Latin American individuals. A revision of cancer genomics data repositories and a profound Methods: literature review of Latin American studies was performed. Mutations with predicted high impact in were reported in Results: CHEK2 studies from Australia, Japan, United States, among other countries. The TCGA cancer types with most mutations in were breast, colorectal, CHEK2 and non-small cell lung cancer. The most common mutation found was E321* in three patients with uterine cancer. In Latin American individuals nine mutations were found in melanoma, lymphoma, and head and neck cohorts from TCGA and ICGC. Latin American studies have been restricted to breast and colorectal cancer and only two mutations out of four that have been interrogated in this population were identified, namely c.1100delC and c.349A>G. This study presents a compilation of mutations in Conclusions: CHEK2 with high impact in different cancer types in White, Hispanic and other populations. We also show the necessity of screening mutations in CHEK2 Latin American in cancer types different than breast and colorectal.

In mammalian cells, ATM activates CHK2 in response to ionizing radiation through phosphorylation.This leads to a variety of cellular responses, such as cell cycle checkpoint activation 2 , where CHK2 is involved in maintaining the G1/S and G2/M checkpoints by phosphorylation of CDC25A, CDC25C and p53 3 and in the repair of double-strand DNA breaks via homologous recombination (HR) through phosphorylation of BRCA1 4 and BRCA2 5 .CHK2 is also involved in the induction of p53-dependent apoptosis through phosphorylation of p53 on Ser20 6 , and, in a p53-independent manner, via phosphorylation of PML and E2F1 3 .These responses prevent damaged cells from going through the cell cycle or proliferating.CHK2 also plays an important role during mitosis by maintaining chromosomal stability 7 .
CHEK2 c.1000delC, a truncating mutation in exon 10 that abolishes kinase activity of the protein, was the first mutation being reported for this gene and was found in a woman with breast cancer and family history of Li-Fraumeni syndrome-2 8 .The role of this mutation in breast cancer was confirmed by Meijers-Heijboer et al. 9 and in several other studies [10][11][12][13][14][15][16][17][18][19][20][21][22] .Based on these studies, CHEK2 has been proposed as a moderate penetrance breast cancer susceptibility gene 9 and mutations in this gene are associated with almost a 3-fold increase in the risk of breast cancer in women and a 10-fold increase in the risk of breast cancer in men 23 .
Given the role of CHEK2 in maintaining genomic stability and the fact that the CHEK2 protein is expressed in a wide range of tissues, it was not surprising that alterations in this protein were found in other cancers, including glioblastoma, ovarian, prostate, colorectal, gastric, thyroid, and lung cancer 18,[24][25][26][27][28] .The studies in CHEK2 included individuals mainly from the United States and Europe while Latin American individuals were underrepresented.In order to infer the role of the CHEK2 gene in the cancer etiology in the Latin American population we compiled mutations in the CHEK2 gene registered in genomics data repositories and the literature, that had been reported in this population.
ICGC, the cBioportal and ExAc use prediction tools to assess functional impact of non-synonymous (SO term: missense_variant) somatic mutations on protein coding genes.ICGC uses FatHMM (http://fathmm.biocompute.org.uk/) 34, Mutation Assessor (RRID: SCR_005762) 35 and SIFT (RID:SCR_012813) 36 to compute functional impact scores and assign impact categories (High, Medium, Low and Unknown).The cBioPortal uses Mutation Assessor and reports the same impact categories.We used those functional impact categories to filter the mutations and extract possible pathogenic mutations by selecting only high and medium impact mutations and nonsense alterations.The percentage of mutations in CHEK2 per cancer study and the percentage of cases altered per cancer type was also calculated.The filter used for the ExAC information was based on the annotation of possible damaging and deleterious mutations made by two in silico tools: Polyphen2 (RID:SCR_013200) 37 and SIFT 36 .The assessment of stop gained, splice site disrupting and frameshift variants was made through Loss of Function Transcript Effect Estimator (LOFTEE), a plugin of the Ensembl Variant Effect Predictor (VEP) (RRID SCR_007931) 38 .The Latino annotation was examined in the databases that reported ethnicity data; this search was done before filtering the datasets, with the purpose to report all genetic alterations found in Latin American populations.

Literature review of Latin American studies
In order to include all the studies identifying CHEK2 gene mutations in Latin America, a deep search of literature was conducted by using the terms "CHEK2", "CHEK2 Latin America", and "CHEK2 cancer" in electronic academic literature search engines.PUBMED (RRID:SCR_004846) was the relevant database used followed by Google Scholar (RRID:SCR_008878).References of the retrieved articles were also screened for relevant studies.This search strategy was performed iteratively up to and including 10 October 2016.

Results
The complete list of mutations in CHEK2 reported in the cBioPortal and ICGC, before applying filters, are available in Dataset 1 and Dataset 2, respectively.CHEK2 mutations in the data genomics repositories cBioPortal.The available data sets consisted of 147 studies that included only cancer samples.Mutations in CHEK2 were reported in 39 out of the 147 studies.Before applying filters, cholangiocarcinoma (8.6%), uterine carcinosarcoma (7.0%), and colorectal adenocarcinoma (6.9%) were the types of cancer that showed the higher number of cases (Figure 1); meanwhile, breast, colorectal and non-small cell lung cancer (NSCLC) had more mutations in CHEK2 than other cancer types (Figure 2).Using the Mutation Assessor from cBioPortal, we filtered out mutations labeled to have neutral and low impact.In Table 1 we are reporting the mutations with high and medium impact and also nonsense mutations and frameshifts.

ICGC
A total of 279 mutations including up-and down-stream mutations were reported in 185 donors.From this number, seven mutations are predicted to have high impact (Table 3).For the Latin American population in ICGC, the Brazilian melanoma study (SKCA-BR) reported four mutations inside the gene, one of them with high impact (Table 2 and Table 3).

ExAC browser
A total of 742 mutations for the CHEK2 gene were reported in this database and 132 of them were present in the Latino population before filters (Dataset 3).After applying the filter of possibly damaging and deleterious alterations, 23 mutations in the Latino population were left.In this group the mutation p.Leu279Pro was the most frequent (0.003112).CHEK2 c.1100delC (p.Thr410MetfsTer15*), the most interrogated mutation in CHEK2, was found in two samples (Table 2).All of these variants were found in the cBioPortal or ICGC data.
CHEK2 mutations in Latinos reported in the literature In total, we found nine studies in which mutations in CHEK2 were evaluated in Latino populations.Two of these studies were international and included Latin American cancer patients 10,22 and the other six studies were country-based.The country in which most studies have been performed was Brazil with four studies [40][41][42][43] .In Argentina 44 , Chile 45 , and Mexico 46 one study per country was identified.In eight out of the nine studies, the presence of variants in CHEK2 was interrogated in breast cancer patients.Only one study used samples of patients with hereditary breast and colorectal cancer.The mutation most frequently evaluated in these investigations was c.1100delC (in six studies); while other two studies 42,44 interrogated the other two most frequent mutations in the CHEK2 gene (c.470T>C and c.444+IG>A) in addition to c.1100delC.Additionally, Chaudury et al. performed a complete sequencing of the gene and found a different mutation, c.478A>G (p.Arg160Gly) 46 .Table 4 shows the Latin American studies that reported the presence of mutations in CHEK2 mutations and their frequency.Only studies in which at least one mutation in CHEK2 was found were included.

Discussion
A search in cancer genomics data repositories and the literature was performed to identify mutations in CHEK2 in different cancer types, with specific emphasis on mutations found in Latino American populations.The database with the most number of mutations reported in CHEK2 for Latino populations was ExAC with 132 mutations, followed by ICGC with four mutations, and TCGA with three mutations.After filtering 30 mutations with high and medium impact according to the databases functional impact categories were kept: seventeen missense, eight 'stop gain' mutations, one frameshift mutation, two mutations in the 5'UTR, and two mutations in splice donor sites of CHEK2.These mutations included the most analyzed mutation of CHEK2, c.1100delC (p.Thr367Metfs) (Table 2).
Worldwide, according to our findings in the ICGC and TCGA databases, CHEK2 mutations were reported in 23 cancer types, while in the Latin American population CHEK2 mutations were only found in head and neck cancer, lymphoma and melanoma.
In this context, it is important to highlight, that Latino populations have been underrepresented in other worldwide studies.As shown in Dataset 4, the cohorts of TCGA are biased toward the inclusion of white individuals and individuals from other ethnicities are underrepresented.The same was observed in ICGC in which only a Latin American cohort from Brazil was available for our analysis.Regarding the data found in our literature review, CHEK2 has only been studied in the Latin American population in breast and colorectal cancer.
In the ExAC repository, the mutations c.1100delC and c.478A>G were found two times and one time, respectively, in the Latino population (Dataset 3).In TCGA, c.1100delC was found in a patient with breast cancer but information about its ethnicity was not available (Table 1).Up to now, only nine studies evaluating mutations in CHEK2 have been performed in Latin America and only six of them found mutations in the gene, five studies found the c.1100delC mutation and one found the c.478A>G (p.Arg160Gly) 10,22,40,43,46 .Two mutations, c.1100delC and c.478A>G, were classified in the ClinVar archive (https://www.ncbi.nlm.nih.gov/clinvar/) as pathogenic and likely pathogenic, respectively.These mutations are the only ones in common with the mutations found in genomics data repositories.
Although c.1100delC is the CHEK2 mutation most evaluated in the Latin American population, it should be noted that its frequency, seen from literature reports and data repositories, is rather low.Because the highest frequency of this mutation is found in populations from the Northern and Western Europe, c.1100delC is proposed as an allele with population gradient, which originated in these populations and its frequency decreases as you get to the southern regions of Europe (Basque Country, Spain, and Italy) 47 .Taking into account the European genetic component of Latin American populations, it is expected that if the frequency of c.1100delC is low in the Spanish population, in our mixed populations the frequency would be even lower.
Because cancer types other than breast and colorectal cancer, such as uterine, lung, bladder and head and neck cancer, presented mutations in CHEK2 in several populations, it is relevant to focus the search for mutations in these types of cancer in the Latin American populations.Additionally, the interrogation of CHEK2 mutations in the Latin American population has been focused mainly on the c.1100delC mutation, but the data obtained from the ExAC database showed that in Latin American samples there are 23 germline mutations (Table 2) that could generate cancer susceptibility.It would therefore be important to examine the frequencies of these mutations in the Latin American population and its association with the development of cancer.
This study has limitations; for example, information about race and ethnicity was not available for at least 28 studies in the cBioPortal, and consequently some Latinos may be hidden in those studies.Thus, the small number of Latinos included in the genomics data repositories could be a reason why we have found a small number of mutations in CHEK2 in this population.It is important to highlight that the use of different transcripts for reporting mutations makes the correlation between mutations found in different studies laborious.
This study presents a compilation of mutations in CHEK2 with high impact in different cancer types in White, Hispanic and other populations.We also showed the necessity of performing studies in Latin American in cancer types different than breast and colorectal and a screening of other mutations in addition to the most popular mutations analyzed, such as c.1100delC.2.

Open Peer Review
Current Referee Status: The authors worked out the compilation of germline mutations in the CHEK2 gene in patients diagnosed with different cancer types and in different populations focusing on Latin American population.CHEK2 mutations have been linked with Li-Fraumeni syndrome, also germline mutations are thought to confer a predisposition to sarcomas, breast cancer and brain tumors.The most frequent CHEK2 mutation c.1100 delC is the low penetrance mutation and it has low impact in breast or other cancers risk.The rest of the mutations or SNPs are much less connected with the known cancer risk.Therefore, it is difficult to evaluate the increase of the different cancers risk for the carriers of germline mutations in CHEK2 gene.It is particularly difficult to do so if there are only one or two carriers of these mutations in the population under study.

I totally agree with both reviewers especially with two issues:
The text is written in that way that the reader can think the authors analyze somatic mutations in CHEK2 gene in different cancer types whereas they made the search for germline mutations.In Table 4 they indicate that the blood was the tissue which was used to analyze mutations so the text should be rewritten in that way there would be no doubt that germline mutations were under study.
The Tables 1 and 2 also should be changed.The description of the ethnic minorities is strange.In Table 2 the data from databaseExAC do not contain the information about the disease connected with the mutation so it does not make sense to include these data if the title of the manuscript is "Mutations of the CHEK2 gene in the patients with cancer…" these data should be excluded from the analysis because they do not bring any important information about CHEK2 mutations in different cancer sites.

Is the work clearly and accurately presented and does it cite the current literature? No
Is the study design appropriate and is the work technically sound?No

Are sufficient details of methods and analysis provided to allow replication by others? Partly
If applicable, is the statistical analysis and its interpretation appropriate?Yes 1.

4.
5. The manuscript by Guauque-Olarte and colleagues is an overview of the variants reported in CHEK2 Latin American population, searched from literature or cBioPortal, ICGC and ExAC databases.Overall the concept of the manuscript is interesting; however the data is poorly presented and the scientific writing is not up to the mark.The manuscript title also needs modification, like "An overview of variants CHEK2 associated with cancer in Latin American population".

I have following reservations about the manuscript:
For missense variants or variants in 5'UTRs it is suggested to write "DNA sequence variants" instead of "mutations" throughout the manuscript, so that these can be differentiated from clear pathogenic mutations i.e. frameshift, nonsense or splice site mutations.
As the study objective was to compile the mutations reported in Latin Americans, Table 1 CHEK2 describes the variants identified in other populations or even the ethnicity is unknown for CHEK2 majority of the variants presented in this table.The table is also not presented properly.It is suggested to omit this table or present it as a "Dataset", and just mention in the text that 78 deleterious or potentially deleterious mutations were reported in TCGA studies.The authors did not state about the origin (somatic or germline) of variants presented in all CHEK2 the tables.It would be of interest if a column is added in all tables for this information.Results section: Data presented in Figure 1 and Figure 2 is not concordant as mentioned in the text.Please resolve this issue.Results section: "…..after the filtering process, 38 of which were classified as with high impact" It is not clear which those 38 nucleotide variants are in Table 1? Please add a column for this information.
Results section: Paragraph "Two patients with three mutations …….this patient carry a frameshift 6.

13.
Results section: Paragraph "Two patients with three mutations …….this patient carry a frameshift and a nonsense mutation" is confusing.Is the patient with DLBC a compound heterozygous for a frameshift and nonsense mutation, simultaneously?CHEK2 Table 2: Column Genomic DNA change: The nucleotide change can't be seen in this column, there is just the nucleotide position.Please modify this column.Results section: GWAS catalog.Authors should be cautious whether the SNPs rs132390-C and rs2239815-T are present in gene or not?CHEK2 Table 2: Two variants in 5'UTR are not clear, population is also not mentioned.Table 2: "Effect" column; please correct that stop gain mutations are also called nonsense mutations.Table 2: The data in the table is not presented properly.c.1590+2T>G and c.573+2T>G are the nucleotide changes and these are presented in column AA change.The authors should follow HGVS nomenclature, both for nucleotide change and the AA change.There should be a column for pathogenicity of missense mutations (high or medium impact) in this table.Table 3 My reservations about the manuscript are as follows: Currently the authors do not make it clear throughout the manuscript whether they are describing somatic mutations or germline mutations/variants.Please add a column to table 2 to show clearly which are somatic and which are germline.
In the abstract it says: "Latin American studies have been restricted to breast and colorectal cancer and only two mutations out of four that have been interrogated in this population were identified, namely c.1100delC and c.349A>G".Table 4 which lists the mutations reported in the literature in Latin American studies does not show the c.349A>G mutation but a c.478A>G mutation and I can see no further mention of c.349A>G in the rest of the manuscript.Please resolve this.studies does not show the c.349A>G mutation but a c.478A>G mutation and I can see no further mention of c.349A>G in the rest of the manuscript.Please resolve this.

Results:
The text description of the difference between Figure 1 and 2 in the start of the results section is unclear.As far as I can see Figure 1 shows data per cancer type in TCGA and Figure 2 shows data per study in TCGA.I don't see the need to have both figures-figure 1 is sufficient and the text should read "breast, colorectal and non small cell lung cancer had more CHEK2 mutations than other cancer type".At the start of the results section the authors describe mutations "before filtering".Please be clearer and state before filtering steps to include only likely functional mutations.
On page 5 the sentence beginning The type of cancer with the most mutations …. should read "After filtering for likely functional variants the cancers with the highest numbers of mutations in CHEK2 were breast followed by uterine, non small cell (?) lung and colorectal.
Table 1 describes mutations in non Hispanic-latino samples.A98Mfs*13 and Q100*, which are found in a white Hispanic or latino sample should be removed (none of the other mutations in Latin American populations are in Table 1).The ethnicity column in table 1 also needs to be formatted properly -remove duplicated words and "_" between words.
GWAS catalogue section in results-need to insert P-values for the associations that you report.Table 4. Insert OR and P-values for associations.The authors make reference to a CHEK2 1100delC mutation picked up in the TCGA datasets and refer to Table 1.I cant find 1100delC in table 1.I can only find it in Table 2 in ExAC.Please clarify.
The authors state that other patients with cancer types such as uterine, lung, bladder and head and neck cancer should be screened for CHEK2 mutations.Here they are trying to show that because a gene is somatically mutated in a particular cancer type that there might also be a germline mutation that increases predisposition.Some of the mutations listed in tables 1 and 2 (mutations post filtering for likely functional impact) are missense or UTR and so it would be important to show that these somatic mutations are functional.Could the authors please annotate the TCGA/ICGC mutations with information of which domain they map to.
Haber DA, Freedman ML: Genetic and functional analysis of CHEK2 (CHK2) variants in multiethnic cohorts.
. The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias You can publish traditional articles, null/negative results, case reports, data notes and more The peer review process is transparent and collaborative Your article is indexed in PubMed after passing peer review Dedicated customer support at every stage For pre-submission enquiries, contact research@f1000.com

Figure 1 .
Figure 1.Percentage of cases with mutations in CHEK2 per cancer study.The X axis shows the type of cancer in which at least one case has a mutation in CHECK2, the Y axis indicates the percentage of cases per study that have mutations in CHECK2 (source: cBioPortal).

Figure 2 .
Figure 2. Percentage of mutations in CHEK2 per cancer study.The X axis shows the type of cancer in which at least one mutation in CHECK2 was identified, the Y axis indicates the percentage of mutations in CHEK2 per cancer type (source: cBioPortal).n unique mutations = 159.Synonymous mutations are not included in the cBioPortal database.
48. Guauque-Olarte S, Rivera-Herrera AL, Cifuentes-C L: Dataset 1 in: Mutations of the CHEK2 gene in patients with cancer and their presence in the Latin American population.F1000Research.2016.Data Source 49.Guauque-Olarte S, Rivera-Herrera AL, Cifuentes-C L: Dataset 2 in: Mutations of the CHEK2 gene in patients with cancer and their presence in the Latin American population.F1000Research.2016.Data Source 50.Guauque-Olarte S, Rivera-Herrera AL, Cifuentes-C L: Dataset 3 in: Mutations of the CHEK2 gene in patients with cancer and their presence in the Latin American population.F1000Research.2016.Data Source 51.Guauque-Olarte S, Rivera-Herrera AL, Cifuentes-C L: Dataset 4 in: Mutations of the CHEK2 gene in patients with cancer and their presence in the Latin American population.F1000Research.2016.Data Source 52.Guauque-Olarte S, Rivera-Herrera AL, Cifuentes-C L: Dataset 5 in: Mutations of the CHEK2 gene in patients with cancer and their presence in the Latin American population.F1000Research.2016.Data Source 1.

Are all the source
data underlying the results available to ensure full reproducibility?Yes Are the conclusions drawn adequately supported by the results?Partly No competing interests were disclosed.Competing Interests: Referee Expertise: cancer genetics, molecular biology of carcinogenesis, epidemiology of cancer, pharmacogenetics I have read this submission.I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.03 April 2017 Referee Report doi:10.5256/f1000research.10703.r21485Muhammad Usman Rashid Shaukat Khanum Memorial Cancer Hospital and Research Centre (SKMCH & RC), Department of Basic Sciences Research, Lahore, Pakistan : Column Consequences: I think there is no need to mention the amino acid change referring all transcripts.Just follow the GenBank reference sequence for transcript variant CHEK2 1 for reporting nucleotide or AA change and follow the HGVS nomenclature.Discussion, paragraph 1: "….eight stop gain mutations, one frameshift mutation…" Please correct, there are four stop gain mutations (also called nonsense mutations) and five frameshift mutations.No competing interests were disclosed.Competing Interests: I have read this submission.I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.Trust Centre for Human Genetics, NIHR Comprehensive Biomedical Research Centre, Oxford, UK Sandra Guauque-Olarte provide an overview of both somatic and germline mutations in CHEK2 that et al have been identified in Latin-American populations.The authors interrogate cBioPortal and ICGC databases to identify somatic mutations and ExAC and a review of existing literature to identify germline mutations.

Dataset 5 .
The %s should not have a -infront of them, it adds confusion as to what these values are.Discussion: this submission.I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Table 2 . Mutations found in three cancer genomics data repositories for Latin American populations.
*The nomenclature used for the mutation annotation is as follow: ICGC (ENST00000328354), ExAC (NP_665861) and TCGA (NP_009125).

Dataset 3. Mutations in CHEK2 identified in Latino American samples before applying filters (source:ExAC) http
In addition, in a Han Chinese cohort of esophageal and gastric cancer the mutation rs738722-T was also associated with those cancers (Dataset 4).
://dx.doi.org/10.5256/f1000research.9932.d142131GWAScatalogMutationsrs132390-C and rs17879961-A mapped to or near CHEK2 were associated in European populations with breast and lung cancer, respectively.Mutations rs4822983-T and rs2239815-T were associated with esophageal squamous cell carcinoma in individuals with Han Chinese ancestry.

:
Dataset 4:Variants reported in CHEK2 that have been associated with cancer according to data in the GWAS catalog.All of these variants were found in the cBioPortal or ICGC data 10.5256/f1000research.9932.d142132 51.