TRGAted: A web tool for survival analysis using protein data in the Cancer Genome Atlas.

Reverse-phase protein arrays (RPPAs) are a highthroughput approach to protein quantification utilizing antibody-based micro-to-nano scale dot blot. Within the Cancer Genome Atlas (TCGA), RPPAs were used to quantify over 200 proteins in 8,167 tumor and metastatic samples. Protein-level data has particular advantages in assessing putative prognostic or therapeutic targets in tumors. However, many of the available pipelines do not allow for the partitioning of clinical and RPPA information to make meaningful conclusions. We developed a cloud-based application, TRGAted to enable researchers to better examine patient survival based on single or multiple proteins across 31 cancer types in the TCGA. TRGAted contains up-to-date overall survival, disease-specific survival, disease-free interval and progression-free interval information. Furthermore, survival information for primary tumor samples can be stratified based on gender, age, tumor stage, histological type, and subtype, allowing for highly adaptive and intuitive user experience. The code and processed data are open sourced and available on github and contains a tutorial built into the application for assisting users.

Improving prognostic prediction and the identification of potential therapeutic targets is of particular interest to clinicians. Quantification of messenger RNA at a genome-wide level has proven valuable in the discovery of gene expression profiles, which can serve as biomarkers for clinical outcomes in cancer 1 . However, RNA quantification of tumor or patient cohorts is a proxy for protein level, with many cellular processes above transcription that ultimately regulate protein level. The availability of protein-level quantifications for the TCGA cohort allows for more relevant clinical outcome predictions compared to mRNA levels. Currently, TCGA-based applications provide entry-level analysis in correlational, differential, and survival modalities for the RPPA information. However, survival analysis in these applications rely on median-or mean-based survival data and do not allow for the use of clinical variables 2-4 .
With these limitations in mind, we developed a new opensource web application, TRGAted ( Figure 1). Built on the R shiny framework, TRGAted is an intuitive data analysis tool for parsing survival information based on over 200 proteins in 31 cancer types. TRGAted is comprised of processed RPPA information, survival information, and code, allowing users to run instances locally or modify the code with ease.  Each file communicates within the R Shiny framework. On the user side (left, blue), users select pertinent cancer type, protein of interest, and clinical variables into the CSS-enabled user interface. This information is received by the server file enabling the subsequent run in R. On the server side (right, orange), the specific cancer type from the database, R packages, and functions are retrieved and executed. After execution, the server file provides both tabular and graphical output (purple) to the user interface.

Amendments from Version 1
After receiving the very generous reviews, the new version of the manuscript reflects our attempts at improving the readability and flow of the manuscript. In doing so, we have tried to address the shared major concern of the reviewers in terms of grammatical and typographical errors. Additionally, as Dr. Zenklusen has suggested, we updated the description of the clinical and survival data source. (DSS) data was available for 7,240 patients, disease-free interval (DFI) data was available for 3,887 patients, and progression-free interval (PFI) data was available for 7,315 patients (Table 1).

REVISED
Operation: Minimum system requirements for running TRGAted locally are modest and include an Intel-compatible CPU and 1 gigabyte of RAM. Running TRGAted from the shiny server requires a modern browser and an internet connection.
Kaplan-Meier survival curves can be generated by selecting the cancer type, survival type and protein(s) of interest (Figure 2). Kaplan-Meier curves are generated using the survival (v2.41-3) and the survminer (v0.4-1) R packages. Multi-protein survival analysis utilizes mean values of protein probes, similar to gene-expression-based survival analysis platforms 6 . Hazard ratios for two-group comparisons, either median or optimal cutoff, utilize the Cox proportional hazards regression model in the survival R package; with the reported hazard ratio comparing high versus low protein groups. Optimal cut-off feature uses the surv_cutpoint function of the survminer package, calculating the minimal p-value based on the log-rank method. This function uses the maximally selected rank statistic (maxstat, v0.7-25) R package, which finds the maximal standardized two-sample linear rank statistic 7 . In order to find clinically or biologically meaningful biomarkers, the minimal proportion cutpoint, or the maximal disparity comparison, was set at 15% versus 85% of samples. Clinical variables dependent on the cancer type selected can be used to filter patients into user-defined groupings.
Clinical information available across all types include: subtype, tumor stage, histological type, gender, age, response to primary therapy.
TRGAted also allows for Cox proportional hazard modeling across all proteins in each cancer type or for a single protein across all cancer types. Hazard ratios and p-values are based on the Cox regression model. Values filtered from the volcano plots are proteins with -log10(p-values) less than 0.1 and hazard ratios greater than 20. These filters were implemented to improve visualization and to reduce artifacts of the analysis pipeline, respectively. The volcano plot can be graphed as linear or natural-log transformed, to assist in the visualization of good prognostic indicators. Visualizing the proportional comparison for the volcano plots is also available.

Use case
In order to demonstrate the functionality of TRGAted, we present a basic survival analysis examining the aggressive, highly-metastatic subtype of breast cancer, known as basallike breast cancer. We found in this cancer, RAD50, involved in homologous recombination of DNA, as a novel poor prognostic marker.
Survival curves: Survival curves can be generated by selecting the cancer type, survival type, and protein or proteins of interest ( Figure 2A). We also selected the subtype information to more closely examine basal-like breast cancer. Other survival types and clinical variables can be selected ( Figure 2B). Samples can be divided into quartiles, tertiles, median or optimally for p-values based on the protein of interest ( Figure 2C). Here we can see that the DNA repair protein, RAD50 is a poor prognostic marker for overall ( Figure 2A) and disease-specific survival ( Figure 2B) in basal-like breast cancer.
Across cancer: TRGAted can be used for biomarker discovery by examining the hazard ratios for all proteins available by cancer type or subtype, like basal-like breast cancer ( Figure 3A). The volcano plot displays good prognostic markers on the left in blue and poor prognostic markers on the right in red. Having selected the optimal cutoff feature, a bar chart can also be generated to examine the proportion of samples in the high and low protein groups ( Figure 3B). Protein labeling is adaptive for both the volcano plot and bar chart and will only label significant proteins (p-value ≤ 0.05). Here we see the RAD50 is one of the most significant predictors of poor overall survival in basal-like breast cancer ( Figure 3A and B).
Across protein: TRGAted can also be used to examine the survival outcomes of a protein of interest across multiple cancers.
Here, RAD50 predicts poor survival in only five cancer types, prostate, adrenocortical, breast cancer, low-grade glioma, and head and neck cancers ( Figure 4A). A summary of the hazard ratios can also be visualized by selecting for the barplot function ( Figure 4B).

Conclusions
TRGAted is an open-source survival analysis application designed to allow for quick and intuitive exploration of TCGA protein-level data. This survival analysis improves on current TCGA pipelines by providing greater diversity of clinical and survival options and relying on protein-level data. In addition to log-rank and Cox regression modeling, TRGAted allows users to download graphical displays and processed data for up to 7,714 samples across 31 cancer types. Built on the R shiny framework, a literate code architecture, the code for TRGAted is annotated and easily modified from our GitHub repository. Under the GNU General Public License v3.0, we encourage interested groups  1.

2.
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article? Yes No competing interests were disclosed.

Competing Interests:
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others? Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool? Yes Are the conclusions about the tool and its performance adequately supported by the findings presented in the article? Yes No competing interests were disclosed. Thank you for your very kind review and suggestions. In the most recent submission, we have addressed your concerns in editing the manuscript and adding additional details on the source of clinical information.

No competing interest Competing Interests:
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias You can publish traditional articles, null/negative results, case reports, data notes and more The peer review process is transparent and collaborative Your article is indexed in PubMed after passing peer review Dedicated customer support at every stage For pre-submission enquiries, contact research@f1000.com