CRI iAtlas: an interactive portal for immuno-oncology research

The Cancer Research Institute (CRI) iAtlas is an interactive web platform for data exploration and discovery in the context of tumors and their interactions with the immune microenvironment. iAtlas allows researchers to study immune response characterizations and patterns for individual tumor types, tumor subtypes, and immune subtypes. iAtlas supports computation and visualization of correlations and statistics among features related to the tumor microenvironment, cell composition, immune expression signatures, tumor mutation burden, cancer driver mutations, adaptive cell clonality, patient survival, expression of key immunomodulators, and tumor infiltrating lymphocyte (TIL) spatial maps. iAtlas was launched to accompany the release of the TCGA PanCancer Atlas and has since been expanded to include new capabilities such as (1) user-defined loading of sample cohorts, (2) a tool for classifying expression data into immune subtypes, and (3) integration of TIL mapping from digital pathology images. We expect that the CRI iAtlas will accelerate discovery and improve patient outcomes by providing researchers access to standardized immunogenomics data to better understand the tumor immune microenvironment and its impact on patient responses to immunotherapy.


Introduction
Immuno-oncology (IO) is one of the most promising areas of cancer research, with IO-based treatments demonstrating high efficacy within certain cancer types and subsets of patients [1][2][3][4] . To broaden the utility of these therapies to more patients, fundamental research is required to improve our understanding of tumor-immune interactions-allowing the next-generation of therapeutics and treatment strategies to emerge 4 . Advances in the IO field are impeded by the inaccessibility of IO study data and results and lack of data standardization, limiting the ability to easily compare results across studies. This has led to the underutilization of existing data, unnecessary study duplication, and failure to achieve rapid consensus in the field 5 . With the vast increase in the number and scope of IO projects expected in the coming years combined with widespread adoption of genomics and other high dimensional technologies, these problems will be compounded going forward.
We developed the Cancer Research Institute (CRI) iAtlas portal (https://www.cri-iatlas.org) to integrate IO research data, with the goal of providing an interactive, exploratory hub for the IO research community. In doing so, we hope to improve the accessibility and utility of critical resources generated from IO studies. iAtlas is a set of analytic modules-hosted on the web-for studying interactions between tumors and the immune microenvironment. These modules allow researchers to explore associations among a variety of immune characterizations as well as with genomic and clinical phenotypes.
The initial release of iAtlas (April 5, 2018) provided a rich resource to complement analysis results from The Cancer Genome Atlas (TCGA) Research Network on the TCGA data set comprising over 10,000 tumor samples and 33 tumor types 6 ("The Immune Landscape of Cancer"; here referred to as "Immune Landscape"). This study identified six immune subtypes that span cancer tissue types and molecular subtypes, and found that these subtypes differ by somatic aberrations, microenvironment, and survival. Per-sample characterizations included total lymphocytic infiltrate (from DNA methylation as well as H&E imaging data), estimated cell type fractions, immune gene signature expression, MHC/HLA type and expression, antigen presentation machinery, T cell and B cell receptor repertoire inference, viral/microbial characterization, associations with pathway disruption and activity, and other analysis results. The Immune Landscape 6 manuscript reported on the most novel and potentially therapeutically salient statistical associations between these immune subtypes and the results of the immune characterization. We have continued to develop and evolve the CRI iAtlas application; here, we report the technical design and implementation of iAtlas up to and including the recently released version 1.2 7 . This version includes new features requested by users including: (1) user-defined loading of sample cohorts, (2) a tool for classifying expression data into immune subtypes, and (3) integration of TIL mapping from digital pathology images.

Methods
Implementation iAtlas is a web-based application to enable data exploration for clinicians, biologists, and informaticists. The inputs and architecture of the application are described below.

Data
The iAtlas app uses structured data and outputs from the Immune Landscape 6 study and the TCGA PanCancer Atlas initiative 8 , which harmonized TCGA data, ensuring uniform quality control and sample inclusion, batch effect detection, normalization across platforms, combination mutation calling from multiple centers, and robustly compiled clinical and outcome data. A key source of iAtlas data is the table summarizing tumor-sample and immune characterizations for 11,080 TCGA patient participants of the TCGA, Table S1 of the Immune Landscape 6 manuscript, here termed the "PanImmune Feature Matrix". Auxiliary data were sourced from files available on this manuscript's data page at the NCI Genomic Data Commons, from the TCGA PanCancer Atlas Data Mirror, and from the TCGA PanCancer Atlas working space in Synapse (see Data availability). iAtlas data were formatted as data frames (tables) and stored as "feather" files (https:// github.com/wesm/feather) on the application server for fast loading (Table 1). R/Shiny architecture iAtlas is powered by Shiny 9 and makes extensive use of Shiny Modules 10 to organize code into composable units ( Figure 1). Each iAtlas Analysis module is designed as a Shiny module, allowing simple integration of new analytical functionality. iAtlas uses the tidyverse 11 family of R packages (e.g., dplyr 12 , tidyr 13 , purrr 14 , stringr 15 , tibble 16 ) as well as the wrapr 17 package to assist with tidy evaluation. These functions power the data transformations of internal tabular data that are then used to create the interactive plots (i.e., with the plotly 18 graphing library) and data tables (via the DT 19 wrapper to the DataTables library) seen through the iAtlas modules. We also make heavy use of the crosstalk 20 package to enable event-driven updates to the application state. The core iAtlas application is hosted on https://shinyapps.io.

Analysis modules
The main feature of the iAtlas interface is the iAtlas Explorer (Figure 2, found under the EXPLORE tab), which provides several Analysis modules to explore and visualize results. Each module supports a type of exploration, with interactive views and controls to enhance and extend the results and analytics as initially described in the Immune Landscape 6 study. The layout of pages and sections within the iAtlas Explorer is driven by the shinydashboard 21 package.
Within each module in iAtlas, results are displayed relative to Sample Groups, corresponding to defined study cohorts. Sev Immunomodulators: Explore the expression of genes coding for immunomodulating proteins 6 , which include therapeutically important immune checkpoint proteins. Immunomodulators are organized by grouping into three categories: Gene Family (such as "TNF", "MHC Class II", "Immunoglobulin", or "CXC chemokine"), Super Category (such as "Ligand", "Receptor", or "Antigen presentation"), and Immune Checkpoint (classified as "Inhibitory" or "Stimulatory"). Violin and box plots are again used to present distributions, and a table provides additional metadata about immunomodulator genes.
Driver Associations: Test and visualize associations between mutations and IO-related response variables. In the Immune Landscape 6 study, we reported somatic driver alterations that are correlated with increases or decreases in overall immune cell content, or with the fraction of individual immune cell types. These and other variables can be selected to calculate the significance of relationships in each sample group and view results in a volcano plot.

TIL Maps:
We used the results of a recently reported method to assess which spatial regions of hematoxylin and eosin (H&E) whole slide images show evidence of tumorinfiltrating lymphocytes (TILs) 24 . The method, which uses deep learning, was applied to thousands of H&E slides of the TCGA, allowing slides to be characterized in terms of TIL density and patterns.
Integration with Landscape of IO Drug Target Development: CRI has compiled and published comprehensive overviews describing ongoing immunotherapy drug trials, including targets, agents, and tumor sites and has made summaries available in an online resource, the Immune-Oncology Landscape (IO Landscape) (www.cancerresearch.org/IO-landscape) 25-28 . The iAtlas and the IO Landscape resource have been interlinked, enabling researchers to more readily understand the relationship between targeted proteins in IO therapy and the behavior of those targets in tumor tissue.
In IO Target Gene Expression Distributions, the distribution of gene expression values for the selected IO target, by sample group, is displayed in violin plots. Clicking on the expression distribution (violin plot) of a particular sample group, a histogram of the values is displayed.
The IO Target Annotations section provides a searchable table with IO targets and associated annotations. In the rightmost column, a link is provided to a view of the IO Landscape page, the selected target is highlighted in summary barcharts showing the number of agents and cancer types being studied for that target.
In the opposite direction, clicking on targets in the barcharts in the IO Landscape on CRI web pages brings up the target gene expression in iAtlas.

Tools
iAtlas Tools are accessible via the TOOLS tab on the iAtlas Portal. Modules in this space of the portal enable users to "bring their own data" for processing through immunogenomic algorithms that drive some of the results presented in the Analysis modules described above.

Figure 2. iAtlas Explorer.
A range of Analysis modules (blue boxes above) are available that span from clinical to molecular and imaging data types. Within each module, interactive controls allow researchers to expand views, exposing underlying data and results. Settings are available (green box above) to select the sample groupings (TCGA Study, Disease Subtype, or Immune Subtype) which then propagate through modules.

Immune Subtype Prediction:
This tool performs classification of RNA-seq data into one of six immune subtypes as described in the Immune Landscape 6 study. Using a new ensemble model based on XGBoost 29 , researchers can upload their own data for classification 30 . Each member of the ensemble was trained on a random subset of previously reported immune subtypes 6 and features (described below) based on gene expression data from the TCGA PanCancer Atlas Initiative 8 . All code and methods have been confirmed as reproducible. An R package is available on GitHub (https://github.com/CRI-iAtlas/ImmuneSub-typeClassifier) 30 .
The submitted expression data-subsetted to the 485 genes that comprised the 5 signatures that produced the immune subtypes-are used to generate robust features of three types: quartiles, binary gene-pairs, and signature-pairs. For example, given a single sample, genes are binned into quartiles and given a bin label (quartile features). Then, similar to the "Top Scoring Pairs" classifier 31 , genes are paired, and given binary values depending on whether (g i > g j ) for two gene expression values, g i and g j . Lastly, signature-pair features are calculated using the five immune subtype signatures, where s mn = ∑ ij (g im > g jn )/k, where g im is gene i from signature m, g jn is gene j from signature n, and k is the number of gene pairs considered resulting in a value between 0 and 1. The features are computed independently for each sample, and do not require normalization across samples. These features are given to a trained XGBoost classifier which returns a probability of being in any of the six subtypes. Lastly, a "best call" is made with a final trained XGBoost classifier using the six probabilities as input. To validate the robustness of the classifier, TCGA data were processed using four different software pipelines and normalization, showing that classification performance was independent of the gene expression quantification method 30 . Along with a downloadable table of results, visualizations are also provided. This tool is a convenient way for researchers to apply the methods of the Immune Landscape 6 study to their own data without difficult statistical coding.

Operation
To use iAtlas, access the web app via https://www.cri-iatlas.org. The software can also be run locally on all platforms (Windows, Mac, Linux). To run the Shiny app locally, a working R installation with necessary libraries is required and an installation of RStudio is recommended.

Use cases
Reproducing published results and gaining information on underlying data One of the initial motivations behind iAtlas was to provide an interactive platform that is able to reproduce figures published in the Immune Landscape 6 manuscript but expands that with the ability to generate variations of those figures, for other choices of tumor samples and immune readouts of interest. As an example, in order to reproduce Figure 4A from the Immune Landscape 6 publication, which shows the correlation of DNA damage measures with the fraction of leukocytes in the tumor, we began by selecting the EXPLORE tab. We then opened the Immune Feature Trends module and selected the "Immune Subtype" option under Select Sample Groups in the Explorer Settings panel in the left menu. In the ensuing module page, at the Correlations section (Figure 3), we selected the "DNA Alterations" under Select or Search for Variable Class, "Leukocyte Fraction" under Select or Search for Response Variable, and the "Spearman" method under Select or Search for Correlation Method (each a separate dropdown menu). This produced a heatmap identical in content to Figure 4A in the Immune Landscape 6 publication. However, the heatmap provides additional information on underlying data via interactivity: by clicking on a heatmap-cell, the underlying data is displayed in a scatterplot. Hovering a cursor over a point in the scatter plot reveals sample-level information. Table 2 lists the particular manuscript figures (from the Immune Landscape 6 publication) that can be reproduced or adapted to specific research questions.

Exploring new IO results
With the iAtlas portal, scientists can explore and answer new questions based on specific research interests. For example, we asked: "What is the expression level of PD-L1, a therapeutically important protein, in subtypes of breast cancer?" To answer this question, from the landing page, we first selected the "TCGA Subtype" sample group, followed by the "Breast Invasive Carcinoma (BRCA)" study subset. Next, we selected the Immunomodulators module (Figure 4). Based on a very quick scan of the drop down, we didn't see any names that matched our gene of interest, so we scrolled further down on the page to view the table of 'Immunomodulator Annotations'. By typing in the first few letters of a gene name (e.g., "PD...") into the 'Search' field, the table was filtered to a set of matching genes, and we could see that "PD-L1" is the Friendly Name for the gene "CD274" (the approved gene symbol on genenames.org). After returning to the Select or Search for Variable drop down menu above and selecting "CD274 (PD-L1)", we were able to see a display of violin plots showing the distributions of gene expression across BRCA molecular subtypes. We could then visually compare distributions between subtypes, noticing for example the elevated expression level in the Her2 subtype compared to Basal breast cancer. These comparisons can guide further characterization not only of how gene expression can differ between TCGA subtypes of breast cancer, but also how these subtype-specific differences might correlate with clinical outcomes, as investigated in other studies 32-34 . Using   this module and others, the researcher has the ability to answer new questions which could lead to developments in oncology research.

Classification of immune subtypes on new data
In order to classify any tumor-derived gene expression samples into immune subtypes 6,30 , users can select the TOOLS tab (top right), which leads to an interface containing notes, several links and the controls. In order to classify new data, we submitted data as a text file, in this case tab separated, with the first column containing gene IDs and later columns containing samples. A provided example file can be found in the description text. The first row of the data was a header containing sample IDs. Gene IDs can be either HGNC gene symbols (preferred), Entrez ID, or Ensembl identifiers. The locally available data was selected using the Browse button, and the file delimiter was selected, along with gene ID type, using drop down menus. Hitting the GO button produced classifications, signature scores, and cluster probabilities, which were reported in a table that was downloaded as a csv, xlsx, or pdf file. In addition, a barplot with the frequency of predicted subtypes for the submitted data was displayed.
All data required to run the application and describe the Use Cases are available in GitHub and archived with Zenodo 7 .

Conclusions
CRI iAtlas is a platform that facilitates analysis and exploration of the tumor immune microenvironment by making IO-related data and tools accessible to the research community. iAtlas builds upon the comprehensive TCGA analysis of tumorimmune interactions on 10,000 tumors and illustrates how commonalities and differences of the immune response across 33 tumor types can provide clues for advancing therapeutics. iAtlas provides researchers with the tools to dive deeper into immunogenomic and clinical data and to develop and refine hypotheses regarding tumor-immune interactions that will empower researchers to gain insight and design the next generation of immuno-oncology treatment strategies. Folder 'Data' contains all data required to run the application and describe Use Cases. This is also available on GitHub.

Data availability
License: Apache License 2.0.

Software availability
Source code is available from GitHub: https://github.com/ CRI-iAtlas/shiny-iatlas. concerns, and suggestions that should not preclude acceptance of the manuscript but rather serve as potential items to include in future releases of iAtlas: In the Cell Type Fractions of the Tumor Microenvironment module, it would be helpful to include an option for the user to select which CIBERSORT immune types to include as a custom "Cell Fraction Type". For example, a user could select only memory resting CD4 T cells, memory activated CD4 T cells, naïve CD4 T cells, Tregs, and Tfh if they wanted to easily visualize the relative fractions of particular Th subsets within the CD4 compartment across groups. In some instances, it may be more biologically meaningful to compare these shifts within broad subsets without having to account for changes in unrelated populations (e.g. M2 macrophages). 1.
In the Tumor Microenvironment module, when comparing leukocyte fraction to the total stromal fraction, there are typically a small number of samples where the leukocyte fraction exceeds the total stromal fraction, denoted as "estimation artifacts". What is the source of these artifacts? It may help to include a description of this in the legend.

2.
As a general comment, in a variety of instances it might be informative if there would be a way to include statistics between groups, though I recognize that doing so could be computationally intensive and invalid in many instances. Nonetheless, given the large sample sizes I am often left wondering if modest differences between groups are statistically meaningful and where inferences can be drawn (e.g. an increase in Treg fraction from 0.024 in the "Inflammatory" subtype to 0.029 in the "TGF-b Dominant" subtype).

3.
The Data Description tab might be aided by including a sentence or phrase within the table stating what each variable is without requiring following all of the methods links. For example, I am assuming the BCR/TCR Shannon incorporates both the richness and evenness to depict the diversity of the repertoire, but it might be helpful to state the meaning of the terms in the table.

4.
For the Clinical Outcomes, it might be useful to include "Immune Subtypes" as a variable when analyzing TCGA Study so that users can evaluate the survival probabilities depending on immune subtype for a given cancer. 5.
There appears to be a minor issue wherein no results can be displayed for the Concordance Index of the Clinical Outcomes analyses for most, if not all, variables when using TCGA Study for the Sample Groups.

6.
The TIL Maps module is quite impressive and the ability to examine the histology from each of the patients along with the calculated metrics is an outstanding feature. Nonetheless, when one clicks one of the violin plots, the annotation lists seems only to display an individual patient rather than the entire list of patients for that group.

7.
The Immune association with driver gene analysis is a powerful tool whose insights have far-reaching implications and represent an active area of research in tumor immunology. While the interactive volcano plot is highly informative and useful for identifying genes of interest, it would also be useful to be able to run the analysis in the opposite direction. If possible, it would be informative to be able to select a driver gene and see where it lies in the various volcano plots and which of the variables are correlated with the mutation.

8.
© 2020 Rouilly V. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Vincent Rouilly DATACTIX, Bordeaux, France
In this article, the authors introduce iAtlas, a web-based application that allows to browse a rich diversity of immune profiles, generated from the public TCGA dataset, and published in the landmark Immune Landscape study. Further than simply giving the possibility to replicate the published figures, the application provides a great flexibility to explore the entire PanImmune feature matrix through sophisticated and interactive tools.
The modular architecture and the features of the software are clearly explained. Detailed instructions are provided. And, all the necessary information is given to run the application. Its source code repository on github is well structured. The documentation is very comprehensive, as it gives ample information on the underlying methods, as well as how to extend the software application.
It is a very valuable resource for the Immuno-oncology community.

Is the description of the software tool technically sound? Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others? Yes

Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool? Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article? Yes