ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Software Tool Article

Interactive Clustered Heat Map Builder: An easy web-based tool for creating sophisticated clustered heat maps

[version 1; peer review: 1 approved, 1 approved with reservations]
* Equal contributors
PUBLISHED 14 Oct 2019
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Bioinformatics gateway.

Abstract

Clustered heat maps are the most frequently used graphics for visualization and interpretation of genome-scale molecular profiling data in biology.  Construction of a heat map generally requires the assistance of a biostatistician or bioinformatics analyst capable of working in R or a similar programming language to transform the study data, perform hierarchical clustering, and generate the heat map.  Our web-based Interactive Heat Map Builder can be used by investigators with no bioinformatics experience to generate high-caliber, publication quality maps.  Preparation of the data and construction of a heat map is rarely a simple linear process.  Our tool allows a user to move back and forth iteratively through the various stages of map generation to try different options and approaches.  Finally, the heat map the builder creates is available in several forms, including an interactive Next-Generation Clustered Heat Map that can be explored dynamically to investigate the results more fully.

Keywords

Bioinformatics, Genomics, Heat Map, Web Tool, Website, Hierarchical Clustering

Introduction

Many thousands of publications on genomics studies include clustered heat maps (CHMs) because the hierarchical clustering and intuitive visualization provide insight into the relationships among sample sub-groups and key biological processes18. Construction of a CHM requires data transformation, application of clustering methods, association of covariate (classification) data, and production of the heat map visualization. Generally, those tasks require the assistance of an analyst with biostatistics or bioinformatics skills who can work in R or a similar language to manipulate the study data and generate the map. This is usually not a simple linear process because data transformation and clustering methods are often revisited to find the ideal match for the study, and modifications are often made to heat map visualizations to select the best colors, adjust covariates, insert gaps, etc. Our Iterative CHM Builder is a web-based tool for generation of high-quality heat maps that can be used by investigators with no bioinformatics experience and only modest exposure to biostatistical methods. The tool guides users through the steps of creating a heat map and supports iterative refinement of the map by working backward and forward through the steps to refine data transformation, annotation, clustering, and formatting options. (Caveat: Iterative exploration of different options may introduce a multiple-comparisons issue that would have to be taken into account if the map were used for formal statistical inference, rather than discovery.)

One obvious limitation of traditional heat maps is that they contain a huge amount of information but are static in nature and do not readily support a deeper exploration of the biology behind the image. The Iterative CHM Builder produces traditional heat map images as PDF files but can also produce interactive next-generation CHMs (NG-CHMs). NG-CHMs support interactive exploration of patterns in the data through zooming, panning, searching, and advanced link-outs to dozens of external resources. An NG-CHM file can be downloaded and viewed locally with the NG-CHM viewer and, importantly, can be embedded in a study results webpage or publication.

The iterative CHM Builder9, available at https://build.ngchm.net/NGCHM-web-builder/, is easy to try out using sample data provided at the site. Other methods of producing NG-CHMs, including an R library and a set of tools for the Galaxy platform10,11, are described at https://www.ngchm.net/.

Methods

Implementation

The Interactive Builder9 is web-based application that accepts an uploaded data matrix and then walks the user through several steps to transform the data, perform hieratical cluster, and format the resulting CHM. The application is implemented as HTML, CSS, and JavaScript on the browser-side and Java servlets on the web server. Data manipulation and heat map generation are implemented in Java classes used by the servlets. The clustering is performed by a servlet using the Renjin engine (https://www.renjin.org) to perform R clustering functions in Java. Browser sessions are tracked by the server to create a working area for each user and prevent users from seeing each other’s data or maps. In addition to the working version of the data matrix on which transformations are performed, an original version of the matrix is preserved. Returning to a previous matrix state is accomplished by restoring the original version and then re-applying transformations until the requested state is restored. The site retains constructed heat maps and the related uploaded data only for the duration of the HTTP session.

A Java NG-CHM heat map generator .jar file is used to construct the heat map repeatedly as options are selected in each step of the builder. The heatmapProperties.json file, which contains all options selected by the user, conveys the selected options to the generator. The current NG-CHM file set is stored in a directory under the session ID. The NG-CHM file is a zipped version of the NG-CHM directory. The downloaded .ngchm file can be saved locally and viewed interactively using a local instance of the NG-CHM viewer that can also be downloaded from the builder site. An overview is given in Figure 1.

fba346df-f838-4b3a-bd14-8f1fe3e885e3_figure1.gif

Figure 1. High-level overview of the interaction of heat map builder components.

Heat maps are built on a webserver. A browser session ID is used to create a separate, temporary working area for each user. Heat map construction sessions are cleaned up when the session is ended, but PDF and NG-CHM heat map files can be downloaded.

The full source code for the Interactive Builder is available in GitHub.

Operation

There is no need to install software to use the Interactive Builder9 it is available for public use on our server at https://build.ngchm.net/NGCHM-web-builder/. If, however, a local private installation of Interactive Builder is preferred, there are two simple installation methods.

Organizations familiar with Docker can run the Builder as a Docker container. To do this, clone the git repository. The base folder of this repository has a docker build file. Run the docker build command in this directory with a –t option to name the resulting docker image. Then use the docker run command to start a container using the image. The heat maps created by the software are transient and last only for the duration of a user http session so there is no need to mount an external directory to the container for persistent storage. The port for connecting to the webserver in the container does need to be specified in the docker run command. Connect the desired external port to the tomcat instance in the container, for example –p 80:8080. Users should then be able to connect to Interactive Builder using their browser and the URL of the docker container. For example, http://<docker machine IP or URL>/ NGCHM-web-builder.

The other option for deploying the software is to install it on an existing web server like tomcat. To do this, first clone the git repository and then use the ant script, ant_buildfile.xml in the NG-CHM_GUI_BUILDER folder to create a .war file. Then simply copy the .war file to the webapps directory of the web server. The application should then be available at http://<server URL>/ NGCHM-web-builder.

Use case

The starting point for a CHM is a matrix of data. In this use-case example, we focus on gene expression data from The Cancer Genome Atlas (TCGA) bladder cancer project12,13. The rows and columns of the matrix require identifiers, in this case sample ids and gene symbols, and the cells of the matrix must be numeric values. The builder will accept either a tab-delimited text file (*.txt), comma-separated text file (*.csv) or Excel spreadsheet (*.xlsx).

Select matrix

The Open Matrix File button on the first page of the builder (Figure 2) is used to upload the data matrix. A name and optional description to be associated with the heat map are entered. When the data have been loaded, the Select Matrix page will show the first few rows and columns of the matrix. It is important that the builder correctly identify the row labels, column labels, and matrix data; the backgrounds of labels and matrix data should be blue and green, respectively. If the input file has extra rows or columns, you may need to correct the identification of labels and matrix data by selecting the appropriate radio button and then clicking on the correct location in the matrix displayed.

fba346df-f838-4b3a-bd14-8f1fe3e885e3_figure2.gif

Figure 2. Heat map creation starts with importing a text matrix file (e.g., *.txt, *.csv or Excel *.xlsx file) and identifying the row labels, column labels and numerical data values.

Note that several screens in the builder include advanced features that are hidden by default to simplify the process for first-time users. The use-case example here does not require advanced features, but be aware that additional capabilities can be accessed using the Advanced Features checkbox.

Transform/filter the data

Creating a good heat map depends on proper data preparation. The second step in the build process is the Data Transform page (Figure 3), which provides three primary categories of matrix transformations: functions that identify and replace missing/invalid values, filters to remove rows or columns, and transforms to perform mathematical operations on data values. There are additional choices in advanced mode for transposing the matrix and calculating correlations.

fba346df-f838-4b3a-bd14-8f1fe3e885e3_figure3.gif

Figure 3. The data transform page makes it easy to perform operations on the matrix like log transformation or filtering to reduce and normalize data.

The right-hand panel of the Transform page provides summary statistics about the data matrix, including the number of rows and columns, a histogram of the data distribution, and an indication of the number of invalid cells in the matrix. The top of the page also provides suggestions about transformations that can be performed and flags any problems with the data. The use-case matrix is too large for the Iterative Builder to use in creating a heat map, so a message in red indicates that. Currently, the website limits the heat map to no more than 4,000 total rows and columns and no more than 3,500 elements on either axis.

For this use case, we apply the following transforms:

  • Apply a threshold to remove (set to NA) values that are less than 0.00001.

  • Log Transform Base 10.

  • Mean Center the Rows.

  • Filter to remove rows with >50% missing values

  • Filter to keep only the 500 rows with the highest standard deviation. That is done to find the rows that differ the most across samples. Those rows will drive clustering and patterns in the heat map.

After applying the transformations, the matrix contains no errors and should be suitable for heat map generation (Figure 4). Note that the left-hand panel shows the history of transformations performed on the matrix, and one can ‘undo’ back to any previous state of the matrix (including the original version) by clicking the desired previous state and hitting reset. More generally, the entire process of creating a heat map is iterative; the Next and Previous buttons can be used to return to previous steps to try different options. If, after generating the heat map, it appears that there should be more or fewer rows or different transforms, one can return to the pertinent screen and use the history and Reset option to adjust the data matrix. Finally, as an added feature, the Transform screen enables the user to download the filtered, transformed matrix for use in other analyses.

fba346df-f838-4b3a-bd14-8f1fe3e885e3_figure4.gif

Figure 4. The transformed dataset has a better distribution and size for heat map generation than did the original.

The history of transformations in the left-hand panel can be used to undo changes and revert to previous matrix states.

Clustering

The next step is clustering (Figure 5). The row order and column order drop-down menus can be used to select the clustering algorithm and distance measure to be applied to the rows and/or columns. Ward’s algorithm with Euclidean distance metric is one common choice, but the menus include many other possibilities, appropriate for different purposes and data characteristics. For the sample case, the Ward/Euclidean options provide strong separation in the dendrogram and interesting groups of samples. The menus also allow the rows and columns to be left in original order or randomized. Additional options will be provided in the future.

fba346df-f838-4b3a-bd14-8f1fe3e885e3_figure5.gif

Figure 5. The clustering step supports many different clustering methods and distance measures.

The Apply button performs clustering and displays the resulting dendrograms.

Please be aware that clustering of larger matrices may take a few minutes to complete. (The time it takes to cluster data increases approximately as the square of the number of rows or number of columns, whichever is larger.)

Covariate bars

The next page allows covariate (classification) bars to be added to the heat map (Figure 6). Covariate bars add descriptive information about the rows or columns of the heat map. A covariate bar file has the same labels as the rows or columns in the matrix and an annotation value. In this use-case we will use TCGA clinical data to add age, smoking status, gender, and tumor stage to the heat map. The covariate file contains sample ids and clinical values – one value per line. When a covariate file is added, one must identify it as a row or column covariate and specify whether it contains discrete (categorical) data or continuous values. In this case smoker status, gender, and stage are discrete column covariates, and age is a continuous column covariate.

fba346df-f838-4b3a-bd14-8f1fe3e885e3_figure6.gif

Figure 6. The covariate screen allows for the addition of supplemental data that describes the rows or columns of the data.

This screen is also used to change the color of values and ordering of the covariate bars.

After covariate bars have been added, the colors associated with the covariate values can be changed. If the color scheme might be useful for other maps, the palette can be saved to the server using the See Palettes button. Covariates can be reordered on the same screen.

An advanced feature, accessed on the cluster page, is the ability to generate a covariate bar based on the clustering dendrogram. If, for example there are four distinct clusters in the data and one wants to emphasize them in discussion of the heat map, a covariate that identifies the four top clusters based on the four top branches of the dendrogram can be generated.

Another notable advanced feature is the ability to include classification data in the original matrix uploaded in the first step, rather than providing individual covariate files on the covariate page. Choosing advanced features on the first page enables the user to identify covariates as well as labels and data in the uploaded matrix.

Format heat map

The format screen (Figure 7) supports the final step in generation of a heat map, adjustments of its appearance:

fba346df-f838-4b3a-bd14-8f1fe3e885e3_figure7.gif

Figure 7. The format step is used to make changes to the appearance of the heat map, for example, changing the color scheme or altering the breakpoints associated with the colors.

Many appearance change options are available.

  • Adjustment of colors and break points in the body of the heat map.

  • Formatting of labels

  • Formatting of the dendrograms

  • Specification of the data type of the labels for link-outs.

For this use case, several changes were made: (i) a slight adjustment to the break points to emphasize high and low values in the matrix, (ii) identification of row labels as gene symbols, and (iii) identification of column labels as TCGA sample identifiers. Associating the labels with known data types activates available type-specific link-outs to external data resources.

Interesting advanced features on the same page include the addition of ‘top items’ that will be displayed in the global (i.e., full) heat map view. For example, to show the positions of a few key genes, they can be entered on the page and will show on the global heat map display. Another powerful advanced feature is the ability to add gaps to emphasize sub-groups in the heat map.

Heat map – view and download

The heat map is now complete, but the Prev button can still be used to go back to previous build steps to try different options. On this final page of the Interactive Builder (Figure 8), the map can be explored dynamically and downloaded. The Get Heat Map PDF button downloads a PDF of the summary and/or detail views as they appear on the screen – including a version of the detailed view zoomed as desired. The legends and other metadata are shown on a separate page of the pdf. The final screen can also be used to explore the dynamic heat map by zooming, panning, searching, dendrogram selection, and link outs. Clicking the Expand Map button devotes the whole browser window to the map.

fba346df-f838-4b3a-bd14-8f1fe3e885e3_figure8.gif

Figure 8. The heat map review and download screen shows the completed heat map, allows for dynamic exploration of the map, and provides download options for a PDF, an NG-CHM, and/or the construction history.

Heat maps constructed on the Iterative Builder website are not saved. However, NG-CHMs can be downloaded to save and explore dynamically on your own computer. Select the Get NG-CHM file to obtain a map and then select the Get Heat Map Viewer to get a stand-alone NG-CHM viewer to run on your computer. See our NG-CHM site for more details on the capabilities of dynamic heat maps, additional builders to generate NG-CHMs (Galaxy and R)2, and instructions on how to embed dynamic heat maps in your websites - https://www.ngchm.net/. Also see our YouTube channel for tutorials on NG-CHM features.

NG-CHM

The interactive NG-CHM produced by the Builder for the use case can be viewed here. Try the pan, zoom, search, and link-out features.

Reproducibility

Reproducibility of results is becoming increasingly important for publication in high-impact journals14. Therefore, it is important to be able to report the exact steps performed to transform data and create a heat map. That is particularly challenging with an iterative tool that facilitates exploration of alternative options. The Get Creation Log button on the file page of the Iterative Builder is meant to address that need. The history provided by the log shows each option, including the data transformations that were performed to produce the current map. With the original data file and the history, it is possible to recreate a heat map exactly.

Conclusions

The Interactive CHM Builder9 is an easy to use yet powerful tool for creating custom clustered heat maps for any type of study that has a matrix of data. It requires no programing skills and has an intuitive step by step process to prepare the data and build high-quality CHMs. Sample data is built-in so it takes just seconds to try out the process and become familiar with the basic steps for heat map generation. It is also easy to back up to previous steps or data states to try alternative approaches and refine formatting. Finally, heat maps may be download as either PDF files or as NG-CHM files that support in-depth exploration of the maps.

Data availability

Open Science Framework: NG-CHM Interactive Builder Use-Case Data. https://doi.org/10.17605/OSF.IO/ABJD713.

This project contains the sample TCGA bladder cancer matrix used in the use-case.

Data are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication).

Software availability

The Iterative CHM Builder is freely available for use as a web resource at: https://build.ngchm.net/NGCHM-web-builder/.

Source code available from: https://github.com/MD-Anderson-Bioinformatics/NG-CHM_GUI_BUILDER.

Archived source code at time of publication: https://doi.org/10.5281/zenodo.34606739.

License: GNU General Public License version 2.

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 14 Oct 2019
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Ryan MC, Stucky M, Wakefield C et al. Interactive Clustered Heat Map Builder: An easy web-based tool for creating sophisticated clustered heat maps [version 1; peer review: 1 approved, 1 approved with reservations]. F1000Research 2019, 8(ISCB Comm J):1750 (https://doi.org/10.12688/f1000research.20590.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 14 Oct 2019
Views
26
Cite
Reviewer Report 21 Feb 2020
Melissa Cline, Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA 
Approved with Reservations
VIEWS 26
Ryan et al. present a manuscript for the interactive clustered heat map builder tool that has been widely used in cancer research consortia. The tool is of excellent quality overall, is very useful, and is intuitive in its design and ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Cline M. Reviewer Report For: Interactive Clustered Heat Map Builder: An easy web-based tool for creating sophisticated clustered heat maps [version 1; peer review: 1 approved, 1 approved with reservations]. F1000Research 2019, 8(ISCB Comm J):1750 (https://doi.org/10.5256/f1000research.22637.r59179)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 19 Mar 2020
    Michael Ryan, In Silico Solutions, Fairfax, 22031, USA
    19 Mar 2020
    Author Response
    Thank you for your feedback and suggestions.  Below we have described how each was addressed.

    Major feedback:
    The authors are not doing justice to the tool, which offers much more than user-friendly heatmap
    ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 19 Mar 2020
    Michael Ryan, In Silico Solutions, Fairfax, 22031, USA
    19 Mar 2020
    Author Response
    Thank you for your feedback and suggestions.  Below we have described how each was addressed.

    Major feedback:
    The authors are not doing justice to the tool, which offers much more than user-friendly heatmap
    ... Continue reading
Views
27
Cite
Reviewer Report 29 Oct 2019
Natasha Caplen, Genetics Branch, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA 
Soumya Sundara Rajan, Genetics Branch, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA 
Approved
VIEWS 27
Ryan and co-workers have developed the software tool Interactive Clustered Heat Map (CHM) builder to enable investigators with minimal expertise in bioinformatics and biostatistics to generate publication-quality heatmaps. The use of heatmaps to visualize related datasets is a common feature ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Caplen N and Sundara Rajan S. Reviewer Report For: Interactive Clustered Heat Map Builder: An easy web-based tool for creating sophisticated clustered heat maps [version 1; peer review: 1 approved, 1 approved with reservations]. F1000Research 2019, 8(ISCB Comm J):1750 (https://doi.org/10.5256/f1000research.22637.r55133)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 19 Mar 2020
    Michael Ryan, In Silico Solutions, Fairfax, 22031, USA
    19 Mar 2020
    Author Response
    Thank you for your detailed comments and suggestions on the article and the tool.  Each suggestion/comment is addressed below:

    Article
     
    In the Introduction, the authors discuss the user’s ability to use their
    ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 19 Mar 2020
    Michael Ryan, In Silico Solutions, Fairfax, 22031, USA
    19 Mar 2020
    Author Response
    Thank you for your detailed comments and suggestions on the article and the tool.  Each suggestion/comment is addressed below:

    Article
     
    In the Introduction, the authors discuss the user’s ability to use their
    ... Continue reading

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 14 Oct 2019
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.