ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Software Tool Article

DataViz: visualization of high-dimensional data in virtual reality

[version 1; peer review: 1 not approved]
PUBLISHED 23 Oct 2018
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

Abstract

Virtual reality (VR) simulations promote interactivity and immersion, and provide an opportunity that may help researchers gain insights from complex datasets. To explore the utility and potential of VR in graphically rendering large datasets, we have developed an application for immersive, 3-dimensional (3D) scatter plots. Developed using the Unity development environment, DataViz enables the visualization of high-dimensional data with the HTC Vive, a relatively inexpensive and modern virtual reality headset available to the general public. DataViz has the following features: (1) principal component analysis (PCA) of the dataset; (2) graphical rendering of said dataset’s 3D projection onto its first three principal components; and (3) intuitive controls and instructions for using the application. As a use case, we applied DataViz to visualize a single-cell RNA-Seq dataset. DataViz can help gain insights from complex datasets by enabling interaction with high-dimensional data.

Keywords

Virtual Reality, Principal Component Analysis, Visualization, High-dimensional, Unity

Introduction

Historically, we have heavily relied on 2-dimensional (2D) graphical displays to communicate large amounts of data. These graphs have also been useful in finding patterns within datasets and building intuition for more accurate and meaningful analysis. However, for large and complex datasets containing numerous dimensions, traditional 2D charts and graphs are inadequate in demonstrating the multi-faceted nature of relevant information.

The 3-dimensional (3D) visualization of datasets are valuable because they offer a starting solution to the problem above; the addition of another dimension allows for more information to be presented and thus decreases the potential for misinterpretation while concurrently increasing the possibility of pattern-matching and building intuition.

This paper researches the potential of using virtual reality (VR) as a platform to graphically render datasets in 3D by creating a visualization application. VR is already being used in a variety of fields including flight simulations1, mental health therapy2, and even visualizations of molecules and their interactions3. In the specific field of data visualization, several applications exist, including a surround-screen, projection-based visualizer named CAVE4, one developed using OpenGL that visualizes economic data5, and iViz6, an efficient and intuitive visualizer using VR that is also the most similar to the application developed in this research. DataViz attempts to make further progress by creating a modern, intuitive, and readily available application.

We continue to explore the potential of VR in the graphical rendering of large datasets; to do so, we have developed a Unity3D VR application for HTC Vive (HTC, New Taipei City, Taiwan) that runs principal component analysis (PCA) on datasets before graphing the subsequent projection into three dimensions. The software was designed to run efficiently with an intuitive interface.

Methods

Implementation

In the design of this application, special consideration was given to the following elements: the method of data analysis, the format of the input data, the limitations in computing power of the selected platform, and the mitigation of motion sickness.

Data analysis

The primary method of data analysis is PCA. The rationale behind this decision is that because humans live in three dimensions, the most intuitive manner of visualization is one that plots in that space. In this sense, PCA is excellent at taking large dimensional data and reducing them to plottable 3D coordinates, making the resulting graph more intuitive, and helping users discover patterns and develop scientific intuition.

Input data

DataViz only accepts data in the table format (CSV or TXT). Occasionally, the user would want to analyze the transpose of the provided data. Although the transpose of a table could easily be found using specialized functions in Numpy or R, we decided to add the transpose functionality into the application.

In addition to transposition, DataViz also allows the user to omit specific columns from the file. This may be due to a variety of reasons including an unwanted dimension of data or column names. This functionality allows researchers to analyze only the columns they are interested in.

The user may also have a column that labels the points. Users can designate a specific column that differentiates the data with various tags, and these groups will show up in a graph legend during runtime.

Limitations in computing power

The engine used in developing this application is Unity®. Unity is one of the most popular platforms for VR development but is not specifically designed for statistical analysis. Therefore, PCA on large datasets may result in slow run times, especially when there is a lack of an appropriate graphics card or other computational power involved. To overcome this limitation, the application can also accept coordinate data derived from PCA or other dimensionality reduction methods such as t-SNE7. In this manner, users can circumvent the slower computations associated with Unity.

VR considerations

When implementing the VR aspect of the application, we concentrated on two main considerations: immersion and motion sickness. For the former, the primary goal was to allow the user to focus on the graphical rendering of his/her data without being bothered by the complicated details on how to use the tool. In pursuit of this, we designed an intuitive interface and series of menus, with clear instructions on the associated GitHub page in ‘Software Availability’.

Another concern when designing for VR was motion sickness. Motion sickness is a consequence of conflicting input between visual and inner ear senses and is a major problem in current VR simulations8. It has been found that motion sickness is a consequence of the action of motion and not displacement itself, and as a result, we designed our movement to be in short bursts of teleportation.

The application is built using the Unity® engine with scripting done in C#. The PCA and transpose implementation is from the Accord.Net 3.8 framework (http://accord-framework.net). The mouse embryonic development data used in the case study is from Ref 9.

Operation

DataViz was designed to be an intuitive application for graphically rendering large datasets. Upon opening the software, a user should follow the onscreen prompts and fill out the appropriate parameters to input their dataset as well as use the extra functionalities described above. DataViz automatically runs PCA on the input dataset according to user configurations. If needed, more detailed instructions can be found on the associated GitHub page.

VR is a resource intensive activity. The following are guidelines for ensuring the quality and performance of DataViz.

System Requirements (https://www.vive.com/us/ready/):

  • Processor: Intel i5-4590 / AMD FX 8350 equivalent or greater

  • Graphics card: NVIDIA GeForce GTX 1060 or AMD Radeon RX 480, equivalent or better

  • Memory: 4 GB RAM or more

  • Video output: HDMI 1.4 or DisplayPort 1.2 or newer

  • USB: 1x USB 2.0 or newer

  • Operating system: Windows 7 SP1 or newer

Use case

Mouse embryo development

The primary goal in the development of this application was to determine the viability of using VR to graphical render and analyze complex data sets. After development, we tested the DataViz by analyzing a high-dimensional dataset regarding mouse embryonic development9. Using single-cell RNA sequencing (scRNA-Seq), Deng et al. generated hundreds of expression profiles of individual embryonic cells from zygote blastocyst stages.

By graphically rendering the 3D PCA projection of the data and subsequent analysis, we were able to verify an expected trend of embryo development; initial cell division (zygote stage to 16-cell stage) results in large-scale physical changes inside the embryo. This is in contrast to later cell division where the various stages of embryo development are more similar to one another. We can also see the developmental trajectory in the transcriptomic landscape (Figure 1).

f07b6400-4554-4f84-925f-1ae6cc5c42d5_figure1.gif

Figure 1. The application when plotting provided mouse embryonic development coordinate data.

The graph displays the similarities among the blastocyst stages in comparison to changes in earlier stages of development. We can identify categories and general trends of the data using this method.

This method of analysis has some limitations, the foremost being an inability to account for all the data present. While reducing high-dimensional data to three dimensions simplifies the resulting plot and may help formulate testable hypotheses through further research or build intuition and comprehension regarding the data provided, it is inevitable that we lose some of the variance present in higher dimensions. In this test case, Table 1 reveals the proportion of the data retained per principal component. One way of overcoming this would be to use non-linear dimensionality reduction methods like multidimensional scaling (MDS) or t-SNE.

Table 1. The application is unable to account for the full variance in the data.

For example, in the test case of mouse embryo development, the resulting three-dimensional graph could only reveal 33% of the original dataset.

VariablePC 1PC 2PC 3
Proportion of variance0.2100.0880.034
Cumulative proportion0.2100.3000.332

PC, principal component.

Despite the shortcomings involved in the provided analysis and plotting approach, DataViz is still useful for categorizing the data into disjoint groups.

Discussion

Two of the primary motivations for using VR to visualize data were the introduction of a third dimension as well as increased interactivity with data. As shown by the Use Case, although the current functionality is limited to PCA, the application is useful in demonstrating the potential that VR has to offer in the analysis and communication of large, complex datasets.

To understand this potential further, future research should focus on human trials in determining the statistical difference between the traditional 3D plot on a computer screen and a VR simulation regarding data comprehension and analysis. Additionally, in order to account for more variance in the original dataset, future research should consider other dimensionality reduction methods.

Conclusion

We have developed an application for visualizing high-dimensional data in VR. It reduces high-dimensional data using PCA before generating an immersive 3D scatter plot. It also contains a variety of functionalities including the ability to transpose the given input and to accept raw coordinate data. A major limitation of DataViz is its inability to account for the full variance in the dataset. Also, the amount of benefit that visualization receives from being in VR as opposed to on a 2D monitor is unknown.

Data availability

The data of mouse embryo development can be found in Deng et al., 20149

Software availability

Source code and additional instructions available at: https://github.com/thunder2011/DataViz

Archived source code at time of publication: https://doi.org/10.5281/zenodo.145578710.

License: GNU Lesser General Public License v2.1

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 23 Oct 2018
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Feng E and Ge X. DataViz: visualization of high-dimensional data in virtual reality [version 1; peer review: 1 not approved]. F1000Research 2018, 7:1687 (https://doi.org/10.12688/f1000research.16453.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 23 Oct 2018
Views
20
Cite
Reviewer Report 18 Dec 2018
David R. Glowacki, Mechanical Engineering, Stanford University, Stanford, CA, USA 
Michael O'Connor, University of Bristol, Bristol, UK 
Not Approved
VIEWS 20
In my opinion the work outlined in this paper represents an interesting software prototype, but at this stage my impression is that this work is still very much in the ‘prototype’ phase and not yet ready for full publication. Developing ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Glowacki DR and O'Connor M. Reviewer Report For: DataViz: visualization of high-dimensional data in virtual reality [version 1; peer review: 1 not approved]. F1000Research 2018, 7:1687 (https://doi.org/10.5256/f1000research.17984.r40657)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 23 Oct 2018
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.