Keywords
Virtual Reality, Principal Component Analysis, Visualization, High-dimensional, Unity
Virtual Reality, Principal Component Analysis, Visualization, High-dimensional, Unity
Historically, we have heavily relied on 2-dimensional (2D) graphical displays to communicate large amounts of data. These graphs have also been useful in finding patterns within datasets and building intuition for more accurate and meaningful analysis. However, for large and complex datasets containing numerous dimensions, traditional 2D charts and graphs are inadequate in demonstrating the multi-faceted nature of relevant information.
The 3-dimensional (3D) visualization of datasets are valuable because they offer a starting solution to the problem above; the addition of another dimension allows for more information to be presented and thus decreases the potential for misinterpretation while concurrently increasing the possibility of pattern-matching and building intuition.
This paper researches the potential of using virtual reality (VR) as a platform to graphically render datasets in 3D by creating a visualization application. VR is already being used in a variety of fields including flight simulations1, mental health therapy2, and even visualizations of molecules and their interactions3. In the specific field of data visualization, several applications exist, including a surround-screen, projection-based visualizer named CAVE4, one developed using OpenGL that visualizes economic data5, and iViz6, an efficient and intuitive visualizer using VR that is also the most similar to the application developed in this research. DataViz attempts to make further progress by creating a modern, intuitive, and readily available application.
We continue to explore the potential of VR in the graphical rendering of large datasets; to do so, we have developed a Unity3D VR application for HTC Vive (HTC, New Taipei City, Taiwan) that runs principal component analysis (PCA) on datasets before graphing the subsequent projection into three dimensions. The software was designed to run efficiently with an intuitive interface.
In the design of this application, special consideration was given to the following elements: the method of data analysis, the format of the input data, the limitations in computing power of the selected platform, and the mitigation of motion sickness.
The primary method of data analysis is PCA. The rationale behind this decision is that because humans live in three dimensions, the most intuitive manner of visualization is one that plots in that space. In this sense, PCA is excellent at taking large dimensional data and reducing them to plottable 3D coordinates, making the resulting graph more intuitive, and helping users discover patterns and develop scientific intuition.
DataViz only accepts data in the table format (CSV or TXT). Occasionally, the user would want to analyze the transpose of the provided data. Although the transpose of a table could easily be found using specialized functions in Numpy or R, we decided to add the transpose functionality into the application.
In addition to transposition, DataViz also allows the user to omit specific columns from the file. This may be due to a variety of reasons including an unwanted dimension of data or column names. This functionality allows researchers to analyze only the columns they are interested in.
The user may also have a column that labels the points. Users can designate a specific column that differentiates the data with various tags, and these groups will show up in a graph legend during runtime.
The engine used in developing this application is Unity®. Unity is one of the most popular platforms for VR development but is not specifically designed for statistical analysis. Therefore, PCA on large datasets may result in slow run times, especially when there is a lack of an appropriate graphics card or other computational power involved. To overcome this limitation, the application can also accept coordinate data derived from PCA or other dimensionality reduction methods such as t-SNE7. In this manner, users can circumvent the slower computations associated with Unity.
When implementing the VR aspect of the application, we concentrated on two main considerations: immersion and motion sickness. For the former, the primary goal was to allow the user to focus on the graphical rendering of his/her data without being bothered by the complicated details on how to use the tool. In pursuit of this, we designed an intuitive interface and series of menus, with clear instructions on the associated GitHub page in ‘Software Availability’.
Another concern when designing for VR was motion sickness. Motion sickness is a consequence of conflicting input between visual and inner ear senses and is a major problem in current VR simulations8. It has been found that motion sickness is a consequence of the action of motion and not displacement itself, and as a result, we designed our movement to be in short bursts of teleportation.
The application is built using the Unity® engine with scripting done in C#. The PCA and transpose implementation is from the Accord.Net 3.8 framework (http://accord-framework.net). The mouse embryonic development data used in the case study is from Ref 9.
DataViz was designed to be an intuitive application for graphically rendering large datasets. Upon opening the software, a user should follow the onscreen prompts and fill out the appropriate parameters to input their dataset as well as use the extra functionalities described above. DataViz automatically runs PCA on the input dataset according to user configurations. If needed, more detailed instructions can be found on the associated GitHub page.
VR is a resource intensive activity. The following are guidelines for ensuring the quality and performance of DataViz.
System Requirements (https://www.vive.com/us/ready/):
The primary goal in the development of this application was to determine the viability of using VR to graphical render and analyze complex data sets. After development, we tested the DataViz by analyzing a high-dimensional dataset regarding mouse embryonic development9. Using single-cell RNA sequencing (scRNA-Seq), Deng et al. generated hundreds of expression profiles of individual embryonic cells from zygote blastocyst stages.
By graphically rendering the 3D PCA projection of the data and subsequent analysis, we were able to verify an expected trend of embryo development; initial cell division (zygote stage to 16-cell stage) results in large-scale physical changes inside the embryo. This is in contrast to later cell division where the various stages of embryo development are more similar to one another. We can also see the developmental trajectory in the transcriptomic landscape (Figure 1).
The graph displays the similarities among the blastocyst stages in comparison to changes in earlier stages of development. We can identify categories and general trends of the data using this method.
This method of analysis has some limitations, the foremost being an inability to account for all the data present. While reducing high-dimensional data to three dimensions simplifies the resulting plot and may help formulate testable hypotheses through further research or build intuition and comprehension regarding the data provided, it is inevitable that we lose some of the variance present in higher dimensions. In this test case, Table 1 reveals the proportion of the data retained per principal component. One way of overcoming this would be to use non-linear dimensionality reduction methods like multidimensional scaling (MDS) or t-SNE.
For example, in the test case of mouse embryo development, the resulting three-dimensional graph could only reveal 33% of the original dataset.
Variable | PC 1 | PC 2 | PC 3 |
---|---|---|---|
Proportion of variance | 0.210 | 0.088 | 0.034 |
Cumulative proportion | 0.210 | 0.300 | 0.332 |
Despite the shortcomings involved in the provided analysis and plotting approach, DataViz is still useful for categorizing the data into disjoint groups.
Two of the primary motivations for using VR to visualize data were the introduction of a third dimension as well as increased interactivity with data. As shown by the Use Case, although the current functionality is limited to PCA, the application is useful in demonstrating the potential that VR has to offer in the analysis and communication of large, complex datasets.
To understand this potential further, future research should focus on human trials in determining the statistical difference between the traditional 3D plot on a computer screen and a VR simulation regarding data comprehension and analysis. Additionally, in order to account for more variance in the original dataset, future research should consider other dimensionality reduction methods.
We have developed an application for visualizing high-dimensional data in VR. It reduces high-dimensional data using PCA before generating an immersive 3D scatter plot. It also contains a variety of functionalities including the ability to transpose the given input and to accept raw coordinate data. A major limitation of DataViz is its inability to account for the full variance in the dataset. Also, the amount of benefit that visualization receives from being in VR as opposed to on a 2D monitor is unknown.
The data of mouse embryo development can be found in Deng et al., 20149
Source code and additional instructions available at: https://github.com/thunder2011/DataViz
Archived source code at time of publication: https://doi.org/10.5281/zenodo.145578710.
This material is based upon work supported by the National Science Foundation/EPSCoR Grant Number IIA – 1355423 and by the State of South Dakota. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the view of the National Science Foundation.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the rationale for developing the new software tool clearly explained?
Yes
Is the description of the software tool technically sound?
Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?
Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?
Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?
Yes
Competing Interests: No competing interests were disclosed.
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | |
---|---|
1 | |
Version 1 23 Oct 18 |
read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)