Keywords
biom-format, ecology, meta-genomics, biojs, parser, meta-barcoding
This article is included in the BioJS collection.
biom-format, ecology, meta-genomics, biojs, parser, meta-barcoding
In recent years, there has been an enormous increase in biological data available from high-throughput studies. Despite this increase, for many of these studies the general basic layout of the data is similar to traditional assessment after bioinformatical processing, yet complications arise due to the increased size of the data tables. This is the case for transcriptomic and marker-gene community data, where the central matrix consists of counts for each observation (e.g. gene or taxon) in each sample, plus a second and third matrix for metadata of both taxa and samples, respectively.
To avoid handling three matrices and to standardize the data deposition for downstream analyses, the Biological Observation Matrix (BIOM) Format was developed1. One main purpose of the BIOM format is to enhance interoperability between different software suits. Many current leading tools in community ecology and metagenomics support the BIOM format, e.g. QIIME2, MG-RAST3, PICRUSt4, phyloseq5, VAMPS6 and Phinch7. Additionally, libraries exist in Python1, R8 and Perl9 to propagate the standardized use of the format.
Interactive visualization of biological data in a web browser is becoming more and more popular10,11. For the development of web applications that support BIOM data, a corresponding library is currently lacking and would be very useful, since several challenges arise when trying to handle BIOM data. While BIOM format version 1 builds on the JSON format and thus is natively supported by JavaScript, the more recent BIOM format version 2 uses HDF5 and can therefore not be handled natively. Also the internal data storage can be either dense or sparse so applications have to handle both cases. Furthermore application developers need to be very careful when modifying BIOM data as changes that do not abide to the specification will break interoperability with other tools. Here we present biojs-io-biom, a JavaScript module that provides a unified interface to read, modify, and write BIOM data. It can be readily used as a library by applications that need to handle BIOM data for import or export directly in the browser. To demonstrate the utility of this module it has been used to implement a simple user interface for the biom-conversion-server12. Additionally, the popular BIOM visualization tool Phinch7 has been forked and extended with new features, in particular support for BIOM version 2 by integrating biojs-io-biom13. This fork is available as Blackbird via https://github.com/molbiodiv/Blackbird.
The biojs-io-biom library can be used to create new objects (called Biom objects for brevity) by either loading file content directly via the static parse function or by initialization with a JSON object:
var biom = new Biom({
id: ’My Biom’,
matrix_type: ’dense’,
shape: [2,2],
rows: [
{id: ’row1’, metadata: {}},
{id: ’row2’, metadata: {}}
],
columns: [
{id: ’col1’, metadata: {}},
{id: ’col2’, metadata: {}}
],
data: [
[0,1],
[2,3]
]
});
The data is checked for integrity and compliance with the BIOM specification. Missing fields are created with default content. All operations that set attributes of the Biom object with the dot notation are also checked and prompt an error if they are not allowed.
var biom = new Biom({});
biom.id = [];
// Will throw a TypeError as id has to be a string or null
Beside checking and maintaining integrity the biojs-io-biom library implements convenience functions. This includes getter and setter for metadata as well as data accession functions that are agnostic to internal representation (dense or sparse). But one of the main features of this library is the capability of handling BIOM data in both versions 1 and 2 by interfacing with the biom-conversion-server12. Handling of BIOM version 2 in JavaScript directly is not possible due to it’s HDF5 binary format. The only reference implementation of the format is in C and trying to transpile the library to JavaScript using emscripten14 failed due to strong reliance on file operations (see discussions: 15,16). Using the conversion server allows developers to use BIOM of both versions transparently. Biom objects also expose the function write which exports it as version 1 or version 2.
To demonstrate the utility of this module it has been used to implement a user interface for the biom-conversion-server12. Besides providing an API it is now also possible to upload files using a file dialog. The uploaded file is checked using this module and converted to version 1 on the fly if necessary. It can then be downloaded in either version 1 or 2. As most of the functionality is provided by the biojs-io-biom module the whole interface is simply implemented with a few additional lines of code.
As a second example the Phinch framework7 has been forked to Blackbird13 and enhanced to allow BIOM version 2. Phinch visualizes the content of BIOM files using a variety of interactive plots. However due to the difficulties of handling HDF5 data only BIOM version 1 is supported. This is unfortunate as most tools nowadays return BIOM version 2 (e.g. QIIME from version 1.9,12 and Qiita17). It is possible to convert from version 2 to version 1 without loss of information but that requires an extra step using the command line. By including this biojs-io-biom module and the biom-conversion-server into Blackbird it was possible to add support for BIOM version 2 along with some other improvements13.
As the biojs-io-biom module resolves the import and export challenges, one of the next steps is the development of a further BioJS module to present BIOM data as a set of data tables. In order to do that for large datasets sophisticated, accession functions capitalizing on the sparse data representation have to be implemented.
The module biojs-io-biom was developed to enhance the import and export of BIOM data into JavaScript. It’s utility and versatility has been demonstrated in two example applications. It is implemented using latest web technologies, well tested and well documented. It provides a unified interface and abstracts from details like version or internal data representation. Therefore it will facilitate the development of web applications that rely on the BIOM format.
Latest source code https://github.com/molbiodiv/biojs-io-biom
Archived source code as at the time of publication https://zenodo.org/record/61698
License MIT
Latest source code https://github.com/molbiodiv/biom-conversion-server
Archived source code as at the time of publication https://zenodo.org/record/61704
License MIT
Latest source code https://github.com/molbiodiv/Blackbird
Archived source code as at the time of publication https://zenodo.org/record/61721
License BSD 2-Clause
Methodology: MJA and SH. Investigation: MJA and NT. Software: MJA. Supervision: AK and FF. Writing - original draft: MJA. Writing - review and editing: All authors.
MJA was supported by a grant of the German Excellence Initiative to the Graduate School of Life Sciences, University of Würzburg (Grant Number GSC 106/3). This publication was supported by the Open Access Publication Fund of the University of Würzburg.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
We are grateful to Franziska Saul for fruitful discussions on user interface design. We further thank members of the biom-format, Phinch and hdf5.node projects for quick, kind and helpful responses to our requests.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
References
1. Brazma A, Hingamp P, Quackenbush J, Sherlock G, et al.: Minimum information about a microarray experiment (MIAME)-toward standards for microarray data.Nat Genet. 2001; 29 (4): 365-71 PubMed Abstract | Publisher Full TextCompeting Interests: No competing interests were disclosed.
Competing Interests: I am the Principal Investigator on the Phinch framework (http://phinch.org) which is the underlying codebase used to generate the "Blackbird" application mentioned in this manuscript.
Competing Interests: No competing interests were disclosed.
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | |||
---|---|---|---|
1 | 2 | 3 | |
Version 2 (revision) 09 Jan 17 |
read | read | |
Version 1 20 Sep 16 |
read | read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)