ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Software Tool Article

BiocPkgTools: Toolkit for mining the Bioconductor package ecosystem

[version 1; peer review: 2 approved, 1 approved with reservations]
PUBLISHED 29 May 2019
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Bioconductor gateway.

This article is included in the RPackage gateway.

This article is included in the Bioinformatics gateway.

Abstract

Motivation: The Bioconductor project, a large collection of open source software for the comprehension of large-scale biological data, continues to grow with new packages added each week, motivating the development of software tools focused on exposing package metadata to developers and users. The resulting BiocPkgTools package facilitates access to extensive metadata in computable form covering the Bioconductor package ecosystem, facilitating downstream applications such as custom reporting, data and text mining of Bioconductor package text descriptions, graph analytics over package dependencies, and custom search approaches.
Results: The BiocPkgTools package has been incorporated into the Bioconductor project, installs using standard procedures, and runs on any system supporting R. It provides functions to load detailed package metadata, longitudinal package download statistics, package dependencies, and Bioconductor build reports, all in "tidy data" form. BiocPkgTools can convert from tidy data structures to graph structures, enabling graph-based analytics and visualization. An end-user-friendly graphical package explorer aids in task-centric package discovery. Full documentation and example use cases are included.
Availability: The BiocPkgTools software and complete documentation are available from Bioconductor (https://bioconductor.org/packages/BiocPkgTools).

Keywords

bioinformatics, r, bioconductor, software, reproducible research

Introduction

Bioconductor is a open source software project (comprising 1741 individual analysis packages) and community for the analysis and comprehension of large-scale biological data. Newly submitted software packages undergo a technical review to ensure that best practices and Bioconductor coding conventions are followed. The project maintains an automated build system that ensures that packages in the Bioconductor project are compiled and built successfully and pass basic checks. Package downloads are tracked and aggregated by package and month, longitudinally. Finally, package details such as title, description, version, author, and dependencies on other R packages are compiled based on package metadata.

The current size and growth of the Bioconductor project suggests that there is merit in exposing computable forms of the metadata describing the Bioconductor package ecosystem. To that end, we developed a small suite of tools, BiocPkgTools, to provide easy access to project details such as download statistics, bulk package metadata, and package build status. Developers, project leaders, open source software researchers, and Bioconductor end users can build on the availability of these data for applications such as custom reporting, dependency graph analytics, package filtering, and text mining.

Features and usage

The core functionality of BiocPkgTools is to expose Bioconductor project and package metadata as tidy data1 objects (Figure 1). The data presented by the package are accessed directly from online resources available from Bioconductor. As such, the package relies on web connectivity and collects the most recent data. Installation instructions are detailed on the package website.

b88065f4-9a0e-4557-b64d-181b6b852b8e_figure1.gif

Figure 1. Schematic overview of the BiocPkgtools package.

BiocPkgTools can access and transform web-accessible resources including package metadata, download statistics, dependencies between packages, and updated Bioconductor build report status to "tidy data" reports that can be manipulated using standard R tools. Interactive package exploration is also available.

Package functionality can be roughly divided into data access, data presentation, and graph/network functionality. See Table 1 for an overview.

After installing BiocPkgTools, the biocDownloadStats function can generate a tidy data structure summarizing monthly download statistics (both total and unique IP addresses) for all Bioconductor packages.

library(BiocPkgTools)                                                 
dlstats = biocDownloadStats()                                         
head(dlstats, 3)                                                      
                                                                      
## # A tibble: 3 x 6                                                  
##   <fct>   <int> <fct>              <int>           <int> <chr>     
## 1 ABarray  2018 Jan                  117             150 Software  
## 2 ABarray  2018 Feb                   97             125 Software  
## 3 ABarray  2018 Mar                  102             121 Software  

Table 1. Main package functions and descriptions.

NameFunctionality
biocPkgListPackage details including description, author and maintainer, dependencies,
URLs, bug report mechanism
biocDownloadStatsMonthly download statistics for all packages
biocbuildReport Bioconductor build report for all packages and systems
biocExploreInteractive, browsable “bubble plot” of Bioconductor packages and details
problemPageInteractive, customized build report for an individual package author
buildPkgDependencyDataFramePackage dependencies as data frame
buildPkgDependencyIgraphPackage dependencies as a graph2
inducedSubgraphByPkgsCreate a minimal subgraph of Bioconductor dependencies based on specific
packages
subgraphByDegreeCreate a subgraph of all packages within a given degree of a single package

The biocBuildReport function gathers information from the Bioconductor build report site and can be used, for example, to summarize the “build status” for all Bioconductor pacakages.

buildrep = biocBuildReport(version = "3.9")       
table(buildrep$stage, buildrep$result)            
                                                  
##                                                
##           ERROR   OK skipped TIMEOUT WARNINGS  
##  buildbin     2 3352      70       0        0  
##  buildsrc    93 5057       0       5        0  
##  checksrc    57 4181      98       8      811  
##  install     39 5116       0       0        0  

These data are useful to developers to track the health of their software either programmatically or via a searchable, sortable table from the problemPage function.

As an alternative to basic web browser search and the Bioconductor online software list, the biocExplore function offers interactive and graphical approach to package browsing (see Figure 2). The biocExplore widget allows browsing packages under different biocViews, Bioconductor’s software catergory tags. This interactively visualises the relative number of downloads each package has under different biocViews, allowing users to quickly determine which packages are most commonly used for different analysis tasks.

b88065f4-9a0e-4557-b64d-181b6b852b8e_figure2.gif

Figure 2. The biocExplore function opens an interactive web application that allows users to select focused groups of Bioconductor packages to view as a bubble plot.

Bubbles are sized based on download statistics. Hovering over a bubble will give download number while clicking on a bubble will pop up a package details page, including a link to the package landing page.

The Bioconductor package ecosystem is, by design, highly interconnected via package dependencies. Several functions in the BiocPkgTools package provide examples of package dependency graph creation and visualization. Figure 3 displays packages within one degree of dependency relationship of the GEOquery package.

b88065f4-9a0e-4557-b64d-181b6b852b8e_figure3.gif

Figure 3. The subgraphByDegree function builds a data visualization of dependencies between all packages within one degree of the GEOquery package using the visNetwork package3.

Links are colored based on type (Suggests [light blue], Depends [green], and Imports [red]) and arrows point to the “dependent” package.

Implementation

BiocPkgTools is implemented as a standard R package and hosted in the Bioconductor repository. All functions are documented and include examples. An included tutorial (vignette) demonstrates features and capabilities.

Discussion

The BiocPkgTools package comprises a set of functions for accessing software metadata from the growing collection of Bioconductor packages. For software developers, this metadata can be useful for tracking package build status and the health of package dependencies. Easy access to descriptive package metadata for all Bioconductor software resources can empower researchers or users interested in text mining, custom package search, or analysis of the existing software ecosystem. BiocPkgTools can provide easy access to metrics of Bioconductor sofware usage that are increasingly being incorporated into funding and promotion decisions.

Data availability

All data accessed and used by the BiocPkgTools package are publicly available and are updated regularly at the Bioconductor project.

Software availability

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 29 May 2019
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Su S, Carey VJ, Shepherd L et al. BiocPkgTools: Toolkit for mining the Bioconductor package ecosystem [version 1; peer review: 2 approved, 1 approved with reservations]. F1000Research 2019, 8:752 (https://doi.org/10.12688/f1000research.19410.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 29 May 2019
Views
17
Cite
Reviewer Report 21 Jun 2019
Henrik Bengtsson, Division of Bioinformatics, Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, CA, USA 
Approved with Reservations
VIEWS 17
This article presents the BiocPkgTools package, which provides an R API to the various package metadata that is available mostly in human-readable formats on the https://bioconductor.org/ website. By providing an R API for accessing package metadata, the authors argue that ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Bengtsson H. Reviewer Report For: BiocPkgTools: Toolkit for mining the Bioconductor package ecosystem [version 1; peer review: 2 approved, 1 approved with reservations]. F1000Research 2019, 8:752 (https://doi.org/10.5256/f1000research.21277.r49217)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
14
Cite
Reviewer Report 19 Jun 2019
Mike Smith, Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany 
Approved
VIEWS 14
In this paper Su et al present BiocPkgTools, an R package that provides programmatic access to metadata about software in the Bioconductor project. The package is available from Bioconductor and the source code can be easily viewed on Github. The ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Smith M. Reviewer Report For: BiocPkgTools: Toolkit for mining the Bioconductor package ecosystem [version 1; peer review: 2 approved, 1 approved with reservations]. F1000Research 2019, 8:752 (https://doi.org/10.5256/f1000research.21277.r49235)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
22
Cite
Reviewer Report 10 Jun 2019
Simina M. Boca, Innovation Center for Biomedical Informatics, Department of Oncology, Georgetown University Medical Center, Washington, DC, USA 
Approved
VIEWS 22
This article presents the new Bioconductor package, BiocPkgTools, which allows users to obtain various statistics about Bioconductor packages, including the number of downloads by month, the build status, and the package dependencies.

Overall, this seems to be ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Boca SM. Reviewer Report For: BiocPkgTools: Toolkit for mining the Bioconductor package ecosystem [version 1; peer review: 2 approved, 1 approved with reservations]. F1000Research 2019, 8:752 (https://doi.org/10.5256/f1000research.21277.r49219)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 29 May 2019
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.