ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Software Tool Article

CBN2Path: an R/Bioconductor package for the analysis of cancer progression pathways using Conjunctive Bayesian Networks

[version 1; peer review: awaiting peer review]
PUBLISHED 29 Aug 2025
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS AWAITING PEER REVIEW

This article is included in the RPackage gateway.

This article is included in the Bioconductor gateway.

Abstract

Conjunctive Bayesian Networks (CBN) are probabilistic graphical models used to describe mutation accumulation processes such as tumorigenesis. Several CBN models exist, which have enabled the analysis and modeling of cancer progression pathways using cross-sectional genomic data. However, these models are implemented in different languages with heterogeneous input and output formats. Moreover, the recent developments towards robust inference of cancer progression pathways (i.e., the R-CBN and B-CBN models), highlight the need for departure from the maximum-likelihood-based frameworks (i.e., the CT-CBN and H-CBN models), which requires substantial implementational adjustments. Thus, we introduce the CBN2Path R/Bioconductor package that not only provides a unifying interface to accommodate all CBN models, but it also offers the necessary functionalities to facilitate robust inference, analysis and visualization of cancer progression pathways.

Keywords

R/Bioconductor Package, Conjunctive Bayesian Networks, CT-CBN, H-CBN, B-CBN, R-CBN, Cancer Progression Pathways, Fitness Landscapes

Introduction

Tumorigenesis is a stepwise process driven by a sequence of molecular changes that are described as pathways of cancer progression. Conjunctive Bayesian Networks (CBN) are probabilistic graphical models designed for the analysis and modeling of these pathways.1 CBN models have evolved into different varieties such as CT-CBN,2 H-CBN,3 B-CBN,4 and R-CBN,5 each addressing different aspects of this task. However, the software corresponding to these methods is not well integrated because they are implemented in different languages with heterogeneous input and output formats. This necessitates a unifying platform that integrates these models and enables the standardization of input and output formats. Evam-tools6 is an R package that takes the initial steps towards this end. However, it does not include the B-CBN model or the recently developed R-CBN algorithm, which focuses on robust inference of cancer progression pathways.5 Importantly, the B-CBN and R-CBN algorithms for pathway quantification necessitate exhaustive consideration and weighting of all potential dependency structures (posets) within the mutational quartets. This requires reimplementation of the CBN models and adjustment of downstream pathway analysis and modeling functions. Therefore, here we introduce the CBN2Path R package that not only includes the original implementation of the CBN models (e.g., CT-CBN and H-CBN) in a unifying interface but also accommodates the necessary modifications to support robust CBN algorithms (e.g., B-CBN and R-CBN). Importantly, CBN2Path includes a collection of functions required to quantify predictability,7 analyze robustness,5 and visualize mutational pathways from pre-processed cross-sectional genomic data. It is important to note that the R-CBN method has great potential for wide application in future predictive models because of its unique ability to offer an optimal balance between robustness and predictability.5 Thus, we anticipate that CBN2Path will be a commonly used package in the field, particularly by providing a platform to facilitate future applications of the R-CBN model.

Methods

Implementation

CBN2Path is implemented as a standard R package and hosted in the Bioconductor repository. Furthermore, the developed version of CBN2Path is available on GitHub. All functions were documented, and examples were included. The main functions included in CBN2Path are listed in Table 1, and their features and capabilities are described in detail in the tutorial (vignette) accompanied by the package. CBN2Path can be installed and used mainly on Unix platforms.

if (!require("BiocManager", quietly = TRUE))
   install.packages("BiocManager")
BiocManager::install("CBN2Path")

Table 1. The main functions available in CBN2Path.

The main functions are listed and categorized into five parts: i) input preparation, ii) CBN models, iii) pathway quantification, iv) fitness landscape analysis, and v) the downstream analysis. For each function, the name (the first column), description (the second column) and the returned value (the third column) are provided.

Function nameDescription Value
Part I: Input Preparation
readPosetReads .poset filesThe poset matrix
readPatternReads .pat filesThe genotype matrix
readLambdaReads .lambda filesThe list of λ parameters
genotypeMatrixMutatorAdds false-positive and false negatives errors to a binary genotype matrixA mutated binary genotype matrix
Part II: CBN Models
ctcbnSingleRuns the CT-CBN model for a given genotype matrix and a single poset (the original implementation)The λ parameters, likelihood and the MLE poset
hcbnSingleRuns the H-CBN model for a given genotype matrix and a single poset (the original implementation)The λ parameters, likelihood and the MLE poset
CtcbnRuns the CT-CBN model for a given genotype matrix and a list of posets (the new implementation)A list of λ parameters and likelihood values
HcbnRuns the H-CBN model for a given genotype matrix and a list of posets (the new implementation)A list of λ parameters and likelihood values
BcbnRuns the B-CBN model for a given genotype matrixA list of MCMC-sampled DAGs
visualizeCBNModelGenerates a graphical representation of a given posetThe DAG corresponding to the given poset
Part III: Pathway Quantification
pathProbCBNQuantifies the pathway probabilities for a given poset and λ vectorThe pathway probability distribution (P(Π))
pathProbQuartetCTCBNQuantifies the CT-CBN based probabilities for all pathways of length 4The pathway probability distribution (P(Π))
pathProbQuartetHCBNQuantifies the H-CBN based probabilities for all pathways of length 4The pathway probability distribution (P(Π))
pathProbQuartetRCBNQuantifies the R-CBN based probabilities for all pathways of length 4The pathway probability distribution (P(Π))
pathProbQuartetBCBNQuantifies the B-CBN based probabilities for all pathways of length 4The pathway probability distribution (P(Π))
visualizeProbabilitiesGenerates graphical representation of all potential pathways of a given length and their probabilitiesA plot of the pathways and their probabilities
Part IV: Fitness Landscape Analysis
pathProbSSWMQuantifies the pathway probabilities for a given fitness landscape using the SSWM assumptionThe pathway probability distribution (P(Π))
visualizeFitnessLandscapeVisualizes a given fitness landscapeA color-coded plot of a given fitness landscape
Part V: Downstream Analyses
PredictabilityQuantifies the predictability for a given pathway probability distributionThe predictability score (Φ)
jensenShannonDivergenceQuantifies the Jensen-Shannon Divergence between two probability distributionsThe Jensen-Shannon Divergence (JSD) value
pathwayCompatibilityQuartetQuantifies the compatibility vector for all pathways of length 4The pathway compatibility vector (c(Π))

During the installation of CBN2Path, other packages that it depends on are automatically installed, including coda, cowplot, doMC, foreach, ggplot2, ggraph, grDevices, graphics, igraph, magrittr, patchwork, rlang, R6, stats, and tidygraph.

Operation

CBN2Path provides a unifying interface to implement different CBN models, which are utilized to facilitate the quantification, visualization, and analysis of mutational pathways. The CBN2Path has three main functionalities.

  • (a) The original implementation of the CT-CBN and H-CBN models, in which case the associated workflow for pathway quantification, is shown in Figure 1. In this setting, the genotype matrix and a given poset are used as inputs for the CBN models (Step 1), which output the estimated λ values and MLE poset structure. Subsequently, these outputs are used as inputs for pathway quantification and visualization functions (Step 2), which outputs the ultimate pathway probability distributions. These outputs are then utilized by downstream analysis functions (Step 3), which quantify the different properties of the pathway probability distributions, such as predictability, robustness, and compatibility.

  • (b) The second workflow was specifically designed to analyze mutational quartets ( Figure 2). In this setting, only a genotype matrix is required as the input, and specifying a given poset is not required. Basically, all potential 219 posets are considered and so the CT-CBN model is executed 219 times, leading to 219 different λ vectors, which are used to estimate 219 different probability distributions that will be aggregated to derive the ultimate probability distribution. The posets are weighted using an MCMC approach in the B-CBN-based approach, whereas they are weighted using their reciprocal rank in the R-CBN approach.5 Note that in the second workflow, unlike in the first one, there is no sequential input-output arrangement; rather, we have one single input (the genotype matrix) and one single output (the ultimate probability distribution), and the intermediate functions are called internally without a direct interface with the user.

  • (c) CBN2Path also provides the necessary functions for visualizing fitness landscapes and quantifying their associated pathway probabilities using evolutionary models that operate based on the Strong-Selection Weak-Mutation (SSWM) assumption ( Figure 3).

6b081f90-46b0-4895-ad01-2e5e5c9e7b69_figure1.gif

Figure 1. Workflow I.

CBN models (CT-CBN or H-CBN) take the genotype matrix and a given poset as input, and then output the estimated λ vector and the MLE poset (step 1), which will be used by the pathway inference functions to produce the inferred pathway probability distribution (step 2) that will be subsequently used by other downstream functions to measure different properties of the pathway probabilities (step 3).

6b081f90-46b0-4895-ad01-2e5e5c9e7b69_figure2.gif

Figure 2. Workflow II.

The R-CBN and B-CBN approaches require an alternative workflow for quantifying the pathway probability distributions, which only takes a genotype matrix as the input. Basically, under all 219 potential posets, the λ parameters are estimated using CT-CBN. Consequently, 219 different probability distributions are derived, which are then aggregated to generate the ultimate pathway probability distribution that will be used by the downstream functions for further analyses. Note that the B-CBN method utilizes an MCMC approach for weighting the posets, which is needed in the aggregation step.4 In contrast, in the R-CBN method, the likelihood outputted from the CT-CBN model under each of the 219 posets are considered, and pathways are weighted based on their reciprocal rank in terms of likelihood. Furthermore, R-CBN weights the pathways and updates their probabilities using their corresponding edge (marginal) probabilities.5 Note that in the workflow II there is no sequential input-output arrangement, but rather we have one single input (the genotype matrix) and one single output (the ultimate probability distribution). In other words, the intermediate functions are called internally, without a direct interface with the user.

6b081f90-46b0-4895-ad01-2e5e5c9e7b69_figure3.gif

Figure 3. Analysis and visualization of fitness landscapes.

CBN2Path enables visualization of fitness landscapes and quantifying pathway probability distributions based on evolutionary models under the Strong-Selection Weak-Mutation (SSWM) assumption.7

Use cases

Workflow I: CT-CBN and H-CBN based quantification of pathway probabilities

Preparing the input data

As shown in Figure 2, the original implementation of the CT-CBN and H-CBN models requires two input files: i) a “.pat” file, which contains binary genotype data, and ii) a “.poset” file that encodes a given poset. CBN2Path avoids reading files but accepts two matrices that are obtained after reading the above files. Importantly, to store input posets and genotype matrices, CBN2Path implements its own data structure, Spock, which includes read_pattern and read_poset methods to read, respectively the “.pat” and “.poset” files in the spock-data type.

library(CBN2Path)
example_path <- getExamples()[1]
input_poset <- readPoset(example_path)
input_pattern <- readPattern(example_path)
input_1 <- Spock$new(
     poset = input_poset$sets,
     numMutations = input_poset$mutations,
     genotypeMatrix = input_pattern
)

Alternatively, input matrices can be created directly without reading from a file. For example:

# The poset
dag <- matrix(c(3, 3, 4, 4, 1, 2, 1, 2), 4, 2)

# The genotype matrix
set.seed(100)

gen_1<-c(rep(0,150),sample(c(0,1),25,replace=TRUE),rep(0,25))
gen_2<-c(rep(0,175),sample(c(0,1),25,replace=TRUE))
gen_3<-c(rep(0,50),sample(c(0,1),100,replace=TRUE),rep(1,50))
gen_4<-c(sample(c(0,1),100,replace=TRUE),rep(0,50),rep(1, 50))
g_mat<-matrix(c(gen_1, gen_2, gen_3, gen_4), 200, 4)
g_mat<-cbind(1, g_mat)

# Preparing input of the ct-cbn/h-cbn methods
input_2 <- Spock$new(
     poset = dag,
     numMutations = 4, genotypeMatrix = g_mat
)

Note that the first column of the genotype matrix must always be one, whereas each of the other columns corresponds to a given mutational event. Therefore, the number of columns in the genotype matrix must be equal to the number of mutations considered plus one.

In the second example, the genotypes are generated such that the mutation orders never violate the restrictions in the temporal ordering between mutations imposed by the corresponding poset. For example, mutations 1 and 2 occur when mutations 3 and 4 have already occurred. To allow violation of the restrictions, one can use the genotypeMatrixMutator function to add false positives and false negatives of a given rate. In the following example, the g_mat matrix is converted to the g_mat_mut matrix by adding false positive and false negative rates of 0.3 and 0.2, respectively:

temp <- g_mat[,2:5]
temp_mut <- genotypeMatrixMutator(temp, 0.3, 0.2)
g_mat_mut <- cbind(1, temp_mut)
# Preparing input of the ct-cbn/h-cbn methods
input_3 <- Spock$new(
     poset = dag,
     numMutations = 4,
     genotypeMatrix = g_mat_mut
)

Note that the first column of the genotype matrix must always remain one. Therefore, we did not pass the first column to genotypeMatrixMutator function.

Running the CBN models

Having prepared the input files, it is now easy to run the CBN models using ctcbnSingle and hcbnSingle.

# CT-CBN
results_c1 <- ctcbnSingle(input_1)
results_c2 <- ctcbnSingle(input_2)
results_c3 <- ctcbnSingle(input_3)
# H-CBN
results_h1 <- hcbnSingle(input_1)
results_h2 <- hcbnSingle(input_2)
results_h3 <- hcbnSingle(input_3)

Below, we can see how to obtain the estimated λ values and the corresponding likelihood for the first example.

# The estimated lambda values
ml_lambda_c1 <- results_c1[[1]]$lambda

# The likelihood
loglikelihood_c1 <- results_c1[[1]]$summary[4]

Furthermore, the maximum-likelihood poset can be identified and visualized using visualizeCBNModel function.

# The MLE poset
ml_poset_c1 <-results_c1[[1]]$poset$sets
# visualizing the MLE
visualizeCBNModel(ml_poset_c1)

It is important to mention that we have an alternative implementation of the CBN models, namely the ctcbn and hcbn functions, which accept a list of posets as input and accordingly produce a list of λ vectors, a list of likelihood values, and a list of MLE posets. This strategy is utilized in the second workflow, which is specifically suited for analyzing mutational quartets and pathways of length four.

# The collection of all 219 potential posets
posets <- readRDS(system.file("extdata", "Posets.rds", package = "CBN2Path")) #
Input preparation
input_4 <- Spock$new(
     poset = posets,
     numMutations = 4,
     genotypeMatrix = g_mat
)
# Running the ctcbn function
results_c4 <- ctcbn(input_4)
# Running the hcbn function
results_h4 <- hcbn(input_4)

Note that the collection of all 219 potential posets for analyzing mutational quartets is already accessible within the package, which is read and stored in the first line of the above code.

Inferring pathway probability distributions

The output of the CBN models, namely the estimated λ values and the MLE poset, can be used as the input of the pathProbCBN function, which quantifies the pathway probability distribution (P(Π)). In the example below, the results of the second example obtained in the previous section (results_c2 and results_h2) are used.

# The first input: The MLE poset (the output of the CBN models)
dag_c2 <- results_c2[[1]]$poset$sets
dag_h2 <- results_h2[[1]]$poset$sets
#The second input: The estimated Lambda values (the output of the CBN models)
lambda_c2 <- as.numeric(results_c2[[1]]$lambda)
lambda_h2 <- as.numeric(results_h2[[1]]$lambda)
# Quantifying the pathway probability distributions
prob_c2 <- pathProbCBN(dag_c2, lambda_c2, 4)
prob_h2 <- pathProbCBN(dag_h2, lambda_h2, 4)

In this example, prob_c2 and prob_h2 are vectors of length 24, each representing one of the 24 pathways.

Downstream analyses

The quantified pathway probability distributions are used as the input of other functions used in the downstream steps for visualization (visualizeProbabilities) and quantification of the predictability (predictability), or measuring the divergence between probability distributions (jensenShannonDivergence):

# Visualization of pathways and their probabilities
visualizeProbabilities(prob_c2)
visualizeProbabilities(prob_h2)
# Quantification of predictability score for a given probability distribution
pred_c2 <- predictability(prob_c2, 4)
pred_h2 <- predictability(prob_h2, 4)
# Quantification of the Jensen-Shannon Divergence (JSD) between a pair of distributions
jsd <- jensenShannonDivergence(prob_c2, prob_h2)

Workflow II: B-CBN and R-CBN based inference of pathway probability distributions

The second workflow for quantifying pathway probabilities was designed specifically for implementing the R-CBN and B-CBN algorithms to enable the robust inference of pathway probability distributions. It is specifically suited for analyzing mutational quartets and pathways of length 4. The ctcbn function that takes the exhaustive list of 219 posets as input is an integral part of this workflow, which makes it easy to work with, as the user only needs to input a genotype matrix.

Running the R-CBN model

R-CBN weighs the 219 posets based on their reciprocal rank in terms of likelihood, and a second pathway-based weighting layer is employed. However, the user only needs to work directly with the pathProbQuartetRCBN function, because intermediate functions such as posetWeightingRCBN, pathwayWeightingRCBN, edgeMarginalized, pathEdgeMapper and pathNormalization are taken care of internally.

g_mat2 <- g_mat[,2:5]
prob_r2 <- pathProbQuartetRCBN(g_mat2)

Note that this function only needs a genotype matrix as the input and does not require the first column of the matrix to be one; therefore, we needed to remove the first column from the gMat matrix that we previously produced.

Running the B-CBN model

Although the B-CBN algorithm is fundamentally different from the R-CBN algorithm, particularly in terms of how the posets are weighted, which strongly affects the internal implementation of the intermediate functions, the user interface is exactly the same. The pathProbQuartetBCBN function is defined as

prob_b2 <- pathProbQuartetBCBN(g_mat2)

CBN2Path also provides a similar implementation for CT-CBN and H-CBN.

prob_c2 <- pathProbQuartetCTCBN(g_mat2)
prob_h2 <- pathProbQuartetHCBN(g_mat2)

Having determined the pathway probabilities, downstream analysis can be performed similarly to those in the first workflow. Furthermore, pathway compatibilities (c(Π)) can be measured directly from a genotype matrix, and their correlation with pathway probabilities (P(Π)) can be calculated as

pathway_c2 <- pathwayCompatibilityQuartet(g_mat2)
rho_c2 <- cor(pathway_c2, prob_c2, method = "spearman")

Analyzing fitness landscapes

CBN2Path also provides functions for the analysis and visualization of fitness landscapes. For example, after assigning a fitness vector f to the set of binary genotypes of length four, which can be enumerated by the generateMatrixGenotypes function, we can calculate the corresponding pathway probability distribution under the SSWM-based evolutionary model using the pathProbSSWM function as follows:

f <- c(0, 0.1, 0.2, 0.1, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0, 0.6, 0.4, 0.3, 0.2, 1)
g <- generateMatrixGenotypes(4)
Prob_w<-pathProbSSWM(f,4)

Furthermore, the fitness landscape can be visually inspected using the visualizeFitnessLandscape function, as follows:

visualizeFitnessLandscape(f)

Conclusions

In summary, CBN2Path provides a unifying platform for the efficient implementation of the CT-CBN, H-CBN, B-CBN, and R-CBN methods, which facilitates robust quantification, visualization, and analysis of cancer progression pathways from pre-processed binary mutational data.

Software availability

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 29 Aug 2025
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Choi-Kim W and Hosseini SR. CBN2Path: an R/Bioconductor package for the analysis of cancer progression pathways using Conjunctive Bayesian Networks [version 1; peer review: awaiting peer review]. F1000Research 2025, 14:834 (https://doi.org/10.12688/f1000research.168810.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status:
AWAITING PEER REVIEW
AWAITING PEER REVIEW
?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 29 Aug 2025
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.