ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Software Tool Article
Revised

Umpire 2.0: Simulating realistic, mixed-type, clinical data for machine learning

[version 2; peer review: 1 approved, 1 approved with reservations]
PUBLISHED 05 Mar 2021
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Artificial Intelligence and Machine Learning gateway.

This article is included in the RPackage gateway.

Abstract

The Umpire 2.0 R-package offers a streamlined, user-friendly workflow to simulate complex, heterogeneous, mixed-type data with known subgroup identities, dichotomous outcomes, and time-to-event data, while providing ample opportunities for fine-tuning and flexibility. Here, we describe how we have expanded the core Umpire 1.0 R-package, developed to simulate gene expression data, to generate clinically realistic, mixed-type data for use in evaluating unsupervised and supervised machine learning (ML) methods. As the availability of large-scale clinical data for ML has increased, clinical data has posed unique challenges, including widely variable size, individual biological heterogeneity, data collection and measurement noise, and mixed data types. Developing and validating ML methods for clinical data requires data sets with known ground truth, generated from simulation. Umpire 2.0 addresses challenges to simulating realistic clinical data by providing the user a series of modules to generate survival parameters and subgroups, apply meaningful additive noise, and discretize to single or mixed data types. Umpire 2.0 provides broad functionality across sample sizes, feature spaces, and data types, allowing the user to simulate correlated, heterogeneous, binary, continuous, categorical, or mixed type data from the scale of a small clinical trial to data on thousands of patients drawn from electronic health records. The user may generate elaborate simulations by varying parameters in order to compare algorithms or interrogate operating characteristics of an algorithm in both supervised and unsupervised ML.

Keywords

machine learning, unsupervised machine learning, supervised machine learning, clustering, clinical informatics, mixed data, mixed-type data, clinical data

Revised Amendments from Version 1

In response to the comments from Reviewer 1, we:
1. Added a sentence to the abstract to clarify the meaning of "mixed type data".
2. Expanded the captions on Figures 2, 3, and 4 in order to make them easier to understand.
3. Added several paragraphs to the "Implementation" section to highlight the use of S4 classes in our R package.

See the authors' detailed response to the review by Julia Wrobel

Introduction

As large clinical databases expand and data mining of the electronic medical record (EMR) improves, the scale and potential of data available for clinical knowledge discovery is increasing dramatically. Expanding size and complexity of data demands new analytics approaches and paves the way for applications of machine learning (ML) in novel clinical contexts1,2. However, clinical data are characterized by heterogeneity, including measurement and data collection noise, individual biological variation, variable data set size, and mixed data types, which raises new challenges for ML analyses1. Clinical data sets vary widely in scale, from early-stage clinical trials with fewer than 100 patients to prospective cohorts following 10,000 patients to large-scale mining of electronic health records. They consist of data collected in the clinical setting, including demographic information, laboratory values, results of physical exams, disease and symptom histories, dates of visits or hospital length-of-stay, pharmacologic medications and dosing, and procedures performed, possibly with associated ICD-9 or -10 codes. The most salient, identifying feature of clinical data is that it is of mixed-type, containing continuous, categorical, and binary data. The result of this heterogeneity is an ML milieu characterized by methodological experimentation, without consensus best methods to apply to challenging clinical data3.

Developing and evaluating best practice methodologies for ML on clinical data demands a known validation standard for comparison. Previously, we described an approach using “biological validation”: testing an ML methodology in a disease with well-understood relationships between patient features and outcomes. Thus, we allow known biological truths uncovered (or absent) in a solution to validate a method3. However, biological validation fails to capture interaction effects or allow the validation of emergent discoveries. By far, a superior solution would be to validate novel methods on data with known “ground truth.” Artificial clinical data, simulated with known assignments, can serve to rigorously test and validate ML algorithms.

Simulating realistic clinical data poses challenges. The wide range in feature spaces and sample sizes demands simulation solutions that vary by orders of magnitude. Rather than simulating data of a single type, simulated clinical data must be of mixed-type and must reflect the variable mixtures of types found in clinical scenarios, where one type may predominate over others46. In addition, in order to conclusively test algorithms for use in clinical contexts, simulations of clinical data must replicate the noisiness of these data that results from variation of human and technological features in measurement and the biological variation between individuals.

A real need exists for noisy, realistic, clinically meaningful simulated data to advance ML in clinical contexts. The user finds few tools currently available, and those pose problematic restrictions. For example, the KAMILA (k-means for mixed large data) R package can be used to generate complex mixed-type clusters with a high degree of user specificity, but can only be used to generate two clusters7. Because many important problems face the analyst beyond distinguishing two groups in data, the need presents itself in the literature for more comprehensive, mixed-type simulation tools.

Here, we present Umpire 2.0, a tool that facilitates generation of complex, noisy, simulated clinical and mixed-type data sets. Umpire 2.0 provides broad functionality across sample sizes, feature spaces, and data types to allow the user to simulate correlated, heterogeneous binary, continuous, categorical, or mixed type data from the scale of a small clinical trial to data on thousands of patients drawn from the EMR. These realistic clinical simulations are vital for testing and developing superior ML techniques for new clinical data challenges.

Methods

The original Umpire R package (1.0) could be used to simulate complex, correlated, continuous gene expression data with known subgroup identities and both dichotomous and survival outcomes, as previously described8. Two core ideas underlie Umpire. First, biological data are correlated in blocks of variable size, simulating the functioning of genes, tissues, or symptoms in biological networks and pathways. Second, motivated by the multi-hit theory of cancer, subgroups (or clusters) of patients are defined by a number of informative, latent variables called “hits”. Each patient receives a combination of multiple “hits,” simulating population heterogeneity. These latent hits are used to link simulated alterations in patient data to outcome data in the form of dichotomous outcomes and time-to-event data.

Umpire 2.0 expands the Umpire simulation engine for clinical and mixed-type data through a flexible pipeline. Users can vary the characteristics and the number of subgroups, features, hits, and correlated blocks. Using Umpire, they can control the level of patient-to-patient heterogeneity in various configurations of mixed-type data. Users can also generate multiple data sets (for example, training and test) of unlimited sizes from the same underlying distributional models.

Data heterogeneity

Umpire 2.0 enables users to incorporate individual and population heterogeneity in multiple ways. First, as above, latent hits are used to simulate features in correlated blocks, using multivariate normal distributions, with variation between individual members of a subgroup. Second, users can simulate clusters of equal or unequal size. Third, users can apply additive noise modeling measurement error and individual biological variation to simulations.

Because we know that clusters of equal size are unrealistic (outside of pre-defined case-control studies), we enable users to simulate clusters of equal or unequal sizes. In the equal case, we set the population proportions equal and sample data using a multinomial distribution. In the unequal case, we first sample a vector, rDirichlet(α1, . . . , αk), setting the expected proportions of k clusters from the Dirichlet distribution. For small numbers of clusters (k ≤ 8), we set all α = 10. For more clusters (k > 8), we set one quarter each of the α parameters to 1, 2, 4, and 8, respectively, accepting only a vector of cluster sizes r in which every cluster contains at least 1% of patients.

The initial data simulated by Umpire represents the true, unadulterated biological signal. To these data, Umpire can add noise, mimicking biological variation and experimental random error. Marlin and colleagues9 argue that all clinical data “must be treated as fundamentally uncertain” due to human error in measurement and manual recording, variability in sampling frequencies, and variation within automatic monitoring equipment. Clinical experience teaches us that variability in clinical data arises from many sources, including human error, measurement error, and individual biological variation. However, because clinical measurements are integral to the provision of patient care, demanding high accuracy and reliability, we also assume that many clinical variables have low measurement error, such as tightly calibrated laboratory tests. For a given feature f measured on patient i, we model the clinically observed value Y from additive measurement noise E applied to the true biological signal S as

Yf,i=Sf,i+Ef,i.

We model the additive noise following the normal distribution EN(0, τ) with mean 0 and standard deviation τ, where τ follows the gamma distribution τ ∼ Γ (c, b) such that bc = 0.05. Thus, we create a distribution in which most features have very low noise while some are subject to very high noisiness.

Mixed-type data

Umpire 2.0 generates binary and categorical data by discretizing raw, continuous features along meaningful cutoffs. To convert a continuous feature into a binary vector, we select a cutoff and assign values on one side of this demarcation to “zero” and the others to “one.” We begin by calculating a “bimodality index” (BI) for the continuous vector10. To compute the bimodality index, we model the data as a mixture of two normal distributions, and take:

BI=[π(1π)]1/2δ

Here π is the fraction of members in one population and δ = (µ1µ2) is the standardized distance between the two means. The recommended cutoff of 1.1 to define bimodality was determined by simulation10. If the continuous data are bimodal, we split them midway between the means. For continuous features without a bimodal distribution, we partition them to binary features by randomly selecting an arbitrary cutoff between 5% to 35%. Although arbitrariness feels uncomfortable in an informatics sphere, we believe that this approach reflects a fundamental arbitrariness in many clinical definitions. For example, an adult female with a hemoglobin of 12.0 is said to be anemic, even though the clinical presentation and symptoms of a woman with a hemoglobin of 11.9 probably do not differ from those of a woman with a hemoglobin of 12.1. The choice of an arbitrary cutoff reflects these clinical decision-making processes: along a spectrum of phenotype, a value is chosen based on experience to define the edge of the syndrome. By choosing an arbitrary cutoff, we replicate this process. To reduce bias that could result if all low values were assigned “0” and all larger values were assigned “1,” we randomly choose whether values above or below the cutoff are assigned 0. We mark binary features in which 10% or fewer values fall into one category as asymmetric and mark the remainder as symmetric binary features.

To simulate a categorical feature, we rank a continuous feature from low to high and bin its components into categories, which we label numerically (i.e., 1, 2, 3, 4, 5). Distributing an equal number of observations into each bin does not reflect the realities we see in clinical data, and dividing a continuous feature by values (e.g., dividing a feature of 500 observations between 1 and 100 into units of 1–10, 11–20, etc.) could lead to overly disparate distributions of observations into categories, especially at the tails. Here, for c categories, we model a vector of R sizes along the Dirichlet distribution,

RcDirichlet(α1,...,αc),α1==αc=20

such that we create categories of unequal membership without overly sparse tails. To generate an ordinal categorical feature, we bin a continuous feature and number its bins sequentially by value of observations (e.g., 1, 2, 3, 4, 5). To generate a nominal categorical feature, we number these bins in random order (e.g., 4, 2, 5, 1, 3).

The user may choose to simulate continuous, binary, nominal, or ordinal data, or any mixture thereof.

Operation

Umpire 2.0 has been implemented as a package for R 3.6.3 and R 4.0. It is freely available on RForge and CRAN. Any system (Linux, Windows, MacOS) capable of running R 3.6.3 or R 4.0 is sufficient for implementing Umpire.

Implementation

Umpire 2.0 provides a 4-part workflow to generate simulations and save parameters for downstream reuse (Figure 1). The original Umpire 1.0 functionality and the Umpire 2.0 extension are arranged as a series of interchangeable modules (e.g., Engines, NoiseModels) within a parallel workflow. For a more thorough, guided introduction to the Umpire functions, please see the package vignettes. For clinical simulations, the user begins by generating a ClinicalEngine, consisting of a correlated block structure to generate population heterogeneity, a model of subgroup membership, and a survival model, which is used to generate a raw (continuous, not-noisy) data set. Next, clinically representative noise is applied. The user discretizes these data to mixed-type. Finally, Engine parameters, the ClinicalNoiseModel, and mixed data definitions are stored in a MixedTypeEngine to easily generate downstream simulations from the same parameter set.

06e0d971-0fbb-4bf3-a13e-a036f3159942_figure1.gif

Figure 1. Workflow to simulate mixed-type, clinically realistic data with the Umpire R package.

The user begins by generating a ClinicalEngine to define correlated block structure, latent hits, subgroup prevalences, and a survival model. This is used to generate a raw, continuous data set. The user generates a clinically meaningful ClinicalNoiseModel, and applies it to the raw data. Next, the data are discretized to mixed type. Finally, the parameters of the ClinicalEngine, the ClinicalNoiseModel, and the discretized cutpoints are stored in a MixedTypeEngine to generate future simulations with the same parameters.

The Umpire package makes extensive use of the S4 object-oriented capabilities in R. The core class is an “Engine”. In statistical terms, an Engine is an abstract representation of a random vector generator, implemented as a list of “components”. The package includes three classes that can be used as components: “IndependentNormal”, “IndependentLogNormal”, and “MVN” (for multivariate normal distributions). Both the Engine and each of its components must support three newly defined methods. First, the “rand” method generates a random vector. Next, the “alterMean” and “alterSD” methods must change the corresponding statistical properties in order to represent systematic differences, for example, between different subtypes of samples. Users can extend the capabilities of the Umpire package by designing their own classes that can respond to these three methods.

At a higher level, the “CancerEngine” contains two engines, one representing expression in normal samples, and the other in cancer samples. It unites these engines with a nested pair of classes. First, the “SurvivalModel” implements an underlying exponential distribution for survival curves. Second, the “CancerModel” combines the SurvivalModel with a vector of parameters that represent latent factors modifying the hazard ratio. It also includes a vector describing how the same factors will modify a binary outcome through a logistic model. In version 1.0 of Umpire, the data simulated by a CancerModel represented “perfect” data, with variability attributable solely to biological differences between samples. An additional class, the “NoiseModel” was used to represent additional sources of variation arising from measurement errors.

Version 2.0 of Umpire extended these S4 classes in two ways. First, we added a “ClinicalEngine” function. Note that there is no corresponding class with this name. The new function actually creates a “CancerEngine”, with a new set of default parameters selected to provide a better representation of clinical data instead of gene expression data. The main goal here was ease-of-use for people who wanted to produce useful simulations without diving as deeply into the underlying structure. The second extension did involve the creation of a new class, the “MixedTypeEngine”. The MixedTypeEngine is derived from a CancerEngine using the core objectoriented principle of inheritance. It has all the properties and behaviors of a CancerEngine. but adds additional features. Specifically, it includes its own NoiseModel, with parameters chosen from a Gamma distribution to represent the more highly variable noise structures one would expect in clinical data. The MixedTypeEngine also includes a set of “cut points” used to convert continuous data into categorical or dichotomous data types. So, unlike the CancerModel that can only generate “raw” continuous data, the MixedType Engine can also generate “noisy” continuous data along with data that has been “binned” after discretization of some components. These new features are illustrated in the use cases below.

Use cases

In this section, we present several examples explaining how Umpire can be used to simulate data relevant to important clinical questions.

Use case 1: Subtypes

Unsupervised machine learning algorithms, designed to discover the subtypes inherent in a given data set, form one of the major branches in the field. In the clinical literature, these algorithms are being applied to data with variable feature sizes, including some studies with fewer than 10 features4,11. The number of subtypes (or clusters) identified in the literature also spans a fairly wide range4,5,12,13. At present, however, there is no consensus on which unsupervised ML algorithms are most effective, nor is it clear if different algorithms work better for different numbers of patients, clusters, features, or mixtures of data types.

Clinical engine

Since one idea at the core of Umpire is that cohorts of patients tend to be heterogeneous, it is perfectly positioned to perform simulations to evaluate unsupervised ML algorithms in the clinical context. As an illustration, we start by construcing a ClinicalEngine with four subtypes of patients.

> library(Umpire)
> set.seed(36475)
> numFeat <- 100
> ce0 <- ClinicalEngine(nFeatures = numFeat, # clinical variables.
+                          nClusters = 4,       # subtypes,
+                          isWeighted = FALSE)  # about the same size.

Internally, the ClinicalEngine simulates latent variables that affect both the expression of the clinical covariates and the outcomes in each of the four patient clusters. You can visualize which latent variables affect which clusters by extracting the “hit pattern” nested inside the ClinicalEngine (Figure 2).

> library(Polychrome)
> dk <- alphabet.colors(26)[c("red", "navy", "forest", "amethyst",
+                                 "turquoise", "sea", "wine")]

> res <- 300
> png(filename = "heatpattern.png", width=6*res, height= 6*res, res=res,
+      bg="white")
> heatmap(ce0@cm@hitPattern, scale="none", ColSideColors = dk[1:4],
+          col = c("gray", "black"))
> dev.off()

06e0d971-0fbb-4bf3-a13e-a036f3159942_figure2.gif

Figure 2. Association between latent variables (rows) and clusters (columns).

Black pixels mark the presence of latent variables, or hits, within a cluster. The top dendrogram shows the true relationships between clusters, which are driven by the presence of shared latent variables. The left dendrogram shows the relationships between hits, which are based on their co-occurrence within clusters.

Note that this heatmap shows the true underlying structure relating the clusters to the latent variables, and not any simulated data sets. By design, however, the ClinicalEngine can only simulate “perfect” continuous data reflecting the true signal. In order to simulate realistic mixed-type data, we must first add noise to these data, and then discretize some of the features to create binary or nominal features.

Mixed data types

Data collected in the clinic includes many different kinds of values. Demographic values include a small number of nominal demographic values (e.g., ethnicity, marital status) with four to five categories. Physical exam values include binary indicators (such as presence or absence of enlarged liver). Most of the values assessed on patients are continuous, such as heart rate, blood pressure, and laboratory values. We assume that most of these values have low error. For example, clinical laboratory values are tightly calibrated. However, some measurements, such as physical exam values, were assessed by chart review from the medical record. Others, such as blood pressure, were measured by hand and typed into the record at the time of the visit. Thus, a few values may be very prone to measurement and human error. Here we apply a ClinicalNoiseModel to our features that reflects our beliefs about the noisiness of our data, with the standard deviation following a gamma distribution defined by shape and scale parameters. Then, we construct a MixedTypeEngine.

> cnm <- ClinicalNoiseModel(numFeat, shape = 1.02, scale = 0.05)
> mte <- MixedTypeEngine(ce0,               # a clinical engine
+                           cnm,               # a noise model
+                           cutpoints = list(N = 200,
+                                               pCont = 0.6,
+                                               pBin = 0.2,
+                                               pCat = 0.2,
+                                               pNominal = 1,
+                                               range = c(4,5)))
> rm(cnm)
> summary(mte)

A ’MixedTypeEngine’ (MTE) based on:
A ’CancerEngine’ using the cancer model:
--------------
Clinical Simulation Model (Raw), a CancerModel object constructed via:
   CancerModel(name = "Clinical Simulation Model (Raw)", nPossible = NP,
    nPattern = nClusters, HIT = hitfn, SURV = SURV, OUT = OUT,
    survivalModel = survivalModel, prevalence = Prevalence(isWeighted,
    nClusters))
    
Pattern prevalences:
[1] 0.2206505 0.2998339 0.2343084 0.2452072

Survival effects:
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
-0.43931 -0.12208  0.08784  0.08932  0.32549  0.51441

Outcome effects:
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
-0.46565 -0.16604 -0.03749 -0.03927  0.13257  0.40766
--------------

Base expression given by:
An Engine with 33 components.

Altered expression given by:
An Engine with 33 components.

---------------
The MTE uses the following noise model:
A ’NoiseModel’ with:
     additive offset = 0
     additive scale distributed as:
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
0.001204 0.015000 0.041432 0.051332 0.069693 0.249882
     multiplicative scale = 0
---------------
The MTE simulates clinical data of these types:

asymmetric binary        continuous           nominal  symmetric binary
                1                60                23                15

Note that the cm slot of the clinical engine is retained as a slot in the mixed-type engine, so the heatmap shown above can be recreated with the command

> # Not run
> heatmap(mte@cm@hitPattern, scale="none", ColSideColors = dk[1:4],
+          col = c("gray", "black"))

At this point, we still haven’t simulated any actual data. For that purpose, we use the rand method.

> mtData <- rand(mte, 500, keepall = TRUE)

We now take a look inside the simulated data

> names(mtData)

[1] "raw"      "clinical" "noisy"     "binned"

> sapply(mtData, dim)

     raw clinical noisy binned
[1,]  99      500    99    500
[2,] 500        4   500     99

> summary(mtData$clinical)

CancerSubType  Outcome         LFU          Event
Min.   :1.00   Bad :234   Min.   : 0.00   Mode :logical
1st Qu.:2.00   Good:266   1st Qu.: 7.00   FALSE:136
Median :3.00              Median :16.00   TRUE :364
Mean   :2.56              Mean   :19.79
3rd Qu.:4.00              3rd Qu.:29.00
Max.   :4.00              Max.   :71.00

There are four components:

  • 1. clinical” contains the subtype, a binary outcome, and a time-to-event outcome represented by the last follow up time (LFU) and a logical indicator of whether the event occurred.

  • 2. raw” contains the continuous data simulated by the clinical engine.

  • 3. noisy” contains the same data, with noise added.

  • 4. binned” contains the mixed type data, after discretization of some features.

Note that using keepall = FALSE will not preserve the raw or noisy components. Also, the raw and noisy components are arranged in the “omics” style, where rows are features and columns are patients. By contrast, the binned component is transposed into the usual clinical style, where rows are patients and columns are features.

Visualization

As an illustration, we visualize clusters for the noisy, continuous data compared to the discretized, mixed-type data. We use the daisy function from the cluster R package to compute distances between mixed-type data, and we use the Rtsne package for visualization (Figure 3).

> myClustColor <- dk[mtData$clinical$CancerSubType]
> library(cluster)
> dtypes <- getDaisyTypes(mte)
> dai <- daisy(mtData$binned, type = dtypes)
> dai <- as.dist(dai)
> library(Rtsne)
> tsN <- Rtsne(t(mtData$noisy)) # transpose from omics-style to clinical-style
> tsM <- Rtsne(dai, is_distance=TRUE)

> png(filename = "daisy.png", width=10*res, height=5*res, res=res, bg="white")
> opar <- par(mfrow=c(1,2))
> plot(tsN$Y, pch=19, col = myClustColor, main="Noisy, Continuous Data")
> plot(tsM$Y, pch=19, col = myClustColor, main="Mixed-Type Data")
> par(opar)
> dev.off()

06e0d971-0fbb-4bf3-a13e-a036f3159942_figure3.gif

Figure 3. t-SNE plots.

Colors are the same as in Figure 2, and indicate the true cluster membership created by the simulation.

Large scale simulation

Based on the literature results referenced above, we constructed simulation parameters to represent common problems in clinical data (Table 1). These included a range of feature sizes (9–243), patient sizes (200– 3200), and number of clusters (2–16). In essence, this simulation was equivalent to evaluating a set of nested for-loops:

> # Not Run
> for (F in featureSize) {
+   cnm <- ClinicalNoiseModel(F, shape = 1.02, scale = 0.05)
+   for (K in clusterSize) {
+     for (P in patientSize) {
+       mte <- MixedTypeEngine(list(nFeatures = F,                # num features
+                                      nClusters = K,                # num clusters
+                                      isWeighted = TRUE),
+                                cnm,
+                                cutpoints = list(N = 200,
+                                                   pCont = 0.6,
+                                                   pBin = 0.2,
+                                                   pCat = 0.2,
+                                                   pNominal = 0.5))
+       simdata <- rand(mte, P, keepall = TRUE)                   # num patients
+       save(simdata, file = SOMEFILENAME) # for evaluation later
+     }
+   }
+ }

Table 1. Parameters for Umpire simulations to assess clustering algorithms for clinical, mixed-type data.

Parameters were chosen to reflect data set sizes from a Phase II clinical trial to a large EHR data set or cohort.

Patients200, 800, 3200
Features 9, 27, 81, 243
Clusters 2, 6, 16
Data types and mixtures Single data type: continuous, binary, nominal, ordinal, mixed categorical1
Mixtures: balanced, continuous unbalanced, binary unbalanced, categorical unbalanced

1 A mixture of nominal and ordinal data

One could also vary other parameters, with the most likely candidates being the “cutpoints” parameters that control the fraction of continuous, binary, or categorical features in the data.

The primary benefit of these simulations for assessing clustering algorithms is that Umpire generates data with known, gold-standard cluster assignments. Using simulation parameters in Table 2, we examined hierarchical clustering (HC) with Euclidean distance, a method commonly found in the literature5,1416, We compared HC to partitioning around medoids (PAM) and self-organizing maps (SOM)17,18. We also compared Euclidean distance to the mixed-type distance measure, DAISY. We were able to assess accuracy and quality of each clustering solution against a known ground truth using the Adjusted Rand Index (ARI)19. Summarized results are shown in Figure 4.

Table 2. Comparison of 6 clustering methods using Umpire simulations.

Three algorithms (hierarchical clustering with Ward’s criterion, Partitioning Around Medoids, and Self-Organizing Maps) were implemented with a single-distance metric (Euclidean distance) and a mixed-type metric and tested on single- and mixedtype simulations generated with Umpire.

AlgorithmSingle-Distance MetricMixed-Type Metric
Hierarchical clustering Euclidean DAISY
Partitioning Around Medoids (PAM) Euclidean DAISY
Self-organizing maps (SOM) Euclidean Supersom
06e0d971-0fbb-4bf3-a13e-a036f3159942_figure4.gif

Figure 4. Adjusted Rand Index for 6 clustering methods applied to two distinct data mixtures.

DAISY outperformed all other methods on both balanced (top) and unbalanced continuous (bottom) data mixtures. Algorithm performance varied between data mixtures, with improved performance of hierarchical clustering (HC) and self-organizing maps (SOM) with Euclidean distance on unbalanced continuous data.

Use case 2: Simulating survival in phase II clinical trials

Time to response or adverse event is a core clinical question in clinical trials of pharmaceutical and device interventions. Here, we use Umpire to simulate time-to-event data for clinical trials to inform study design or methods development.

Survival model

We begin by customizing a SurvivalModel, which in this case will simulate a trial with 5 years of patient accrual and 1 year of follow up. The user may customize length and units of follow up, as well as the base hazard rate. (Internally, Umpire uses this hazard rate in an exponential survival function.)

> library(Umpire)
> set.seed(83552) # for reproducibility
> sm <- SurvivalModel(baseHazard = 1/5,
+                       accrual = 5,
+                       followUp = 1,
+                       units = 12,
+                       unitName = "months")

Here, we illustrate the impact of altering the base hazard on the simulated mortality rate. We simulate three different survival models, using the default values for accrual and follow up (Figure 5).

> library(survival)
> set.seed(12345)
> sm3 <- SurvivalModel(baseHazard = 1/3)
> dat3 <- rand(sm3, 200)
> sm5 <- SurvivalModel(baseHazard = 1/5)
> dat5 <- rand(sm5, 200)
> sm8 <- SurvivalModel(baseHazard = 1/5)
> dat8 <- rand(sm8, 200)

> png(filename = "hazards.png", width=10*res, height=4*res, res= res, bg="white")
> opar <- par(pty="s", mfrow=c(1,3))
> plot(survfit(Surv(LFU, Event) ~ 1, data=dat8),
+       main="Base Hazard = 1/8")
> plot(survfit(Surv(LFU, Event) ~ 1, data=dat5),
+       main="Base Hazard = 1/5")
> plot(survfit(Surv(LFU, Event) ~ 1, data=dat3),
+       main="Base Hazard = 1/3")
> par(opar)
> dev.off()

06e0d971-0fbb-4bf3-a13e-a036f3159942_figure5.gif

Figure 5. Effect of base hazard on survival.

Clinical engine

The survival model is an argument to the constructor for a ClinicalEngine; if omitted, a default survival model is used. Here we explicitly use our survival model to construct a clinical engine with four balanced subtypes.

> set.seed(64321)
> ce <- ClinicalEngine(nFeatures = 40,     # clinical variables.
+                        nClusters = 4,      # subtypes,
+                        isWeighted = FALSE, # designed to be the same size.
+                        survivalModel = sm) # outcomes.

Because the latent variables affect the survival outcomes (by changing the hazard ratio) in the four patient clusters, you can generate different clinical engines from the same underlying parameters and obtain cohorts with different survival patterns (Figure 6). Note that we create different clinical engines to select different latent variables and create different populations. Using the rand function multiple times with the same engine would generate different samples from the same population, which would be useful for creating separate training and test data sets.

> set.seed(11111)
> ce1 <- ClinicalEngine(nFeatures = 40, nClusters = 4, isWeighted = FALSE,
+                         survivalModel = sm)
> cdat1 <- rand(ce1, 100)
>
> ce2 <- ClinicalEngine(nFeatures = 40, nClusters = 4, isWeighted = FALSE,
+                         survivalModel = sm)
> cdat2 <- rand(ce2, 100)
>
> ce3 <- ClinicalEngine(nFeatures = 40, nClusters = 4, isWeighted = FALSE,
+                         survivalModel = sm)
> cdat3 <- rand(ce3, 100)

> png(filename = "varce.png", height=4*res, width=10*res, res=res, bg="white")
> opar <- par(pty="s", mfrow=c(1,3))
> plot(survfit(Surv(LFU, Event) ~ CancerSubType, data=cdat1$clinical),
+       main="Simulation 1", col = dk, lwd = 2)
> plot(survfit(Surv(LFU, Event) ~ CancerSubType, data=cdat2$clinical),
+       main="Simulation 2", col = dk, lwd=2)
> plot(survfit(Surv(LFU, Event) ~ CancerSubType, data=cdat3$clinical),
+       main="Simulation 3", col = dk, lwd = 2)
> par(opar)
> dev.off()

06e0d971-0fbb-4bf3-a13e-a036f3159942_figure6.gif

Figure 6. Cohort survival in different simulations.

Clinical trials

It is important to realize that the subtypes generated as part of a clinical engine or a mixed-type engine are unlikely to represent the arms of an actual clinical trial. They are, after all, based on patterns of latent variables that, by definition, would be unobserved by the team running the clinical trial. One might want to view the simulations as a single-arm trial, where different unknown subgroups of patients respond to the therapy differently, and the goal is to use the covariate to identify a subset of patients who respond. In that case, the ability of Umpire to generate another data set from the same mixed-type engine could be used to provide independent validation of the method.

A sensible approach might be to simulate a two-arm clinical trial where one arm receives a placebo (or the current standard-of-care), while the second arm receives a new (or additional) therapy. Again, one possible goal is to identify the subset of patients in the experimental arm with better response. We can achieve this in Umpire by adding a control group.

> cec <- addControl(ce)
> summary(cec)

A ’CancerEngine’ using the cancer model:
--------------
Clinical Simulation Model (Raw) plus control, a CancerModel object constructed via:
   CancerModel(name = "Clinical Simulation Model (Raw)", nPossible = NP,
    nPattern = nClusters, HIT = hitfn, SURV = SURV, OUT = OUT,
    survivalModel = survivalModel, prevalence = Prevalence(isWeighted,
    nClusters))

Pattern prevalences:
[1] 0.5000000 0.1197993 0.1096268 0.1340680 0.1365059

Survival effects:
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
-0.13793 -0.03038  0.03760  0.07660  0.18176  0.50587

Outcome effects:
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
-0.284222 -0.228845 -0.040243  0.001858  0.129494  0.616201
--------------

Base expression given by:
An Engine with 40 components.

Altered expression given by:
An Engine with 40 components.

The control arm is now subtype 1, and the experimental arm is given by the collection of all other (heterogeneous) subtypes.

We note here that one of the default parameters to the CancerModel constructor that is used inside a clinical engine defines the distribution of the beta-parameters in a Cox proportional hazards model. By default, these are chosen from a normal distribution with mean 0 and standard deviation 0.3. As a consequence, each latent variable is just as likely to make the hazard ratio worse rather than better. For purposes of illustration, we are going to cheat and adjust the beta parameters to bias them toward an improved outcome in the experimental group:

> beta <- cec@cm@survivalBeta
> if (sum(beta) > 0) cec@cm@survivalBeta <- -beta

Of course, the better way to accomplish this goal would have been to set that parameter when we constructed the ClinicalEngine orginally, to something like SURV = function(n) rnorm(n, 0.2, 0.3).

Here is an example of a simulated trial.

> cnm <- ClinicalNoiseModel(40, shape = 1.02, scale = 0.05)
> mte <- MixedTypeEngine(cec,
+                           cnm,
+                           cutpoints = list(N = 200,
+                                              pCont = 0.6,
+                                              pBin = 0.2,
+                                              pCat = 0.2,
+                                              pNominal = 1,
+                                              range = c(4,5)))
> trialData <- rand(mte, 200)
> # make a factor for trial arm
> isExp <- 1 + 1*(trialData$clinical$CancerSubType > 1)
> trialData$clinical$Arm <- factor(c("Control", "Experimental")[isExp])
> summary(trialData$clinical)

CancerSubType  Outcome         LFU          Event                   Arm
Min.   :1.00   Bad :101   Min.   : 0.00   Mode :logical   Control     :104
1st Qu.:1.00   Good: 99   1st Qu.:14.00   FALSE:101       Experimental: 96
Median :1.00              Median :23.00   TRUE :99
Mean   :2.25              Mean   :27.53
3rd Qu.:4.00              3rd Qu.:40.25
Max.   :5.00              Max.   :70.00

Now we compute the Cox proportional hazards model that we would see from the two-arm trial.

> fit <- coxph(Surv(LFU, Event) ~ Arm, data = trialData$clinical)
> summary(fit)

Call:
coxph(formula = Surv(LFU, Event) ~ Arm, data = trialData$clinical)

  n= 200, number of events= 99
                   coef exp(coef) se(coef)      z Pr(>|z|)
ArmExperimental -0.5974    0.5502   0.2075 -2.879  0.00399 **
---
Signif. codes:  0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

                exp(coef) exp(-coef) lower .95 upper .95
ArmExperimental    0.5502      1.817    0.3664    0.8264

Concordance= 0.57  (se = 0.027 )
Likelihood ratio test= 8.56  on 1 df,   p=0.003
Wald test            = 8.29  on 1 df,   p=0.004
Score (logrank) test = 8.53  on 1 df,   p=0.003

Thanks in part to the bias we built into the simulation, the experimental group does significantly better than the control group. We can also fit the model that would be obtained if the trial designers were omniscient and could see the subtypes defined by the latent variables.

> latentfit <- coxph(Surv(LFU, Event) ~ factor(CancerSubType),
+                      data = trialData$clinical)
> latentfit

Call:
coxph(formula = Surv(LFU, Event) ~ factor(CancerSubType), data = trialData$clinical)

                           coef exp(coef) se(coef)      z      p
factor(CancerSubType)2  0.08847   1.09250  0.30651  0.289 0.7729
factor(CancerSubType)3 -0.62841   0.53344  0.37710 -1.666 0.0956
factor(CancerSubType)4 -0.91522   0.40043  0.35797 -2.557 0.0106
factor(CancerSubType)5 -0.90929   0.40281  0.37737 -2.410 0.0160

Likelihood ratio test=15.31  on 4 df, p=0.004102
n= 200, number of events= 99

Here we see that one of the four subtypes is equivalent to the control group, while all three other subtypes appear to do better. Finally, we plot the resulting Kaplan-Meirer curves for both models (Figure 7).

> png(filename = "fitplots.png", width=10*res, height=5*res, res=res,bg="white")
> opar <- par(mfrow =c(1,2))
> plot(survfit(Surv(LFU, Event) ~ Arm, data = trialData$clinical),
+      col=dk[c(1, 6)], lwd=2)
> legend("bottomleft", levels(trialData$clinical$Arm), lwd=2, col = dk[c(1, 6)])
>
> plot(survfit(Surv(LFU, Event) ~ factor(CancerSubType), data = trialData$clinical),
+      col =dk, lwd=2)
> legend("bottomleft", legend = 1:5, lwd=2, col = dk)
> par(opar)
> dev.off()

06e0d971-0fbb-4bf3-a13e-a036f3159942_figure7.gif

Figure 7. Kaplan-Meier plots for a simulated two-arm trial (left) and for the hidden latent subtypes (right).

As in the first use case, you can run a set of nested loops to vary the parameters of interest. As noted previously, one possible application would be to test algorithms for finding clinical variables that define patient subgroups with better (or worse) responses than the control group.

Use case 3: Epidemiological cohort studies, mixed data sources, and binary outcomes

Large epidemiological cohorts are a foundational data type in public health research. Here, we simulate an extensive patient cohort and assess for a binary outcome.

Epidemiological cohorts may aggregate data from multiple data collection instruments, possibly including chart review, laboratory data, and survey. Here, we generate mixed type data consisting of continuous laboratory data gathered at time of study entry and an extensive survey, which contains both nominal and ordinal (Likert scale) categorical responses. We simulate a ClinicalEngine with a large feature space and 6 latent clusters of unequal size, taking the default noise and survival models. We generate data for 4,000 patients.

> set.seed(1000)
> ce <- ClinicalEngine(nFeatures = 300,
+                        nClusters = 6,
+                        isWeighted = TRUE,
+                        OUT = function(n) rnorm(n, -0.1, 0.3)) # bias favorable
> cnm <- ClinicalNoiseModel(300)
> mixed <- MixedTypeEngine(ce, cnm,
+                             list(N=200, pCont = .3, pCat = .7, pBin=0,
+                                  pNominal = .5, range = c(3,7)))
> table(getDataTypes(mixed))

continuous    nominal    ordinal
        77         97        123

> simdata <- rand(mixed, 4000)
> sapply(simdata, dim)

     binned clinical
[1,]   4000     4000
[2,]    297        4

> summary(simdata$clinical)

CancerSubType   Outcome          LFU          Event
Min.   :1.000   Bad :1233   Min.   : 0.00   Mode :logical
1st Qu.:2.000   Good:2767   1st Qu.:14.00   FALSE:2095
Median :4.000               Median :25.00   TRUE :1905
Mean   :3.762               Mean   :28.31
3rd Qu.:5.000               3rd Qu.:42.00
Max.   :6.000               Max.   :71.00

Now we are going to fit univariate models to determine which features are associated with the binary outcome. For continuous variables, we perform a t-test comparing the values in the two outcome groups. For discrete variables (binary or categorical), we perform a chi-squared test.

> DT <- getDataTypes(mixed) 
> results <- data.frame(Test=NA, Statistic = NA, PValue = NA)[-1,]
> for (J in 1:ncol(simdata$binned)) {
+   if (DT[J] == "continuous") {
+     test <- t.test(simdata$binned[simdata$clinical$Outcome == "Bad", J],
+                      simdata$binned[simdata$clinical$Outcome == "Good", J]) 
+     results <- rbind(results,
+                        data.frame(Test = "T",
+                                    Statistic = test$statistic,
+                                    PValue = test$p.value)) 
+   } else { # discrete = binary or categorical
+     tab <- table(feature = simdata$binned[, J], 
+                   outcome = simdata$clinical$Outcome)
+     test <- chisq.test(tab) 
+     results <- rbind(results, 
+                        data.frame(Test = "ChiSq", 
+                                    Statistic = test$statistic, 
+                                    PValue = test$p.value))
+     }
+  }
> summary(results)

    Test             Statistic          PValue
Length:297         Min.   :-9.300   Min.   :0.00000
Class :character   1st Qu.: 1.266   1st Qu.:0.00371
Mode  :character   Median : 4.960   Median :0.23660
                   Mean   : 9.104   Mean   :0.32206
                   3rd Qu.: 9.924   3rd Qu.:0.58628
                   Max.   :87.610   Max.   :0.99882

To account for multiple testing, we fit a beta-uniform-mixture (BUM) model to estimate the false discovery rate (FDR)20. We show the results by overlaying the fitted model on a histogram of p-values (Figure 8).

> suppressMessages( library(ClassComparison) )
> bum <- Bum(results$PValue)

> png(filename = "fig8-bum.png", width=5*res, height=4*res, res=res, bg="white")
> hist(bum)
> dev.off()

06e0d971-0fbb-4bf3-a13e-a036f3159942_figure8.gif

Figure 8. Histogram of p-values with overlaid BUM model.

There is clear evidence of an enrichment of small p-values indicating features that are associated with the clinical outcome in univariate models. We can determine the number of significant features and the nominal p-value cutoff associated with any given FDR.

> countSignificant(bum, alpha = 0.01, by = "FDR")

[1] 76

> cutoff <- cutoffSignificant(bum, alpha = 0.01, by = "FDR")
> cutoff

[1] 0.004186874

We can also count the number of significant “discoveries” associated with each block of correlated genes, but this requires some spelunking into the depths of the mixed-type engine.

> A <- get("altered", mixed@localenv)
> N <- nComponents(A)
> am <- sapply(A@components, function(x) length(x@mu))
> block <- rep(1:N, times = am)
> table(results$PValue < cutoff, block)

       block
         1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
  FALSE 10 11 10 11  8  6  6  6  3  8 11  7  4  9 11  6  2  0  8  7 11 11 11 11
  TRUE   1  0  1  0  3  5  5  5  8  3  0  4  7  2  0  5  9 11  3  4  0  0  0  0
       block
        25 26 27
  FALSE 11 11 11
  TRUE   0  0  0

We can compare this table with a heatmap of the hit pattern (Figure 9). Only 20 of the 27 blocks were included as possible hits, and blocks 2, 4, 11, and 15 were unused. The table shows that none of the identified features were included in any of those blocks, suggesting that we made no false discoveries.

> png(filename = "heatpattern2.png", width=6*res, height=6*res, res=res, bg="white")
> heatmap(mixed@cm@hitPattern, scale="none", ColSideColors = dk[1:6],
+          col = c("gray", "black"))
> dev.off()

06e0d971-0fbb-4bf3-a13e-a036f3159942_figure9.gif

Figure 9. Association between latent variables and clusters.

Black pixels mark the presence of latent variables within a cluster. Columns are clusters (or subypes); rows are latent variables (or hits).

Discussion

The Umpire R-package provides a series of tools to simulate complex, correlated, heterogenous data for methods development and testing for omics and clinical data. The Umpire 2.0 package version described here provides an easy, user-friendly pipeline to generate clinically realistic, mixed-type data to interrogate analytic problems in clinical data. Alongside data sets with meaningful noise and complex feature interrelationships, Umpire simulates subgroup or cluster identities with known ground truth and single- and multi-group dichotomous and survival outcomes. Thus, Umpire facilitates the creation of simulations to explore a variety of methodological problems.

Umpire offers the user a streamlined workflow with ample opportunities for fine-tuning and flexibility. Although this paper describes applications for clinical data, we have previously described Umpire’s tools for simulating omics data8. Furthermore, the modules of the package (e.g., Engines, NoiseModels, and make-DataTypes) may be used interchangeably. Thus, the user may choose to generate omics-scale data of noncontinuous type. The user may generate elaborate simulations by varying and increasing parameters (including, but not limited to, subgroup size or number, feature space, sample size, noise, survival model) to target an inquiry.

In our use cases, we demonstrated the flexibility of Umpire for generating simulations to help evaluate a variety of applications of machine learning to clinical data. These include applications of unsupervised ML to discover subtypes (in Use case 1) and applications of supervised machine learning to find predictive or prognostic factors (in Use cases 2 and 3). The ability of Umpire to evaluate analysis methods is not confined to these use cases. Our use cases illustrating supervised ML did not exploit the fact that, using the parameters saved in a mixed-type engine, Umpire can simulate multiple data sets from the same underlying population, thus providing unlimited test and validation sets. In addition to testing algorithms head-to-head, Umpire can also be used to generate complex simulations to interrogate the “operating characteristics” of an algorithm. For instance, one of the still-unsolved problems in clustering is determining the true number of clusters. A researcher who has developed a new method that claims to solve this problem could simulate mixed-type data with a variety of different cluster numbers, prevalences, feature numbers, and patient sizes to determine which factors influence the accuracy of the method.

We expect Umpire to have wide applicability as a tool for comparing and understanding the behavior of any ML method that has the potential to be applied to clinical data.

Data availability

All data underlying the results are available as part of the article and no additional source data are required.

Software availability

Umpire is freely available at the Comprehensive R Archive Network: https://cran.r-project.org/ web/packages/Umpire/index.html.

Source code is available from R-Forge: https://r-forge.r-project.org/R/?group_id=1831.

Archived source code at time of publication: https://cran.r-project.org/web/packages/Umpire/index.html and https://doi.org/10.5281/zenodo.402310621.

License: Apache License, version 2.0.

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 01 Oct 2020
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Coombes CE, Abrams ZB, Nakayiza S et al. Umpire 2.0: Simulating realistic, mixed-type, clinical data for machine learning [version 2; peer review: 1 approved, 1 approved with reservations]. F1000Research 2021, 9:1186 (https://doi.org/10.12688/f1000research.25877.2)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 2
VERSION 2
PUBLISHED 05 Mar 2021
Revised
Views
11
Cite
Reviewer Report 26 Mar 2021
Justin Mower, Rice University, Houston, TX, USA 
Andrew Zimolzak, Baylor College of Medicine, Houston, TX, USA;  Michael E. DeBakey VA Medical Center, Houston, TX, USA 
Approved with Reservations
VIEWS 11
General Comments
The authors describe an R software package that upgrades and extends a previous package designed to simulate gene expression data. The new package can simulate continuous, categorical, and time-to event data, with added noise. Its main purpose ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Mower J and Zimolzak A. Reviewer Report For: Umpire 2.0: Simulating realistic, mixed-type, clinical data for machine learning [version 2; peer review: 1 approved, 1 approved with reservations]. F1000Research 2021, 9:1186 (https://doi.org/10.5256/f1000research.54638.r82026)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
8
Cite
Reviewer Report 08 Mar 2021
Julia Wrobel, Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA 
Approved
VIEWS 8
The authors have adequately addressed ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Wrobel J. Reviewer Report For: Umpire 2.0: Simulating realistic, mixed-type, clinical data for machine learning [version 2; peer review: 1 approved, 1 approved with reservations]. F1000Research 2021, 9:1186 (https://doi.org/10.5256/f1000research.54638.r80992)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Version 1
VERSION 1
PUBLISHED 01 Oct 2020
Views
44
Cite
Reviewer Report 02 Dec 2020
Julia Wrobel, Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA 
Approved with Reservations
VIEWS 44
The article introduces Umpire 2.0, an R package that builds on its previous version (Umpire 1.0) and provides a way to simulate complicated clinical data that has survival and genomics components. The package has a wide range of useful capabilities, ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Wrobel J. Reviewer Report For: Umpire 2.0: Simulating realistic, mixed-type, clinical data for machine learning [version 2; peer review: 1 approved, 1 approved with reservations]. F1000Research 2021, 9:1186 (https://doi.org/10.5256/f1000research.28557.r75047)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 05 Mar 2021
    Kevin Coombes, Biomedical Informatics, The Ohio State University, Columbus, 43210, USA
    05 Mar 2021
    Author Response
    In response to the comments from Reviewer 1 (Julia Wrobel), we:

    1. Added a sentence to the abstract to clarify the meaning of "mixed type data".
    2. Expanded the captions ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 05 Mar 2021
    Kevin Coombes, Biomedical Informatics, The Ohio State University, Columbus, 43210, USA
    05 Mar 2021
    Author Response
    In response to the comments from Reviewer 1 (Julia Wrobel), we:

    1. Added a sentence to the abstract to clarify the meaning of "mixed type data".
    2. Expanded the captions ... Continue reading

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 01 Oct 2020
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.