ogttMetrics: Data structures and algorithms for oral glucose

We describe an open source software package, , to compute ogttMetrics diverse measures of glucose metabolism derived from oral glucose tolerance tests (OGTTs). Tools are provided to organize, visualize and compare OGTT data from large cohorts. Numerical difficulties in estimation of parameters of the Bergman minimal model are described, and in one large clinical trial, the simpler closed form index of Matsuda is observed to lead to similar rankings of individuals with respect to insulin sensitivity, and similar inferences concerning effects of modifications to carbohydrate content and glycemic index of experimental diets.


Introduction
Disorders of carbohydrate metabolism contribute substantially to overall disease burden throughout the world. According to the International Diabetes Foundation (International Diabetes Federation: IDF Diabetes Atlas, 2015), over 400 million individuals are diabetic, and numbers afflicted con-tinue to rise.
Various tests are used to diagnose diabetes or assess risk of diabetes. In the oral glucose tolerance test (OGTT), a specified quantity of glucose is ingested orally. Plasma concentrations of glucose and insulin are measured at specific times after ingestion. Panels (a) and (b) of Figure 1 illustrate trajectories of glucose and insulin concentrations in a single patient performing the 120 minute protocol.
Methods for administering, analyzing, and clinically interpreting OGTT results are subjects of active research. Concerns with the use of individualized compartmental models for OGTT analysis are discussed in the work of Theodorakis et al. (2017). These authors propose population level nonlinear modeling for estimation of insulin sensitivity, and demonstrate that empiri-cal Bayes procedures have desirable properties for computation and interpretation.
In this report, we describe an open-source software, ogttMetrics, for the management and analysis of OGTT series collected in large cohorts, and illustrate the application of models and metrics to a cross-over clinical trial (Sacks et al. (2014)) of effects of varying glycemic index and carbohydrate content of controlled diets. A widely cited proprietary software tool for fitting compartmental models to OGTT data is SAAM-II (Barrett et al. (1998)). We developed ogttMetrics to allow open investigation into properties of OGTT series, for which the SAAM-II models yield untenable estimates or do not converge. In addition, we saw an opportunity to develop a formal structure for collections of large numbers of OGTT series. We adopted the Bioconductor MultiAssayExperiment structure for this purpose, and introduced methods for interactive visualization and quality assessment of OGTT series, exploiting structures and functions of this package to simplify the coding.

Informal derivation of insulin sensitivity via the minimal model
Following Bergman et al. (1979), let G(t) and I(t) denote timedependent plasma concentrations of glucose and insulin respectively. Various time-dependent factors affect the trajectories of these concentration functions, and we will assume that derivatives of these function with respect to time, and partial derivatives of these functions with respect to relevant time-dependent variables, can be defined. Let Ġ denote the rate of change of glucose concentration in plasma over time. Glucose effectiveness is defined as E = -∂ Ġ/∂G. This is described as "the quantitative enhancement of glucose disappearance due to an increase in the plasma glucose concentration" (Bergman et al., 1979, p. E673). At steady state, insulin sensitivity is S I = ∂ E/∂ I. A four compartment model (model VI of Bergman et al. (1979)) leads to differential equations where X(t) is an abstract time-dependent function representing insulin action, G(t) is the time-dependent function representing glucose concentration, I(t) represents time-dependent insulin concentration, and B 0 represents "glucose balance" (difference between rates of hepatic release to circulation and uptake in peripheral tissue) extrapolated to zero glucose concentration. By the definition of glucose effectiveness, the first differential equation implies E(t) = X(t)p 1 , and, at steady state, X SS = -I SS p 3 /p 2 . This final expression is substituted into the expression for E just obtained, and after formal partial differentiation by I, we obtain S I = -p 3 /p 2 .
A formal specification In the following, • G(t) is the plasma glucose concentration (mg/dl), • I(t) is the plasma insulin concentration (μU/ml), • G b and I b are the baseline values of glucose and insulin, • X(t) represents insulin action on glucose production and disposal (min -1 ), • S I is insulin sensitivity (min -1 /μU · ml -1 ), • p 2 is a rate constant for dynamics of insulin action (min -1 ), • Ra(α, t) denotes a time-dependent function representing appearance of glucose in plasma, with parameters α (mg · min -1 /kg), • V is volume of distribution (dl/kg), and • S G is glucose effectiveness per unit volume (min -1 ).
We consider the specific formalism for the dalla Man et al. minimal model given by Burattini et al. (2006):

Estimation
The procedure of Dalla Man et al. (2002) involves two phases. In the first phase, the system of ordinary differential equations (ODE) above is solved on the basis of provisional settings of unknown parameters. The solution yields pointwise predictions of glucose concentrations Ĝ t with t ranging over the sampling time course of the OGTT. In the second phase, parameters of the ODE system are updated using non-linear least squares. The phases are iterated until the sum of squared discrepancies ∑ t (G t − Ĝ t ) 2 converges to a minimum. Inputs to the algorithm are measured time series of glucose and insulin concentrations, and individual body weight; other quantities, such as glucose effectiveness (S G ) fraction of ingested dose absorbed (FA), and volume of distribution (V) are taken as fixed constants, with values derived from results of other experiments.

Programming considerations
The kernel of fitOneMinMod in the ogttMetrics package is Here the interface to lsoda in the deSolve package is employed (Hindmarsh, 1983;Petzold, 1983). The formal variables Y[1] and Y[2] represent G(t) and X(t) respectively; ra() and Insulin() are specially defined functions that return, for any given time in the course of the OGTT, the rate of glucose appearance, and insulin concentration, respectively. The lsoda solver is invoked in the function mmsolfn, whose inputs a1, a2, a3 are free parameters of a piecewise linear model for Ra(t), the rate of appearance of glucose; input SI is the target quantity of interest, the measure of insulin sensitivity. The values of free parameters are obtained by minimizing the sum of squared differences between observed glucose g and values predicted by the ODE system for current values of the unknown parameters: mmsolfn = function(a1, a2, a3, SI) lsoda(c(Gb, 0), t, model, c(a1 = a1, a2 = a2, a3 = a3, SI = SI)) fit = nls(g ~ mmsolfn ( in which BW is participant body weight, D is the dose of glucose ingested, and FA is the fraction of ingested glucose that is actually absorbed; DC is constant that determines the rate of exponential decay of glucose concentration in plasma past minute 120.
For concreteness, Figure 1 displays all components of a minimal model fitted to a single 120 minute OGTT.

Data management and reporting
In practice, OGTT series can be collected according to different protocols and may include additional biomarkers such as c-peptide concentrations. For flexible data management and analysis, we adopted the data structure of the MultiAssayExperiment package of Bioconductor. We extended this structure in a class called ogttCohort, which includes metadata about timing of concentration measures. Each biomarker series for each individual is stored as a column of an R matrix, with rows and columns coordinated across assays. Arbitrary additional samplelevel information can be linked to assay data. High level functions getMinmodSIs and addMatsuda120 fit the minimal model or compute the Matsuda index for each series, and append results to the data container. Because the minimal model may be time-consuming to fit, support is provided for parallel computation of multiple models. Use of a compact formal representation of all the OGTT data collected on a cohort simplifies creation of generic reports and visualizations. Figure 2 is based on the QCplots function, that can be applied to any ogttCohort instance. The top two panels display aspects of marginal (time-specific) distributions using boxplots. The bottom two panels are views of joint distributions of features and samples using the biplot methodology of Gabriel (1971). Calibrated outlier detection, proceeding under the assumption that the OGTT series are multivariate normal with a common mean vector and unspecified covariance matrix, can be conducted for glucose and insulin series separately, using mvOutliers. The procedure of Caroni & Prescott (1992) is used.  of the minimal model. The display is made with transformed axes (log10 for SI, square root for Matsuda's index). Negative estimates of SI are Winsorized to the smallest positive estimate observed in the data. Positive correlation between the indices is apparent, and the general trend appears to be obeyed for the majority of estimates of SI for which the dalla Man et al., algorithm does not converge.

Application to a cross-over trial
We created the ogttMetrics package to analyze data from the OMNICarb study (Sacks et al. (2014)). This study involved over 150 overweight individuals (BMI > 25kg=m 2 ) whose systolic blood pressure was in the interval 120-159 mmHg, or diastolic blood pressure in the interval 70-90 mmHg. Individuals with diagnoses of diabetes, cardiovascular disease, or chronic kidney disease were excluded. Four experimental diets were designed to provide contrasting values of overall carbohydrate content and glycemic index of foods consumed. Carbohydrate and glycemic index each had two levels denoted C and c (G and g) respectively, leading to the set (CG, Cg, cG, cg) of experimental diets. Each patient received a randomly ordered sequence of diets from this set, consuming each assigned diet for five weeks, with a pause of two weeks between diets. At the end of each feeding period a 120-minute OGTT protocol was administered. As noted previously, attempts to fit the Bergman minimal model with SAAM-II frequently failed to produce acceptable values, and so the study report of effects on insulin sensitivity used Matsuda's index. We have used the ogttMetrics package to structure the data and compute both Matsuda's index and the minimal model SI. The right panel shows results based on SI that are qualitatively similar to those found with Matsuda's index, with the exception of the estimated effect of lowering glycemic index in the context of high overall carbohydrate content. With Matsuda's index, the 95% confidence interval excludes zero, but this is not observed when SI is used. Further work on optimizing estimation of insulin sensitivity from the 120 minute OGTT protocol is warranted; the empirical Bayes approach of Theodorakis et al. (2017) is of particular interest as individual-level estimation in that procedure borrows strength from information assembled for the cohort as a whole.

Installation and operation
Installation of ogttMetrics can be accomplished using R 3.4 via devtools::install_github("vjcitn/ogttMetrics", dependencies=c("Depends", "Imports", "Suggests")). The key infrastructure components required for ogttMetrics are CRAN package deSolve for minimal model estimation, and Bioconductor package MultiAssayExperiment for data management. The SIexplorer utility employs the shiny package. These key components have extensive dependencies among other CRAN packages, but these dependencies are automatically resolved by the install_github() command given above. All example data analyzed or visualized in this paper are accessible using the data() function. For example, to reproduce  Users may import data managed in spreadsheets (CSV format) for use with this software. An executable example is available with example(csvImport).

Conclusions
The reference assay for glucose metabolism is the hyperinsulinemiceuglycemic clamp (Soonthornpun et al. (2003)). Because it is less expensive and much less invasive, the OGTT is an attractive assay for assessing insulin sensitivity, particularly in large studies. We have presented, and made freely available (at http://github.com/vjcitn/ogttMetrics), a collection of data structures and functions in the R programming language that help manage and interpret OGTT series collected in cohort studies and clinical trials.
We and others have found that the minimal model frequently fails to generate reasonable values for SI in OGTT series encountered in practice. In part this is manifested in nonconvergence of the basic nonlinear model for the glucose trajectory. However, we have not observed a striking disparity between rankings of participants using the estimate of SI based on an unsatisfactory minimal model fit, and rankings obtained when the closed form Matsuda index is computed on the same OGTT data (Figure 3, Spearman correlation between Matsuda and estimated SI = 0.5782, p < .0001). The estimated SI may be good enough for practical use, but further investigation of features of OGTT data associated with non-convergence of the minimal model, and biologically motivated elaborations of the model that yield successful fits more generally, should be undertaken.
The tools for multivariate analysis and interactive model visualization in the SIexplorer component of ogttMetrics will be useful for gaining additional insight into subtyping of patients according to features of glucose and insulin trajectories.

Software and data availability
Software and all data analyzed in this paper are available from: http://github.com/vjcitn/ogttMetrics Archived source code as at time of publication: DOI, 10.5281/ zenodo.570174 (Carey, 2017) License: GPL-3

Author contributions
Benjamin Stubbs and Keith Frankston developed software and visualizations, analyzed the data, and participated in manuscript development. Marcel Ramos developed the MultiAssayExperiment package of Bioconductor. Frank M. Sacks and Nancy Laranjo conceived and executed the OMNICarb study created the database from which ogttMetrics data are derived, and participated in manuscript development. Vincent Carey acquired funding for software development, developed software and visualizations, and wrote the manuscript.

Competing interests
No competing interests were disclosed. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

4.
Running the examples provided in the paper produces some errors: > QCplots(obasamp) Error in experiments(oc) : object 'obasamp' not found After fixing the command the first run gives: > QCplots(obaSamp) ... Error in UseMethod("depth") : no applicable method for 'depth' applied to an object of class "NULL" > Oddly enough this works when used later. 5. Additionally, when reading the example from the PDF, the command plot_OGTT_fit contains 'fi' ligature which breaks copy-paste of the command from the PDF. 6. Running "R CMD check" produces notes and a warning, which probably would not be acceptable at the major repositories: * checking for missing documentation entries ... WARNING Undocumented data sets: 'omnicCG_samp' 'omniccG_samp' 'omniccg_samp' All user-level objects in a package should have documentation entries. See chapter 'Writing R documentation files' in the 'Writing R Extensions' manual. 7. These fairly trivial technical issues aside, I am unsure what is the intended audience of the package and how useful it would be for that audience. The authors present a smooth workflow for analysing pre-packaged data from existing large studies, but instructions for importing new data are limited to one sparsely documented example and it is not immediately obvious how to e.g. compute the minimal models for this example. The vignette contains some code snippets that are likely relevant, but more comments and explanation would be needed. I tried a little but could not get this working easily. In general the vignette would need to be clearer to be useful to new users. 8. Related to the above note, the csvImport format should be documented better. The vignette could contain an example with different time points. A hard-coded default of time points seems difficult for something where there probably is no generally applicable default.
9. The minimal model code contains a number of magic constants with some assumed default values. It would be very good to document with proper references where these come from. It is especially unclear where the constant 420 in the integral in Programming considerations comes from and if that can be safely used for data with a different sampling period.
10. The implications of the piece-wise linear model using a different model before and after 120 min for data with different (either shorter or longer) sampling period and times should be discussed. Can the model be safely applied in these cases? Are there other hidden assumptions that could impact the end users?
11. The unit for BMI in "Application to a cross-over trial" is reported incorrectly (25kg=m^2, units incorrectly in italics).