Peccary, a metaprogramming open-source R platform to improve pharmacometrics efficiency [version 1; peer review: 1 approved with reservations]

Background: A pharmacometrics (PMx) workflow usually requires several software tools to cover all the steps from data analysis to model evaluation and simulations. However, these tools do not always communicate well together, compromising the efficiency of the whole process. Highly inspired by the markdown/pandoc system, we developed Peccary, an R package and its dedicated Shiny Application with the objective to accelerate the use of previously released R packages through various translation and metaprogramming processes. Methods: Peccary was developed with an agile method, progressively aggregating snippets of R code produced during real-life pharmacometrics works. Its first subpackage, PeccAnalysis, can be used to produce and evaluate various R code for population description (using table1 package), plot exploration (ggplot) and non-compartment analysis (pknca). The second subpackage, PeccaReverse, allows writing a structural model using a minimalist (simplified deSolve) syntax, before metaprogramming model simulations (using either deSolve or RxODE) and design evaluation (through PopED), along with performing various model syntax translations (e.g.,


Introduction
Pharmacometrics (PMx) is the discipline of the description and quantification of interactions between xenobiotics and patient's biologic system (human and non-human) using mathematical models and taking into account both beneficial and adverse effects 1 . These models are used in various circumstances, such as supporting R&D drug development in the pharmaceutical industry 2 , especially through the implementation of Model-Informed Drug Discovery and Development (MID3) 3 , or in hospitals for therapeutic drug monitoring 4 . In addition to the creation of the analysis plan and the Pharmacometrics report, the typical operational workflow of Pharmacometrics analyses includes: 1) pre-modeling work with data assembly, followed by graphical and statistical exploration, 2) iterative core modeling work with equation writing, parameter estimation, and model evaluation, and 3) final model exploitation, mostly through simulations, computation of secondary parameters, and/or parameter interpretation for the patients, in order to inform decision-making for a new study or individual patient dosing 5,6 .
Pharmacometricians routinely use a variety of tools ranging from generic programming software (e.g., R) to specific PMx software (e.g., NONMEM 7 , Monolix 8 , Phoenix WinNonLin 9 , in addition to multiple dedicated R packages). These PMx programs' performance can be assessed both in terms of efficacy and efficiency. Efficacy is the ability of software to produce a desired or intended output, while efficiency represents the resources (time, effort) required to generate the intended output 10 . The efficacy of most of the PMx software is satisfying, thanks to the variety of available programs and the flexibility that they allow. Efficiency, on the other hand, can be improved, through a simplification of the model-coding or the improvement of interoperability, since switching from one software to another with the same model is standard practice to complete a PMx project, but most of the software uses different syntax for coding the same elements (e.g., structural equation or model results). This interoperability issue has already been reported, both for the overall workflow 11 or specific tasks such as optimal design 12 .
Several projects have been created in various fields to improve or create interoperability between software, with at least two different paradigms showing successful results. In the first one, a new robust and generally marked-up language is created, aiming to become the standard of use. It has been created for system biology with the SBML project 13 . The Drug Disease Model Resources (DDMoRe) project (https://www.ddmore.foundation) has implemented this methodology in pharmacometrics [14][15][16][17] .
The second methodology consists of creating wrappers and translation tools from existing languages. The most successful and used tool of that kind is probably pandoc, dealing with document writing 18 . It allows notably to automatically translate and produce a LaTeX-based PDF, HTML or Word document (high efficacy but low efficiency) by simply coding in markdown (low efficacy but high efficiency syntax). The addition of a light translating tool between the two turned the system both highly effective and efficient. Looking at the markdown / (LaTeX) / PDF workflow, at least four main advantages of this system can be described: 1. It is not needed to know LaTeX to produce a LaTeX-based PDF document (reduced required knowledge).
2. It is easier and faster to write first a markdown file and translate it rather than directly coding in LaTeX (efficiency improvement at a single software level).
3. There is no additional work needed to produce an HTML or Word document if those formats are finally required (multiple software / interoperability efficiency improvement).
4. If needed, it is possible to extract the intermediary LaTeX file to verify and eventually modify the output (as a result, even imperfect or incomplete pandoc task often remains useful).
We developed Peccary, a new set of tools (R console package with its Shiny App 19 ) that aims to provide the four previously mentioned pandoc benefits but in the pharmacometrics field. The goal is thus to gather and create links between pre-existing PMx software through various translating systems. R was judged the most appropriate platform for such a project because 1) it is the most used generic programming software in PMx 11 , 2) almost every R packages are under free licenses, meaning their source code are available and users can freely modify or redistribute them 20,21 , and 3) R packages already cover most of PMx tasks, such as a non-compartmental analysis (e.g., pknca 22 ), model diagnostic (e.g., Xpose 23 , npde 24 or vpc 25 ), optimal design (e.g., PFIM 26 and PopED 27 ), simulation (e.g., simulX 28 ), parameter estimation (e.g., nlmixr 29 or saemix 30 ) and linker to proprietary software (e.g., popKinR to manage NONMEM runs 31 or lixoft-connector to control Monolix 32 ). Having all these packages reunited in the same platform, such as Peccary, would therefore highly increases PMx efficiency.

Development process
Software can be developed through various methods. In 1999, Eric Raymond published an influential essay in which he compared software built as a cathedral to those built as a bazaar 33 . Briefly, software built as a cathedral are very well thought out and structured before any code integration whereas bazaar systems imply much less preparation but need an organic development through immediate code creation and aggregation.
Peccary was developed with the bazaar methodology principles. Concretely, Peccary's development followed the main author's daily work as a pharmacometrician. Whenever a specific need to produce an output was encountered, an immediate solution was coded in R, most of the time (but not always) trying to build a generic function with a metaprogramming system. Peccary is the aggregation of these snippets of R-code. This agile methodology allowed us to develop a consistent R package piece by piece, ensuring that each implemented feature is truly needed and tested in real life. Since Peccary's goal is to improve the efficiency of PMx, all attention was given to optimize the time and number of steps required to produce each output.

Software availability
The source code availability of Peccary and its subpackages is described in the software availability statement.
The downloading and installation of Peccary (and all subpackages) is made directly in R using the following code: Peccary and its Shiny App have been tested both on Windows, mac OS and Linux (Arch distribution).

Overview of the workflow
The current structure of Peccary, depicted in Figure 1, comprises three subpackages. Briefly, PeccAnalysis deals with pre-modeling work, PeccaReverse with model building and simulation, and PeccaResult with model diagnostics. Peccary is the main package that gathers these three subpackages under one platform usable both in R console or inside a Shiny App. PeccAnalysis allows performing pre-modeling work such as population description, exploratory analysis and non-compartmental analysis (NCA). PeccaReverse allows to create or import a model with a minimal syntax, before using it for simulations, design evaluation, or translation into other syntaxes. Finally, PeccaResult standardizes and compares runs from multiple software with various analysis functions. Peccary can be used during a whole PMx process (red arrows) or just as a toolbox for any specific task (e.g., model translation from NONMEM to Monolix in green arrows).
Inside the Shiny App, users can create a Peccary's project, allowing to save in the same place metadata of both the datasets (their path, separator, decimal, and "NA" strings) and various analysis (e.g., exploratory plots or model simulations). This system allows the analyst to reload any desired datasets when reopening the Shiny App easily, switch from one dataset to another with a single action, or reload and secondarily modify any pre-saved data analysis. Of note, Peccary requires the use of tidy data 34 , secondarily modifiable in the Shiny App.

Metaprogramming system
Most Peccary functions have been metaprogrammed 35 , meaning a Peccary independent R code/expression is produced before generating the outputs. This system was conceived to make Peccary useful both during the exploratory work phases (trying many things) and during the cleaning / reporting phase (selecting a few key analyses to report). Indeed, most of the produced works in a project are generally never published because they are just intermediate results in a trial-and-error workflow. During this exploratory phase, the Shiny app can directly produce the desired outputs, similarly to a direct markdown / PDF shortcut, with fast but unverified results that the user can only informally trust. However, every time a report integration needs to be made, the corresponding R code can be copy-pasted in an R/Rmd file before being further controlled (quality control) and eventually modified. Of note, this dual-system can dynamically co-exist, for Peccary is especially efficient working in a dual-screen set-up with two parallel R sessions -one using the Peccary Shiny app, the other a free R session for direct code transfer and manipulation.

PeccAnalysis
PeccAnalysis gathers all functions made to analyze the data before any modeling work. This mostly includes three different activities, namely 1) population description, 2) plot exploration, and 3) non-compartmental analysis (NCA).
The theophyline dataset pre-existing in R is used for demonstrative purpose, with the addition of the following columns: dose level (DL) as the rounded real-dose, the presence or absence of a side effect (true or false), and the gender (men or women) leading to Table 1:

Population dataset description
The population description generally requires two steps. The first step is to extract covariate information from the full dataset. Because PK/PD datasets generally contain multiple rows per ID (one for each observation and one for each drug administration), extracting only the covariate information as a reduced dataset is needed. This has been simplified with a function (pecc_search_cov()), automatically creating a covariate table by only inputting Overall, this whole process of population description has been embedded into a single PeccAnalysis function (pecc_table1_original()). In the Shiny App, where it is not needed to know the function's name, the metaprogrammed R code is provided in addition to its corresponding table. The user can choose the output format in the console: either the R code, the final table, or both (through outputExpr argument). An example of an automatic covariate extraction and statistical description by providing only the source data frame, the name of the column representing the individuals, and an optional covariate name to create the columns (other than "Overall") is provided leading to Of note, the ". . . " argument (non-used above) allows the user to manually choose the desired covariates to display (instead of printing them all) but also to transform the numeric covariates into factors, even though the main interest of this function relies on its ability to detect the covariates automatically.

Non-compartmental analysis (NCA)
Peccary simplifies the use of the pknca package 22 for performing NCA in several ways. As a first step, the function automatically creates the dose dataset, either using an EVID column, or assuming dose administration at time 0 (depending on the inputs). Second, it simplifies the computation of AUC from time 0 to X (X being a numeric argument) by using the pknca "intervals" system. Integrating "intervals" also makes it easier to toggle the computation of more NCA parameters by modifying the metaprogrammed code (for instance setting vss. last to True in the following example). Third, it automatically transforms the final pknca results into an organized data frame, directly including all possible covariates (extracted with pecc_search_cov()) for further analysis. Results are shown in Table 4.  The inputs, produced plot, and metaprogrammed code (both Peccary dependent and Peccary independent) are displayed on the same page. In addition, each dataset can store the metadata of each analysis, such as a plot that can be recreated when reopening the Shiny app, before eventually being re-modified. On the left panel, the user can efficiently switch from one dataset to another (among the ones stored in a project or inputted since the beginning of the session).
The Shiny app gathers all metaprogrammed R codes in a single bloc (NCA generation, table1 generation, and covariate analyses).

PeccaReverse
PeccaReverse is the second Peccary subpackage whose role is to simplify model creation, dynamic exploration, simulations, and translation into other software syntaxes. A simplified deSolve syntax has been used as the core language of PeccaReverse. For example, to create two compartments for a PK model with oral drug absorption, one can simply write the structural model as follows: model_demo <-"dGut <--ka * Gut dCentral <-ka * Gut + Periph * k21 -Central * k12 -ke * Central dPeriph <-Central * k12 -Periph * k21 conc_output <-Central/vd" Peccary also contains a PK/PD library to avoid writing frequently used models from scratch. Then, the deSolve_pecc() function automatically extracts all the model compartments, parameters, and initial states (if provided) from the equations, and also reconstitutes both deSolve and RxODE expressions: .. with 2 variables: Cmt <chr>, value <chr> Briefly, deSolve_pecc() function works by analyzing the text through regular expression manipulations (grep, grepl, gsub, . . . ). Everything not recognized as a compartment (here, Gut, Central and Periph, recognized with the initial "d" and some other conditions to distinct them from variables starting with "d"), a computed variable (every word before "<-", here conc), or a reserved word (such as log, exp, if, else . . . ) is directly considered as a parameter (ka, k12, k21, ke and Vd). Adding "_output" or "_plot" (automatically removed in the produced deSolve/RxODE code) at the end of a variable means this variable is respectively an output of the model or that it needs to be reported after simulations for plotting.
Within the Shiny app, once the structural model has been analyzed (through an action button), various input tables are automatically created and pre-filled, as depicted in Figure 6. Filling the missing parts of these tables (further described) allows the users to perform most of PeccaReverse tasks, namely 1) simulations, 2) design evaluation, and 3) various syntax translations. The following sections describe the main console functions internally invoked while using the Shiny App. It is important to note that this system is much less efficient in console mode because it requires writing input tables from scratch, plus the need to know how to use the internal functions (compared to a more intuitive and dynamic use in the Shiny app).

Simulations
Performing simulations and plots was divided into three steps: 1) programming the parameters sets, 2) programming the simulations, and 3) programming the plots. For each step, several methods exist, leading to multiple possible combinations. The metaprogrammed R code is incremented after each of the three steps to produce the final R code.
The first step consists of programming the sets of parameter values to simulate. It requires at least one table containing the population values, the distributions (normal, log-normal) and the information on whether each parameter is estimated or fixed. Importantly, several values can be attributed to a parameter to perform a local sensitivity analysis automatically. The user can choose between simulating those exact values or sampled values within a distribution. A variance/covariance matrix (coded in standard deviation or variance) should be inputted in the late case. Finally, a function generates the code for sampling parameters sets, in the following example asking for 100 parameters sets for three possible k21 values: # Parameter value table param_df <-tribble(~Param, ~Value, ~ Distrib, ~E, "ka", 0.3, "logN", "Esti", "ke", 0.1, "logN", "Esti", "k12", 0.3 ,"logN", "Esti", " k21", c(0,0.1,0.3) ,"logN", "Esti", "vd", 5 ,"Norm", "Esti") The second step consists of programming the simulations using the sets of previously coded parameter values. For the simulations, two additional tables are needed. First, a table containing the initial states of compartments is required. Initial states can be inputted directly as numeric values or formulas involving parameters (e.g., "kin/kout"). Initial states mentioned directly in the structural equations are automatically included in the pre-filled table. Second, a table must contain all information regarding one or several protocols of administrations (times of administrations, amount, inputted compartment, . . . ). Each protocol can contain one or several administrations. Of note, the time column can include any R code (text) which produces an atomic vector after evaluation (e.g., "0:10", "seq(0,240,12)", . . . ), allowing to create several administrations in a single line. Tlag, bioavailability and perfusions information are also set up in this table (in the Shiny app, new parameters corresponding to Tlag or bioavailability are automatically added to the parameters values table).
The simulations can then be performed using the deSolve package or the faster and more PMx-oriented RxODE package. Every possible combination of parameter values and protocols is simulated.
Regardless of the simulation software used (deSolve, RxODE), the simulated values can be plotted with two additional pieces of information: 1) a table containing which outputs the user wants to display and if observations points should be added (with various filtering systems); 2) a table containing for each plot (several can be produced simultaneously) additional information such as the scales to use (logarithmic or continuous) or the wrapping systems. It is possible to wrap the plot either by output, protocol, set of parameter values, a mix of them or none. In the following example, we wrapped the plot both by the set of parameter values and by protocol, every other characteristic (here the outputs) being dissociated using different colors (Figure 7).

# Y logarithmic scale and grid "Parameters vs Events" ("P|E")
Finally, Peccary can import and translate NONMEM, Monolix, or ADAPT files into simplified deSolve syntax. If the user provides a path to an estimated Monolix or NONMEM run, Peccary can extract the observation values, the estimated parameter values and the study designs for each individual (along with estimated typical parameters). Only the simplest study designs can be extracted for now. More complex input (e.g., NONMEM's "SS" or "II" columns) are not covered yet and would require additional programming work. If the extraction is successful, all Peccary tables are automatically filled, with the possibility to switch between the population or any individual sets of values at any time. The main author particularly used this for first recreating the same individual profile as estimated by the parameter estimation software, before trying various modifications to explore what could improve the model performances (new structural equations and/or parameter values).

Translation into other syntaxes
The model can also be translated into various other syntaxes such as Monolix, NONMEM, ADAPT, and nlmixr. All the corresponding translation functions are based on regular expression manipulation (mostly using gsub function) and take into consideration the information included in the previous tables (e.g., parameter distribution, initial parameter values, . . . ). A final table can also mention the outputs (with associated YTYPE and error model). For example, the following code translates our demonstrative model into a Monolix model file. A function for creating the corresponding mlxtran file (to use directly through R with the Lixoft connector 32 ) exists and works for the simplest models (e.g., one or several YTYPE with normal or log-normal variabilities but without covariate), but was in practice judged not efficient enough compared to simply copy/pasting the structural model code directly in the Monolix suite, and thus not further developed.
For NONMEM, minor manual modifications would be required after the translation, for instance if the user wants to change the used ADVAN or the estimation method. Parameters can be coded with or without MU referencing, and a block for handling data below the levels of quantification can be automatically added.  23 ). Therefore, PeccaResult was designed to be used only during the modeling process (the trial-and-error phase), and not for being integrated into a final report: its integration into a traceable workflow is currently not available.
The main idea of PeccAnalysis is thus to use various functions to standardize the output of each software and store their data into a unique structure, namely a "run" S4 object 38 . This object contains essentially a list of various tables: one for estimates of population parameters, one for the individual parameters obtained through posthoc analysis, for the individual and population predictions, etc. By giving a path to a run (a NONMEM or ADAPT result file, a Monolix output folder. . . ), a function automatically recognizes the software and performs this run standardization. Each software requires unique steps for extracting the possible information. More information is available on GitHub under the corresponding source code. Each run can be created from scratch or from a second S4 object, called "folder", containing several runs. Concretely, the user must use a specific closure function 39 that takes the path of any folder : The first time a run is requested, its creation (which takes a few seconds) is directly followed by its storage inside the "folder" object. The next time this run is requested, its creation is not re-performed.
Various functions have then been created to analyze either a single run (if a "run" object is passed as an argument), or to compare runs (if a "folder" is inputted), for instance: results(demo(1)) # return the result of run number 1 results(demo(c(1,3))) # compare the results of run 1 and 3 This results() function provides the main results of a model, namely the criteria score (objective function, AIC, BIC), the parameter estimates with precision and, if provided, the shrinkage. If several run IDs are entered, a full join is automatically created to compare the run values ( Figure 9B). An individual prediction (IPRED) function plots IPRED and PRED vs. time for each individual and each YTYPE. If several run IDs are inputted, the predictions will be plotted in the same graph with different colors ( Figure 9C). The goodness of fit plots can be stratified by covariate, and users can specifically compare any subplot between desired runs ( Figure 9D). The same system applies for random effect analysis (variance-covariance plots) or covariate analysis ( Figure 9E). vpc 25 and coveffectsplot 40 packages have also been implemented in the Shiny app to simplify the production of visual predictive checks and forest plots, respectively. The strength of this system is the universality of these functions. If one creates a new function working with runs, this function will directly work with all software (NONMEM, Monolix. . . ). Reciprocally, if one adds a new software standardization (e.g., from Phoenix WinNonLin), then all Peccary functions will be directly available. The currently available functions cover the most common analysis performed after a run estimate. However, more complex projects often require very specific outputs. Users can then manually create a function using standardized run and import them into a project. Of final note, some parameter estimation software, for instance NONMEM, allows syntax flexibility, creating "personal ways of coding". Those syntax variabilities can create difficulties for PeccaResult to extract all the needed information effectively. If previous effort has been made to consider several "personal ways" of coding in NONMEM, more work is required to make those extractions as flexible and adaptive as possible. This challenge and the difficulty of metaprogramming the system explain why PeccaResult should be used with caution and not integrated into a report. It remains, nevertheless, very useful to perform quick "informal" run results, visualization, and run comparisons.

Generality
The use of multiple software is essential for pharmacometricians to improve efficiency, as no single program can perform all desired analyses. However, this creates significant efficiency issues. In this context, Peccary is an R project created with the sole objective of improving the efficiency of pharmacometricians. So far, the most complex model on which Peccary has been used in real conditions (using both PeccAnalysis, PeccaReverse and PeccaResult functions) was a Monolix PK/PD model containing complex if-else structures, multiple covariates, 51 parameters, 18 ODEs, and five observation types fitted simultaneously.
Pandoc legacy From many aspects, Peccary tried to reproduce the markdown pandoc efficiency but applied in PMx. First, Peccary can accelerate the use of various R packages (pknca, RxODE, PopED, table1,. . . ) by simplifying their syntax, and hence reducing 1) the required knowledge and 2) the time needed before launching these analyses. Second, Peccary can create links between the different packages. For instance, the same input tables in PeccaReverse are used both for simulations, design evaluation and syntax translations, improving the efficiency at the multi-software (or interoperability) level. The Shiny application is fast enough to be used during a live meeting, dynamically producing plots or analyses. Similarly, the Shiny app has educational potential, especially with its ability to easily reload any dataset or model simulations. In a second time, and thanks to its metaprogramming systems, it is possible to extract the peccary independent R code from most analyses, before copy/pasting it into a report to be checked and, if needed, secondarily modified. It is almost always faster to create a code through Peccary than coding it from scratch, even if secondary modifications are needed. If knowing the pre-existing packages integrated in Peccary is not mandatory to produce many outputs, it is nevertheless useful to learn them in order to reach the full potency of the system.

Toward a collaborative project
The Peccary project started in mid-2018 as a personal side project supporting the modeling work of the lead author, with no release ambition. After four years of adding different features, the current version of Peccary significantly improves the efficiency of PMx on a daily basis. Consequently, Peccary may be of interest to other pharmacometricians and is now aiming to be released. However, scaling such a project from a single user to multiple people is challenging. The chances of success would be optimized if the project became fully collaborative, in line with bazaar development. Thanks to its R-only structure and high modularity, every user can become a co-developer by checking, improving, or adding new features to Peccary. Being frequently updated, each modification or addition of a feature is registered in the "author section" in the Shiny app (authors listed by alphabetic order). Finally, any part of Peccary can be integrated in any other project without the authors' initial authorization.

Fixing bugs
Due to the bazaar development approach of Peccary, there is a large heterogeneity in terms of Peccary function qualities. Some of them have been tested, used, improved and fixed several times until complete stability. Some others (the most recent or the less used ones) are more likely to contain errors or to be incomplete. Nevertheless, Peccary was designed to be useful even if imperfect and unvalidated. Despite the integration of several unit tests (e.g., for the peccary_nca() function, more information on GitHub), it is always the user's responsibility to verify the results, as the purpose of Peccary is only to speed up the creation of R codes and have quick access to various results. Users can verify these outputs in two ways, 1) via the metaprogramming system and 2) by comparing the Peccary result to any other software making the same result. For the metaprogrammed functions, computational errors can occur either during the internal Peccary metaprogramming task or during the code evaluation. In the last case, it is still possible to extract the metaprogrammed code and manually fix it (most of the time requiring little effort and time). Above all, multiplying the numbers of co-developer would accelerate the detection and correction of these bugs, in accordance with Linus's law "given enough eyeballs, all bugs are shallow" 33 .

Adding new functionalities
There are also many features that need to be added. For example, the current version of Peccary does not support Time-To-Event data, mixture model, Bayesian analysis, and the Phoenix WinNonLin translation tool sets. . . . It should be noted that other types of ODE-based model types (QSP or PBPK modeling for instance) can be imported or created inside PeccaReverse. Being able to launch and program run parameter estimations through an R package (nlmixr, saemix) or external software (NONMEM, Monolix. . . ) within Peccary can also be of great interest, requiring appropriate parallelization and computational server communications. Finally, any R package can potentially be metaprogrammed in Peccary to further merge more open-source tools within the same platform.

Conclusion
In summary, Peccary is an R package that increases the daily efficiency of a pharmacometrician. Although Peccary was initially developed as an individual project, it has now grown into a platform that aims to benefit the workflow process development for all pharmacometricians by integrating the different steps of the PMx into one tool. However, in line with the bazaar model, Peccary will only reach its full potential through a fully collaborative process.

Peccary source code:
Software available using the following R code (automatically downloading all the subpackages): devtools:: Author contributions TD initiated the project, designed the software architecture, implemented Peccary and wrote the original manuscript. SF supervised the project. SF and XD reviewed and edited the manuscript.