Quantitative AI yields algorithmic RNA biomarkers from a rheumatoid arthritis clinical trial, accurately predicting individual patient responses to anti-TNF treatment, providing a novel approach to companion diagnostic discovery.

Kevin Horgan; Michael F McDermott; Douglas Harrington; Vahan Simonyan; Patrick Lilley

doi:10.12688/f1000research.164015.1

Home Browse Quantitative AI yields algorithmic RNA biomarkers from a rheumatoid...

ALL Metrics

Views

Downloads

Get PDF

Get XML

Export

▬

✚

Research Article

Quantitative AI yields algorithmic RNA biomarkers from a rheumatoid arthritis clinical trial, accurately predicting individual patient responses to anti-TNF treatment, providing a novel approach to companion diagnostic discovery.   

[version 1; peer review: 2 approved with reservations]

Kevin Horgan ¹, Michael F McDermott^2,3, Douglas Harrington⁴, Vahan Simonyan⁵, Patrick Lilley^1,6

Kevin Horgan ¹, Michael F McDermott^2,3, [...] Douglas Harrington⁴, Vahan Simonyan⁵, Patrick Lilley^1,6

PUBLISHED 07 Jul 2025

Author details Author details

¹ Liquid Biosciences Inc., 26895 Aliso Creek Road B800, California, 92656, USA
² Institute of Rheumatic and Musculoskeletal Medicine, University of Leeds, Leeds, England, LS9 7TF, UK
³ Department of Rheumatic Diseases, Clinical Sciences Institute, University of Galway, Galway, H91 V4AY, Ireland
⁴ Montana Department of Health and Human Services, Helena, MT, 59601, USA
⁵ Department of Biochemistry and Molecular Medicine, The George Washington University School of Medicine and Health Sciences, Washington, District of Columbia, 20052, USA
⁶ Ignite Biomedical Inc., Framingham, MA, 01701, USA

Kevin Horgan
Roles: Conceptualization, Investigation, Methodology, Writing – Original Draft Preparation, Writing – Review & Editing

Michael F McDermott
Roles: Conceptualization, Investigation, Writing – Original Draft Preparation, Writing – Review & Editing

Douglas Harrington
Roles: Conceptualization, Investigation, Writing – Original Draft Preparation, Writing – Review & Editing

Vahan Simonyan
Roles: Conceptualization, Methodology, Visualization, Writing – Review & Editing

Patrick Lilley
Roles: Conceptualization, Data Curation, Formal Analysis, Funding Acquisition, Investigation, Methodology, Project Administration, Resources, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the AI in Medicine and Healthcare collection.

Abstract

Background

The derivation of novel biomarkers from biomedical data to accurately predict individual patient’s responses, would be an advance. We hypothesized that quantitative AI designed to analyze complex data, based on evolutionary computation, could identify algorithmic biomarkers from baseline data in a clinical trial, predictive of individual therapeutic responses.

Methods

A previously published randomized placebo controlled clinical trial in which patients with active rheumatoid arthritis (RA) naive to anti-tumor necrosis factor (TNF) therapy, were randomized to receive infliximab or placebo was analyzed. Baseline peripheral blood gene expression data, plus the treatment variable, infliximab or placebo, yielding 52,379 variables were available for 59 patients. The variable for analysis was a decrease in Disease Activity Score-28 (DSA28) score of 1.2. At 14 weeks, 20 of the 30 patients receiving infliximab had responded, and ten of the 29 patients receiving placebo had responded.

Findings

The AI derived a discovery algorithm, with 4 gene expression variables plus treatment assignment, that predicted responders versus non-responders for all 59 patients, with 100% accuracy. We present the discovery algorithm to enable transparent verification. Excluding the 4 gene expression variables, we then derived similarly accurate predictive algorithms with 4 other gene expression variables. We tested the hypothesis that the software could derive algorithms as predictors of treatment response applying just these 8 discovery gene expression variables to 6 previously published independent datasets. In each validation analysis, the accuracy of the algorithmic predictors surpassed those benchmarks previously reported, using a variety of analytic approaches.

Conclusions

AI based on evolutionary computation summarized a clinical trial, with transparent biomarker algorithms derived from baseline data, correctly predicting the outcome for all patients. The biomarker variables, validated in 6 independent cohorts, are now in development as a clinical test. This approach may expedite the discovery of companion diagnostics.

Keywords

Biomarkers, algorithmic, AI, evolutionary computation, precision medicine, clinical trial, anti-TNF, rheumatoid arthritis

Corresponding author: Kevin Horgan

Competing interests: Kevin Horgan and Patrick Lilley have equity in Liquid Biosciences. All other authors report no financial relationships with any organizations that might have an interest in the submitted work in the previous three years. The reported gene expression variables are included in issued U.S. Patent 9,845,505

Grant information: The author(s) declared that no grants were involved in supporting this work.

Copyright: © 2025 Horgan K et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Horgan K, McDermott MF, Harrington D et al. Quantitative AI yields algorithmic RNA biomarkers from a rheumatoid arthritis clinical trial, accurately predicting individual patient responses to anti-TNF treatment, providing a novel approach to companion diagnostic discovery.    [version 1; peer review: 2 approved with reservations]. F1000Research 2025, 14:663 (https://doi.org/10.12688/f1000research.164015.1) First published: 07 Jul 2025, 14:663 (https://doi.org/10.12688/f1000research.164015.1) Latest published: 07 Jul 2025, 14:663 (https://doi.org/10.12688/f1000research.164015.1)

Introduction

The completion of the Human Genome Project in 2003 was accompanied by optimism that a profusion of novel diagnostics and therapeutics would rapidly ensue. Technical innovations and digitization have since produced vast amounts of data without clearly impacting the trajectory of clinical advances.¹ The lack of predictive biomarkers with sufficient accuracy to inform therapeutic decisions for individual patients is an unresolved problem, thwarting the realization of the vision of precision medicine.^2,3 Why has the explosive growth in biomedical data not enabled greater progress, especially with respect to the discovery and development of predictive biomarkers? The typical explanation is the daunting complexity of biology.

Living organisms are complex systems, containing orders of magnitude more information than non-living entities.⁴ Biological functions, including disease and drug responsiveness, result from complex networks of cellular and molecular interactions that can potentially be described mathematically, reflecting mechanisms and constraints under which biological systems operate.^5–9 These interactions are typically nonlinear and multi-dimensional which complicates their analysis and definition.^10–13 This explains why single variable biomarkers are typically inadequate providers of clinically useful predictive insight.¹¹

Identifying causal factors from data is an inverse problem. What would an ideal analytic solution to the inverse problem as applied to biological complexity to yield actionable insight, including predictive biomarkers, look like? In a complex system, information is present in the relationships between components, as well as in the individual components themselves. An effective analytic solution would reveal the specific molecular networks that underly biological functions, diseases and therapeutic effects, respectively. The analytic approach would not assess the components in isolation. Instead, it would identify the relevant components in terms of their mathematical relationships with other components to reflect accurately their frequently non-linear nature. In essence, the solution would distil essential information into transparent explanatory and predictive quantitative algorithms, which could then be translated into clinical tests. Such an analytic solution could be applied to all forms of biomedical data, including clinical trial data. Randomized clinical trial outcomes are conventionally analyzed using a pre-specified hypothesis and statistical tests, based on the average overall response, in order to assess whether treatment is effective at a population level. Much information from clinical trials is neglected with this approach, resulting in a lack of guidance for clinicians and patients as to which individuals might benefit from the therapy. The ability to summarize clinical trial data in terms of transparent summary, quantitative, easily validated algorithms predictive of individual patient outcomes would have significant potential for informing both clinical practice and research.

Since machine learning is not adequate, in part because of its black box solutions, and in part because prevalent methods do not have a broad palette of mathematical functions to characterize many interactions present in biological systems, we developed a novel approach.^14,15 We devised a novel analytic solution based on an evolutionary computation foundation fused with mathematics, the science of complex emergent systems, information theory and its subset, algorithmic compression theory. The software incorporated a comprehensive set of mathematical functions, to produce transparent interpretable predictive algorithms from complex biomedical data, specifically designed to model the nonlinear and high dimension relationships that define complex emergent systems. The algorithms produced are verifiably accurate mathematical solutions that can predict the outcome of interest.

To test the software, we applied it to a previously published placebo-controlled clinical trial of infliximab in RA, conducted by a pharmaceutical company with baseline peripheral blood transcriptomic data.^16,17 Infliximab is a monoclonal anti-TNF antibody effective in treating a range of immune mediated disease including RA. The hypothesis was that the software would produce transparent, interpretable, discovery algorithms predictive of individual treatment responses that could be independently validated. We selected this trial because it was randomized and placebo controlled, with tens of thousands of RNA data points available for each patient to provide a rigorous test. The software produced a discovery algorithm comprised of biomarker measurements, a clinical variable in addition to mathematical functions, with 100% accuracy in predicting both infliximab and placebo treatment outcomes. The biomarkers were four RNA gene expression variables. The discovery algorithm is shown, in different formats, to enable independent validation. When those four gene expression variables in the original algorithm were excluded from subsequent analyses, using exactly the same analytic approach, additional algorithms were derived, with four different gene expression variables and similar 100% predictive accuracy.

Having identified the eight discovery gene expression variables, we then hypothesized that application of the software, using only those eight variables to previously published data from six additional RA studies containing baseline gene expression data, would also derive accurate predictors of clinical outcomes following anti-TNF treatment. In each case, using just these eight gene expression variables, the software provided an algorithm more predictive of treatment response than reported in the original publications, thereby independently validating both the gene expression variables as response predictors, and the superior predictivity of our methodology relative to the various approaches, including different variants of machine learning, previously reported. The gene expression variables we discovered are now in development as a clinical test, which will need to be validated prospectively in future studies for clinical use.

Methods

Discovery cohort

The discovery cohort was from a published randomized, double-blind placebo-controlled trial of infliximab, in RA patients naïve to biologics therapy following an inadequate response to methotrexate.^16,17 Participants were recruited between April 6^th, 2011 and March 29^th, 2012 from three European clinical centers: one in Romania and two in Moldova.¹⁷ The study was conducted in accordance with principles of Good Clinical Practice and was approved by the National Ethics Committee in Romania and the National Ethics Committee, Clinical Research of Drugs and Methods of Treatment in Moldova, with all subjects providing informed written consent.¹⁷

Active disease was defined as at least 6 tender and 6 swollen joints, with a rheumatoid arthritis magnetic resonance imaging score of ≥1 in the radio-carpal or intercarpal joints, as objective confirmation of disease activity. The participants were all on stable doses of methotrexate, steroids, and/or non-steroidal anti-inflammatory drugs. At weeks 0, 2, 6, and 14, the participants received either infliximab 3 mg/kg or placebo. The participants were of mean age of 50 years, predominantly female (92%), and rheumatoid factor positive (91.5%), with a mean baseline Disease Activity Score-28 (DAS28) score of 6.2. The primary endpoint was a magnetic resonance imaging assessment of disease activity. This endpoint was not used in our analysis, because the imaging data were not available for individual participants. The study had 80% power to yield a significant difference in DAS28 score.¹⁷ The trial used the European League against Rheumatism (EULAR) DAS28 score to evaluate response, defined as a decrease of 1.2, as a binary dependent variable for the analysis at 14 weeks, yes or no. Baseline peripheral blood gene expression data for 59 patients were available plus one additional variable reflecting treatment, either infliximab or placebo, resulting in a total of 52,379 potentially independent variables.¹⁶ The participant flow is shown diagrammatically in the original publication, where the study protocol is referenced.¹⁷

Validation cohorts

To independently validate the gene expression variables in the discovery algorithms, we applied these variables to 6 previously published studies with available baseline gene expression data from patients with active RA, treated with anti-TNF therapies.^18–23 These studies were done in different geographies, used different methodologies to process the samples and gene expression data, with variable endpoints, and in some cases, used different anti-TNF therapies.^18–23 Summary details on the discovery and validation studies are provided in Table 1, with details available in the original publications.^18–23 Written informed consent for all participants and local ethics approval was obtained for each of the studies that provided validation data as detailed in the original publications.^18–23

Table 1. Discovery and validation studies of blood-based gene expression data of anti-TNF responsiveness in RA.

First author	MacIsaac et al. (2014)^16,17	Lequerre et al. (2005)¹⁸	Julia et al. (2009)¹⁹	Bienkowska et al. (2009)²⁰	Toonen et al. (2011)²¹	Nakamura et al. (2016)²²	Tanino et al. (2010)²³
GEO accession	GSE58795	GSE3592	GSE12051	GSE15258	GSE33377	GSE78068	GSE20690
Phase	Discovery	Validation	Validation	Validation	Validation	Validation	Validation
Location	Moldova & Romania	France	Spain	USA	Netherlands	Japan	Japan
RA classification criteria	RA ACR (1987)	RA ACR (1987)	RA ACR (1987)	RA (1987)	RA ACR (1987)	RA ACR (1987) or EULAR/ACR (2010)	Not reported
Treatment	infliximab or placebo	infliximab	infliximab	infliximab adalimumab, and etanercept	infliximab and adalimumab	infliximab	infliximab
RA treatment population	MTX resistant. No prior TNF therapy	MTX resistant. DAS28>=5.1	MTX resistant. DAS28>3.2. No prior TNF therapy	Active RA: No TNF in past 6 months	DMARD resistant. DAS28>3.2. No prior TNF therapy	MTX resistant	MTX resistant
Platform	Rosetta/Merck custom Affymetrix 2.0 microarray	Affymetrix Human Genome U133A Array	Illumina H-6 mRNA Sentrix Human-6 Expression BeadChip	Affymetrix Human Genome U133 Plus 2.0 Array	Affymetrix GeneChip Exon 1.0 ST	Agilent-014850 Whole Human Genome Microarray 4x44K G4112F	Agilent-014850 Whole Human Genome Microarray 4x44K G4112F
Efficacy	14-week EULAR criteria	14-week EULAR criteria	14-week EULAR criteria	14-week EULAR criteria	14-week EULAR Criteria	6-month CDAI	14-week serum CRP

Microarray data

The gene expression profiles analyzed were downloaded from the GEO database. Because all the data used were previously published, de-identified and publicly available, neither ethics committee approval nor informed consent were required. Datasets were downloaded and transposed so that gene expression values and clinical variables were changed from rows to columns, and subject records were changed from columns to rows. Data files were saved in CSV format and imported into the software for analysis, without any pre-processing of the biomarker measures.

Analysis

The software is a quantitative analytic platform based on evolutionary computation, designed as a scalable, unbiased methodology to produce transparent algorithms based on mathematical relationships from complex data, without any prior assumptions other than the patient selection criteria and designs of the original studies yielding data for analysis. The software fuses evolutionary principles, signal processing functions, and information theory, and requires no domain expertise or prior knowledge of the nature of a problem in terms of explanatory variables, dimensionality or underlying mathematical relationships. A distinctive feature is that the software uses all available data to derive the algorithms, without any filtering process to exclude variables based on commonly used thresholds or feature selection methods. This enables the identification of variables typically discarded by feature selection methods used in biomarker discovery. This includes such methods’ tendency to discard potential explanatory variables, with relatively low expression or nonlinear relationships to the outcome, but which may be functionally important because of the nonlinear and binary threshold interactions pervasive in complex biologic systems. The software identifies key variables in the context of their mathematical relationships with each other and associated with outcomes of interest. The software automatically excludes overfitted algorithms and automatically divides the data into three distinct, random subsets that are sequentially processed: a training set, a selection set, and a test set. Analysis of the training subset provides an ensemble of candidate algorithms, which are then evaluated on the selection subset, to select a final algorithm, which is then validated on the test set. An overfitted algorithm would not be validated on the test set, ensuring that overfitted algorithms are not selected. The training, selection, and test data subsets are scrupulously segregated, to avoid any information leakage between the discrete components of the process. In the discovery analysis of the MacIsaac et al. data, there were 18 patients in the training set, 21 in the selection set and 20 in the test set ( Table 2).

Table 2. Discovery algorithm metrics.

	Analysis of data from MacIsaac et al.¹⁶
Training set: 9 placebo, 9 infliximab
Errors/total	0/18
Accuracy	100%
Misclassification rate	0%
Validation set: 11 placebo, 10 infliximab
Errors/total	0/21
Accuracy	100%
Misclassification rate	0%
Test set: 9 placebo, 11 infliximab
Errors/total	0/20
Accuracy	100%
Misclassification	0%
Overall Accuracy	100%
Overall Sensitivity	100%
Overall Specificity	100%
Number of gene expression variables	4

The 8 variables in the discovery algorithms were then applied to the analyses of six additional published data sets in patients with RA, using both baseline gene expression data and response outcomes to anti-TNF therapy, for independent validation. The intent was to prospectively validate the 8 variables and also to benchmark the predictivity of the algorithmic biomarkers that incorporated them, relative to prior predictors using traditional analytic approaches, including machine learning.

Results

Discovery analysis: Derivation of MacIsaac et al. algorithm

The software initially yielded an algorithmic biomarker set with five variables and twelve sequential mathematical instructions as shown in Table 3. There were four gene expression variables expressed quantitatively: SPTY2D1, Clorf105, KCTD4 and UL84 and the fifth variable was treatment assignment: infliximab or placebo. The algorithm is also presented in the form of an equation in Figure 1a, and in schematic form in Figure 1b. The three different depictions of the algorithm allow transparent validation of the discovery algorithm using the MacIsaac et al. GSE58795 dataset.

Table 3. Discovery algorithm as series of operations.

Instruction	Input registers	Explanation
r[03] = SINCPI(r[03])	r[03] = 19	Calculate the sine of 19, divide that by 19 and push the result (0.00788827419278696) into memory register 03
r[06] = SUB(r[13], r[02])	r[13] = KCTD4, r[02] = 7	Subtract 7 from the measurement of RNA KCTD4, then push the result into register 06
r[04] = ADD(r[15], r[01])	r[15] = Treatment_num, r[01] = 2	Add Treatment_num to 2 and push the result into register 04. Treatment_num 0 is placebo, Treatment_num 1 is infliximab
r[05] = ADD(r[10], r[06])	r[10] = SPTY2D1	Add the value in register 06 to the measurement of RNA SPTY2D1 and push the result into register 05
r[01] = DIV(r[02], r[04])	r[02]= 7	Divide 7 by the value in register 04 and push the result into register 01
r[00] = SINCPI(r[09])	r[09] = UL84	Calculate the sine of the measurement of RNA UL84, divide that by the measurement of RNA UL84, and push the result into register 00
IF(r[00] < r[03])		If the value in register 00 is less than the value in register 03, then execute the following instruction, otherwise skip it
IF(r[11] < r[01])	r[11] = C1orf105	If the measurement of RNA C1orf105 is less than the value in register 01, then execute the following instruction, otherwise skip it
r[00] = ADD(r[04], r[05])		If both of the above IF statements are TRUE, then add the value in register 04 to the value in register 05 and push the result into register 00. If one or both of the above IF statements are false, skip this instruction
r[05] = DIV(r[11], r[00])	r[11] = C1orf105	Divide the measurement of RNA C1orf105 by the value in register 00, and push the result into register 05
r[04] = CONSTDIV(7, 5)		Divide 7 by 5 and push the result into register 04
r[00] = ADD(r[04], r[05])		Add the value in register 04 to the value in register 05, and push the result into register 00
Final interpretation		If the contents of memory register 00 are negative, then patient is predicted to be a non-responder, otherwise a responder

Figure 1. Discovery algorithm as an equation (a) and in schematic form (b).

The discovery algorithm containing 4 variables is transparently depicted as an equation and also in schematic form to complement its depiction as a sequence of operations in Table 3, to facilitate both understanding and independent validation using the data from GSE58795 from MacIsaac et al. accessible at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE58795.¹⁶

A calculation for an individual patient, with a resulting value of less than zero indicated treatment non-response, and a value of zero or more indicated treatment response. The algorithmic variables were not those with the highest levels of expression. Agnostic evolutionary selection of treatment as a variable, either infliximab or placebo, into the predictive algorithm is evidence that the treatment has a mathematically significant impact on the response outcome for some patients. Although expected a priori, the treatment assignment variable was selected agnostically by the evolutionary process, and not pre-specified. Users of the software cannot influence or require any variable to be incorporated into an algorithm.

The performance metrics for the components of the discovery analyses are shown in Table 2. The overall accuracy was 100%, with 100% sensitivity and specificity. Repeat analyses consistently yielded exactly the same variables, and the mathematical instructions encoded in all algorithms were, without exception, mathematically equivalent in terms of binary outcome: response or non-response. Omission of any of the mathematical instructions from the algorithm degraded predictivity. Therefore, the algorithm was optimized and devoid of superfluous calculations. Accuracy and reliability were consistent across training, validation and test sets analysed. When the initial four gene expression variables were excluded from subsequent analyses, four additional gene expression variables were identified as components of algorithms, also with 100% accuracy: PTPRC, TPM3, ARHGDIB and SIAH1 ( Table 4). The software selects the minimum number of variables for maximum accuracy. The fact that a second group of four variables provided algorithms with 100% accuracy implies that these variables are highly correlated with the original four, containing almost as much information.

Table 4. Gene expression variables present in discovery algorithms.

Expression Variable		Function
SPTY2D1	Suppressor of Ty 5 homolog	Tumor suppressor gene in thyroid cancer²⁴
C1orf105	Chromosome 1 open reading frame	Vascular remodeling, coronary artery disease and atrial fibrillation^25,26
KCTD4	Potassium channel tetramerization domain containing 4	Expressed in sepsis and esophageal cancer^27,28
UL84	Gene for human CMV protein UL84	Prior CMV infection associated with treatment response in RA^29,30
PTPRC	Protein Tyrosine Phosphatase Receptor Type C or CD45	Genetic variants associated with anti-TNF response in RA^31,32
TPM3	Tropomyosin 3	Tropomyosin isoform associated with neoplasia, myopathy and as an autoantigen in RA³³^–³⁵
ARHGDIB	Rho GDP Dissociation Inhibitor Beta	Negative regulator of Rho guanosine triphosphate (RhoGTP) ases, reduced in osteoarthritis synovial fluid and elevated in early RA synovial fluid³⁶^,³⁷
SIAH1	Siah E3 Ubiquitin Protein Ligase 1	Downregulated in both RA and tuberculosis, may influence amplitude of inflammatory gene response³⁸

Validation of discovery gene expression variables by analysis of six independent datasets

We then tested the hypothesis that the eight variables from the discovery algorithms could be used to derive algorithms predicting individual treatment anti-TNF response outcomes from six additional published RA clinical trial data sets, providing independent validation of the variables.^18–23 These data sets were chosen because they were derived from clinical trials similar to the discovery trial, though none were placebo controlled. While the eight variables used were the same for all six dataset analyses, distinct algorithmic operations were necessary for each validation, because of variation in quantitation instruments, measurement scale, data preparation and normalization methods used by the original researchers across the individual datasets. Summary information on the clinical studies is shown in Table 1, with further details available in the original publications.^18–23 In each case, the algorithms we produced had superior performance to the analyses presented in the original publications as shown in Table 5, even though only a subset of the eight discovery variables were available in some datasets. The original analyses used a wide range of analytic methods including the most common machine learning approaches: neural networks, support vector machines and random forests.

Table 5. Discovery and validation studies of blood-based gene expression data of anti-TNF responsiveness in RA.

First author	MacIsaac¹⁶	Lequerre¹⁸	Julia¹⁹	Bienkowska²⁰	Toonen²¹	Nakamura²²	Tanino²³
Number of patients	59	30	44	46 – leaving out intermediate responses or 75 including intermediate responses	42	140 (infliximab treated)	68
Classifier Method	OLS regression	UHC	SVM, DDA, RF, & k-NN	RF	K-means cluster analysis	Logistic regression	LOO
Cross-validation analyses	None	LOO	LOO	LOO, NN, LDA and SVM	None	None	None
Original reported results	None reported	Sens 90% Spec 70%	Sens 94% Spec 86%	Sens 88% Spec 84%	Sens 61% Spec 71%	Sens 79% Spec 47%	Sens 68% Spec 86%
Current analysis	Sens 100% Spec 100%	Sens 100% Spec 100%	Sens 100% Spec 86%	Sens 100% Spec 86%	Sens 100% Spec 96%	Sens 95% Spec 91%	Sens 96% Spec 84%

The weighted-average sensitivity and specificity of the individual models from the analyses we conducted across all 7 datasets, with a total of 400 subjects treated with anti-TNF therapy, were 98.5% and 90.9% respectively, surpassing the accuracy of all the prior published individual analyses, as shown in Figure 2. Accuracy for the marketed TNF response predictor from two reports, Mellors et al. and Jones et al., is also presented for comparison.^39,40 The marketed predictor consists of 23 variables and was derived using machine learning, with a combination of neural networks and random forests. The actual model has not been published.

Figure 2. Performance metrics of discovery algorithmic biomarker predictor compared to other predictors.

Weighted-average accuracy of the individual analyses we conducted using 7 datasets, on all 400 subjects receiving anti-TNF therapy had 98.5% sensitivity and 90.9% specificity as shown. This surpassed the performance metrics for all the individual previously reported analyses which are also shown.^18–23 No metrics were provided for MacIsaacs et al. study used for discovery.^16,17 Accuracy for a marketed TNF predictor from two reports, Jones et al. and Mellors et al., is also presented for comparison.^30,31

Discussion

To address the stagnation in biomarker discovery and development we applied a novel analytic approach, based on evolutionary computation and information theory incorporating mathematical functions, to a placebo controlled randomized clinical trial with 52,379 baseline gene expression variables for each of 59 patients. This provided discovery biomarker algorithms perfectly predictive of individual responses to both active therapy and placebo. The algorithms contained subsets of 4 variables from a total of 8 variables. The eight discovery variables were then validated, using the same analytic approach, as components of algorithmic predictors when applied to data from six independent clinical trials with different TNF inhibitors, on 3 continents, in multiple ethnicities, despite differences in response criteria, gene expression processing platforms and measurement scale differences, thereby providing extremely high confidence as to their validity.

The MacIsaacs et al. clinical trial study that provided the discovery data, was conducted by a major pharmaceutical company but did not provide a predictor of therapeutic response, nor did it identify any of the 8 variables we discovered as predictive.¹⁶ In each of the six validation analyses, our derived algorithms all containing 4 or fewer variables, had sensitivity and specificity superior to benchmarks in the original publications, which used a variety of mainstream analytic approaches. including several types of machine learning.

None of the publications for the six validation datasets identified the eight predictive algorithmic biomarkers we discovered^18–23 or reported an algorithm that might be the basis for a predictive test. To enable conclusive independent confirmation of our novel approach, we show an example of a perfectly predictive discovery algorithm, depicted three different ways: as a series of computer instructions, as an algebraic equation and also in schematic form.

The performance metrics for the algorithms we report surpass those of the marketed predictor of anti-TNF response derived using machine learning, consisting of 23 variables, which only identify non-responders with a specificity of 77.3-86.8% and sensitivity of 50.0-60.2%.^39,40

Precision medicine requires highly predictive biomarkers to inform treatment decisions for individual patients. With the availability of more therapeutic options, selecting the most appropriate for an individual patient is an increasingly difficult challenge. Clinicians currently rely on trial and error approaches.

The necessary biomarkers, to align patients with the optimal therapy, have not been forthcoming despite huge increases in the production of biomedical data. The asymmetry between the tens of thousands of publications describing novel biomarkers, and the tiny fraction of those that ultimately become clinical diagnostic tests, as noted in 2014, has persisted, representing a profound failure of biomarker discovery.^2,3 Why?

The torrent of data enabled by technical advances and digitization led to a 2008 conclusion that the traditional scientific method needed to be refined, as standard hypothesis testing was not compatible with the availability of “big data”.⁴¹ Computers analyzing large datasets can yield important previously undetectable correlations as seen in astronomy, cosmology and meteorology.^42,43 However, in biomedical research, this approach has yielded an underwhelming dividend, as illustrated by the poverty of novel clinically useful biomarkers.

We hypothesized that prior analytic approaches have not been able to reveal useful biomarkers, because of their lack of suitability for analyzing complex biomedical data, neglecting biology and disease as complex systems.³

Our software solution differs from the various types of machine learning, by fusing evolutionary computation with algorithmic information theory, incorporating a wide range of mathematical functions to directly address the non-linearity and high dimensionality of complex biomedical data yielding transparent quantitative predictive algorithms. Transparency is necessary for both informing medical decisions and providing scientific insight,^14,15,44 for patients, physicians and regulators, and also to provide a basis for the development of clinical tests.

The software automatically eliminates overfitted algorithms from consideration. Confirmation that the selected algorithm is not overfitted is done in the third stage of the automated process. The application of concepts from evolutionary biology - inheritance, random variation, and natural selection - explains why evolutionary computation can handle large high dimensional data so efficiently.⁴⁵ Information theory is a mathematical framework for understanding information at a fundamental level, which includes the concepts of randomness and algorithmic compression.^46,47 That nature, in all its manifestations, is algorithmic, and that scientific comprehension is a process of finding predictive algorithms that compress information into its essence, is the basis for our approach.^47–49 Our algorithmic focus is complemented by the emerging understanding that biology reflects molecular networks that are algorithmic.⁵⁰ Biological functions, including therapeutic response and disease are recognized to be mediated by molecular networks, variously defined as modules, motifs or cores.^5,51–54 It has been proposed that understanding of molecular networks will be necessary for a deeper understanding of biological information flow and that this will require an algorithmic framework.^5,50 The incorporation of many mathematical functions into the software allows the underlying molecular interactions, which are frequently non-linear, to be modeled in the selected algorithms, thereby maximizing their predictivity, without any a priori assumptions as to their nature. The incorporation of mathematical functions into the algorithms aligns with emerging perspectives around the need to incorporate mathematics into biology.⁵⁵

We posit that our approach is practical and effective with modest sample sizes because of its focus on defining the most salient signals, as represented by the mathematical relationships between variables, which is predominantlywhere the information resides in complex systems. The evolutionary foundation of our approach allows large numbers of variables to be analyzed without the need to use arbitrary thresholds and feature selection to eliminate variables from consideration. The number of potential permutations of algorithmic memory registers containing only the final eight variables, combined with available mathematical functions in the analysis we report, is in excess of 10⁶¹. When considering all possible 52,379 biomarkers in the discovery dataset for each patient, the treatment arm variable, the software’s complete palette of available mathematical functions, and a limit of only 16 instructions per algorithm, the total number of potential discovery algorithms is 4.749 times 10¹²⁵³.

The discovery algorithm that we present incorporates the sine function, showing that molecular interactions underlying the response to therapy in RA can be represented using mathematical functions that also depict relationships in other biological contexts. This implies that the algorithm depicts a fundamental phenomenon.

We identify lower expression variables that are likely to have disproportionate biological effects because of their non-linear interactions that other analytic approaches cannot readily detect and have frequently been excluded from consideration. These variables, which are not well studied, are likely to provide insights into the molecular mechanisms underlying responsiveness to anti-TNF therapy in RA. Individually, the eight variables were not highly correlated with response, and would not be useful response predictors, either individually or collectively, in the absence of the mathematical components of the algorithm.

As well as revealing predictive algorithmic biomarkers, our approach may also enable the identification and understanding of the molecular networks mediating disease and therapeutic effects. These algorithms may help with the identification and prioritization of novel therapeutic targets, particularly those that might be amenable to emerging approaches that modulate specific RNA transcripts. The accuracy we report defines three distinct patient subsets: the first are anti-TNF responders, the second are placebo responders and the third are non-responders to either anti-TNF therapy or to placebo. The potential to predict patients in these three categories will be a focus of future studies to inform the most appropriate individualized therapeutic interventions.

There are no prior reports directly linking SPTY2D1, KCTD4, and c1orf105 to either RA, immune-mediated diseases or to responses to therapy.^24–28 TPM3 has been reported as an autoantigen in RA, and to be associated with both myopathy and renal cancer.^33–35 ARHGDIB expression is upregulated in RA, whereas SIAH1 is downregulated.^36–38 TPM3, ARHGDIB and SIAH1 have not been associated with therapeutic outcomes. Prior cytomegalovirus exposure has been associated with poor responses to therapy in early RA^29,30 which may explain the UL84 gene variable association as it encodes for a cytomegalovirus protein. Mutations in PTPRC, also known as CD45, have been associated with RA patient response to anti-TNF therapy.^31,32

The discovery variables we report and have validated, are in development as components of a clinical PCR test using clinical grade instruments and protocols. PCR tests are versatile and reliable for deployment in clinical laboratories. The discovery algorithm, derived on research instruments such as we have presented here, is not appropriate for clinical use, as its sole purpose is to identify the most informative variables. Research-stage instruments have different quantitation levels, dynamic range and reliability compared to clinical-grade assay processes and technology. Therefore, the clinical test we have in development will require derivation of novel algorithms, which will then need to be prospectively validated for clinical use.

We envision that the clinical test will provide the basis for an ensemble of algorithms, each predictive of different specific clinical endpoints, in addition to DAS28, at different time points, thereby providing a comprehensive efficacy profile for individual patients, most informative to clinicians, to maximize clinical relevance.

We are applying the same gene expression variables to other diseases responsive to anti-TNF therapy. Our hypothesis is that the same variables will be the basis of algorithms predictive of individual treatment response to anti-TNF therapies in other diseases where anti-TNF therapy is of proven efficacy, implying that the variables we have identified are of fundamental biological importance in mediating the efficacy of anti-TNF therapy in general.

Our analytic approach may provide a novel paradigm for future synchronous development of novel therapies and companion diagnostics. Proof of concept clinical studies could routinely incorporate baseline biomarker profiling to yield transparent algorithms predictive of both efficacy and safety signals. The predictive algorithmic biomarkers could then be expeditiously incorporated into and validated in subsequent registration studies, potentially yielding highly predictive companion diagnostics to inform both regulatory approval and reimbursement decisions. Such algorithmic biomarkers could also be incorporated into the product label to inform prescribing decisions for individual patients.

In the future, we envision all therapies could be accompanied by algorithmic biomarkers, predictive of efficacy and safety endpoints of interest, to objectively and quantitatively inform administration decisions for individual patients. The algorithmic biomarkers could be promptly updated as new data emerge to optimize clinical utility. In addition, clinical trial outcomes could be routinely summarized in terms of the algorithms.

Optimal translation of biomedical data into actionable information requires analytic methodology designed to yield mathematically informed algorithmic insight from complex high dimensional non-linear data. The novel analytic approach that we present addresses that challenge. This is the only reported method that provides transparent, simple algorithmic biomarkers that accurately reflect biology as a complex system and quantitatively predicts individual therapeutic responses, that can readily be translated into clinical tests. This may have implications for the discovery and development of companion diagnostics and the analysis of clinical trials.

Currently, the gulf between actual health care and that which could be provided, has arguably never been wider.⁵⁶ We contend that bridging that gulf will require solving the “biomarker problem” with biomarkers that reliably identify those patients who are most likely to benefit from a particular agent.⁵⁷ As we have shown here, this will be done by a focus on medicine as an information science, acknowledging that “to be useful, data must be analyzed, interpreted, and acted on. Thus, it is algorithms, not data sets, that will prove transformative.”⁵⁸

Ethical considerations

The study that provided the discovery data was conducted in accordance with principles of Good Clinical Practice and was approved by the National Ethics Committee in Romania and the National Ethics Committee, Clinical Research of Drugs and Methods of Treatment in Moldova, with all subjects providing informed written consent.^16,17 Written informed consent for all participants and local ethics approval was obtained for each of the studies that provided validation data as detailed in the original publications.^18–23

Data availability

Gene Expression Omnibus (GEO) database: All the gene expression profiles analyzed were downloaded from, and are freely available at, the GEO database. No software is required to view the datasets or to replicate the discovery algorithm we report.

For discovery analysis, GSE58795 - https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE58795¹⁶

For validation analyses,

GSE3592 - https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE3592¹⁸

GSE12051 - https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE12051¹⁹

GSE15258 - https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15258²⁰

GSE33377 - https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE33377²¹

GSE78068 - https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE78068²²

GSE20690 - https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE20690²³

Reporting guidelines

The report was written to comply with the STARD 2015 guideline – BMJ 2015:351:h5527 PMID: 26511519

Figshare: STARD-2015-Checklist forQuantitative AI based on evolutionary computation yields algorithmic RNA biomarkers from a randomized rheumatoid arthritis clinical trial, accurately predicting individual patient responses to anti-TNF treatment.docx, https://doi.org/10.6084/m9.figshare.28707170.v1

This project contains the following underlying data:

STARD-2015-Checklist.docx

Data is available under CC BY 4.0 license.

The clinical trial used for discovery analysis was registered - ClinicalTrials.gov registration: NCT01313520

References

1. Joyner MJ, Paneth N: Promises, promises, and precision medicine. J. Clin. Invest. 2019; 129(3): 946–948. PubMed Abstract | Publisher Full Text | Free Full Text
2. Barker AD, Compton CC, Poste G: The National Biomarker Development Alliance accelerating the translation of biomarkers to the clinic. Biomark. Med. 2014, 2014; 8(6): 873–876. Publisher Full Text
3. Barker AD, Alba MM, Mallick P, et al.: An Inflection Point in Cancer Protein Biomarkers: What was and What’s Next. Mol. Cell. Proteomics. 2023; 22(7): 100569. PubMed Abstract | Publisher Full Text | Free Full Text
4. Hopfield JJ: Physics, Computation, and Why Biology Looks so Different. J. Theor. Biol. 1994; 171: 53–60. Publisher Full Text
5. Nurse P: Life, logic and information. Nature. 2008; 454(7203): 424–426. Publisher Full Text
6. Nurse P: Complexity and Biology. Cell. 2014; 157(1): 272–273. Publisher Full Text
7. Berger SI, Iyengar R: Role of systems pharmacology in understanding drug adverse events. Wiley Interdiscip. Rev. Syst. Biol. Med. 2011; 3(2): 129–135. PubMed Abstract | Publisher Full Text | Free Full Text
8. Savic S, Caseley EA, McDermott MF: Moving towards a systems-based classification of innate immune-mediated diseases. Nat. Rev. Rheumatol. 2020; 16: 222–237. PubMed Abstract | Publisher Full Text
9. Coffey DS: Self-organization, complexity and chaos: The new biology for medicine. Nat. Med. 1998; 4: 882–885. PubMed Abstract | Publisher Full Text
10. Goldberger AL: Giles f. Filley lecture. Complex systems. Proc. Am. Thorac. Soc. 2006; 3(6): 467–471. PubMed Abstract | Publisher Full Text | Free Full Text
11. Rea TJ, Brown CM, Sing CF: Complex adaptive system models and the genetic analysis of plasma HDL-cholesterol concentration. Perspect. Biol. Med. 2006; 49(4): 490–503. PubMed Abstract | Publisher Full Text | Free Full Text
12. Mazzocchi F: Complexity in biology. Exceeding the limits of reductionism and determinism using complexity theory. EMBO Rep. 2008; 9(1): 10–14. PubMed Abstract | Publisher Full Text | Free Full Text
13. Herrington D, Wang Y: Clinical heterogenity in the age of big data, advanced analytics and complexity theory. Trans. Am. Clin. Climatol. Assoc. 2023; 133: 56–68. PubMed Abstract Reference Source
14. Wilkinson J, Arnold KF, Murray EJ, et al.: Time to reality check the promises of machine learning-powered precision medicine. Lancet Digit Health. 2020; 2: e677–e680. PubMed Abstract | Publisher Full Text | Free Full Text
15. Price WN: Big data and black-box medical algorithms. Sci. Transl. Med. 2018; 10: 471. PubMed Abstract | Publisher Full Text | Free Full Text
16. MacIsaac K, Baumgartner R, Kang J, et al.: Pre-treatment whole blood gene expression is associated with 14-week response assessed by dynamic contrast enhanced magnetic resonance imaging in infliximab-treated rheumatoid arthritis patients. PLoS ONE. 2014; 9(12): e111937. PubMed Abstract | Publisher Full Text | Free Full Text
17. Beals C, Baumgartner R, Peterfy C, et al.: Magnetic resonance imaging of the hand and wrist in a randomized, double-blind, multicenter, placebo-controlled trial of infliximab for rheumatoid arthritis: Comparison of dynamic contrast enhanced assessments with semi-quantitative scoring. PLoS ONE. 2017; 12(12): e0187397. PubMed Abstract | Publisher Full Text | Free Full Text
18. Lequerre T, Gauthier-Jauneau A, Bansard C, et al.: Gene profiling in white blood cells predicts infliximab responsiveness in rheumatoid arthritis. Arth. Res. Ther. 2006; 8: R105. PubMed Abstract | Publisher Full Text | Free Full Text
19. Julia A, Erra A, Palacio C, et al.: An eight-gene blood expression profile. PloS ONE. 2009; 4: e7556. PubMed Abstract | Publisher Full Text | Free Full Text
20. Bienkowska J, Dagin G, Batliwalla F, et al.: Convergent random forest predictor: Methodology for predicting drug response from genome-scale data applied to anti-TNF response. Genomics. 2009; 94: 423–432. PubMed Abstract | Publisher Full Text | Free Full Text
21. Toonen EJ, Gilissen C, Franke B, et al.: Validation study of existing gene expression signatures for anti-TNF treatment in patients with rheumatoid arthritis. PLoS ONE. 2012; 7(3): e33199. PubMed Abstract | Publisher Full Text | Free Full Text
22. Nakamura S, Suzuki K, Iijima H, et al.: Identification of baseline gene expression signatures predicting therapeutic responses to three biologic agents in rheumatoid arthritis: a retrospective observational study. Arth. Res. Ther. 2016; 18: 159. PubMed Abstract | Publisher Full Text | Free Full Text
23. Tanino M, Matoba R, Nakamura S, et al.: Prediction of efficacy of anti-TNF biologic agent, infliximab, for rheumatoid arthritis patients using a comprehensive transcriptome analysis of white blood cells. Biochem. Biophys. Res. Commun. 2009 18; 387(2): 261–265. PubMed Abstract | Publisher Full Text
24. Ramírez-Moya J, Wert-Lamas L, Acuña-Ruíz A, et al.: Identification of an interactome network between lncRNAs and miRNAs in thyroid cancer reveals SPTY2D1-AS1 as a new tumor suppressor. Sci. Rep. 2022; 12(1): 7706. PubMed Abstract | Publisher Full Text | Free Full Text
25. Li K, Kong R, Ma L, et al.: Identification of potential M2 macrophage-associated diagnostic biomarkers in coronary artery disease. Biosci. Rep. 2022; 42(12): BSR20221394. PubMed Abstract | Publisher Full Text | Free Full Text
26. Zhang J, Huang X, Wang X, et al.: Identification of potential crucial genes in atrial fibrillation: a bioinformatic analysis. BMC Med. Genet. 2020; 13(1): 104. PubMed Abstract | Publisher Full Text | Free Full Text
27. Kim S, Noh JH, Lee MJ, et al.: Effects of mitochondrial transplantation on transcriptomics in a polymicrobial sepsis model. Int. J. Mol. Sci. 2023; 24(20): 15326. PubMed Abstract | Publisher Full Text | Free Full Text
28. Zheng C, Yu X, Xu T, et al.: KCTD4 interacts with CLIC1 to disrupt calcium homeostasis and promote metastasis in esophageal cancer. Acta Pharm. Sin. B. 2023; 13(10): 4217–4233. PubMed Abstract | Publisher Full Text | Free Full Text
29. Colletti KS, Smallenburg KE, Xu Y, et al.: Human cytomegalovirus UL84 interacts with an RNA stem-loop sequence found within the RNA/DNA hybrid region of oriLyt. J. Virol. 2007; 81: 7077–7085. PubMed Abstract | Publisher Full Text | Free Full Text
30. Davis JM, Knutson KL, Strausbauch MA, et al.: Immune response profiling in early rheumatoid arthritis: discovery of a novel interaction of treatment response with viral immunity. Arthritis Res. Ther. 2013; 15(6): R199. PubMed Abstract | Publisher Full Text | Free Full Text
31. Cui J, Saevarsdottir S, Thomson B, et al.: Rheumatoid arthritis risk allele PTPRC is also associated with response to anti-tumor necrosis factor alpha therapy. Arthritis Rheum. 2010; 62(7): 1849–1861. PubMed Abstract | Publisher Full Text | Free Full Text
32. Plant D, Prajapati R, Hyrich KL, et al.: Replication of association of the PTPRC gene with response to anti-tumor necrosis factor therapy in a large UK cohort. Arthritis Rheum. 2012; 64(3): 665–670. PubMed Abstract | Publisher Full Text | Free Full Text
33. Poulsen TBG, Damgaard D, Jørgensen MM, et al.: Identification of Novel Native Autoantigens in Rheumatoid Arthritis. Biomedicine. 2020; 8(6): 141. PubMed Abstract | Publisher Full Text | Free Full Text
34. Lambert MR, Gussoni E: Tropomyosin 3 (TPM3) function in skeletal muscle and in myopathy. Skelet. Muscle. 2023; 13(1): 18. PubMed Abstract | Publisher Full Text | Free Full Text
35. Galea LA, Hildebrand MS, Witkowski T, et al.: ALK-rearranged renal cell carcinoma with TPM3::ALK gene fusion and review of the literature. Virchows Arch. 2023; 482(3): 625–633. PubMed Abstract | Publisher Full Text
36. Ritter SY, Subbaiah R, Bebek G, et al.: Proteomic analysis of synovial fluid from the osteoarthritic knee: comparison with transcriptome analyses of joint tissues. Arthritis Rheum. 2013; 65(4): 981–992. PubMed Abstract | Publisher Full Text | Free Full Text
37. Lequerré T, Bansard C, Vittecoq O, et al.: Early and long-standing rheumatoid arthritis: distinct molecular signatures identified by gene-expression profiling in synovia. Arthritis Res. Ther. 2009; 11: R99. PubMed Abstract | Publisher Full Text | Free Full Text
38. Badr MT, Häcker G: Gene expression profiling meta-analysis reveals novel gene signatures and pathways shared between tuberculosis and rheumatoid arthritis. PLoS ONE. 2019; 14(3): e0213470. PubMed Abstract | Publisher Full Text | Free Full Text
39. Mellors T, Withers JB, Ameli A, et al.: Clinical validation of a blood-based predictive test for stratification of response to tumor necrosis factor inhibitor therapies in rheumatoid arthritis patients. Network and Systems Medicine. 2020; 3(1): 91–104. Publisher Full Text
40. Jones A, Rapisardo S, Zhang L, et al.: Analytical and clinical validation of an RNA sequencing-based assay for quantitative, accurate evaluation of a molecular signature response classifier in rheumatoid arthritis. Expert. Rev. Mol. Diagn. 2021; 21(11): 1235–1243. PubMed Abstract | Publisher Full Text
41. Anderson C: The End of Theory: The Data Deluge Makes the Scientific Method Obsolete Wired 16.07 retrieved Jan 13^th 2025.2008. Reference Source
42. Pontzen A: The Universe in a Box: Simulations and the Quest to Code the Cosmos. Riverhead Books; 2023. 9780593330487.
43. Doyne Farmer J: Making Sense of Chaos: A Better Economics for a Better World. Yale University Press; 2024. 9780300273779.
44. Naughton J: Machine-learning systems are problematic. That’s why tech bosses call them ‘AI’. The Guardian. 2022. Retrieved June 22^nd 2025. Reference Source
45. Sipper M, Olson RS, Moore JH: Evolutionary computation: the next major transition of artificial intelligence? BioData Min. 2017; 10: 26. PubMed Abstract | Publisher Full Text | Free Full Text
46. Gleich J: The Information: A History, a Theory, a Flood. Pantheon Books (US); 2011. 978-0-375-423-72-7.
47. Robertson DS: Phase Change: The Computer Revolution in Science and Mathematics. Oxford University Press; 2003. 0195157486.
48. Zenil H, Schmidt A, Tenner J: Causality, information and biological computation: an algorithmic software approach to life, disease and the immune system. Chapter in From Matter to Life: Information and Causality. Walker SI, Davies PCW, Ellis G, editors. Cambridge University Press; 2017; pages 244–280. 9781107150539.
50. Arthur WB: Algorithms and the Shift in Modern Science. Beijer Discussion Paper Series No. 269.2020. Reference Source
51. Lim WA, Lee CM, Tang C: Design principles of regulatory networks: searching for the molecular algorithms of the cell. Mol. Cell. 2013; 49(2): 202–212. PubMed Abstract | Publisher Full Text | Free Full Text
52. Alon U: Network motifs: theory and experimental approaches. Nat. Rev. Genet. 2007; 8(6): 450–461. Publisher Full Text
53. Gallo E, De Renzis S, Sharpe J, et al.: Versatile system cores as a conceptual basis for generality in cell and developmental biology. Cell Syst. 2024; 15(9): 790–807. PubMed Abstract | Publisher Full Text
54. Hartwell LH, Hopfield JJ, Leibler S, et al.: From molecular to modular cell biology. Nature. 1999; 402(6761 Suppl): C47–C52. Publisher Full Text
55. Rajapakse I: Conversation with Dr Steve Smale and Dr. Lee Hartwell. Not. Am. Math. Soc. 2021; 68(9): 1578–1582. Publisher Full Text
56. Angus DC, Huang AJ, Lewis RJ, et al.: The Integration of Clinical Trials With the Practice of Medicine: Repairing a House Divided.2024; 332(2): 153–162. PubMed Abstract | Publisher Full Text | Free Full Text
57. Sawyers C: The cancer biomarker problem. Nature. 2008; 452: 548–552. Publisher Full Text
58. Obermeyer Z, Emanuel EJ: Predicting the Future - Big Data, Machine Learning, and Clinical Medicine. N. Engl. J. Med. 2016; 375(13): 1216–1219. PubMed Abstract | Publisher Full Text | Free Full Text

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 07 Jul 2025

Author details Author details

Kevin Horgan
Roles: Conceptualization, Investigation, Methodology, Writing – Original Draft Preparation, Writing – Review & Editing

Michael F McDermott
Roles: Conceptualization, Investigation, Writing – Original Draft Preparation, Writing – Review & Editing

Douglas Harrington
Roles: Conceptualization, Investigation, Writing – Original Draft Preparation, Writing – Review & Editing

Vahan Simonyan
Roles: Conceptualization, Methodology, Visualization, Writing – Review & Editing

Competing interests

Kevin Horgan and Patrick Lilley have equity in Liquid Biosciences. All other authors report no financial relationships with any organizations that might have an interest in the submitted work in the previous three years. The reported gene expression variables are included in issued U.S. Patent 9,845,505

Grant information

The author(s) declared that no grants were involved in supporting this work.

Article Versions (1)

version 1

Published: 07 Jul 2025, 14:663

https://doi.org/10.12688/f1000research.164015.1

© 2025 Horgan K et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

SEE MORE DETAILS

CITE

how to cite this article

Horgan K, McDermott MF, Harrington D et al. Quantitative AI yields algorithmic RNA biomarkers from a rheumatoid arthritis clinical trial, accurately predicting individual patient responses to anti-TNF treatment, providing a novel approach to companion diagnostic discovery.    [version 1; peer review: 2 approved with reservations]. F1000Research 2025, 14:663 (https://doi.org/10.12688/f1000research.164015.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 1

VERSION 1

PUBLISHED 07 Jul 2025

Views

Reviewer Report 10 Oct 2025

Jean Roudier, INSERM, Luminy, France

Approved with Reservations

https://doi.org/10.5256/f1000research.180451.r403742

"Some kind of mathematical analysis (a genius in a black box) is performed using 50.000 biomarkers expression data from a study of clinical outcome in 59 patients with RA under Infliximab or placebo.

From there, 8 markers are identified which are supposedly predictive of clinical response to Infliximab.
The expression of these 8 markers is studied in another 6 50.000 markers expression and clinical outcome studies in RA patients under Infliximab or Placebo.

The genius in a black box mathematical analysis supposedly allows to predict responders and non responders with 100% accuracy.

This article is interesting but presented as

"I got a great mathematical toolbox that allows me to predict clinical response to infliximab in patients with RA from 50.000 biomarkers expression data ".

I suggest to shorten the introduction and discussion and try to explain simply for a layperson what kind of mathematical/statistical treatment of the expression data has been done .

Is the work clearly and accurately presented and does it cite the current literature?

No
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

No
If applicable, is the statistical analysis and its interpretation appropriate?

I cannot comment. A qualified statistician is required.
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Partly

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: immunogenetics of Rheumatoid arthritis

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Views

Reviewer Report 22 Aug 2025

Daniel H. Solomon, Harvard Medical School, Boston, USA

Kathryne Marks, Brigham and Women's Hospital, Boston, Massachusetts, USA

Approved with Reservations

https://doi.org/10.5256/f1000research.180451.r398676

Comments:

The authors pointed out several genes of interest. However, it was unclear if there was an increase in transcription or a decrease. This seems like an important detail to leave out.
Were all

Comments:

The authors pointed out several genes of interest. However, it was unclear if there was an increase in transcription or a decrease. This seems like an important detail to leave out.
Were all 8 genes used in the final predictive model? If not, why not and how were some eliminated?
I was a bit unclear on why placebo patients were being considered. I would find it helpful if the authors showed the 2x2 contingency table used to generate sensitivity and specificity.
The authors need to comment on the heterogeneity of the cohorts.
Is a 14 week follow-up adequate?

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

I cannot comment. A qualified statistician is required.
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

Partly

Competing Interests: No competing interests were disclosed.

We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above.

CITE

Report a concern

Author Response 11 Sep 2025

Kevin Horgan, Liquid Biosciences Inc., 26895 Aliso Creek Road B800, 92656, USA

11 Sep 2025

Author Response

Response to review from Daniel H. Solomon, Harvard Medical School, Boston, USA and
Kathryne Marks, Brigham and Women's Hospital, Boston, Massachusetts, USA

1. The authors pointed out several genes of interest. However, ... Continue reading Response to review from Daniel H. Solomon, Harvard Medical School, Boston, USA and
Kathryne Marks, Brigham and Women's Hospital, Boston, Massachusetts, USA

1. The authors pointed out several genes of interest. However, it was unclear if there was an increase in transcription or a decrease. This seems like an important detail to leave out.

Authors answer: We thank you for your perceptive and helpful comments. The mRNA/transcription biomarkers’ data values were measurements of the state of a patient at a single baseline timepoint. Therefore, there is no increase or decrease to be reported, in a longitudinal sense. However, it may be that the reviewer intended to refer to the baseline differences between future responders and non-responders – usually reported as a difference of means with a p value. If this is the case, the comment and the aforementioned metrics have an underlying assumption, which is that the biomarkers would be consistently higher or lower in responders than non-responders. This assumption is based on each biomarker being an “independent variable” in a predictive model. Further, the difference of means and the associated p value metric, assume linearity and normal distributions. It turns out that none of these three assumptions are true in the datasets. A central point of the paper is that biomarkers are interacting in the dynamic biology of the patient, the inflammatory disease state, and exogenous factors like lifestyle and treatments. This is borne out in our algorithmic models: the interactions of the biomarkers carry far more predictive information than the levels of each biomarker in isolation. A minimum of four biomarkers are needed for an accurate, robust predictive model, and their measurements are demonstrably interacting mathematically in our model(s). A difference of means metric can be blind to the relationships between the biomarkers because it cannot capture interactions, nonlinearity, or manifold distributions. By contrast, our model does capture nonlinear interactions between the biomarkers, and does not depend on any consistent baseline differences in biomarker levels between responders and non-responders.

2. Were all 8 genes used in the final predictive model? If not, why not and how were some eliminated?

None of the eight biomarkers were eliminated. We discovered two distinct sets of four biomarkers, where each set contains the minimum number of biomarkers required for the best combination of accuracy and robustness: reproducibility across patient heterogeneity. We discovered the first minimal set of four, whose algorithmic model is described in three different but equivalent representations in the paper. We then set aside the first four, and repeated the analytic discovery process on the remaining thousands of biomarkers in the dataset. That repeat analysis yielded another set of four biomarkers with similar accuracy and robustness. We then validated all eight biomarkers, again in minimally required sets, across several validation datasets created from independent patient cohorts.

3. I was a bit unclear on why placebo patients were being considered. I would find it helpful if the authors showed the 2x2 contingency table used to generate sensitivity and specificity.

Placebo patients were considered because the discovery dataset was derived from a clinical trial with a treatment arm and a placebo arm, and our task was to predict response using the treatment/placebo variable plus biomarkers reflecting the baseline (pre-treatment) state of the patients. A key question in drug development and clinical use is whether treatment response surpasses placebo response. The ability to accurately predict individual responses to both treatment and placebo would be extremely useful in both settings. The sensitivity and specificity report is contained in the paper as Table 2. This includes the patient counts, in a slightly different format than a contingency table but with the same information content.

4. The authors need to comment on the heterogeneity of the cohorts.

All of the datasets were obtained from a publicly available repository. The various datasets were produced by different researchers, and we have cited the papers associated with each of the datasets, so that our readers can review subject selection criteria and subject composition information described in the original papers. Given the number of datasets we used, reproducing this information in our paper would be redundant and greatly increase the length of our paper without contributing to the key point we did include in the paper, which is that we used a diverse set of patient cohorts to discover and validate our predictive biomarkers. We state “These studies were done in different geographies, used different methodologies to process the samples and gene expression data, had different endpoints, and in some cases, used different anti-TNF therapies.”

5. Is a 14 week follow-up adequate?

Follow-up time frames can vary according to clinical trial and research practices, as can response metrics (e.g. DAS28, ACR50, etc.). We used datasets from seven studies described in peer-reviewed papers. These papers indicated that each of the datasets was derived from a competently constructed study using broadly accepted outcomes measures and treatment follow-up time frames, even with some differences across studies. Each of these studies were performed to derive clinically useful predictors of treatment response and six of the seven studies had a 14 week primary end point. Despite some variety across datasets, we found that the biomarkers predicted the binary responder/non-responder outcome variable with high accuracy and consistency across datasets.

6. Is the work clearly and accurately presented and does it cite the current literature?Yes

7. Is the study design appropriate and is the work technically sound? Yes

8. Are sufficient details of methods and analysis provided to allow replication by others?Partly

All of the source data are available from a single public data source, as cited in our paper. Further, we have provided the predictive algorithm from the discovery dataset, in three different representations, so that any party can apply the algorithm’s calculations to the discovery dataset and verify the accuracy metrics we reported.

9. If applicable, is the statistical analysis and its interpretation appropriate?I cannot comment. A qualified statistician is required.

A central point of this paper is that biological systems are complex, emergent, and nonlinear. The availability of omic data with overwhelming information content presents a challenge to conventional statistical analytic approaches. The need to model potential interactions between thousands of variables rather than individual variables is particularly daunting. While our methods and models reflect an approach that addresses this complexity, with an algorithm that can be verified, we have also presented the accuracy results in terms of the sensitivity and specificity metrics familiar to statisticians, and have also shown these metrics according to training, selection, and out-of-sample subsets of the dataset(s).

10. Are all the source data underlying the results available to ensure full reproducibility?Partly

All of the source data are available from a single public data source, as cited in our paper. Further, we have provided the predictive algorithm from the discovery dataset, in three different representations, so that any party can apply the algorithm’s calculations to the discovery dataset and verify the accuracy metrics we reported.

11. Are the conclusions drawn adequately supported by the results?Partly

It would be helpful to understand which conclusions are not perceived as adequately supported by the results, so that we can either elucidate or plan for additional studies.
Response to review from Daniel H. Solomon, Harvard Medical School, Boston, USA and
Kathryne Marks, Brigham and Women's Hospital, Boston, Massachusetts, USA

1. The authors pointed out several genes of interest. However, it was unclear if there was an increase in transcription or a decrease. This seems like an important detail to leave out.

Authors answer: We thank you for your perceptive and helpful comments. The mRNA/transcription biomarkers’ data values were measurements of the state of a patient at a single baseline timepoint. Therefore, there is no increase or decrease to be reported, in a longitudinal sense. However, it may be that the reviewer intended to refer to the baseline differences between future responders and non-responders – usually reported as a difference of means with a p value. If this is the case, the comment and the aforementioned metrics have an underlying assumption, which is that the biomarkers would be consistently higher or lower in responders than non-responders. This assumption is based on each biomarker being an “independent variable” in a predictive model. Further, the difference of means and the associated p value metric, assume linearity and normal distributions. It turns out that none of these three assumptions are true in the datasets. A central point of the paper is that biomarkers are interacting in the dynamic biology of the patient, the inflammatory disease state, and exogenous factors like lifestyle and treatments. This is borne out in our algorithmic models: the interactions of the biomarkers carry far more predictive information than the levels of each biomarker in isolation. A minimum of four biomarkers are needed for an accurate, robust predictive model, and their measurements are demonstrably interacting mathematically in our model(s). A difference of means metric can be blind to the relationships between the biomarkers because it cannot capture interactions, nonlinearity, or manifold distributions. By contrast, our model does capture nonlinear interactions between the biomarkers, and does not depend on any consistent baseline differences in biomarker levels between responders and non-responders.

2. Were all 8 genes used in the final predictive model? If not, why not and how were some eliminated?

None of the eight biomarkers were eliminated. We discovered two distinct sets of four biomarkers, where each set contains the minimum number of biomarkers required for the best combination of accuracy and robustness: reproducibility across patient heterogeneity. We discovered the first minimal set of four, whose algorithmic model is described in three different but equivalent representations in the paper. We then set aside the first four, and repeated the analytic discovery process on the remaining thousands of biomarkers in the dataset. That repeat analysis yielded another set of four biomarkers with similar accuracy and robustness. We then validated all eight biomarkers, again in minimally required sets, across several validation datasets created from independent patient cohorts.

3. I was a bit unclear on why placebo patients were being considered. I would find it helpful if the authors showed the 2x2 contingency table used to generate sensitivity and specificity.

Placebo patients were considered because the discovery dataset was derived from a clinical trial with a treatment arm and a placebo arm, and our task was to predict response using the treatment/placebo variable plus biomarkers reflecting the baseline (pre-treatment) state of the patients. A key question in drug development and clinical use is whether treatment response surpasses placebo response. The ability to accurately predict individual responses to both treatment and placebo would be extremely useful in both settings. The sensitivity and specificity report is contained in the paper as Table 2. This includes the patient counts, in a slightly different format than a contingency table but with the same information content.

4. The authors need to comment on the heterogeneity of the cohorts.

All of the datasets were obtained from a publicly available repository. The various datasets were produced by different researchers, and we have cited the papers associated with each of the datasets, so that our readers can review subject selection criteria and subject composition information described in the original papers. Given the number of datasets we used, reproducing this information in our paper would be redundant and greatly increase the length of our paper without contributing to the key point we did include in the paper, which is that we used a diverse set of patient cohorts to discover and validate our predictive biomarkers. We state “These studies were done in different geographies, used different methodologies to process the samples and gene expression data, had different endpoints, and in some cases, used different anti-TNF therapies.”

5. Is a 14 week follow-up adequate?

Follow-up time frames can vary according to clinical trial and research practices, as can response metrics (e.g. DAS28, ACR50, etc.). We used datasets from seven studies described in peer-reviewed papers. These papers indicated that each of the datasets was derived from a competently constructed study using broadly accepted outcomes measures and treatment follow-up time frames, even with some differences across studies. Each of these studies were performed to derive clinically useful predictors of treatment response and six of the seven studies had a 14 week primary end point. Despite some variety across datasets, we found that the biomarkers predicted the binary responder/non-responder outcome variable with high accuracy and consistency across datasets.

6. Is the work clearly and accurately presented and does it cite the current literature?Yes

7. Is the study design appropriate and is the work technically sound? Yes

8. Are sufficient details of methods and analysis provided to allow replication by others?Partly

All of the source data are available from a single public data source, as cited in our paper. Further, we have provided the predictive algorithm from the discovery dataset, in three different representations, so that any party can apply the algorithm’s calculations to the discovery dataset and verify the accuracy metrics we reported.

9. If applicable, is the statistical analysis and its interpretation appropriate?I cannot comment. A qualified statistician is required.

A central point of this paper is that biological systems are complex, emergent, and nonlinear. The availability of omic data with overwhelming information content presents a challenge to conventional statistical analytic approaches. The need to model potential interactions between thousands of variables rather than individual variables is particularly daunting. While our methods and models reflect an approach that addresses this complexity, with an algorithm that can be verified, we have also presented the accuracy results in terms of the sensitivity and specificity metrics familiar to statisticians, and have also shown these metrics according to training, selection, and out-of-sample subsets of the dataset(s).

10. Are all the source data underlying the results available to ensure full reproducibility?Partly

All of the source data are available from a single public data source, as cited in our paper. Further, we have provided the predictive algorithm from the discovery dataset, in three different representations, so that any party can apply the algorithm’s calculations to the discovery dataset and verify the accuracy metrics we reported.

11. Are the conclusions drawn adequately supported by the results?Partly

It would be helpful to understand which conclusions are not perceived as adequately supported by the results, so that we can either elucidate or plan for additional studies.
Competing Interests: Author review. Kevin Horgan and Patrick Lilley have equity interests in Liquid Biosciences, the developer and owner of the software used in the study. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 11 Sep 2025

Kevin Horgan, Liquid Biosciences Inc., 26895 Aliso Creek Road B800, 92656, USA

11 Sep 2025

Author Response

Response to review from Daniel H. Solomon, Harvard Medical School, Boston, USA and
Kathryne Marks, Brigham and Women's Hospital, Boston, Massachusetts, USA

1. The authors pointed out several genes of interest. However, ... Continue reading Response to review from Daniel H. Solomon, Harvard Medical School, Boston, USA and
Kathryne Marks, Brigham and Women's Hospital, Boston, Massachusetts, USA

1. The authors pointed out several genes of interest. However, it was unclear if there was an increase in transcription or a decrease. This seems like an important detail to leave out.

Authors answer: We thank you for your perceptive and helpful comments. The mRNA/transcription biomarkers’ data values were measurements of the state of a patient at a single baseline timepoint. Therefore, there is no increase or decrease to be reported, in a longitudinal sense. However, it may be that the reviewer intended to refer to the baseline differences between future responders and non-responders – usually reported as a difference of means with a p value. If this is the case, the comment and the aforementioned metrics have an underlying assumption, which is that the biomarkers would be consistently higher or lower in responders than non-responders. This assumption is based on each biomarker being an “independent variable” in a predictive model. Further, the difference of means and the associated p value metric, assume linearity and normal distributions. It turns out that none of these three assumptions are true in the datasets. A central point of the paper is that biomarkers are interacting in the dynamic biology of the patient, the inflammatory disease state, and exogenous factors like lifestyle and treatments. This is borne out in our algorithmic models: the interactions of the biomarkers carry far more predictive information than the levels of each biomarker in isolation. A minimum of four biomarkers are needed for an accurate, robust predictive model, and their measurements are demonstrably interacting mathematically in our model(s). A difference of means metric can be blind to the relationships between the biomarkers because it cannot capture interactions, nonlinearity, or manifold distributions. By contrast, our model does capture nonlinear interactions between the biomarkers, and does not depend on any consistent baseline differences in biomarker levels between responders and non-responders.

2. Were all 8 genes used in the final predictive model? If not, why not and how were some eliminated?

None of the eight biomarkers were eliminated. We discovered two distinct sets of four biomarkers, where each set contains the minimum number of biomarkers required for the best combination of accuracy and robustness: reproducibility across patient heterogeneity. We discovered the first minimal set of four, whose algorithmic model is described in three different but equivalent representations in the paper. We then set aside the first four, and repeated the analytic discovery process on the remaining thousands of biomarkers in the dataset. That repeat analysis yielded another set of four biomarkers with similar accuracy and robustness. We then validated all eight biomarkers, again in minimally required sets, across several validation datasets created from independent patient cohorts.

3. I was a bit unclear on why placebo patients were being considered. I would find it helpful if the authors showed the 2x2 contingency table used to generate sensitivity and specificity.

Placebo patients were considered because the discovery dataset was derived from a clinical trial with a treatment arm and a placebo arm, and our task was to predict response using the treatment/placebo variable plus biomarkers reflecting the baseline (pre-treatment) state of the patients. A key question in drug development and clinical use is whether treatment response surpasses placebo response. The ability to accurately predict individual responses to both treatment and placebo would be extremely useful in both settings. The sensitivity and specificity report is contained in the paper as Table 2. This includes the patient counts, in a slightly different format than a contingency table but with the same information content.

4. The authors need to comment on the heterogeneity of the cohorts.

All of the datasets were obtained from a publicly available repository. The various datasets were produced by different researchers, and we have cited the papers associated with each of the datasets, so that our readers can review subject selection criteria and subject composition information described in the original papers. Given the number of datasets we used, reproducing this information in our paper would be redundant and greatly increase the length of our paper without contributing to the key point we did include in the paper, which is that we used a diverse set of patient cohorts to discover and validate our predictive biomarkers. We state “These studies were done in different geographies, used different methodologies to process the samples and gene expression data, had different endpoints, and in some cases, used different anti-TNF therapies.”

5. Is a 14 week follow-up adequate?

Follow-up time frames can vary according to clinical trial and research practices, as can response metrics (e.g. DAS28, ACR50, etc.). We used datasets from seven studies described in peer-reviewed papers. These papers indicated that each of the datasets was derived from a competently constructed study using broadly accepted outcomes measures and treatment follow-up time frames, even with some differences across studies. Each of these studies were performed to derive clinically useful predictors of treatment response and six of the seven studies had a 14 week primary end point. Despite some variety across datasets, we found that the biomarkers predicted the binary responder/non-responder outcome variable with high accuracy and consistency across datasets.

6. Is the work clearly and accurately presented and does it cite the current literature?Yes

7. Is the study design appropriate and is the work technically sound? Yes

8. Are sufficient details of methods and analysis provided to allow replication by others?Partly

All of the source data are available from a single public data source, as cited in our paper. Further, we have provided the predictive algorithm from the discovery dataset, in three different representations, so that any party can apply the algorithm’s calculations to the discovery dataset and verify the accuracy metrics we reported.

9. If applicable, is the statistical analysis and its interpretation appropriate?I cannot comment. A qualified statistician is required.

A central point of this paper is that biological systems are complex, emergent, and nonlinear. The availability of omic data with overwhelming information content presents a challenge to conventional statistical analytic approaches. The need to model potential interactions between thousands of variables rather than individual variables is particularly daunting. While our methods and models reflect an approach that addresses this complexity, with an algorithm that can be verified, we have also presented the accuracy results in terms of the sensitivity and specificity metrics familiar to statisticians, and have also shown these metrics according to training, selection, and out-of-sample subsets of the dataset(s).

10. Are all the source data underlying the results available to ensure full reproducibility?Partly

All of the source data are available from a single public data source, as cited in our paper. Further, we have provided the predictive algorithm from the discovery dataset, in three different representations, so that any party can apply the algorithm’s calculations to the discovery dataset and verify the accuracy metrics we reported.

11. Are the conclusions drawn adequately supported by the results?Partly

It would be helpful to understand which conclusions are not perceived as adequately supported by the results, so that we can either elucidate or plan for additional studies.
Response to review from Daniel H. Solomon, Harvard Medical School, Boston, USA and
Kathryne Marks, Brigham and Women's Hospital, Boston, Massachusetts, USA

1. The authors pointed out several genes of interest. However, it was unclear if there was an increase in transcription or a decrease. This seems like an important detail to leave out.

Authors answer: We thank you for your perceptive and helpful comments. The mRNA/transcription biomarkers’ data values were measurements of the state of a patient at a single baseline timepoint. Therefore, there is no increase or decrease to be reported, in a longitudinal sense. However, it may be that the reviewer intended to refer to the baseline differences between future responders and non-responders – usually reported as a difference of means with a p value. If this is the case, the comment and the aforementioned metrics have an underlying assumption, which is that the biomarkers would be consistently higher or lower in responders than non-responders. This assumption is based on each biomarker being an “independent variable” in a predictive model. Further, the difference of means and the associated p value metric, assume linearity and normal distributions. It turns out that none of these three assumptions are true in the datasets. A central point of the paper is that biomarkers are interacting in the dynamic biology of the patient, the inflammatory disease state, and exogenous factors like lifestyle and treatments. This is borne out in our algorithmic models: the interactions of the biomarkers carry far more predictive information than the levels of each biomarker in isolation. A minimum of four biomarkers are needed for an accurate, robust predictive model, and their measurements are demonstrably interacting mathematically in our model(s). A difference of means metric can be blind to the relationships between the biomarkers because it cannot capture interactions, nonlinearity, or manifold distributions. By contrast, our model does capture nonlinear interactions between the biomarkers, and does not depend on any consistent baseline differences in biomarker levels between responders and non-responders.

2. Were all 8 genes used in the final predictive model? If not, why not and how were some eliminated?

None of the eight biomarkers were eliminated. We discovered two distinct sets of four biomarkers, where each set contains the minimum number of biomarkers required for the best combination of accuracy and robustness: reproducibility across patient heterogeneity. We discovered the first minimal set of four, whose algorithmic model is described in three different but equivalent representations in the paper. We then set aside the first four, and repeated the analytic discovery process on the remaining thousands of biomarkers in the dataset. That repeat analysis yielded another set of four biomarkers with similar accuracy and robustness. We then validated all eight biomarkers, again in minimally required sets, across several validation datasets created from independent patient cohorts.

3. I was a bit unclear on why placebo patients were being considered. I would find it helpful if the authors showed the 2x2 contingency table used to generate sensitivity and specificity.

Placebo patients were considered because the discovery dataset was derived from a clinical trial with a treatment arm and a placebo arm, and our task was to predict response using the treatment/placebo variable plus biomarkers reflecting the baseline (pre-treatment) state of the patients. A key question in drug development and clinical use is whether treatment response surpasses placebo response. The ability to accurately predict individual responses to both treatment and placebo would be extremely useful in both settings. The sensitivity and specificity report is contained in the paper as Table 2. This includes the patient counts, in a slightly different format than a contingency table but with the same information content.

4. The authors need to comment on the heterogeneity of the cohorts.

All of the datasets were obtained from a publicly available repository. The various datasets were produced by different researchers, and we have cited the papers associated with each of the datasets, so that our readers can review subject selection criteria and subject composition information described in the original papers. Given the number of datasets we used, reproducing this information in our paper would be redundant and greatly increase the length of our paper without contributing to the key point we did include in the paper, which is that we used a diverse set of patient cohorts to discover and validate our predictive biomarkers. We state “These studies were done in different geographies, used different methodologies to process the samples and gene expression data, had different endpoints, and in some cases, used different anti-TNF therapies.”

5. Is a 14 week follow-up adequate?

Follow-up time frames can vary according to clinical trial and research practices, as can response metrics (e.g. DAS28, ACR50, etc.). We used datasets from seven studies described in peer-reviewed papers. These papers indicated that each of the datasets was derived from a competently constructed study using broadly accepted outcomes measures and treatment follow-up time frames, even with some differences across studies. Each of these studies were performed to derive clinically useful predictors of treatment response and six of the seven studies had a 14 week primary end point. Despite some variety across datasets, we found that the biomarkers predicted the binary responder/non-responder outcome variable with high accuracy and consistency across datasets.

6. Is the work clearly and accurately presented and does it cite the current literature?Yes

7. Is the study design appropriate and is the work technically sound? Yes

8. Are sufficient details of methods and analysis provided to allow replication by others?Partly

All of the source data are available from a single public data source, as cited in our paper. Further, we have provided the predictive algorithm from the discovery dataset, in three different representations, so that any party can apply the algorithm’s calculations to the discovery dataset and verify the accuracy metrics we reported.

9. If applicable, is the statistical analysis and its interpretation appropriate?I cannot comment. A qualified statistician is required.

A central point of this paper is that biological systems are complex, emergent, and nonlinear. The availability of omic data with overwhelming information content presents a challenge to conventional statistical analytic approaches. The need to model potential interactions between thousands of variables rather than individual variables is particularly daunting. While our methods and models reflect an approach that addresses this complexity, with an algorithm that can be verified, we have also presented the accuracy results in terms of the sensitivity and specificity metrics familiar to statisticians, and have also shown these metrics according to training, selection, and out-of-sample subsets of the dataset(s).

10. Are all the source data underlying the results available to ensure full reproducibility?Partly

All of the source data are available from a single public data source, as cited in our paper. Further, we have provided the predictive algorithm from the discovery dataset, in three different representations, so that any party can apply the algorithm’s calculations to the discovery dataset and verify the accuracy metrics we reported.

11. Are the conclusions drawn adequately supported by the results?Partly

It would be helpful to understand which conclusions are not perceived as adequately supported by the results, so that we can either elucidate or plan for additional studies.
Competing Interests: Author review. Kevin Horgan and Patrick Lilley have equity interests in Liquid Biosciences, the developer and owner of the software used in the study. Close
Report a concern

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 07 Jul 2025

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2
Version 1 07 Jul 25	read	read

Daniel H. Solomon, Harvard Medical School, Boston, USA

Kathryne Marks, Brigham and Women's Hospital, Boston, USA
Jean Roudier, INSERM, Luminy, France

Comments on this article

All Comments(0)

Add a comment

Browse by related subjects

Back to all reports

Reviewer Report

7 Views

10 Oct 2025 | for Version 1

Jean Roudier, INSERM, Luminy, France

7 Views Cite this report Responses(0)

Approved With Reservations

Is the work clearly and accurately presented and does it cite the current literature?

No
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

No
If applicable, is the statistical analysis and its interpretation appropriate?

I cannot comment. A qualified statistician is required.
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Partly

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

immunogenetics of Rheumatoid arthritis

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

18 Views

22 Aug 2025 | for Version 1

Daniel H. Solomon, Harvard Medical School, Boston, USA

Kathryne Marks, Brigham and Women's Hospital, Boston, Massachusetts, USA

18 Views Cite this report Responses(1)

Approved With Reservations

Comments:

The authors pointed out several genes of interest. However, it was unclear if there was an increase in transcription or a decrease. This seems like an important detail to leave out.
Were all 8 genes used in the final predictive model? If not, why not and how were some eliminated?
I was a bit unclear on why placebo patients were being considered. I would find it helpful if the authors showed the 2x2 contingency table used to generate sensitivity and specificity.
The authors need to comment on the heterogeneity of the cohorts.
Is a 14 week follow-up adequate?

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

I cannot comment. A qualified statistician is required.
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

Partly

Competing Interests

No competing interests were disclosed.

Respond to this report

Responses (1)

Author Response

11 Sep 2025

Kevin Horgan, Liquid Biosciences Inc., 26895 Aliso Creek Road B800, 92656, USA

Response to review from Daniel H. Solomon, Harvard Medical School, Boston, USA and
Kathryne Marks, Brigham and Women's Hospital, Boston, Massachusetts, USA

1. The authors pointed out several genes of interest. However, it was unclear if there was an increase in transcription or a decrease. This seems like an important detail to leave out.

Authors answer: We thank you for your perceptive and helpful comments. The mRNA/transcription biomarkers’ data values were measurements of the state of a patient at a single baseline timepoint. Therefore, there is no increase or decrease to be reported, in a longitudinal sense. However, it may be that the reviewer intended to refer to the baseline differences between future responders and non-responders – usually reported as a difference of means with a p value. If this is the case, the comment and the aforementioned metrics have an underlying assumption, which is that the biomarkers would be consistently higher or lower in responders than non-responders. This assumption is based on each biomarker being an “independent variable” in a predictive model. Further, the difference of means and the associated p value metric, assume linearity and normal distributions. It turns out that none of these three assumptions are true in the datasets. A central point of the paper is that biomarkers are interacting in the dynamic biology of the patient, the inflammatory disease state, and exogenous factors like lifestyle and treatments. This is borne out in our algorithmic models: the interactions of the biomarkers carry far more predictive information than the levels of each biomarker in isolation. A minimum of four biomarkers are needed for an accurate, robust predictive model, and their measurements are demonstrably interacting mathematically in our model(s). A difference of means metric can be blind to the relationships between the biomarkers because it cannot capture interactions, nonlinearity, or manifold distributions. By contrast, our model does capture nonlinear interactions between the biomarkers, and does not depend on any consistent baseline differences in biomarker levels between responders and non-responders.

2. Were all 8 genes used in the final predictive model? If not, why not and how were some eliminated?

None of the eight biomarkers were eliminated. We discovered two distinct sets of four biomarkers, where each set contains the minimum number of biomarkers required for the best combination of accuracy and robustness: reproducibility across patient heterogeneity. We discovered the first minimal set of four, whose algorithmic model is described in three different but equivalent representations in the paper. We then set aside the first four, and repeated the analytic discovery process on the remaining thousands of biomarkers in the dataset. That repeat analysis yielded another set of four biomarkers with similar accuracy and robustness. We then validated all eight biomarkers, again in minimally required sets, across several validation datasets created from independent patient cohorts.

3. I was a bit unclear on why placebo patients were being considered. I would find it helpful if the authors showed the 2x2 contingency table used to generate sensitivity and specificity.

Placebo patients were considered because the discovery dataset was derived from a clinical trial with a treatment arm and a placebo arm, and our task was to predict response using the treatment/placebo variable plus biomarkers reflecting the baseline (pre-treatment) state of the patients. A key question in drug development and clinical use is whether treatment response surpasses placebo response. The ability to accurately predict individual responses to both treatment and placebo would be extremely useful in both settings. The sensitivity and specificity report is contained in the paper as Table 2. This includes the patient counts, in a slightly different format than a contingency table but with the same information content.

4. The authors need to comment on the heterogeneity of the cohorts.

All of the datasets were obtained from a publicly available repository. The various datasets were produced by different researchers, and we have cited the papers associated with each of the datasets, so that our readers can review subject selection criteria and subject composition information described in the original papers. Given the number of datasets we used, reproducing this information in our paper would be redundant and greatly increase the length of our paper without contributing to the key point we did include in the paper, which is that we used a diverse set of patient cohorts to discover and validate our predictive biomarkers. We state “These studies were done in different geographies, used different methodologies to process the samples and gene expression data, had different endpoints, and in some cases, used different anti-TNF therapies.”

5. Is a 14 week follow-up adequate?

Follow-up time frames can vary according to clinical trial and research practices, as can response metrics (e.g. DAS28, ACR50, etc.). We used datasets from seven studies described in peer-reviewed papers. These papers indicated that each of the datasets was derived from a competently constructed study using broadly accepted outcomes measures and treatment follow-up time frames, even with some differences across studies. Each of these studies were performed to derive clinically useful predictors of treatment response and six of the seven studies had a 14 week primary end point. Despite some variety across datasets, we found that the biomarkers predicted the binary responder/non-responder outcome variable with high accuracy and consistency across datasets.

6. Is the work clearly and accurately presented and does it cite the current literature?Yes

7. Is the study design appropriate and is the work technically sound? Yes

8. Are sufficient details of methods and analysis provided to allow replication by others?Partly

All of the source data are available from a single public data source, as cited in our paper. Further, we have provided the predictive algorithm from the discovery dataset, in three different representations, so that any party can apply the algorithm’s calculations to the discovery dataset and verify the accuracy metrics we reported.

9. If applicable, is the statistical analysis and its interpretation appropriate?I cannot comment. A qualified statistician is required.

A central point of this paper is that biological systems are complex, emergent, and nonlinear. The availability of omic data with overwhelming information content presents a challenge to conventional statistical analytic approaches. The need to model potential interactions between thousands of variables rather than individual variables is particularly daunting. While our methods and models reflect an approach that addresses this complexity, with an algorithm that can be verified, we have also presented the accuracy results in terms of the sensitivity and specificity metrics familiar to statisticians, and have also shown these metrics according to training, selection, and out-of-sample subsets of the dataset(s).

10. Are all the source data underlying the results available to ensure full reproducibility?Partly

All of the source data are available from a single public data source, as cited in our paper. Further, we have provided the predictive algorithm from the discovery dataset, in three different representations, so that any party can apply the algorithm’s calculations to the discovery dataset and verify the accuracy metrics we reported.

11. Are the conclusions drawn adequately supported by the results?Partly

It would be helpful to understand which conclusions are not perceived as adequately supported by the results, so that we can either elucidate or plan for additional studies.

View more View less

Competing Interests

Author review. Kevin Horgan and Patrick Lilley have equity interests in Liquid Biosciences, the developer and owner of the software used in the study.

Alongside their report, reviewers assign a status to the article:

Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions

[1] 1. Joyner MJ, Paneth N: Promises, promises, and precision medicine. J. Clin. Invest. 2019; 129(3): 946–948. PubMed Abstract | Publisher Full Text | Free Full Text

[2] 2. Barker AD, Compton CC, Poste G: The National Biomarker Development Alliance accelerating the translation of biomarkers to the clinic. Biomark. Med. 2014, 2014; 8(6): 873–876. Publisher Full Text

[3] 3. Barker AD, Alba MM, Mallick P, et al.: An Inflection Point in Cancer Protein Biomarkers: What was and What’s Next. Mol. Cell. Proteomics. 2023; 22(7): 100569. PubMed Abstract | Publisher Full Text | Free Full Text

[4] 4. Hopfield JJ: Physics, Computation, and Why Biology Looks so Different. J. Theor. Biol. 1994; 171: 53–60. Publisher Full Text

[5] 5. Nurse P: Life, logic and information. Nature. 2008; 454(7203): 424–426. Publisher Full Text

[6] 6. Nurse P: Complexity and Biology. Cell. 2014; 157(1): 272–273. Publisher Full Text

[7] 7. Berger SI, Iyengar R: Role of systems pharmacology in understanding drug adverse events. Wiley Interdiscip. Rev. Syst. Biol. Med. 2011; 3(2): 129–135. PubMed Abstract | Publisher Full Text | Free Full Text

[8] 8. Savic S, Caseley EA, McDermott MF: Moving towards a systems-based classification of innate immune-mediated diseases. Nat. Rev. Rheumatol. 2020; 16: 222–237. PubMed Abstract | Publisher Full Text

[9] 9. Coffey DS: Self-organization, complexity and chaos: The new biology for medicine. Nat. Med. 1998; 4: 882–885. PubMed Abstract | Publisher Full Text

[10] 10. Goldberger AL: Giles f. Filley lecture. Complex systems. Proc. Am. Thorac. Soc. 2006; 3(6): 467–471. PubMed Abstract | Publisher Full Text | Free Full Text

[11] 11. Rea TJ, Brown CM, Sing CF: Complex adaptive system models and the genetic analysis of plasma HDL-cholesterol concentration. Perspect. Biol. Med. 2006; 49(4): 490–503. PubMed Abstract | Publisher Full Text | Free Full Text

[12] 12. Mazzocchi F: Complexity in biology. Exceeding the limits of reductionism and determinism using complexity theory. EMBO Rep. 2008; 9(1): 10–14. PubMed Abstract | Publisher Full Text | Free Full Text

[13] 13. Herrington D, Wang Y: Clinical heterogenity in the age of big data, advanced analytics and complexity theory. Trans. Am. Clin. Climatol. Assoc. 2023; 133: 56–68. PubMed Abstract Reference Source

[14] 14. Wilkinson J, Arnold KF, Murray EJ, et al.: Time to reality check the promises of machine learning-powered precision medicine. Lancet Digit Health. 2020; 2: e677–e680. PubMed Abstract | Publisher Full Text | Free Full Text

[15] 15. Price WN: Big data and black-box medical algorithms. Sci. Transl. Med. 2018; 10: 471. PubMed Abstract | Publisher Full Text | Free Full Text

[16] 16. MacIsaac K, Baumgartner R, Kang J, et al.: Pre-treatment whole blood gene expression is associated with 14-week response assessed by dynamic contrast enhanced magnetic resonance imaging in infliximab-treated rheumatoid arthritis patients. PLoS ONE. 2014; 9(12): e111937. PubMed Abstract | Publisher Full Text | Free Full Text

[17] 17. Beals C, Baumgartner R, Peterfy C, et al.: Magnetic resonance imaging of the hand and wrist in a randomized, double-blind, multicenter, placebo-controlled trial of infliximab for rheumatoid arthritis: Comparison of dynamic contrast enhanced assessments with semi-quantitative scoring. PLoS ONE. 2017; 12(12): e0187397. PubMed Abstract | Publisher Full Text | Free Full Text

[18] 18. Lequerre T, Gauthier-Jauneau A, Bansard C, et al.: Gene profiling in white blood cells predicts infliximab responsiveness in rheumatoid arthritis. Arth. Res. Ther. 2006; 8: R105. PubMed Abstract | Publisher Full Text | Free Full Text

[19] 19. Julia A, Erra A, Palacio C, et al.: An eight-gene blood expression profile. PloS ONE. 2009; 4: e7556. PubMed Abstract | Publisher Full Text | Free Full Text

[20] 20. Bienkowska J, Dagin G, Batliwalla F, et al.: Convergent random forest predictor: Methodology for predicting drug response from genome-scale data applied to anti-TNF response. Genomics. 2009; 94: 423–432. PubMed Abstract | Publisher Full Text | Free Full Text

[21] 21. Toonen EJ, Gilissen C, Franke B, et al.: Validation study of existing gene expression signatures for anti-TNF treatment in patients with rheumatoid arthritis. PLoS ONE. 2012; 7(3): e33199. PubMed Abstract | Publisher Full Text | Free Full Text

[22] 22. Nakamura S, Suzuki K, Iijima H, et al.: Identification of baseline gene expression signatures predicting therapeutic responses to three biologic agents in rheumatoid arthritis: a retrospective observational study. Arth. Res. Ther. 2016; 18: 159. PubMed Abstract | Publisher Full Text | Free Full Text

[23] 23. Tanino M, Matoba R, Nakamura S, et al.: Prediction of efficacy of anti-TNF biologic agent, infliximab, for rheumatoid arthritis patients using a comprehensive transcriptome analysis of white blood cells. Biochem. Biophys. Res. Commun. 2009 18; 387(2): 261–265. PubMed Abstract | Publisher Full Text

[24] 24. Ramírez-Moya J, Wert-Lamas L, Acuña-Ruíz A, et al.: Identification of an interactome network between lncRNAs and miRNAs in thyroid cancer reveals SPTY2D1-AS1 as a new tumor suppressor. Sci. Rep. 2022; 12(1): 7706. PubMed Abstract | Publisher Full Text | Free Full Text

[25] 25. Li K, Kong R, Ma L, et al.: Identification of potential M2 macrophage-associated diagnostic biomarkers in coronary artery disease. Biosci. Rep. 2022; 42(12): BSR20221394. PubMed Abstract | Publisher Full Text | Free Full Text

[26] 26. Zhang J, Huang X, Wang X, et al.: Identification of potential crucial genes in atrial fibrillation: a bioinformatic analysis. BMC Med. Genet. 2020; 13(1): 104. PubMed Abstract | Publisher Full Text | Free Full Text

[27] 27. Kim S, Noh JH, Lee MJ, et al.: Effects of mitochondrial transplantation on transcriptomics in a polymicrobial sepsis model. Int. J. Mol. Sci. 2023; 24(20): 15326. PubMed Abstract | Publisher Full Text | Free Full Text

[28] 28. Zheng C, Yu X, Xu T, et al.: KCTD4 interacts with CLIC1 to disrupt calcium homeostasis and promote metastasis in esophageal cancer. Acta Pharm. Sin. B. 2023; 13(10): 4217–4233. PubMed Abstract | Publisher Full Text | Free Full Text

[29] 29. Colletti KS, Smallenburg KE, Xu Y, et al.: Human cytomegalovirus UL84 interacts with an RNA stem-loop sequence found within the RNA/DNA hybrid region of oriLyt. J. Virol. 2007; 81: 7077–7085. PubMed Abstract | Publisher Full Text | Free Full Text

[30] 30. Davis JM, Knutson KL, Strausbauch MA, et al.: Immune response profiling in early rheumatoid arthritis: discovery of a novel interaction of treatment response with viral immunity. Arthritis Res. Ther. 2013; 15(6): R199. PubMed Abstract | Publisher Full Text | Free Full Text

[31] 31. Cui J, Saevarsdottir S, Thomson B, et al.: Rheumatoid arthritis risk allele PTPRC is also associated with response to anti-tumor necrosis factor alpha therapy. Arthritis Rheum. 2010; 62(7): 1849–1861. PubMed Abstract | Publisher Full Text | Free Full Text

[32] 32. Plant D, Prajapati R, Hyrich KL, et al.: Replication of association of the PTPRC gene with response to anti-tumor necrosis factor therapy in a large UK cohort. Arthritis Rheum. 2012; 64(3): 665–670. PubMed Abstract | Publisher Full Text | Free Full Text

[33] 33. Poulsen TBG, Damgaard D, Jørgensen MM, et al.: Identification of Novel Native Autoantigens in Rheumatoid Arthritis. Biomedicine. 2020; 8(6): 141. PubMed Abstract | Publisher Full Text | Free Full Text

[34] 34. Lambert MR, Gussoni E: Tropomyosin 3 (TPM3) function in skeletal muscle and in myopathy. Skelet. Muscle. 2023; 13(1): 18. PubMed Abstract | Publisher Full Text | Free Full Text

[35] 35. Galea LA, Hildebrand MS, Witkowski T, et al.: ALK-rearranged renal cell carcinoma with TPM3::ALK gene fusion and review of the literature. Virchows Arch. 2023; 482(3): 625–633. PubMed Abstract | Publisher Full Text

[36] 36. Ritter SY, Subbaiah R, Bebek G, et al.: Proteomic analysis of synovial fluid from the osteoarthritic knee: comparison with transcriptome analyses of joint tissues. Arthritis Rheum. 2013; 65(4): 981–992. PubMed Abstract | Publisher Full Text | Free Full Text

[37] 37. Lequerré T, Bansard C, Vittecoq O, et al.: Early and long-standing rheumatoid arthritis: distinct molecular signatures identified by gene-expression profiling in synovia. Arthritis Res. Ther. 2009; 11: R99. PubMed Abstract | Publisher Full Text | Free Full Text

[38] 38. Badr MT, Häcker G: Gene expression profiling meta-analysis reveals novel gene signatures and pathways shared between tuberculosis and rheumatoid arthritis. PLoS ONE. 2019; 14(3): e0213470. PubMed Abstract | Publisher Full Text | Free Full Text

[39] 39. Mellors T, Withers JB, Ameli A, et al.: Clinical validation of a blood-based predictive test for stratification of response to tumor necrosis factor inhibitor therapies in rheumatoid arthritis patients. Network and Systems Medicine. 2020; 3(1): 91–104. Publisher Full Text

[40] 40. Jones A, Rapisardo S, Zhang L, et al.: Analytical and clinical validation of an RNA sequencing-based assay for quantitative, accurate evaluation of a molecular signature response classifier in rheumatoid arthritis. Expert. Rev. Mol. Diagn. 2021; 21(11): 1235–1243. PubMed Abstract | Publisher Full Text

[41] 41. Anderson C: The End of Theory: The Data Deluge Makes the Scientific Method Obsolete Wired 16.07 retrieved Jan 13^th 2025.2008. Reference Source

[42] 42. Pontzen A: The Universe in a Box: Simulations and the Quest to Code the Cosmos. Riverhead Books; 2023. 9780593330487.

[43] 43. Doyne Farmer J: Making Sense of Chaos: A Better Economics for a Better World. Yale University Press; 2024. 9780300273779.

[44] 44. Naughton J: Machine-learning systems are problematic. That’s why tech bosses call them ‘AI’. The Guardian. 2022. Retrieved June 22^nd 2025. Reference Source

[45] 45. Sipper M, Olson RS, Moore JH: Evolutionary computation: the next major transition of artificial intelligence? BioData Min. 2017; 10: 26. PubMed Abstract | Publisher Full Text | Free Full Text

[46] 46. Gleich J: The Information: A History, a Theory, a Flood. Pantheon Books (US); 2011. 978-0-375-423-72-7.

[47] 47. Robertson DS: Phase Change: The Computer Revolution in Science and Mathematics. Oxford University Press; 2003. 0195157486.

[48] 48. Zenil H, Schmidt A, Tenner J: Causality, information and biological computation: an algorithmic software approach to life, disease and the immune system. Chapter in From Matter to Life: Information and Causality. Walker SI, Davies PCW, Ellis G, editors. Cambridge University Press; 2017; pages 244–280. 9781107150539.

[49] 50. Arthur WB: Algorithms and the Shift in Modern Science. Beijer Discussion Paper Series No. 269.2020. Reference Source

[50] 51. Lim WA, Lee CM, Tang C: Design principles of regulatory networks: searching for the molecular algorithms of the cell. Mol. Cell. 2013; 49(2): 202–212. PubMed Abstract | Publisher Full Text | Free Full Text

[51] 52. Alon U: Network motifs: theory and experimental approaches. Nat. Rev. Genet. 2007; 8(6): 450–461. Publisher Full Text

[52] 53. Gallo E, De Renzis S, Sharpe J, et al.: Versatile system cores as a conceptual basis for generality in cell and developmental biology. Cell Syst. 2024; 15(9): 790–807. PubMed Abstract | Publisher Full Text

[53] 54. Hartwell LH, Hopfield JJ, Leibler S, et al.: From molecular to modular cell biology. Nature. 1999; 402(6761 Suppl): C47–C52. Publisher Full Text

[54] 55. Rajapakse I: Conversation with Dr Steve Smale and Dr. Lee Hartwell. Not. Am. Math. Soc. 2021; 68(9): 1578–1582. Publisher Full Text

[55] 56. Angus DC, Huang AJ, Lewis RJ, et al.: The Integration of Clinical Trials With the Practice of Medicine: Repairing a House Divided.2024; 332(2): 153–162. PubMed Abstract | Publisher Full Text | Free Full Text

[56] 57. Sawyers C: The cancer biomarker problem. Nature. 2008; 452: 548–552. Publisher Full Text

[57] 58. Obermeyer Z, Emanuel EJ: Predicting the Future - Big Data, Machine Learning, and Clinical Medicine. N. Engl. J. Med. 2016; 375(13): 1216–1219. PubMed Abstract | Publisher Full Text | Free Full Text

Quantitative AI yields algorithmic RNA biomarkers from a rheumatoid arthritis clinical trial, accurately predicting individual patient responses to anti-TNF treatment, providing a novel approach to companion diagnostic discovery.

Abstract

Background

Methods

Findings

Conclusions

Keywords

Introduction

Methods

Discovery cohort

Validation cohorts

Table 1. Discovery and validation studies of blood-based gene expression data of anti-TNF responsiveness in RA.

Microarray data

Analysis

Table 2. Discovery algorithm metrics.

Results

Discovery analysis: Derivation of MacIsaac et al. algorithm

Table 3. Discovery algorithm as series of operations.

Figure 1. Discovery algorithm as an equation (a) and in schematic form (b).

Table 4. Gene expression variables present in discovery algorithms.

Validation of discovery gene expression variables by analysis of six independent datasets

Table 5. Discovery and validation studies of blood-based gene expression data of anti-TNF responsiveness in RA.

Figure 2. Performance metrics of discovery algorithmic biomarker predictor compared to other predictors.

Discussion

Ethical considerations

Data availability

Reporting guidelines

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated

Quantitative AI yields algorithmic RNA biomarkers from a rheumatoid arthritis clinical trial, accurately predicting individual patient responses to anti-TNF treatment, providing a novel approach to companion diagnostic discovery.