Characterizing gene expression in an in vitro biomechanical strain model of joint health [version 2; peer review: 1 approved with reservations, 1 not approved]

Background: Both genetic and environmental factors appear to contribute to joint health and disease. For example, pathological levels of biomechanical stress on joints play a notable role in initiation and progression of osteoarthritis (OA), a common chronic degenerative joint disease affecting articular cartilage and underlying bone. Population-level gene expression studies of cartilage cells experiencing biomechanical stress may uncover gene-by-environment interactions relevant to human joint health. Methods: To build a foundation for population-level gene expression studies in cartilage, we applied differentiation protocols to develop an in vitro system of chondrogenic cell lines (iPSC-chondrocytes). We characterized gene regulatory responses of three human iPSC-chondrocyte lines to cyclic tensile strain treatment. We measured the contribution of biological and technical factors to gene expression variation in this system. Results: We identified patterns of gene regulation that differ between strain-treated and control iPSC-chondrocytes. Differentially expressed genes between strain and control conditions are enriched for gene sets relevant to joint health and OA. Furthermore, even in this small sample, we found several genes that exhibit inter-individual expression differences in response to mechanical strain, including genes previously implicated in OA. Conclusions: Expanding this system to include iPSC-chondrocytes from a larger number of individuals will allow us to characterize and better understand gene-by-environment interactions related to joint health. sources of gene expression variation in the and their for quantitative genetic studies of gene regulation in the same incubator for the same period as a control. We performed three technical replicates of this experiment, starting with MSCs from the same cryopreservation batch and carrying out an independent differentiation of the MSCs to iPSC-chondrocytes in each replicate. We extracted bulk RNA from all biological and technical iPSC-chondrocyte treated and untreated replicates (n = 9). We also collected single cell RNA sequencing (scRNA-seq) data from one technical replicate (n = 3) using the 10X Genomics platform. consistent between individuals in the control condition but differs in magnitude or direction between individuals in the strain condition (in response to CTS). These heterogenous responses do not all involve differing magnitudes of changesin the same direction.Inthe caseof MMP14, while NA18855does notseem to respond toCTS throughaltering MMP14 expression, NA18856 responds through upregulation of this gene and NA19160 responds through down-regulation. Ifheterogenous responses toCTS such astheseexistmore broadly between individuals andare associated withgenotypicdifferences,theyshouldbeidentifiableinpopulation-leveleQTLstudiesusingthisexperimentalsystem. The Hung et al manuscript seeks to define the gene regulatory responses of three human iPSC lines that have been differentiated into ‘chondrocytes’ to cyclic tensile strain treatment. The premise is that measuring how chondrocytes from different individuals respond to mechanical stress could uncover gene-environment interactions that are relevant to human joint health. This is a question worthy of study – understanding how individual differences influence joint health could identify new therapeutic targets to treat osteoarthritis, an increasingly common and debilitating disease. The paper includes a lot of complex bioinformatic and statistical analyses of single cell RNAseq and bulk RNAseq data generated in this study and also includes new analyses of publicly available RNAseq data that, in the first instance, are designed to demonstrate that ‘the process of chondrocyte differentiation is robust to individual differences.’ It seems this complex analysis was undertaken because the cell populations being compared contain only a small proportion of chondrocytes. To draw robust conclusions about how chondrocytes respond to mechanical forces it is critical to start the study with a well-defined chondrocyte population. Hence, my reservations about this manuscript stem from the cell population that was studied. The study uses iPSCs from three individuals, 2 male and 1 female. The iPSCs were first differentiated to mesenchymal stem cells (MSCs) and after several passages the MSCs were cryopreserved at passage 5-7. A single batch of cryopreserved MSCs from each iPSC line was used for multiple technical replicate experiments where the MSCs were differentiated in chondrogenic medium for 14 days. The greatest source of variability in iPSC


Introduction
Disorders of the joints can often lead to pain and disability and have far-reaching impacts on quality of life. For example, osteoarthritis (OA) is a chronic degenerative joint disease characterized by defects in articular cartilage integrity and alterations to underlying bone structure. 1 OA is a major cause of disability in older adults and impacts approximately 300 million people worldwide. 2 There are no disease-modifying treatments for this painful disorder, and its specific pathogenic mechanisms are still under investigation.
Genome-wide association studies (GWAS) have identified over 86 genetic loci associated with OA risk. 3 Most of these loci fall within non-coding regions of the genome and have eluded functional characterization. Therefore, it remains unclear how associated genetic factors modulate OA onset and progression. One possibility is that regulatory changes in key structural and metabolic genes may modulate OA-related outcomes. Regulatory changes occurring in response to relevant environmental factors, including biomechanical stress, may be particularly important. Indeed, gene expression studies have identified broad patterns of gene expression that differ markedly between healthy and osteoarthritic human cartilage. 4,5 These gene expression differences reflect activation of biological pathways associated with joint disease, suggesting that studies of gene regulation in cartilage and other skeletal tissues are valuable for understanding normal joint health and joint disease and pathogenesis in joint conditions like OA.
However, few studies have measured gene regulatory phenotypes in human skeletal tissues or cells. Even the Genotype-Tissue Expression (GTEx) Project, one of the largest efforts to examine gene expression variation across human tissues and cell types, does not include samples from cartilage. 6 This is partially due to the practical limitations and ethical issues associated with collecting healthy, high-quality cartilage samples from human donors. Nevertheless, protocols to differentiate induced pluripotent stem cells (iPSCs) into cells relevant to joint health and disease, such as chondrocytes (the primary cells of cartilage), exist, 7,8 and these methods can circumvent some of the challenges associated with inaccessible primary tissues.
iPSC-derived cells also allow for the study of dynamic cellular responses to specific environmental conditions. It has become increasingly evident that studying gene regulation in disease-relevant states is crucial for understanding the genetic basis of disease. 9 Thus, numerous studies have begun identifying dynamic regulatory expression quantitative trait loci (eQTLs) in various cell types and contexts, including drug-induced cardiotoxicity, 10 cardiomyocyte differentiation, 11 vitamin D exposure, 12 and response to infection. [13][14][15][16][17] These studies highlight the merits of exploring gene regulation beyond steady-state conditions.
In human joints, biomechanical stress is a particularly relevant environmental condition. Joint health deteriorates in response to excessive or insufficient amounts of mechanical loading. [18][19][20][21][22] Further, biomechanical factors may impact gene expression regulation in joint tissues and may interact with genetic factors to impact risk for joint diseases. 23 Such interactions are difficult to examine in vivo. However, iPSC-derived chondrogenic cells (iPSC-chondrocytes) provide an alternative system in which to study the effects of cyclic tensile strain (CTS), a type of controlled biomechanical stress regimen designed to induce joint disease-like phenotypes. [24][25][26][27] Thus, iPSC-chondrocytes offer an opportunity to study gene expression responses to joint disease-relevant states. Studies of iPSC-chondrocytes may also help uncover mechanisms through which OA-associated genetic loci modulate OA outcomes.
Both chondrocyte differentiation protocols and methods for inducing CTS in vitro existed prior to this study. Still, little is known about the suitability of this system for studies of gene regulatory dynamics. Therefore, we designed a study using human iPSC-chondrocytes to examine the combined effects of genetic variation and biomechanical stress on gene regulation during CTS. Through this study, we evaluated whether the process of chondrocyte differentiation is robust to individual differences between three individuals. We also ascertained whether iPSC-chondrocytes exhibit a robust gene expression response to CTS. Finally, we determined whether expanding the sample size of this experimental system might further improve our understanding of gene-by-environment interactions within joint health.

REVISED Amendments from Version 1
In the revised version of the manuscript, we make minor additions and changes to address the feedback and suggestions of two reviewers. We add additional extended data figures that reflect further morphologic and gene expression characterization of iPSC-chondrocytes. We also expand the Discussion section to address sources of gene expression variation in the data and their implications for quantitative genetic studies of gene regulation in this system.
Any further responses from the reviewers can be found at the end of the article Methods Chondrogenic differentiation iPSCs in this study were previously generated from lymphoblastoid cell lines (LCLs) derived from three Yoruba individuals (NA18855, female; NA18856, male; and NA19160; male). 28 These LCLs were originally derived from individuals collected as part of the HapMap project. 29 Undifferentiated iPSCs were cultured on Matrigel-coated plates (Corning 356230) in Essential 8 (E8) medium at 37°C, 5% CO 2 , and atmospheric O 2 until iPSCs reached 30% confluency. E8 medium was subsequently changed to mesenchymal stem cell (MSC) culture medium, which consists of low glucose Dulbecco's Modified Eagle Medium (DMEM) with 20% stem cell-qualified fetal bovine serum (FBS), and 100 mg/mL Penicillin/Streptomycin. The MSC medium was changed every day for 3 days, at which point cells were 80-100% confluent. On day 3, cells were detached from the Matrigel-coated petri dishes using a 0.05% Trypsin/EDTA solution and cultured on uncoated polystyrene flasks in MSC medium. The medium was changed every 2-3 days until the cells reached 90% confluency. The cells were then sub-cultured at a ratio of 1:3 until passage 4, at which point cells were classified as iPSC-derived MSCs (iPSC-MSCs). iPSC-MSCs were cryopreserved with cryopreservation media (80% FBS, 10% MSC culture medium, 10% Dimethyl Sulfoxide) in liquid nitrogen at passage 5 to 7.
iPSC-MSCs were detached from culture flasks using 0.05% Trypsin/EDTA and seeded at a density of 250,000 cells/well onto the center of wells of BioFlex Type I Collagen coated 6-well Culture Plates (FlexCell International BF-3001C) using BioFlex cell seeders (FlexCell International). Cells were seeded using a regimen of 15% elongation for 2 hours followed by overnight culture in MSC culture medium. After seeding, cells were cultured in serum-free chondrogenic differentiation medium, 7 consisting of high glucose DMEM, 100 mg/mL Penicillin/Streptomycin, 50mg/mL L-Proline, 200 mM GlutaMax, 50 mg/mL L-Ascorbic acid-2-Phosphate, 11g/L Sodium pyruvate, 5mM Dexamethasone, 1x ITS Premix, and supplemented with 10 ng/mL TGF-β3. The chondrogenic medium was changed every 2-3 days for 14 days.
Standard phenotyping of iPSC-derived cells Flow cytometry of iPSC-MSCs was performed using the BD Biosciences Human MSC Analysis Kit (BD Biosciences 562245), in combination with the Zombie Violet Fixable Viability Kit (BioLegend 423113). The Human MSC Analysis Kit assesses the surface markers CD90, CD105, CD73, CD34, CD45, CD11b or CD14, CD19, CD79α, and HLA-DR. In each flow experiment, matched iPSCs from the same line as each iPSC-MSC were included as a negative staining control. Samples were run on a BD LSRII Special Order System machine at the University of Chicago Cytometry and Antibody Technology Core Facility.
iPSC-chondrocytes were fixed using 4% paraformaldehyde in phosphate-buffered saline (PBS) before staining using Alcian blue and Nuclear Fast Red. Alcian blue binds proteoglycans, which are found in connective tissue, particularly in cartilage. 30 Stained iPSC-chondrocytes and matched iPSC-MSCs from the same individuals were imaged using an Olympus dissecting microscope.
Immunostaining for COL2A1 iPSC-chondrocytes differentiated in chondrogenic media from MSCs for 14 days in either monolayer or pellet culture, MSCs, and primary cartilage tissue were fixed using 4% paraformaldehyde in PBS. iPSC-chondrocyte pellets were generated as in Nejadnik et al. (2015), 7 fixed in 4% paraformaldehyde in PBS, dehydrated sequentially in 15% sucrose in PBS and 30% sucrose in PBS, and then embedded in optimal cutting temperature compound (OCT). Primary human articular cartilage samples were obtained from a patient undergoing hip replacement surgery (University of Chicago BSD/UCMC IRB Protocol 19-0990). The IRB granted a waiver of informed consent for these samples as they were deidentified. Under sterile conditions, cartilage scrapings were obtained from the medial portion of the femoral head and cut into small pieces using a scalpel. Samples were washed with PBS twice and flash frozen. Primary cartilage tissues were then thawed and embedded in OCT. Cartilage tissue and iPSC-chondrocyte pellets were sectioned on a cryostat to a thickness of 18 μm and 5 μm respectively and sections were mounted on slides prior to staining. Cells and sections were immunostained using a rabbit Collagen II polyclonal primary antibody (Thermo PA5-85108, RRID:AB_2792256) and a secondary antibody HRP/DAB detection IHC kit (Abcam ab64261, RRID:AB_2810213). Immunostained cells and sections were imaged using an EVOS microscope under the brightfield setting.
Cyclic tensile strain regimen iPSC-chondrocytes were treated with a cyclic tensile strain (CTS) regimen that is known to induce an OA-like phenotype using the Flexercell FX6000 Tension System (Flexcell International). [24][25][26][27] Plates were loaded onto the Flexercell baseplate (located in an incubator at 37°C, 5% CO 2 , and atmospheric O 2 ), and a vacuum was used to deform the cell culture plate membrane and create uniform biaxial cyclic tensile strain. Specifically, 2.5% elongation (15kPa) of CTS was applied to the cells at a rate of 0.5 Hz for 24 hours.
Quantitative real-time reverse transcription PCR (RT-PCR) of chondrocyte hypertrophy-related marker genes RNA was extracted from iPSC-chondrocytes following CTS or control treatments using the ZR-Duet™ DNA/RNA MiniPrep kit according to manufacturer instructions (Zymo D7001). To quantify target gene expression of GUSB, MMP1, MMP13, and TIMP2, we used RT-PCR with a QuantStudio 6 Flex Real-Time PCR System and SYBR Green reagents according to manufacturer instructions (Applied Biosystems). The sequences of primers used for each marker gene are shown in Extended Data Table 9. 82 The cycle threshold (Ct) values were measured, and relative transcript levels were calculated for each target gene in each sample. The efficiency (E) of each PCR amplification reaction was calculated based on the slope of a linear curve of a series of dilutions of target DNA with known concentrations (E =10 (-1/slope) ). Data were plotted as relative expression, which is calculated as E (À ΔΔCt) , using GUSB as a housekeeping gene in all cases. 31 Droplet-based single cell RNA sequencing iPSC-chondrocytes were dissociated from adherent conditions into single cell suspension as follows: first, cells were rinsed once with 1X PBS. Then, 1mg/mL of collagenase II) in 1X HBSS was added to cell culture wells at room temperature for 5 minutes. The collagenase II was neutralized with MSC medium and removed before further processing of the cells. Cells were rinsed once again with 1X PBS. A 0.25% Trypsin/EDTA solution was added to wells at room temperature for 2 minutes until cells detached. The trypsin was neutralized with MSC culture medium, and the cells were pelleted at 1000 rpm for 5 minutes and resuspended in FBS Stain Buffer. Cells were counted separately for each sample and combined in equal proportions before loading into a Chromium Single Cell A Chip kit according to manufacturer instructions (10X Genomics, 120236). To ensure that collection batch, individual, and treatment conditions were not confounded, samples were pooled strategically. One GEM well of a Chromium single cell chip targeting a collection of 5000 cells contained NA19160 control, NA18856 control, and NA18855 strain-treatment cells. A second GEM well of the same Chromium single cell chip targeting a collection of 5000 cells contained NA19160 strain-treatment and NA18855 control cells. The cells collected from sample NA18856 strain-treatment were not processed due to viability issues. Single cell cDNA libraries were established following the 10x Genomics Chromium Single Cell protocol. 32 In brief, the RNA of the captured cells was released by lysis, barcoded via a reverse transcription process, and amplified to produce gene expression libraries. The libraries were sequenced to 100 base pairs, paired-end on one lane using the Illumina HiSeq4000 at the University of Chicago Genomics Core Facility according to manufacturer instructions.
Single cell data processing FastQC (RRID:SCR_014583) was used to confirm that the reads were of high quality. Using an in-house computational pipeline, we extracted 10X cell barcodes and UMIs from raw scRNA-seq reads and mapped remaining reads to genes in the hg38 genome using STARsolo from the STAR software with default parameters (version 2.6.1b, RRID: RRID: SCR_004463). 33 The software demuxlet was used to deconvolute sample identity of individual cell droplets and detect multiplets in multiplexed samples with default parameters. 34 Previously collected and imputed genotype data for the three Yoruba individuals from the HapMap and 1000 Genomes Project were used as input for demuxlet. 29,35 Processed gene count per cell barcode matrices were imported into R using the Seurat package (v3.2.0, RRID: SCR_007322). 36,37 Data were filtered to remove cells with fewer than 2000 UMIs detected and more than 10% of reads mapping to mitochondrial genes. Cells assigned as multiplets by demuxlet were also removed. A Uniform Manifold Approximation and Projection (UMAP) plot of the merged and unintegrated data shows that cells originating from the same individual cluster with each other (Extended Data Figure 11 82

).
Integration of individual level scRNA-seq data and characterization of cell clusters Filtered scRNA-seq data was integrated across individuals using Seurat. Cells that were assigned as singlets by demuxlet were treated as individual datasets. Specifically, we focused on just those datasets deriving from control cell culture conditions (n = 3), as opposed to strain-treated conditions (n = 2). Using Seurat, the SCTransform normalization function was applied to each of these datasets, and then datasets were integrated using integration anchors identified using the FindIntegrationAnchors function. Five-thousand features were selected as integration features for the SCT integration.
Seurat's FindClusters function was used with 38 gene expression principal components (PCs) and a resolution of 0.4 as parameters to perform unsupervised clustering of transformed and integrated data. Thirty-eight gene expression PCs were chosen by locating the elbow in an elbow plot of PCs. To characterize the resulting three clusters that emerged, a Poisson adaptive shrinkage model was fit to the raw count data from the cells in each pseudo-sample described above using the ashR package. 38 Poisson ashR models were fit separately for cell droplets assigned to each unsupervised cluster or separately for cell droplets from each individual. The cumulative density function of the inferred prior distributions for each of the fitted Poisson ashR models was plotted as in Sarkar and Stephens 2020, 39 for chondrogenic gene markers.
Topic modeling of single cell RNA sequencing data An unsupervised topic model with k = 7 topics was fit to the scRNA-seq raw count data from several published sources and data from iPSCs and iPSC-derived cell types generated by our laboratory. Briefly, single cell data from iPSCs, iPSC-MSCs, iPSC-chondrocytes, and iPSC-osteoblasts collected by our group from a single human cell line were combined with single cell data from primary human hepatocytes, 40 iPSC-chondrocytes from an iPSC-chondrogenic pellet timecourse, 41 primary human chondrocytes, 42,43 and the iPSC-chondrocytes from the present study.
First, the 10X Genomics Chromium Single Cell Gene Expression platform was used to collect scRNA-seq data from iPSCs, iPSC-MSCs, iPSC-chondrocytes, and iPSC-osteoblasts. All of these cells, which were derived from a single human cell line, were included in the topic modeling analysis. This human iPSC line was previously generated and characterized in our lab. 44 The same protocol used to generate other iPSC-MSCs in the current study was also used to generate iPSC-MSCs here, with the exception that DMEM:F-12 (Thermo fisher 11330032) was used instead of low glucose DMEM (See Methods). The same chondrogenic media formulation was used to differentiate iPSCchondrocytes here as in this current study. iPSC-osteoblasts were generated by culturing iPSC-MSCs in osteogenic differentiation medium, consisting of high glucose DMEM (Gibco 11965092), 100 mg/mL Penicillin/Streptomycin (Corning 30002Cl), 10% stem cell-qualified fetal bovine serum (FBS, Thermo fisher 10567014), 50ug/mL Vitamin C, 100nM Dexamethasone, 10mM β-glycerophosphate, and 1uM Vitamin D. The osteogenic medium was changed every 2-3 days. Our iPSC-chondrocyte and iPSC-osteoblast protocols each included a total of 21 days of differentiation in their respective media before isolation and data collection, compared to 14 days for iPSCchondrocytes in the current study.
Also included in the analysis were single-cell data from 3,490 hepatocytes published in MacParland et al., 2018, which were subset from a larger dataset of single cell results from whole liver homogenate. 40  Genes with non-zero counts in at least one cell in any of the six single cell datasets were included in the raw count matrix used to fit the topic model. A Poisson non-negative matrix factorization (NMF) model with 7 ranks was fit to the data using the fit_poisson_nmf function in the fastTopics R package with default parameters (v0.4.35). 45 After fitting the Poisson NMF model, the fitted loadings and factors matrices were rescaled to sum to a total of 1 across each barcode for the loadings matrix and across each gene for the factors matrix to convert the Poisson NMF model into a topic model. The rescaled loadings matrix became the topic probabilities, and the rescaled factors matrix became the word probabilities in the resulting topic model.
The diff_counts_analysis function in fastTopics was applied to the topic model to evaluate differential expression of individual genes in each topic. Briefly, the function calculates a β statistic, which represents the log-fold change in relative occurrence of a gene in a single topic compared to its occurrence in all other topics. The function also calculates a standard error and z-score for each β statistic based on a Laplace approximation to the likelihood at the MLE.
Bulk RNA extraction and sequencing RNA was extracted from cells following CTS or control treatments using the ZR-Duet™ DNA/RNA MiniPrep kit (Zymo D7001). RNA concentration and quality were measured using the Agilent 2100 Bioanalyzer. Library preparation was performed over two batches using the Illumina TruSeq RNA Sample Preparation Kit v2 (RS-122-2001& -2002. Samples were sequenced to 50 base pairs, single-end on one lane using the Illumina HiSeq4000 at the University of Chicago Genomics Core Facility according to manufacturer instructions. A minimum of 17,284,094 raw reads were generated per sample. We used FastQC to confirm that the reads were of high quality. One bulk RNA-seq sample was found to have a very low proportion of mapped reads (38.40%) and was excluded from subsequent analyses.
Quantifying the number of bulk RNA-seq reads mapping to genes Reads were mapped to the hg38 genome using STAR (version 2.6.1b). 33 Gene expression levels were quantified using the featureCounts function in Subread (v1.6.5 RRID:SCR_009803) using standard parameters. 46 All downstream processing and analysis steps were performed in R (v3.6.1, RRID:SCR_001905) unless otherwise stated.
Transformation and normalization of bulk RNA-seq reads Log2-transformed counts per million (CPM) were calculated from raw counts for each sample using the edgeR package (RRID:SCR_012802). 47 Lowly expressed genes were filtered such that only genes with an expression level of log2 (CPM) > 2.5 in at least 4 samples were kept for downstream analyses. For the remaining 10,486 genes, the raw read counts were normalized using the relative log expression (RLE) method to account for the median number of reads sequenced across samples.

Removing unwanted variation from bulk RNA-seq data
To account for batch effects arising between technical replicates before differential expression analysis, we modeled factors of unwanted variation using the RUVs correction method (RRID:SCR_006263) 48 with k = 2. RUVs is a method that uses technical replicate samples to estimate factors of unwanted variation from RNA-seq data. Individual-treatment pairs were constant within replicate blocks, which are used for the RUVs correction. RUVg is distinct from RUVs in that it uses negative control genes to estimate factors of unwanted variation from RNA-seq data rather than knowledge of technical replicate samples in the data.
Differential expression analysis with bulk RNA-seq data Differential expression (DE) was measured using a linear-model-based empirical Bayes method in the limma R package (RRID:SCR_010943). The voom function from the limma package was also used to calculate weights to account for the mean-variance relationship in the RNA-seq count data.
Replicate batch was modeled as a random effect while treatment, individual, two RUVs coefficients, and RIN score were modeled as fixed effects in the linear mixed model for DE comparisons as in equation (1). The ashR package 38 was used to perform multiple testing correction on the DE tests using an adaptive shrinkage method. Genes with an FDR-adjusted p value < 0.05 were considered DE.
Enrichment of DE genes in biological pathways and OA-related gene sets Using topGO (RRID:SCR_014798), we assessed enrichment of Gene Ontology (GO) biological processes among DE genes. A Kolmogorov-Smirnov test using ashR adjusted p-values was used for assessing enrichment of GO processes, and the top 20 most enriched terms were reported. To test for enrichment of sets of OA-related genes in our DE genes, 3,5 a Fisher's exact test was used. In all enrichment tests, the background gene set was the complete set of genes tested for DE in our analyses (n = 10,486 genes).
There is the possibility that the enrichment of DE genes for GO categories and outside gene sets is driven by differential power due to some genes having higher expression in our samples. We therefore repeated the DE and enrichment analyses ten times, permuting the treatment condition labels for the samples each time.
Analysis of sources of variation in bulk RNA-seq data Principal component analysis (PCA) was performed on the normalized log2(CPM) values from above. A linear regression analysis was then performed between each of the top 5 PCs and several biological and technical variables. These variables included number of reads sequenced, library preparation batch, RIN score, treatment condition, replicate, and individual. P values from the regression were corrected using the Benjamini Hochberg (BH) procedure. Results with a BH-adjusted p value < 0.05 were considered significant.
The variancePartition (RRID:SCR_019204) package was applied to the filtered and RLE-normalized CPM values. 49 variancePartition uses a linear mixed model to quantify the contribution of variance from different sources. Our linear mixed model included variation due to individual cell line, treatment status, replicate batch, and library preparation batch. In addition, a single coefficient of unwanted variation was determined using the RUVg correction method 48 with k = 1; this coefficient was also included in the model. The RUVg correction method estimates factors of unwanted variation in RNA-seq data through negative control genes, which have the lowest variation in expression between samples. The 100 least variable genes in the data ranked by coefficient of variation were used as the set of control genes for the RUVg correction.
Power curves for expression QTL (eQTL) and dynamic eQTL mapping To ascertain the power to detect eQTLs and dynamic eQTLs across a range of sample sizes and standardized effect sizes, we followed the example presented in Ward et al., 2021. 50 In brief, for the power analysis we assumed a simple linear regression for eQTL mapping and a conservative Bonferroni correction for multiple testing (FWER = 0.05). Standardized effect sizes are defined as the true additive effect size of genotype on gene expression divided by the phenotype standard deviation. To estimate the false positive rate of calling a dynamic eQTL, we computed the probability of a SNP being called as significant in only one of the two treatment conditions, assuming the standardized effect size was in fact identical in both conditions.

Reanalysis of previous dynamic eQTL studies
We used summary statistics from eQTL mapping in three prior dynamic eQTL studies 13,50,51 to determine standardized effect sizes for eQTL association tests in each treatment condition. Briefly, p values from association tests were converted into Z-scores using the appropriate quantile function. Z-scores were then converted to standardized effect sizes by adjusting for the square root of the sample size of the study. For summary statistics from Alasoo et al., 2018, and Caliskan et al., 2015, an adaptive shrinkage model was fit to the distribution of effect sizes and standard errors using ashR. 38 The ashR posterior estimates of effect sizes and standard deviations were used to compute the standardized effect size. Standardized effect size thresholds for at least 0.8 power under a sample size of 10, 30, 58, or 100 individuals were determined as described above. The number of genes with at least one association test that meets each of these thresholds in each condition were tabulated. Empirical distribution functions were fit to the distributions of the standardized effect sizes from each condition in each of the three studies. The 99 th percentile of these standardized effect sizes was determined from the empirical distribution function.

Analysis code
The analysis code used in this study is available on GitHub and is archived with Zenodo. 83

Results
We designed this study to determine whether iPSC-chondrocytes are a useful system for studying gene-by-environment regulatory interactions relevant to joint health. First, we asked whether the efficiency of chondrocyte differentiation is similar in different individuals. Next, we evaluated the effects of CTS on gene regulation in iPSC-chondrocytes to determine whether this system is suitable for studying gene regulatory effects on joint health. Finally, we estimated the contribution of sample and batch effects to variation in gene expression response to CTS, to assess the suitability of our iPSC-chondrocyte system for response eQTL mapping studies.
Study design and data collection in the iPSC-based in vitro system We used three human iPSC lines that were previously established and characterized as part of a panel of iPSCs derived from Yoruba individuals. 28  We treated iPSC-chondrocytes from each individual with 24 hours of CTS, which is known to induce a hypertrophic phenotype in cartilage [24][25][26][27] ; Methods). We simultaneously kept a second, matched set of untreated iPSC-chondrocytes in the same incubator for the same period as a control. We performed three technical replicates of this experiment, starting with MSCs from the same cryopreservation batch and carrying out an independent differentiation of the MSCs to iPSCchondrocytes in each replicate. We extracted bulk RNA from all biological and technical iPSC-chondrocyte treated and untreated replicates (n = 9). We also collected single cell RNA sequencing (scRNA-seq) data from one technical replicate (n = 3) using the 10X Genomics platform.
We could not detect differential expression between treatment conditions for markers of chondrocyte degeneration and hypertrophy in the bulk RNA sequencing data. This is because these genes are expressed at a low level across all samples and are filtered out in pre-processing steps. To evaluate the changes in the expression of these gene markers between treatment conditions more sensitively, we performed quantitative real-time reverse transcription PCR (RT-PCR) on control and strained iPSC-chondrocytes. We detected modest increases in expression of gene markers of chondrocyte hypertrophy MMP1 and MMP13 in response to CTS, 25,53 while gene marker TIMP2 did not change in expression, in line with previous reports (Extended Data Figure 3, Extended Data Table 1 82 ). 25 iPSC-chondrocytes likely represent an early stage of chondrogenesis As a next step in our analysis, we confirmed that iPSCs successfully differentiated to chondrogenic cells.  In addition to discerning heterogeneity in our samples, we also evaluated the relative maturity of our iPSC-chondrocytes. To do this, we used topic modeling, analyzing our single cell data along with single cell datasets from multiple cell types, including primary adult chondrocytes (Methods). Topic modeling is an unsupervised classification approach that, when applied to single cell gene expression data, allows one to find recurring patterns of gene expression, or topics, present across a collection of cells. By allowing each cell to have grades of membership in multiple topics simultaneously, rather than assigning cells to only one cluster, 57 topic modeling can identify both discrete and continuous variation between cells.  retain a large proportion of gene expression patterns characteristic of iPSC-MSCs (topic 1), but they also possess certain gene expression patterns seen in adult primary chondrocytes and in iPSC-chondrocytes differentiated through a chondrogenic pellet (topics 4, 6, and 7) (Figure 2d). Topics 4-7 display relatively high levels of expression of several markers of chondrogenesis or cartilage fate, including SOX9, SOX5, SOX6, 58 and COL9A1 59 (Extended Data Figure 5a 82 ). Furthermore, differential expression analyses across topics identified type IX collagen gene COL9A3 59 as highly occurring in topic 7 relative to all other topics (Extended Data Figure 5b 82 ). Similarly, in topics 4 and 6, the chondrogenic marker COMP 60 has a high occurrence relative to all other topics. Thus, we conclude our iPSC-chondrocytes are likely in the early stages of chondrogenesis and are readily distinguishable from iPSCs and iPSC-MSCs.
Analysis of bulk RNA sequencing data After confirming we can generate chondrogenic cells from iPSCs, we next sought to understand gene expression variation in this system. For this we focused on bulk RNA-seq data collected from all replicates. We generated an average of 22.3M raw reads per sample (s.d. 4M reads). We excluded one sample from further analyses because it displayed a particularly low percentage of mapped reads (Extended Data Table 2 82 ). We note the mapped reads from this sample cluster as expected with other technical replicates from the same individual and treatment (Extended Data Figure 6 82 ), but we still excluded it because it failed standard QC metrics. We filtered the remaining data for lowly expressed genes and standardized gene counts with respect to library size (Methods).
As a first step of our analysis of the bulk data, we identified gene expression responses to strain treatment. We used the limma R package to fit a linear mixed model for each of the 10,486 expressed genes in the filtered bulk RNA-seq data, accounting for the random effect of experimental batch and the fixed effects for individual cell line, sex, treatment status, two factors of unwanted variation, and RIN score (Methods). Using this model, we tested for differential expression between treated and untreated cultures. At an FDR of 0.05, 987 genes are significantly differentially expressed (DE) between treated and untreated chondrogenic cells (Figure 3a-b; Extended Data Table 3 82 ).
To evaluate the potential relevance of the DE genes to joint health and OA, we first considered their enrichment among gene ontology (GO) terms. The top 20 most highly enriched GO biological process terms include those related to ECM organization and metabolism of ECM structure (Figure 3c). These functions make intuitive sense given that ECM homeostasis is important for joint cartilage health; moreover, imbalances in this homeostasis are associated with OA. 61,62 We also determined whether the DE genes may be overrepresented in gene sets previously implicated in OA. We examined results from one of the largest GWAS for OA susceptibility, which identified 64 independent significant associations with OA. 3 We used a Fisher's exact test to assess enrichment of DE genes among a set of 553 genes located within 500 kb of the 64 associated loci. These 553 genes were also identified as having prior evidence of involvement in animal models of skeletal disease or human bone diseases. 3 We found that DE genes in our study are significantly enriched within the set of 553 genes previously associated with OA (p = 0.002; Extended Data Table 4 82 ).
Next, we evaluated results from a separate study, which profiled mRNA and protein samples in low-grade and high-grade osteoarthritic cartilage from 115 patients undergoing joint replacement. 5 Steinberg et al., 2019 found 409 genes with evidence of significant differential expression between patients with low-grade and high-grade osteoarthritic cartilage, at both the RNA and protein levels. Though causality is difficult to infer, this observation suggests at least a subset of these genes is involved in OA cartilage degradation (Extended Data Table 5). A Fisher's exact test reveals that our DE genes are also significantly enriched among this gene set (p = 0.02). Of note, the two genes that overlap between the two external gene sets, LTBP3 and LAMC1, are also DE our study. LTBP3 is a regulator of the TGF-ß signaling family, which plays roles in cartilage formation and development. 63 LAMC1 has been identified as a blood-based biomarker for detecting mild knee OA, with lower RNA expression identifying mild OA. 64 Based on these GO and gene set enrichment results we concluded that the DE genes identified between strain-treated and control iPSC-chondrocytes are relevant to joint health.
Due to differential power, highly expressed genes are more likely to be detected as DE than lowly expressed genes in any RNA sequencing dataset. 65 Therefore, it is possible the magnitude of expression of different genes in our data can explain our enrichment results. To assess the robustness of our findings, we permuted the labels of treatment condition among our samples and re-performed DE and enrichment analyses a total of ten times (Extended Data Table 6 82 ). In nine permutations, we failed to identify ECM-related GO terms among the top 20 enriched terms (one permutation revealed two ECM-related GO terms). Further, we did not find any enrichment of DE genes within the two OA-relevant gene sets using permuted data. As our permuted data do not display the same enrichment patterns as our actual data, we concluded our results are not due to differential power to detect DE.

Certain gene expression responses to stress are heterogeneous between individuals
In our DE analysis, we focused on identifying inter-treatment differences in gene expression rather than inter-individual differences. Ultimately, we would like to use this system to study gene-by-environment interactions, which occur at the intersection of inter-treatment and inter-individual differences. A gene-by-environment interaction occurs when the magnitude or direction of gene expression response to an environmental stimulus is associated with an individual's genotype at a particular locus. The sample size of this current study is far too small to detect gene-by-environment interactions. Still, we identified several genes that exhibit inter-individual differences in expression in response to CTS (Figure 4).
For example, MMP14 displays a different pattern of expression in each cell line before and after CTS (Figure 4a): MMP14 expression remains constant between control and strain-treated NA18855 cells, is upregulated in strain-treated NA18856 cells, and is downregulated in strain-treated NA19160 cells. MMP14 is expressed constitutively in adult joint cartilage and upregulated in diseased states. 66 In addition, EXOSC8 and COPG1 (Figure 4b-c) are both involved in the formation of secretory vesicles originating from the Golgi complex. These genes also display differences in direction or magnitude of gene expression response to CTS between individuals. If heterogenous responses to biomechanical stress exist more broadly and are associated with genotypic differences, this experimental system will be able to identify them in population-level eQTL studies.

Sources of variation in bulk RNA sequencing data
Thus far, our results show that CTS elicits a robust, joint health-relevant gene expression response in iPSC-chondrocytes, and that, anecdotally, this response can differ between individuals. Next, we sought to more generally evaluate the utility of this system for studying the effects of genetic variation and biomechanical stress on gene regulation. Specifically, we considered dynamic expression quantitative trait loci (dynamic eQTLs), which are genetic variants associated with a change in gene expression in response to a treatment. For our system to be useful in identifying dynamic eQTLs, individual differences should drive a substantial amount of the variation in gene expression response to the treatment. To quantify the contribution of individual differences to gene expression variation in iPSC-chondrocytes, we estimated how much of this variation is attributable to technical and biological factors. Our study design allows us to do so, as we collected bulk RNA-seq data from three independent technical replicates of each cell line. As each technical replicate used MSCs from the same cryopreservation batch from each individual, we are not able to differentiate between variation introduced by individual level differences and variation introduced by differences in iPSC-MSC differentiation. Thus, these two sources of variation are captured together by the single variable of "individual" in our analysis.  Table 7 82 ). Our results indicate the primary source of gene expression variation is individual (regression of PC1 by individual, p = 1.34 Â 10 -7 , regression of PC2 by individual, p = 4.61 Â 10 -4 ). The second largest source of variation in the data is sex (regression of PC2 by sex, p = 2.67 Â 10 -4 ), which is unsurprising given our study included one female and two male cell lines. Although treatment shows a minor correlation with PC2 (R 2 = 0.14), PC3 (R 2 = 0.14), and PC4 (R 2 = 0.3), none of these correlations are statistically significant.
Encouragingly, we did not find technical replicate (or 'batch') to be significantly associated with any of the first five PCs in the data. Nevertheless, we took advantage of our replicated experimental design to account for two factors of unwanted technical variation in the data 48  Next, we took a more systematic approach to modeling the contribution of biological and technical factors to gene expression variation. Our goal was to leverage the total amount of variation in our data rather than focusing only on a few major axes of variation, as in the PCA above. We quantified the contributions of several experimental variables to gene expression variation on the level of individual genes using a linear mixed model (Methods). To do so, we modeled a single factor of unwanted variation in the data by using a set of 100 genes with the least amount of variation in the data as negative control genes 48 (Methods). We then included the filtered and normalized gene expression data and this single factor of unwanted variation in the model (Methods; Figure 5c).
We determined that individual cell line contributes the largest amount of variance to the data (median of 42% variance explained). The additional factor of unwanted variation explains a median of 3.6% of the variance, and treatment explains a median of 3.5% of the variance. In contrast, technical replicate batch and cDNA library preparation batch explain a (a-c) MMP14, COPG1, and EXOSC8 each demonstrate inter-individual differences in gene expression in our dataset. Dot plots are the expression level (log2 cpm) of each gene in each sample, with each individual and treatment condition plotted separately. Lines represent AE one standard deviation. For each candidate gene, expression is relatively consistent between individuals in the control condition but differs in magnitude or direction between individuals in the strain condition (in response to CTS). These heterogenous responses do not all involve differing magnitudes of changes in the same direction. In the case of MMP14, while NA18855 does not seem to respond to CTS through altering MMP14 expression, NA18856 responds through upregulation of this gene and NA19160 responds through downregulation. If heterogenous responses to CTS such as these exist more broadly between individuals and are associated with genotypic differences, they should be identifiable in population-level eQTL studies using this experimental system. negligible amount of variance (median of 8.7 Â 10 -7 % and 3.5 Â 10 -7 % variance explained, respectively). We observed similar results when running a model that did not include the factor of unwanted variation (Extended Data Figure 10 82 ). Therefore, the biological variables of individual cell line and treatment contribute more to gene expression variation than technical variables. However, unwanted variation still seems to contribute to gene expression variation. Therefore, gene expression studies using this system should account for potential latent sources of variation.

A power analysis
Our results are encouraging for our goal of verifying the feasibility of using iPSC-chondrocytes to study gene-byenvironment interactions in joint health. One possible way to study these interactions would be to use this system to map static eQTLs and dynamic response eQTLs (i.e., eQTLs that emerge in response to CTS). We conducted a power analysis to determine the potential impact of expanding this system to include 58 iPSC lines (Figure 6; we chose n = 58 because this is the number of YRI iPSCs available to us). Under the assumptions of a simple linear regression to map eQTLs and a conservative Bonferroni correction for multiple testing (FWER = 0.05; Methods), we estimated that a sample of 58 individuals will provide 80% power to detect eQTLs with a standardized effect size of 0.7 in each of the control and treatment conditions. At this effect size, the power to detect eQTLs comes with a correspondingly low FDR (0.22).
To contextualize these results, we reanalyzed eQTL summary statistics from a set of previous dynamic eQTL studies that come from a variety of different research contexts 13,50,51 (Figure 6; Extended Data Table 8 82 ). In each of these studies, hundreds to thousands of genes in each treatment condition have at least one eQTL which meets the standardized effect size threshold of 0.7 above. While none of these examples perfectly recapitulates the results of our system, the fact these estimates are conservative and come from eQTL studies in three different stimulus conditions demonstrates the potential effectiveness of this approach.

Discussion
We conducted a study to establish the feasibility of an in vitro iPSC-chondrocyte model for studying gene-by-environment interactions in joint health. Gene-by-environment interactions, particularly those related to biomechanical stress, may play a role in pathogenesis in joint related diseases such as OA. However, numerous ethical and logistical obstacles limit the study of these interactions and their effects on gene regulation in human chondrocytes. iPSC-chondrocytes may be a suitable alternative to circumvent these obstacles when paired with in vitro CTS models. Overall, our in vitro system allows for both the precise control of iPSC-chondrocyte environmental exposures and measurement of gene expression responses relevant to human joint health. While no in vitro model can completely accurately mimic in vivo disease, our results demonstrate this system has tremendous potential to increase our understanding of human joint health.
iPSC-chondrocytes are a valuable system to address the current lack of gene expression studies in human joint cells. Although iPSC-chondrocytes do not completely emulate mature, primary human chondrocytes, they do exhibit protein and gene expression patterns characteristic of both adult chondrocytes and developing chondrocytes. The relatively early differentiation stage of our cells may be due to a variety of factors, including a shorter differentiation time and the culturing of cells as a monolayer as opposed to a 3-dimensional pellet. 67 Nevertheless, iPSC-chondrocytes provide a unique opportunity to learn about gene regulation in human joints and the basis of adult joint disease phenotypes. For instance, the ability to generate cells along the trajectory between iPSC-MSCs and mature chondrocytes allows for gene expression studies at a level infeasible with human primary tissues. Furthermore, prior studies have shown that studying iPSC-derived cells can uncover potentially important and transient forms of gene regulation masked in terminal cell types. 11 Our results point to a robust gene expression response to CTS in iPSC-chondrocytes. We detected 987 DE genes in our study between treated and control cultures. These DE genes are enriched for gene sets relevant to joint health and OA. Thus, our results highlight the potential of this system as a platform for gene expression studies of human joint cells that circumvent the limitations of primary tissues. Our observations also suggest that studying gene regulation in this relatively simple system may provide insight into more complex biological processes relevant to human joint disease.
One potential cause for reservation in our GWAS analysis is the mismatch in population ancestries between our DE results (African ancestry) and OA GWAS results (European ancestry). However, prior studies have found that genetic associations between causal variants and complex traits are largely shared between populations. 68-70 Furthermore, Figure 6. Power analysis for eQTL and dynamic eQTL study with two conditions. Power curves are derived under the assumption of a simple linear regression for expression quantitative trait locus (eQTL) mapping and plotted over standardized effect sizes (effect sizes divided by the phenotype standard deviation) for a range of sample sizes. Dynamic QTL false positive rates are computed as the probability of a SNP being called as significant in only one of two treatment conditions, assuming the standardized effect size was in fact identical in both conditions. The horizontal red line represents a power to detect eQTLs of 0.80. Vertical transparent lines represent the 99 th percentile of the standardized effect size estimated from an empirical cumulative distribution function fit to eQTL summary statistics from 3 published dynamic eQTL studies from different contexts, with the mean value over all the conditions in each study plotted. our analyses focus on the genes implicated in the OA GWAS results, rather than the causal variants themselves. As such, we expect the relevance of these gene sets to carry over more faithfully between populations. Finally, the population mismatch would likely have the effect of biasing our results towards the null rather than introducing false positive findings. Therefore, we believe the observed enrichment is meaningful despite the current lack of equivalent OA GWAS results from a more comparable African ancestry population.
We acknowledge that in vitro CTS models do not directly mimic the compressive biomechanical stress felt by joint chondrocytes in vivo. However, CTS models are recognized as a valid method for studying the effects of extraphysiological stresses in cultured cells. Others have previously used CTS models to measure responses of joint cells to controlled biomechanical stress treatments. [24][25][26][27][71][72][73][74][75] Other groups have also developed models that use specific types and patterns of biomechanical strain to induce transcriptional and biochemical changes characteristic of early human OA. [24][25][26][27] Our RT-PCR and bulk RNA-seq results further attest to the utility of CTS as a model of biomechanical stress in studies of human joint health.
We also found that an in vitro iPSC-chondrocyte system may be useful for studying the effects of genetic variation on gene regulation; moreover, it offers a way to study how biomechanical stress interacts with genetic factors to affect gene regulation. Indeed, individual-level differences drive a substantial amount of gene expression variation in this system. Furthermore, previous multi-species studies of regulatory sequence variation in cartilage have found a high degree of constraint in these sequences, but that this conservation is violated in human-specific changes, which may bring about OA. 76 The presence of associations between sequence variation in human non-coding genomic regions and OA further supports the importance of gene regulation in this disease. 3 Therefore, eQTL and dynamic eQTL studies, including those carried out in iPSC-chondrocytes, may be fruitful for explaining the connection between human genetic variation and OA or joint health.
We identified specific differences in the gene expression response to CTS between individuals in this study. Additionally, one individual (NA18856) demonstrates a stronger transcriptional response to CTS than that found in the other two individuals in this study, and this stronger response is consistent across three replicates. We anticipate that there will be similar levels of heterogeneity in the severity of response to CTS in other individuals not included in this study. We further expect that these stronger responses represent a difference in the severity, but not the nature, of the gene expression changes during CTS. Therefore, we do not expect this heterogeneity to limit the ability of this system to map eQTLs in future studies, especially it is driven by genotype. As such, iPSC-chondrocytes may be fruitful for uncovering gene-by-environment interactions involved in pathogenesis of joint diseases.
Investigating dynamic and context-specific gene regulatory effects may reveal the mechanisms contributing to joint disease development and progression, as this approach has been successfully applied to a variety of other cell types and trait contexts. 10,[13][14][15][16][17]51,77 Previous studies have found that dynamic eQTLs are more enriched for relevant significant GWAS alleles than non-dynamic ('standard') eQTLs, which show consistent effects between conditions. 10,11,14,16 Our power analysis suggests that a study with a few dozen individuals may grant sufficient power to detect many static and dynamic eQTLs. Dynamic eQTLs may be more useful for identifying candidate susceptibility genes in joint diseases than steady state eQTLs, and they may also improve our understanding of gene-by-environment interactions related to joint health and disease.
Future studies using the iPSC-chondrocyte system should account for the possibility that transcriptional heterogeneity between and within individual iPSC-chondrocyte lines may confound association results in an eQTL study. Our analysis of the scRNA-seq data from control iPSC-chondrocytes suggests that differentiation efficiency does not differ substantially between individuals. Nonetheless, it is possible that differentiation efficiency may differ for other individuals not included in this study. There may also still exist transcriptional heterogeneity between iPSC-chondrocytes in their response to CTS that bulk RNA-seq would not adequately capture. Measuring and accounting for transcriptional heterogeneity in iPSC-chondrocytes will also allow future gene expression studies to focus specifically on iPSCchondrocytes, which represent only a minority of cells in each culture. This will increase power to detect both standard and dynamic eQTLs.
Our study does not distinguish between gene expression variation introduced by individual level differences and variation introduced by differences in iPSC-MSC differentiation. However, prior work demonstrates that gene expression patterns between separate iPSC-MSC lines differentiated independently from the same individual are highly consistent, and this consistency persists across species and across variations of MSC differentiation media. 78 Therefore, we anticipate that iPSC-MSC differentiation does not contribute significantly to gene expression variation on the individual level.
Our iPSC-chondrocyte system also facilitates investigations beyond those involving only human cells. The existence of panels of human and nonhuman primate iPSCs 44 introduces the possibility of inter-species comparisons of response to CTS. Comparative studies may help uncover gene-by-environment interactions that contribute to the differential prevalence of OA and other joint diseases observed across primate species. [79][80][81] We believe the in vitro iPSC-chondrocyte CTS model shows great promise when applied to studies of gene expression in human joints. We hope such a system enables future studies of gene regulation in joint cells and their connections to joint health and disease. This project contains the following underlying data:

Data availability
-Extended Data Table 1. Raw RT-PCR amplification -Extended Data Table 2. Bulk RNA sequencing library metadata and quality control metrics.
-Extended Data Table 3. Differential expression results from limma analysis between strain-treated and control samples. Only genes that passed filters for low expression were kept.
-Extended Data Table 4. Genes from Tachmazidou et al., 2019 -Extended Data Table 5. Genes with significant cross-omics differences between high-grade and low-grade cartilage in Steinberg et al. 2019.
-Extended Data Table 6. Results from 10 random permutations of sample treatment condition labels.
-Extended Data Table 7. Full results of linear regression analysis between experimental variables and top 5 principal components in the bulk RNA sequencing data.
-Extended Data Table 8. Results from reanalysis of prior dynamic eQTL studies for eQTL power analysis.
-Extended Data The Hung et al manuscript seeks to define the gene regulatory responses of three human iPSC lines that have been differentiated into 'chondrocytes' to cyclic tensile strain treatment. The premise is that measuring how chondrocytes from different individuals respond to mechanical stress could uncover gene-environment interactions that are relevant to human joint health. This is a question worthy of study -understanding how individual differences influence joint health could identify new therapeutic targets to treat osteoarthritis, an increasingly common and debilitating disease.
The paper includes a lot of complex bioinformatic and statistical analyses of single cell RNAseq and bulk RNAseq data generated in this study and also includes new analyses of publicly available RNAseq data that, in the first instance, are designed to demonstrate that 'the process of chondrocyte differentiation is robust to individual differences.' It seems this complex analysis was undertaken because the cell populations being compared contain only a small proportion of chondrocytes.
To draw robust conclusions about how chondrocytes respond to mechanical forces it is critical to start the study with a well-defined chondrocyte population. Hence, my reservations about this manuscript stem from the cell population that was studied. The study uses iPSCs from three individuals, 2 male and 1 female. The iPSCs were first differentiated to mesenchymal stem cells (MSCs) and after several passages the MSCs were cryopreserved at passage 5-7. A single batch of cryopreserved MSCs from each iPSC line was used for multiple technical replicate experiments where the MSCs were differentiated in chondrogenic medium for 14 days.
The greatest source of variability in iPSC differentiation protocols are technical parameters (eg different differentiation batch) rather than the cell line (see Phipson et al (2018) and this means that comparisons must be made using samples from concurrent carefully matched differentiations. 1 It is not clear if concurrent differentiations to MSCs were part of the experimental design but the images in extended data figure 1 show cells at very different densities suggesting differences in cell growth or seeding density. The MSCs differentiated from the three lines are compared by FACS using 8 different markers (extended data Fig 1) and this shows considerable variation between the three lines. Given the importance of starting the chondrocyte differentiation from matched cell populations it is imperative to show that this first step is robust and reproducible and that the cell population is similar at the start of the chondrocyte differentiation step. This is not shown in the manuscript. The authors conclude that "individual cell line contributes the largest amount of variance to the data" but this is based on a single differentiation to MSCs then technical replicates of the chondrocyte differentiation stage and it is unclear how reproducible the MSC differentiation is. The variance could have been generated by technical variation in the early differentiation stages.
Chondrocyte differentiation. Figure 1b  means that comparisons must be made using samples from concurrent carefully matched differentiations. 1 It is not clear if concurrent differentiations to MSCs were part of the experimental design but the images in extended data figure 1 show cells at very different densities suggesting differences in cell growth or seeding density. The MSCs differentiated from the three lines are compared by FACS using 8 different markers (extended data Fig 1) and this shows considerable variation between the three lines. Given the importance of starting the chondrocyte differentiation from matched cell populations it is imperative to show that this first step is robust and reproducible and that the cell population is similar at the start of the chondrocyte differentiation step. This is not shown in the manuscript. The authors conclude that "individual cell line contributes the largest amount of variance to the data" but this is based on a single differentiation to MSCs then technical replicates of the chondrocyte differentiation stage and it is unclear how reproducible the MSC differentiation is. The variance could have been generated by technical variation in the early differentiation stages.
Author Response to comment 1: The reviewer indicated that they believe that the study is not sound and that our conclusions are not supported by the data. We do not understand these statements considering the more specific concerns detailed by the reviewer. We agree with the reviewer that our cellular model is not optimal, and that the differentiated chondrocytes are not mature and represent a small proportion of the culture. These details, however, are known to the reviewer because we clearly reported them in our paper and explained our conclusions in relationship to these caveats. We are aware of the limitations of the study and believe we reported them accordingly. In our opinion, this is the requirement for a sound study -being aware of the limitations. We provide further details in our responses below.
Cell heterogeneity is critically important to consider, and we appreciate the reviewer's request for further clarification. While it is true that iPSC differentiation batch contributes greatly to variation in many complex organoid differentiation protocols, our observations in monoculture differentiations of iPSCs have not found this to be as sizeable of an issue. Data from Housman, Briscoe, and Gilad 2022 demonstrate that gene expression patterns between iPSC-MSC lines differentiated from the same individual but in different batches and even using different formulations of MSC media are remarkably similar (97% correlation between matched iPSC-MSC lines). This high degree of correlation is observed between human (98% correlation) and chimpanzee (93% correlation) iPSC-MSC lines, demonstrating the robustness of this observation.
Furthermore, our gene expression measurements across technical replicates of the experiment allow us to measure the contribution of individual variables (encompassing both individual cell line differences and individual donor differences) to variation in the system. These technical replicates of the experiment also allow us to account for individual factors in downstream analyses, which demonstrate reliable differences between treatment groups. We have modified our textual descriptions of individual cell line contributions to gene expression variation to reflect the fact that this variable encompasses both cell line and donor variation in the Results section.
If any of the individual level gene expression variation observed in this study segregates with genotype, this will be detected in an eQTL mapping study. While much can and has been achieved through this small scale study to determine the feasibility of an eQTL approach in this system, to definitively establish this feasibility, a full scale experiment will need to be performed. Therefore, we believe this study has achieved its intended purpose.
In this study, we made the choice to pause the differentiation at the iPSC-MSC state for each individual by cryopreserving cells in order to facilitate the design of larger-scale experiments including many more individuals. During the differentiation of each individual to MSCs, we were careful to keep reagent lot numbers consistent to minimize technical differences between MSC lines. The differentiation from iPSC-MSC to iPSC-chondrocyte consistently occurs over a period of 14 days, and the differentiation from iPSC to iPSC-MSC occurs over a period of around 28 days but is variable between individuals. Cryopreservation of iPSC-MSC lines enables larger scale studies to more reliably synchronize the collection of data from multiple individuals simultaneously to reduce the impact of batch effects from multiple sample collections. Figure 1 were taken prior to seeding of cells for differentiation and were not standardized to the same confluency, though this was not made adequately clear in the previous figure description. In Extended Data Figure 1, we now include images of iPSC-chondrocytes from each of the lines after 14 days of differentiation, demonstrating equal confluency of cells between individuals, in an updated version of the text.

Images of MSCs in Extended Data
Reviewer comment 2: Chondrocyte differentiation. Figure 1b and extended data Figure 2a show alcian blue staining. Alcian blue stains proteoglycan side chains so the staining doesn't show a collagenous ECM as claimed in the results. The staining is localised and very weak and in one line, not visible at all in the mechanically unstimulated cells. Data from one individual shows slight collagen II staining (extended data figure 2b). This staining pattern suggests that the chondrocyte differentiation protocol has produced a small proportion of immature chondroprogenitors and the scRNAseq data confirms this. Between 5% and 8% of the cells from each line form a cluster that is 'most like chondrocytes'. The only gene expression data shown that supports the chondrocyte like nature of this cell cluster is COL11A1 in figure 2c and TIMP2 and TIMP3 in extended data figure 4. It would be much easier to judge this if this is a chondrocyte population if expression data for a range of genes characteristically expressed in chondrocytes was shown -for example SOX9, SOX5, SOX6, COL2A1, ACAN, COL9A1, MATN3, SPARC, COL11A1, COL10A1, PRG4 -a comparison with the scRNAseq data published in the Wu et al 2021 paper would also be helpful when judging what stage of chondrocyte development has been induced. 2 To substantiate any conclusions about how individual differences influence how chondrocytes respond to mechanical forces the authors would need to compare chondrocytes rather than a mixed population of cells where immature chondrocyte progenitors are <10% of the total cell population. When >90% of the cell population are not chondrocytes it is difficult to see how the differential gene expression in mechanically stimulated cultures can be attributed to the way chondrocytes respond.

Author Response to comment 2:
We appreciate the note about Alcian blue staining indicating proteoglycan side chains rather than collagen production. This wording is changed in our revised manuscript. We now include expression plots for several other chondrogenic marker genes as suggested. However, we were not able to plot expression data for COL2A1, ACAN, or COL9A1 due to zero counts measured for these genes. . Thus, we do agree with the fact that our model does not capture completely the responses of primary chondrocytes to mechanical strain, but that this does not necessarily invalidate it as a useful approach to study gene regulation in OA.

2022).
Reviewer comment 2: I am concerned that the NA18856 cells appear to be more dramatically changed by the cyclic tensile strain on the axes of PC1 and especially PC2 than the other cell lines. This is dismissed as "treatment shows a minor correlation with PC2 (R2 = 0.14), PC3 (R2 = 0.14), and PC4 (R2 = 0.3)", but I wonder if this result implies concerted individual variability across many genes in this system, rather than random variability in responses across individuals at specific loci. It would help to also show the samples plotted along PC3 and PC4 to evaluate if the response of this single individual to treatment is driving the correlations, and to discuss possible implication of a single individual having distinct concerted responses.
Author Response to comment 2: While the small scale of this study makes it difficult to generalize to other individuals not included in the study, we anticipate that there will be some degree of heterogeneity in the severity of response to CTS in a larger group of individuals. We also think it is more likely that NA18856 does not represent concerted individual variability across many genes specific to this one individual but rather a more severe response that would be found across other individuals. Therefore, we do not expect this heterogeneity to limit our ability to map eQTLs in this system, especially if many other individuals show a similar large response to the strain treatment. We present PCA plots of samples plotted along additional PCs beyond PC1 and 2 in the Extended Data figures (Extended Data Figures 7 and 9), and in an updated version of this text, we include more details about the implications of a subset of individuals having strong responses to CTS in the Discussion section.
Reviewer Minor comment 1: This claim in the introduction is too strong "Through this study, we evaluated whether the process of chondrocyte differentiation is robust to individual differences" given that only used 3 individuals were used in the present study.
Author Response to Minor comment 1: Thank you for the suggestion. We modified the sentence to read: "Through this study, we evaluated whether the process of chondrocyte differentiation is robust to individual differences between three individuals." Reviewer Minor comment 2: The authors could cite analyses of human sequence variation in chondrocyte regulatory elements: e.g. .1 Author Response to Minor comment 2: Thank you for the suggestion. We mention prior studies of known variation in chondrocyte regulatory regions in an updated version of the text.

Reviewer Minor comment 3:
For the topic modeling, this is discussed in the text, but it would be helpful to show an additional panel in the figure or supplement on the key genes in each topic and their preservation, especially for markers characteristic of adult chondrocytes.