A high-throughput gene expression analysis software tool for developmental time series and gene signature analysis of human cardiomyocyte differentiation

Isaiah Bartlett; Jeff Kinne; Grace Martin; Kevin Coombes; Kristopher R. Schwab

doi:10.12688/f1000research.154762.1

Home Browse A high-throughput gene expression analysis software tool for developmental...

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Software Tool Article

A high-throughput gene expression analysis software tool for developmental time series and gene signature analysis of human cardiomyocyte differentiation

[version 1; peer review: 2 approved with reservations]

Isaiah Bartlett¹, Jeff Kinne^1-3, Grace Martin⁴, Kevin Coombes⁵, Kristopher R. Schwab ^2-4

Isaiah Bartlett¹, Jeff Kinne^1-3, [...] Grace Martin⁴, Kevin Coombes⁵, Kristopher R. Schwab ^2-4

PUBLISHED 26 Sep 2024

Author details Author details

¹ Department of Electronics and Computer Engineering, Indiana State University, Terre Haute, Indiana, USA
² The Rich and Robin Porter Cancer Research Center, Indiana State University, Terre Haute, Indiana, USA
³ The Center for Genomic Advocacy, Indiana State University, Terre Haute, Indiana, USA
⁴ Department of Biology, Indiana State University, Terre Haute, Indiana, USA
⁵ Augusta University Medical College of Georgia, Augusta, Georgia, USA

Isaiah Bartlett
Roles: Conceptualization, Data Curation, Formal Analysis, Investigation, Software, Visualization, Writing – Original Draft Preparation

Jeff Kinne
Roles: Conceptualization, Data Curation, Formal Analysis, Funding Acquisition, Investigation, Methodology, Project Administration, Resources, Software, Supervision, Validation, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing

Grace Martin
Roles: Data Curation, Investigation, Writing – Original Draft Preparation

Kevin Coombes
Roles: Conceptualization, Formal Analysis, Methodology

Kristopher R. Schwab
Roles: Conceptualization, Investigation, Supervision, Validation, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

Abstract

Publicly available high-throughput gene expression data enable the investigation of biological processes by the scientific community. Although several bioinformatics tools offer methodologies for basic differential gene expression analysis, difficulties arise in the analysis of multiple sample groups comprising a developmental time series, especially when the identification and classification of unique gene expression patterns is the primary goal of the study. Data analysis using these tools requires programming experience, which limits the accessibility of these tools to the broader community. To streamline developmental time-series investigations, we created the Developmental Gene Expression Analysis (devGEA) tool. This environment can be implemented locally or via web browsers to expedite differential gene expression analysis. This tool provides gene signature determination methods that can classify differentially expressed genes based on their correlation with gene expression patterns.

devGEA was used to characterize cardiac development gene expression signatures from high-throughput RNA-seq datasets profiling small-molecule directed cardiomyocyte differentiation of human pluripotent stem cell lines (hiPSCs). After pre-processing, discrete gene expression criteria-based expected changes were used to classify the genes into developmental signatures. Several cardiomyocyte differentiation markers and candidate cardiac genes representing different developmental signatures were experimentally validated using the GIBCO^TM hiPSC line. This method was then compared to a gene signature correlation approach that classified expressed genes based on their degree of similarity with key cardiac developmental signatures representing the stages of cardiomyocyte differentiation. Therefore, devGEA provides a robust workflow for investigator-driven analysis of developmental time series, allowing for the identification of differentially expressed genes and gene signatures for extensive experimental investigation. We also introduced a method for classifying genes by their correlation with genes or developmental patterns of interest. Our correlation-based method takes advantage of a priori knowledge of an experiment and is conceptually simpler than unsupervised clustering approaches.

Keywords

transcriptomics, differential gene expression, RNA-seq, heart development, cardiomyocyte differentiation, clustering, shiny application

Corresponding author: Kristopher R. Schwab

Competing interests: No competing interests were disclosed.

Grant information: National Institute on Minority Health and Health Disparities
5R25MD011712-04
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2024 Bartlett I et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Bartlett I, Kinne J, Martin G et al. A high-throughput gene expression analysis software tool for developmental time series and gene signature analysis of human cardiomyocyte differentiation [version 1; peer review: 2 approved with reservations]. F1000Research 2024, 13:1097 (https://doi.org/10.12688/f1000research.154762.1) First published: 26 Sep 2024, 13:1097 (https://doi.org/10.12688/f1000research.154762.1) Latest published: 26 Sep 2024, 13:1097 (https://doi.org/10.12688/f1000research.154762.1)

Introduction

Differential gene expression analysis of developmental processes during embryogenesis can be enhanced by combining online resources that provide curated data from the scientific literature with independent investigations of previously published data. Several online databases exist, such as the Mouse Genome Informatics (MGI) online database, which aggregates in situ hybridization gene expression data (Genepaint), high-throughput gene expression datasets profiling a diverse set of tissues (Genotype-Tissue Expression (GTEx) Project), and the biocuration of gene expression data from diverse sources integrating anatomy and expression.¹^–³ However, the expression profiles of many genes remain ambiguous or poorly characterized during the spatiotemporal expression changes that occur during organogenesis. For example, according to the MGI Gene Expression Database (GXD), only 8,660 of the 16,038 genes identified by the resource have annotated gene expression data for the developing heart at any stage or expression level, using conventional molecular techniques.³^,⁴ To fill these gaps in gene expression knowledge, the large quantity of high-throughput gene expression data aggregated within the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) provides an incredible opportunity for investigators to enhance their investigation of differential gene expression during developmental processes and thereby obtain preliminary data to develop new hypotheses for future investigations.

Data analyses involving two experimental groups are relatively straightforward to complete using GEO2R, a differential analysis software integrated into GEO.⁵ However, an in-depth study of the differentiation and development of a tissue or organ involves analysis of multiple experimental conditions comprising a developmental time series. Several bioinformatics tools and the requisite programming experience are required to complete the appropriate differential gene expression analysis and data visualization. We introduced developmental Gene Expression Analysis (devGEA), a new software solution that combines common bioinformatics tools necessary to investigate gene expression changes during cellular differentiation or other developmental processes, with an emphasis on classifying unique gene expression signatures. The workflow provides an investigator with the ability to evaluate and classify gene expression changes in a developmental or time-series dataset, such as differentiation from a pluripotent cell to a terminally differentiated cellular state. DevGEA extends the common steps of raw data processing by introducing several methods for analysis and classification based on a gene model or specific developmental gene expression. These methods are direct, reproducible, and conceptually simpler than many other clustering methods often used in this area. devGEA can be run in the browser from http://cs.indstate.edu/devgea or downloaded as an R source code from https://gitlab.com/jkinne/devgea and executed locally. The Gitlab repository also contains a standalone R Markdown file that performs the analysis in this study.

To demonstrate the use of this software tool, we used devGEA to perform differential gene expression analysis and developmental gene signature classification of cardiomyocyte differentiation from human pluripotent stem cells (hPSCs), including human embryonic stem cells (hESC) and human induced pluripotent stem cell (hiPSC) lines. The two prior datasets utilized in our investigation profile the cardiomyocyte differentiation of several cell lines, providing an opportunity to classify cardiac developmental signature genes that are reproducible across multiple cell lines. Genes identified within these developmental signatures were validated in the laboratory for comparable expression changes in an independent hiPSC line.

hPSCs have revolutionized biology, allowing in vitro investigation of developmental, physiological, and pathological processes within a human tissue-based experimental system.⁶^,⁷ The study of heart development and disease has undergone a paradigm shift in the last decade, evolving from the use of experimental organisms as models of mammalian heart development, such as the mouse, to the use of in vitro hPSC differentiation procedures to directly investigate human cardiac development.⁸^–¹⁰ Furthermore, these breakthroughs have enabled scientists to study human cardiomyocyte differentiation and the physiology of hiPSCs in both healthy patients and those with congenital heart defects. Directed differentiation of hESCs or hiPSCs using either recombinant growth factors or small-molecule modulators of the WNT signaling pathway enables robust and efficient differentiation of hPSCs into cardiomyocytes, which recapitulates the differentiation of the cardiac lineage in vivo.¹¹^–¹⁴ Similar to embryonic heart development, hPSCs undergo mesoderm, cardiac progenitor, and cardiomyocyte differentiation, which are characterized by the stepwise expression of specific developmental mesodermal and cardiac regulatory genes.⁸^,⁹^,¹¹ In this study, the devGEA analysis suite was used to classify gene expression profiles during human cardiomyocyte differentiation based on known cardiac developmental regulators. Together, these results aid our understanding of discrete gene expression profiles representing the major gene expression states of cardiomyocyte differentiation, and form a robust method to investigate novel cardiac genes.

Background - cardiomyocyte differentiation

in vitro directed cardiomyocyte differentiation using small molecule modulators of the WNT signaling pathway has greatly enhanced our ability to study human cardiomyocyte development.¹³^,¹⁵^,¹⁶ This procedure offers a robust and reproducible experimental assay for studying cardiomyocyte differentiation that recapitulates the in vivo cardiac developmental gene regulatory network. Compared to previous methods, such as embryoid body generation, which produces a heterogeneous mixture of developing tissues, directed cardiomyocyte differentiation enables the generation of cardiomyocytes at an efficiency varying from 80 to 98%.⁸^,¹⁵^,¹⁶ Since directed cardiomyocyte differentiation involves the introduction of defined culture media, developmental gene expression changes can be correlated with the introduction of new media formulations containing specific small molecules and supplements. These datasets consist of at least four developmental time points that correspond to specific culture media that induce a stepwise cardiomyocyte differentiation program. Standard bioinformatic analyses using pairwise statistical tests and fold-change cutoff criteria between groups can be unwieldy and difficult to interpret. Thus, differential gene expression analyses of such datasets have traditionally used statistical tools consisting of the ANOVA statistical test and pair-wise fold-change cutoff criteria, followed by data visualization using clustering tools such as k-means or hierarchical clustering. Given our prior knowledge of key regulatory genes within the cardiac network, a guided analytical approach can instead be employed to classify cardiac genes based on their unique developmental gene expression signatures. Figures 1 and 2 summarize the key timepoints of the cardiomyocyte differentiation experiment: the two prior datasets that we used for analysis and expression patterns of key genes from our study.

Figure 1. hiPSC directed cardiomyocyte differentiation procedures used in this study.

A) Methods of small molecule-directed cardiomyocyte differentiation of hiPSCs in prior studies and the present study. Wnt signaling pathway modulators are utilized in a stepwise manner to induce cardiomyocyte differentiation and to maintain contractile cardiomyocytes over the course of several days. Although the addition of specific small-molecule inhibitors and supplements varies slightly, the overall result efficiently generated a large population of autorhythmic contractile cardiomyocytes.

B) Gene expression profiling of three experimental datasets used in this study. The major developmental time points were identified along the cardiac differentiation stage timeline with comparable dates of gene expression profiling in each experiment identified in the table above. Key cardiomyocyte gene expression signatures as well as the expected cardiac developmental markers were identified.

Figure 2. Directed cardiomyocyte differentiation of Gibco hiPSCs and RT-qPCR analysis of developmental marker expression.

A) Micrographs of morphological changes in hiPSCs during cardiomyocyte differentiation. hiPSC colonies, characteristic of feeder-free human PSCs, were identifiable on day 0. Cell morphological changes are identifiable as differentiating cells proliferate to confluency by days 2 and 4, generating significant cellular debris. By day 8, the confluent multilayer cellular morphology of the well had undergone remodeling into rhythmically contractile cardiomyocyte syncytia.

B) Relative gene expression profiling using RT-qPCR of key cardiomyocyte differentiation markers obtained from the basic gene expression signature analysis for pluripotency and mesodermal, early-cardiomyocyte, mid-cardiomyocyte, and late-cardiomyocyte signatures within the GSE85331 (Liu et al., 2017) dataset. Note that ISL1 is the only mid-cardiomyocyte expression signature, whereas MEF2C, NKX2-5, and TBX5 exhibit late-cardiomyocyte expression.

Datasets

To begin the in silico analysis of directed cardiomyocyte differentiation, we identified two publicly available GEO datasets that profile human cardiomyocyte differentiation (Figure 1). Both Refs. 17 and 18 utilize Illumina RNA-seq technology to profile gene expression changes during hPSC-directed cardiomyocyte differentiation. Although these reports utilized slightly different modifications to the original procedure, similar results were obtained with the robust and efficient differentiation of cardiomyocytes.¹¹^,¹³^,¹⁷^,¹⁸ Both datasets evaluated the major differentiation steps that were initiated by the addition of GSK3 inhibitors (day 0), the addition of WNT inhibitors (day 2 or 3), the addition of media supplemented with B27 and insulin (day 4 or 5), and then beating cardiomyocyte syncytia at the end of differentiation (day 15 or 30). Furthermore, these high-quality datasets evaluate cardiomyocyte differentiation using multiple hPSC lines, which allows the identification of cardiac developmental genes demonstrating consistent regulation and the exclusion of genes that exhibit variable expression between cell lines. Note that the bioinformatics analysis within these papers used specialized scripts in R and other languages to conduct their experimental analyses. The GEO-provided analysis software GEO2R can be used to define sample groups for analysis, set P value and fold change cutoff thresholds, select pairs of sample groups for differential expression analysis, and allow minimal plot functionality, data export, and the ability to download an R script that can be run locally. devGEA provides enhanced functionality for developmental time-series analysis in contrast to GEO2R, which will be explored in this report.

Experimental setup

hiPSC line

The human induced pluripotent stem cell (hiPSC) Gibco^TM Episomal line (A18945) was purchased from Thermo Fisher Scientific (Waltham, MA, USA). The hiPSC line was derived from cord blood CD34⁺ progenitors using seven episomally expressed factors (Oct4, Sox2, Klf4, Myc, Nanog, Lin28, and SV40 T) and adapted to feeder-free, serum-free conditions. This line is indistinguishable from human embryonic stem cells and can be robustly differentiated into cardiac lineages.¹²

hiPSCs were maintained according to the manufacturer’s recommendations. Cells were cultured on Vitronectin (VTN-N) Recombinant Human Protein, Truncated-coated 6-well tissue culture plates (A14700, Thermo Fisher Scientific) using Gibco StemFlex™ Medium (A1517001, Thermo Fisher Scientific) maintained under normoxic conditions in 5.0% CO₂. hiPSCs were maintained and renewed and passaged at least twice per weekusing Versene clump dissociation reagent(15040, Thermo Fisher Scientific) at multiple split ratios. The weekend-free Stemflex media passaging protocol was completed as described below. On day 1 cell passaging occurred as described below. VTN-N-coated 6-well plate were prepared by diluting 60-μL of the VTN stock solution (0.5 mg/mL) in 6 mL 1X Dulbecco’s Phosphate-Buffered Saline (DPBS) without calcium or magnesium (Thermo Fisher Scientific, 14190) to a final concentration of 0.5 μg/mL, adding 1 mL of diluted solution to each well, incubating the plate at room temperature for one hour, aspirating the solution, and then adding 1 mL of StemFlex Medium. Clump dissociation was performed by rinsing each well with 2 mL of DPBS, adding 1 mL of 1X Versene clump dissociation reagent(15040, Thermo Fisher Scientific) for 3-5 mins at room temperature, aspirating the Versene solution, and then removing cell clumps from the surface by gentle pipetting in 2 mL StemFlex Medium and transfer to a sterile 15 mL conical tube. The split ratios varied from 1:4 to 1:10 (0.5 to 0.2 mL) depending on the confluency and density of the hPSC colonies. The appropriate cell suspension volume was transferred to a new 15 mL conical tube, Stemflex media added to a final volume of 1 mL, mixed gently by pipetting, and then transferred into a well containing 1 mL of media. On day 2, wells were renewed with 2 mLs of Stemflex medium. On day 4, cells were Versene clump dissociated, passaged as described above. On day 5, cells were renewed with 4 mL StemFlex Medium. On day 8, cells were then passaged as described above.

Aliquots of cells were cryopreserved using the Gibco PSC Cryopreservation Kit (A2644601, Thermo Fisher Scientific) and recovered in stemflex media supplemented with 1X RevitaCell^TM supplement (A2644501, Thermo Fisher Scientific) according to the manufacturer’s recommendations. For cryopreservation, an ~80% confluency well containing hiPSCs were Versene clump dissociated as described above except the entire cell suspension was centrifuged at 200 × g for 4 mins using an Eppendorf Centrifuge 5810 (Hamburg, Germany) followed by media aspiration. 2 mL of chilled PSC Cryomedia was then added to the cell pellet dropwise, gently mixed with 1 mL of cryo suspension aliquoted to cryovials (430488, Corning^TM, New York, New York). Vials were then transferred to a Mr. Frosty™ freezing container and slowly frozen overnight at -80°C and then cryopreserved in the vapor phase of liquid nitrogen for long-term storage. For cryopreserved hiPSCs recovery, a vial was floated in a 37°C water bath until thawed and then transferred into a 15-mL conical tube. 10 mL of Stemflex media was added dropwise and then centrifuged. The cell pellet was then resuspended in 1 mL of Stemflex media supplemented 1X RevitaCell and transferred to one VTN-N-coated well of a 6-well plate containing 1 mL of Stemflex media supplemented with 1X RevitaCell. The next morning the well was renewed with 2 mL of Stemflex media and cultured as described above. Cell imaging was performed using an Olympus CKX41 Inverted Phase Contrast Microscope with a DP23M digital camera system and CellSens software (Evident, Tokyo, Japan). Micrographs were collected at 100× using the automatic settings.

Directed cardiomyocyte differentiation

Gibco Episomal hiPSCs were differentiated using the Gibco PSC Cardiomyocyte Differentiation Kit (A29212-01, Thermo Fisher Scientific) according to the manufacturer’s recommendations. Briefly, 70-85% confluent hiPSC cultures were dissociated into single cells usingTrypLE^TM Select Enzyme reagent (12563029, ThermoFisher Scientific), triturated, and resuspended in Stemflex media supplemented with 1X RevitaCell. Each well of a 6-well plate containing hiPSCs is rinsed with 2 mL of DPBS, 1 mL of TrypLE^TM Select Enzyme reagent (12563029, ThermoFisher Scientific) solution is added to each plate, incubated at 37°C for approximately 5 minutes until cells have detached from the well, and inactivated by adding 1 mL of Stemflex media. Cell suspensions are combined into a single 50-mL conical tube, pelleted by centrifuged at 100 × g for 5 minutes, and resuspended with 6 mL of Stemflex media supplemented with 1X RevitaCell. A 500 μL aliquot of cell suspension was removed to determine live cell counts and cell viability using the Bio-RAD TC20 automated cell counter and cell counting kit with trypan blue (Hercules, CA, USA). 20 μL of this suspension was combined with 20 μL of trypan blue solution with 10 μL transferred into one well of the counting slide. All cell counts were completed in duplicate. A cell suspension of 6 × 10⁴ cells/mL was prepared in Stemflex media supplemented with 1X RevitaCell. One mL of cell suspension per well (6 × 10⁴ cells/well) was plated onto a 12-well tissue plate coated with Geltrex^TM LDEV-Free, hESC-Qualified, Reduced Growth Factor Basement Membrane Matrix (A14133, Thermo Fisher Scientific) after aspiration of the Geltrex solution. Geltrex-coated plates were prepared according to manufacturer’s recommendation. First, the concentrated Geltrex stock solution (15 mg/mL) was diluted in an equal volume of Dulbecco’s Modified Eagle Medium (DMEM) (11965092, Thermo Fisher Scientific) on ice to prevent gelling. This 180 μL of this solution was then added to a cold 15 mL conical containing 9 mL of DMEM on ice and mixed gently. 750 μL of this solution was added to each well of a 12-well plate and incubated at 37°C for 1 hour prior to aspiration and plating of the cells.

Twelve well plates were renewed with 1 mL of StemFlex medium each day until they density approached 60-80% confluency. Upon reaching the appropriate density (day 0), the wells were renewed with 1 mL of Cardiomyocyte Differentiation Medium A incubated for two days. On day 2, the wells were renewed with 1mL of Cardiomyocyte Differentiation Medium B. On days 6 and 8, the wells were renewed with 1 mL of Cardiomyocyte Maintenance Medium every two days. Small areas of beating cardiomyocytes were observed on day 6. Two days later (day 8), the cultures appeared to have undergone extensive remodeling.

Total RNA isolation and cDNA synthesis

Total RNA was isolated on the day of cardiomyocyte differentiation (day 0) and subsequently on days 2, 4, and 8. For each cardiomyocyte differentiation time point, three separate wells were harvested so that downstream gene expression analysis could be performed in triplicate. Cells were directly lysed by adding 0.3 mL TRIzol RNA isolation reagent (15596016, Thermo Fisher Scientific) to each well, vigorously rocking the plate for 5 minutes, and then freezing the lysate at -80.0 °C. Total RNA was isolated using the Direct-Zol^TM RNA MicroPrep Kit (R2061, Zymo Research, Irvine, CA) according to the manufacturer’s recommendations, which included in-column DNase I treatment to remove genomic DNA. Total RNA was eluted in 10 μL of RNase/DNase-free water. One microliter was used to quantify the total RNA concentration and quality using a ThermoFisher Scientific NanoDrop™ One Microvolume UV-Vis Spectrophotometer. cDNA was prepared using the SuperScript^TM IV VILO^TM Master Mix with ezDNase^TM Kit (11756050, Thermo Fisher Scientific) using 0.2 μg of total RNA according to the manufacturer’s recommendations, including ezDNase^TM pre-treatment to remove genomic DNA. cDNA synthesis reactions (20 μL) were diluted 1:10 to a final volume of 200 μL using RNase/DNase-free water to obtain a concentration of 1 ng/μL total RNA.

Reverse transcription quantitative PCR (RT-qPCR)

RT-qPCR was used to quantify the relative transcript expression between cardiomyocyte differentiation time points. Oligonucleotide primers were designed using Primer3 with standard parameters and amplicon product sizes ranging from 75 to 300 nucleotides, ensuring that each primer spanned at least one intron to ensure cDNA-specific application, and were produced by Integrated DNA Technologies using standard synthesis procedures (IDT, Coralville, IA).¹⁹ Table 1 lists the design and sequence information of each primer set. qPCR was performed using the PowerUp^TM SYBR^TM Green Master Mix (100029283, Thermo Fisher Scientific) system in an Applied Biosystems^TM QuantStudio^TM 3 Real-Time PCR System (Thermo Fisher Scientific). qPCR reactions (10 μL qPCR reactions containing 2 μL (2 ng) of the reverse-transcribed total RNA, 1 pmol each of the relevant forward and reverse primers, and 5 μL of the 2X master mix PowerUp^TM SYBR^TM Green Master Mix (100029283, Thermo Fisher Scientific). qPCR reactions were performed in technical triplicates to evaluate the reproducibility of the quantification cycle (Cq) for each amplified condition. QuantStudio^TM Design and Analysis Software VERSION 1.4.3 (Thermo Fisher Scientific) was used to run the experimental method, analyze, and determine the Cq values. This software is downloadable from https://www.thermofisher.com/us/en/home/technical-resources/software-downloads/quantstudio-3-5-real-time-pcr-systems.html. The manufacturer’s standard cycling method consisted of a UDG activation step at 50.0 °C for 2 min, DNA polymerase activation step at 95.0 °C for 2 min, PCR amplification stage (40 cycles × 95.0 °C for 15 s, 58.0 °C, 72.0 °C for 1 min), and a dissociation curve stage (60.0–95.0° C at 0.1 °C/s) was used for all qPCR reactions. The qPCR experimental parameters utilized the standard curve experimental type setting, which determined the Cq value using both Auto-Threshold and Auto-Baseline analysis functions. The qPCR results were exported to Microsoft^TM Excel^TM for relative gene expression quantification. Relative gene expression was calculated using the 2^-ΔΔCT method as described previously.²⁰ The average standard deviation for all samples for the qPCR triplicate reactions was 0.159 Cq; the range between triplicate reactions for each sample varied from 0.003 to 0.544, demonstrating reproducibility near a standard deviation of 0.5 Cq. At least three No Template Controls (NTC) qPCR for each primer set were included in each experiment, validating the absence of template-independent amplification or template contamination. Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) was used as a reference gene to normalize targets because it demonstrated a low Cq variance of 20.19 ± 0.73 (Mean ±).²⁰^,²¹

Table 1. Gene-specific primers for RT-qPCR analysis of GIBCO hiPSC cardiomyocyte differentiation.

This table provides gene information, oligonucleotide sequences, and amplicon lengths for each gene-specific RT-qPCR experiment.

Ensembl Gene ID	Gene name	Gene symbol	Left primer sequence (5'-3')	Right primer sequence (5'-3')	Amplicon size
ENSG00000077522	Actinin Alpha 2	ACTN2	AGGAGGCCCTTGTCTTTGAC	CGCATCTCTCGTCAGGATCT	134
ENSG00000134508	Cdk5 And Abl Enzyme Substrate 1	CABLES1	AACAGGCGAGAACTGATTGC	GGCCAGTGCTAGGAACTCTG	125
ENSG00000113722	Caudal Type Homeobox 1	CDX1	CAGCCGTTACATCACAATCC	TGTTGCTGCTGCTGTTTCTT	138
ENSG00000165556	Caudal Type Homeobox 2	CDX2	CAGTCGCTACATCACCATCC	TTTCCTCTCCTTTGCTCTGC	109
ENSG00000164309	Cardiomyopathy Associated 5	CMYA5	AGCTTTGGTGAGAGGAGACG	TATGCTGGGCAGTCTGTGAC	110
ENSG00000129170	Cysteine And Glycine Rich Protein 3	CSRP3	GTGCCATCTGTGGGAAGAGT	TTGTGTAAGGCCTCCAAACC	125
ENSG00000137090	Doublesex And Mab-3 Related Transcription Factor 1	DMRT1	GAGGATGCTCCCTCTTACCC	AGCACTGCCTTTGTGCTCTT	125
ENSG00000174804	Frizzled Class Receptor 4	FZD4	AACTTTCACACCGCTCATCC	CCAATGGGGATGTTGATCTT	105
ENSG00000111640	Glyceraldehyde-3-Phosphate Dehydrogenase	GAPDH	ATGTTCGTCATGGGTGTGAA	ATGTTCGTCATGGGTGTGAA	239
ENSG00000285109	GATA Binding Protein 4	GATA4	GGAGATGCGTCCCATCAAGA	CTTGTGGGGAGAGCTTCAGG	149
ENSG00000110422	Homeodomain Interacting Protein Kinase 3	HIPK3	ATGGCTTAAATGCCGATGAG	TTGCTGACGAGTACCCACTG	119
ENSG00000144730	Interleukin 17 Receptor D	IL17RD	CCAGTGTTTCGCCTACTTCC	GGACTCGTGGATCTTCTGGA	124
ENSG00000016082	ISL LIM Homeobox 1	ISL1	GTAGAGATGACGGGCCTCAG	TTTCCAAGGTGGCTGGTAAC	234
ENSG00000081189	Myocyte Enhancer Factor 2C	MEF2C	AGCGCTCTTCACCTTGGTTC	GGTTCTGACTTGATGTTGAGGC	164
ENSG00000185155	Mix Paired-Like Homeobox	MIXL1	CTGCTGGAGCTCGTCTTCC	ATTTCCCACTCTGACGCCG	169
ENSG00000146147	Muscular LMNA Interacting Protein	MLIP	TCTCTCCTCACCATCTTCTACAG	GGGTTGGGTTCATAGACTTCC	123
ENSG00000163132	Msh Homeobox 1	MSX1	AGAAGATGCGCTCGTCAAAG	GGCTTACGGTTCGTCTTGTG	144
ENSG00000197616	Myosin Heavy Chain 6	MYH6	CATCAAGGAGCTCACCTACC	TAGGCCTTGACCTTCAGTTG	96
ENSG00000092054	Myosin Heavy Chain 7	MYH7	CACCAACCTGTCCAAGTTCC	GGGCTGAGCAGATCAAGATG	167
ENSG00000101605	Myomesin 1	MYOM1	GAGGTCTCCCAGACGTGGT	TCGAACTTGAGGTTGCAGTG	145
ENSG00000111704	Nanog Homeobox	NANOG	ATACCTCAGCCTCCAGCAGA	TGCGTCACACCATTGCTATT	147
ENSG00000162614	Nexilin F-Actin Binding Protein	NEXN	CGACCATCTCTCAAGGAAGC	CAGCTTGCTTCTTTCGGTTT	154
ENSG00000183072	NK2 Homeobox 5	NKX2-5	CAAGTGTGCGTCTGCCTTTC	GTTGTCCGCCTCTGTCTTCT	148
ENSG00000162878	Protein Kinase Domain Containing, Cytoplasmic	PKDCC	AGAACTCCACGGCAAGCA	CAGCCAGGTTGAACACTGAA	129
ENSG00000206454	POU Class 5 Homeobox 1	POU5F1	AGTGAGAGGCAACCTGGAGA	TCTCGTTGTGCATAGTCGCT	168
ENSG00000077157	Protein Phosphatase 1 Regulatory Subunit 12B	PPP1R12B	TCTGACCGATCATCAGTGCT	TTCATTTCTTCCTCCATTTCTGA	86
ENSG00000089225	T-Box Transcription Factor 5	TBX5	TGCCTCCACCCAACCCATA	GCTGTGGATAGCTAGAGCGG	168
ENSG00000164458	T-Box Transcription Factor T	TBXT	AATTGGTCCAGCCTTGGAAT	CGTTGCTCACAGACCACAG	112
ENSG00000118194	Troponin T2, Cardiac Type	TNNT2	GGCTTTGTCCAACATGATGC	TCGTTGATCCTGTTTCGGAG	282
ENSG00000116729	Wnt Ligand Secretion Mediator	WLS	TCCTCATGCTTATCACCTTGG	GCCTGTGAAAAAGGCACTGT	128
ENSG00000015153	YY1 Associated Factor 2	YAF2	AGTCACCGCCTGCATCTAGT	TCCTCTGGGTGAAGATGACC	107
ENSG00000169554	Zinc Finger E-Box Binding Homeobox 2	ZEB2	TGGCATGTATGCATGTGACT	AGTGAAGCCTTGAGTGCTCG	161
ENSG00000165655	Zinc Finger Protein 503	ZNF503	GAACTGGCCACATTTTGCAC	CCGATCTGCGAACATGTTTG	124

Identification of key marker genes and expression patterns

Figures 1 and 2 describe the small-molecule-directed cardiomyocyte differentiation used in these experiments¹¹^,¹³ our own study. Here, we highlight the key genes involved in developing our list of key marker genes and their expression patterns. First, the procedure involves seeding, recovering, and growing hiPSCs dissociated into single cells in pluripotency culture media.

The initial time point (day 0) of these cells in pluripotency culture media prior to subsequent media changes represents proliferative hiPSCs expressing the expected pluripotency markers POU Class 5 Homeobox 1 (POU5F1, also known as OCT4) and Nanog Homeobox (NANOG).²²^–²⁴ On day 0, hiPSC media was replaced with cardiomyocyte differentiation media containing a GSK3 inhibitor, such as CHIR99021, which activates the canonical WNT signaling pathway and initiates mesodermal specification, as identified by T-Box Transcription Factor T (TBXT, also known as BRACHYURY [T]) and the early cardiac marker GATA Binding Protein 4 (GATA4).²⁵^,²⁶ GATA4 is necessary for mesoendodermal development during gastrulation and is then expressed within the mesoderm and later in the cardiac lineage, in which it directs cardiac development and postnatal cardiac function.²⁷^–²⁹ The divergence of day 2 induction of TBXT and GATA4 gene expression signatures enabled the classification of transient mesodermal expression from early cardiomyocyte gene signatures because GATA4 is robustly activated during mesodermal differentiation and then persists throughout cardiomyocyte differentiation (Figures 1B and 2B). Therefore, TBXT expression identifies transient mesodermal expression describing the mesodermal gene signature, whereas GATA4 represents the earliest state of cardiomyocyte gene induction, describing the early cardiomyocyte gene signature (Figure 1).

After 48-72 hours of incubation (days 2 or 3), the GSK3 inhibitor-containing media was replaced with media containing canonical WNT signaling inhibitors, such as IWP2 or IWP4 (Figure 1B). Canonical WNT inhibition induces the expression of several important progenitor genes that mark the cardiac lineages of early embryos.¹²^,¹³ The cardiac progenitor markers ISL LIM Homeobox 1 (ISL1), Myocyte Enhancer Factor 2C (MEF2C), NK2 Homeobox 5 (NKX2-5), and T-Box Transcription Factor 5 (TBX5) exhibit unique expression profiles during later cardiomyocyte differentiation stages (Figure 1 and 2).³⁰^–³³ Although these four genes are robustly expressed at high levels at the end of differentiation (days 8, 15, or 30, see Figure 1), the initial detection of transcripts occurs earlier in differentiation at lower levels.¹²^,¹³ For example, while ISL1 transcripts in the GSE85331 (Liu et al., 2017) experiment demonstrated the greatest change in expression from day 2 to day 4, NKX2-5, MEF2C, and TBX5 genes were not maximally expressed until day 30 (Figure 6). A similar result was identified via RT-qPCR analysis of Gibco hiPSC cardiomyocyte differentiation (Figure 2B). Therefore, these progenitor markers represent two different gene expression signatures: a mid-cardiomyocyte gene expression signature with a maximum change in expression at day 4 (ISL1) or a late-cardiomyocyte gene expression signature on day 8 (MEF2C, NKX2-5, and TBX5) (Figures 1 and 2).

Figure 3. Gene expression signatures of clustered genes.

Median gene expression patterns of the five cardiomyocyte gene expression signatures were determined by LOESS, median, and basic analysis for GSE122380 (Strober et al., 2019) and GSE85331 (Liu et al., 2017). Each graph plots the median expression values and the 95% confidence interval error bars for pluripotency, mesodermal, early-cardiomyocyte, mid-cardiomyocyte, and late-cardiomyocyte-grouped genes.

Figure 4. RT-qPCR analysis of candidate genes.

RT-qPCR analysis of candidate gene expression signatures during directed cardiomyocyte differentiation in Gibco hiPSCs. The plots are from the present study, with samples taken on Days 0, 2, 4, and 8. The grouping of genes/plots was based on the basic/descriptive gene analysis of the GSE85331 dataset (i.e., CABLES1 is listed under mesodermal because it was identified using the basic/descriptive analysis of GSE85331). RT-qPCR profiling approximates the expected gene expression signature for each stage of differentiation in this single analysis. However, discrepancies in these signature calls exist when these results are compared to those of some of the classification methods discussed in the article.

Figure 5. Model fits and RT-qPCR of two sample genes, ZEB2 and ACTN2.

Comparison of correlation curve fitting models for GSE122380 and GSE85331 to RT-qPCR analysis of ACTN2 (a late-cardiomyocyte signature call) and ZEB2 (an early cardiomyocyte signature call). These two genes have been used as examples in the text to describe our classification methods. To evaluate the LOESS (left panel) and median curve-fitting correlation methods (middle panel), the log2 expression values for each time point were plotted for both datasets, and the line overlay identified the respective curve-fitting model. The additional time points located within GSE122380 (16 time points) enhanced the gene expression curve compared with GSE85331 (four time points). Although both methods generated similar curves for each gene, the LOESS function provided additional smoothing to the plot. The right panel shows log2 relative gene expression changes during cardiomyocyte differentiation of GIBCO hiPSCs using RT-qPCR. Note the gene expression signature similarities among the three analytical methods.

Figure 6. Gene expression profiles of mid-cardiomyocyte marker candidates obtained from the GSE85331 RNA-seq experiment.

Graphs of day 0, 2, 4, and 30 log2 expression values for ISL1, MEF2C, NKX2-5, and TBX5. Each line represents an individual cell line and replicates. Note that ISL1 and TBX5 demonstrated marked expression at day 4, whereas MEF2C and NXK2-5 exhibited baseline expression until day 30. Therefore, ISL1 and TBX5 were the best candidates for representing the mid-cardiomyocyte gene expression signature.

After 48 h of cardiomyocyte progenitor induction (day 4-5), a cardiomyocyte-maintenance medium containing B-27 supplement and insulin was added, which supports cardiomyocyte differentiation and viability. Subsequently, key contractile cardiomyocyte markers, such as MYH6, MYH7, and TNNT2, are highly expressed.²⁴^,³⁴ After the onset of contraction on approximately day 6-7 (Figures 1 and 2), the differentiated cardiomyocytes underwent remodeling into syncytia, with rhythmic contraction apparent at the gross level (Figure 2A). The expression patterns of the genes described above were utilized as key cardiac gene expression signatures for the classification of differentially expressed genes during cardiomyocyte differentiation in the preceding experiments (Figures 1 and 2).

Methods

Here, we describe the main features and workflow of devGEA. We look at these features in more detail in the Results and Discussion section where we use the software to analyze the prior datasets (Refs. 17 and 18).

Implementation

devGEA is a shiny web application written completely in the R programming language. We also provide a self-contained R markdown file with the analysis that is contained in the Results and Discussion section.

Data import, normalization, and plots. The devGEA shiny web application can import data as raw counts or already normalized. Data files can be imported by uploading a file or specifying a GEO ID. The web application allows for normalization using limma. Data can be scaled by taking the logarithm of the data or scaling it to have a mean of 0 and standard deviation of 1. Data can be filtered based on a raw count threshold and whether the gene information is missing. The data and results can be viewed in an interactive table format so that the user can verify the data import, analytical methods, and results. The table can be searched using gene symbols or identifiers, or by setting ranges for columns. Gene selection by mouse clicks provides a basic gene expression plot.

Additional metadata for the samples (e.g., time point, cell line) can be specified by importing a csv/tsv file, using a regular expression based on the column headings, or manually entered within the app. Once the sample metadata have been imported, the sample groupings used for analysis are specified (e.g., groups based on time point or cell line). Additional metadata for genes (e.g., gene symbols or names) can be imported automatically from Ensembl BioMart³⁵ through the app.

Additional data normalization plots were provided to evaluate the quality of the dataset. Gene expression box plots or histograms for each sample were used to identify outliers. The PCA of the samples or groups provides a linear representation of variation within the data set. Samples identified as outliers or with poor gene expression preparation can then be marked for exclusion, and the data can be reanalyzed.

Analysis. Once a dataset is loaded, the user can choose to add an analysis. For a basic analysis, the software computes for each gene – medians of each time point, maximum and minimum time points based on expression, and other simple descriptive statistics of the data. These results are displayed within a table and can be exported; results can be filtered based on fold change, maximum/minimum time point, or time point of maximum change. For a correlation analysis, the user specifies either a set of marker genes or a set of gene expression patterns. For each gene in the dataset, the software computes the correlation of the gene with each marker gene (or given gene expression pattern) and then classifies each gene based on the highest correlation. The results of this analysis are the correlations and classifications for each of the genes. We discuss this analysis technique further in the Results and Discussion section. For an ANOVA analysis, the user specifies a sample property (e.g., time point or sample group) to define a linear model, and the software computes adjusted p values for each gene based on this property. The results of the analysis are the adjusted p values for each gene. For a subset analysis, the software combines results from previous analyses within the app (i.e., by taking intersections or unions of result sets). By default, the software takes the intersection of each successive analysis (i.e., first filtering based on fold change with basic analysis, and then filtering based on adjusted p values with ANOVA analysis, and so forth).

Saving and exporting results. The entire project can be saved as an R data file and then later loaded into the software again. Result sets can be exported in csv format. Plots can be exported as svg files.

We note that the main novelty in devGEA is the correlation analysis, which is a novel form of analysis in this setting (as far as we are aware). We have built the standard steps of differential gene expression analysis around the correlation analysis so that users with little or no programming experience can use this form of analysis themselves.

Operation

devGEA is written completely within the R programming. The code has been tested on macOS 13.4, Windows 10, and Ubuntu Linux.

The shiny web application can be downloaded and run locally or can be run from our server. To run locally, the user runs the file app. R that is within the shiny_app directory of our gitlab repository (https://gitlab.com/jkinne/devgea).

When the shiny web application is run from our server (https://cs.indstate.edu/devgea), the user does not require to install any software on their own system. We note that we use ShinyProxy³⁶ as the backend on our server; we provide example configuration files within the shinyproxy directory of our gitlab repository.

To run the R markdown file that contains the analysis in this paper, the user should knit the file devGEA. Rmd that is in the gitlab repository.

Results and discussion

Here, we describe our analysis of the prior datasets (Refs. 17 and 18). We focus our discussion on the analysis provided within the Rmd. After describing our analysis, we discuss validation with our study results.

Data import, normalization, filtering, scaling

Many researchers argue for using transcripts per kilobase million (TPM) as the preferred normalization for RNA seq count data (see Ref. 37 which introduced the term, and more recently³⁸ which evaluated different normalization methods). The National Center for Biotechnology Information (NCBI) has recently begun to provide both raw counts and normalized files for all GEO datasets. We used the TPM-normalized data available from NCBI as the input for our analysis. The NCBI database also provides supporting gene annotations.

We filtered the data so that we only considered protein-coding genes, and we also only considered genes that were non-zero in at least two samples and had a total TPM sum of at least 10.

We extracted metadata about the samples using series matrix files that can be downloaded from the NCBI GEO. The design matrix contains the following for each sample: cell line, time point (day #), replicate #, sample ID, whether to exclude the sample from analysis, and sample name for use on plots.

As is standard for biological data, we take the logarithm (base 2) and use these values for further analysis.

Evaluating cell lines - Ref. 18 used a Gaussian process model to identify clusters of cell lines that share gene clusters. One of the cell line clusters was noted to have less robust expression of cardiac troponin. Thus, we removed these cell lines from the analysis. Many approaches can be used to cluster cell lines and identify outliers. We considered an approach similar to Ref. 39, but decided to stick with the clustering of cell lines identified in Ref. 18.

Combined analysis - We considered whether the two datasets could be combined and analyzed together.⁴⁰ lists a number of potential problems when attempting to combine datasets from separate experiments or laboratories. In this case, it was determined that the datasets should not be combined. As can be seen from the principal components analysis (PCA) plot in Figure 7, the samples from the datasets are well separated on the PCA plot. Notably, Refs. 18 and 17 used different laboratory methods to filter out ribosomal RNA -¹⁷ used rRNA depletion, while¹⁸ used poly(A)+ selection. This is one of the potential inconsistencies that Ref. 40 warns.

Figure 7. Principal components analysis of combined GSE122380/GSE85331.

A principal component analysis plot combining the datasets GSE122380 and GSE85331. We considered whether the two datasets could be combined in a single analysis. On the left, each sample is colored by day. Note that the samples are clustered by day, going from top to bottom and left to right. On the right, each sample is colored according to the experiment, and we see that GSE122380 and GSE85331 are clearly separated on the PCA plot.

Analysis overview

We identify and cluster significant genes using a variety of methods.

• Linear model ANOVA was performed using a linear model over the key time points to obtain adjusted p-values.
• Basic/descriptive statistical analysis - using simple conditions related to the highest time points of expression and the largest change in expression.
• Correlation analysis using the correlation of each gene with key marker genes or expression patterns of interest.

We suggest that our basic analysis and correlation analysis as types of analysis should be considered when known patterns of interest or marker genes are known. Our precise method of using correlation to cluster is novel (to our knowledge at the time of writing), although it is well known that using correlation is preferred for computing distances with biological data (see e.g., Ref. 41). These methods are discussed in detail in the following sections.

Some parts of our analysis looked at gene expression at only the key time points as described earlier: days 0, 2, 4, and 30 in Ref. 17 and days 0, 3, 5, and 15 in Ref. 18. Other parts of the analysis are time-series methods that use all the available time points.

Linear model ANOVA analysis

For each gene, we used a linear model over the four key time points. The resulting adjusted p-values were used to indicate if a gene had a statistically significant difference between at least two of the key time points. This is the standard step in the analysis process.

Selection of a priority cardiomyocyte differentiation gene list

Differential gene expression analysis described above with standard cutoff for adjusted p value of 0.05 and log2 fold change of at least 1 identified 8634 genes within the GSE85331 dataset and 7117 genes within the GSE122380 dataset that had a significant difference among the four key time points, with a total of 5241 genes common between both lists. We will further filter by requiring that each gene has a correlation of at least 0.75, one of our key marker genes (POU5F1 (pluripotency), TBXT (mesodermal), GATA4 (early-cardiomyocyte), TBX5 (mid-cardiomyocyte), and TNNT2 (late-cardiomyocyte)), using the correlation analysis that is described later. This provided a list of 3259 genes for further investigation.

Table 2 and Figure 5 present the main analysis results for two example genes: ZEB2 and ACTN2. As indicated in the table, the adjusted p-values for both genes were 0 for both datasets. Both genes also passed the log2 fold change criteria for both datasets, and had a correlation of at least 0.75 for their closest matches, for the correlation analysis. Therefore, both genes were in the 3259 gene set.

Table 2. Analysis summary for ZEB2 and ACTN2.

A summary of the analysis of the two genes is provided.

ZEB2	gse122380	gse85331	ACTN2	gse122380	gse85331
p_adj_day	0	0	p_adj_day	0	0
min day	0	0	min day	3	0
max day	15	4	max day	15	30
FC	4.43	5.59	FC	7.38	7.51
delta max day	3	2	delta max day	15	30
delta max direction	1	1	delta max direction	1	1
class (descriptive)	early	early	class (descriptive)	late	late
loess class	early	early	loess class	late	mid
loess corr	0.97	0.94	loess corr	0.97	0.99
loess_group_2	mid	meso	loess_group_2	mid	late
loess_corr_2	0.75	0.47	loess_corr_2	0.95	0.96
loess_group_3	late	mid	loess_group_3	early	early
loess_corr_3	0.70	0.29	loess_corr_3	0.74	0.65
loess_group_4	meso	late	loess_group_4	meso	meso
loess_corr_4	-0.58	0.09	loess_corr_4	-0.83	-0.46
loess_group_5	pluri	pluri	loess_group_5	pluri	pluri
loess_corr_5	-0.78	-0.50	loess_corr_5	-0.96	-0.98
medians class	early	early	medians class	late	mid
medians corr	0.94	0.96	medians corr	0.96	0.99
day_medians_group_2	mid	meso	day_medians_group_2	mid	late
day_medians_corr_2	0.66	0.47	day_medians_corr_2	0.93	0.95
day_medians_group_3	late	mid	day_medians_group_3	early	early
day_medians_corr_3	0.61	0.37	day_medians_corr_3	0.70	0.59
day_medians_group_4	meso	late	day_medians_group_4	meso	meso
day_medians_corr_4	-0.28	0.04	day_medians_corr_4	-0.65	-0.46
day_medians_group_5	pluri	pluri	day_medians_group_5	pluri	pluri
day_medians_corr_5	-0.67	-0.47	day_medians_corr_5	-0.96	-0.99

The expected classification for ZEB2 is early, and each analysis method (in bold) classifies ZEB2 as early. The expected classification for ACTN2 is late, but some classification methods classify ACTN2 as mid-cardiac.

The class (descriptive) is based on the time points of the maximum expression and the time point at which there is a maximum change. For gse122380, the major time points used for descriptive analysis were days 0, 3, 5, and 15; for gse85331, the major time points were days 0, 2, 4, and 30. ZEB2 is classified as early because it has the maximum change going into the second major time point and its maximum expression in one of the last two time points. ACTN2 is classified as late because it has the maximum change at the last major time point and has the maximum expression at the last major time point.

The loess class was based on the correlation of the loess fit of the gene with the loess fits of the marker genes for each class (pluri - POU5F1, meso-TBXT, early - GATA4, mid - TBX5, late - TNNT2). The median class is based on the correlation of the day median of the gene with the day median of the marker genes. ZEB2 was classified as early in each of these methods with both datasets because its correlation was closer to GATA4 than to the other marker genes. For example, for gse122380 and the loess fit, the loess fit of ZEB2 was 0.96 with GATA4, and the second highest correlation was with the mid-cardiac marker gene TBX5 (0.75). ACTN2 is classified as late according to the loess and day median fit for gse122380, but it is classified as mid-cardiac for both fits for gse85331. For example, for gse85331 and the median fit, ACTN2 had a correlation of 0.99 with the mid-cardiac gene GATA4, while it had a correlation of 0.95 with the late marker gene TNNT2. Note that ACTN2 has a high correlation with both, and this is also the case for many genes that are classified as either mid- or late-cardiac.

Basic/descriptive statistical analysis

Small-molecule cardiomyocyte differentiation is an easy and effective procedure involving three media changes that allows transcriptomic profiling throughout differentiation. Coinciding with each medium change, subsequent gene expression changes should proceed in a straightforward manner, representing the four major stages of cardiomyocyte differentiation. Therefore, the following rules can classify the major gene expression signatures according to the day of maximum expression (Day_Max) and the day of maximum difference in the time series (Day_{Max-difference}) (Figures 1 and 2).

• Pluripotency – Day_Max occurring on day 0 similar to POU5F1 and NANOG expression.
• Mesodermal – Day_Max at day 2/3 and maximum difference centered on day 2/3 (either positive Day_{Max-difference} occurring on day 2/3, or negative Day_{Max-difference} occurring on day 4/5), similar to TBXT and MIXL1 expression.
• Early-cardiomyocyte – Day_Max occurred later in differentiation on days 4/5 or 8/15/30 since expression persisted throughout cardiomyocyte differentiation and a positive Day_{Max-difference} on day 2/3, similar to GATA4 expression.
• Mid-cardiomyocyte – Day_Max occurred on days 4/5 or 8/15/30 and a positive Day_{Max-difference} was observed on day 4/5, similar to ISL1 expression.
• Late-cardiomyocyte – Day_Max occurring on days 8/15/30 and positive Day_{Max-difference} on day 8/15/30 gene, similar to TNNT2 expression.

Consider the example genes, ZEB2 and ACTN2. The middle plots in Figure 5 show the median values at each time point. For GSE85331, ZEB2 had its maximum value on day 4 and the maximum difference in expression on day 2. ZEB2 was thus classified as an early cardiomyocyte in GSE85331. For GSE122380, we analyzed a subset of time points: days 0, 3, 5, and 15. Among these time points, the maximum expression of ZEB2 was at day 15, and the maximum difference in expression increased from day 0 to day 3. ZEB2 was thus classified as an early cardiomyocyte for GSE122380. For ACTN2 in GSE85331, the maximum expression was observed at day 30, with the maximum difference in expression occurring at day 30. In GSE122380, considering only days 0, 3, 5, and 15, the maximum expression of ACTN2 was on day 15, with the maximum difference going up from day 5 to day 30. ACTN2 was classified as a late cardio for both datasets. Table 2 contains a summary of the analysis for ZEB2 and ACTN2, with the “class (descriptive)” given for each matching our discussion here (early for ZEB2 and late for ACTN2).

Classification using simple descriptive statistical rules is understandable. If a gene of interest is misclassified via user-defined rules, it is easy to determine the cause. In contrast, sophisticated unsupervised clustering methods may generate unexpected gene cluster patterns that require additional filtering or parameter modifications to generate results usable by the investigator. We propose that the simple basic analysis described here may aid investigators in data-exploration tasks.

The basic descriptive analysis of GSE85331 and GSE122380 datasets identified 1711 genes within the 3259 gene list that possessed identical signature calls for this basic/descriptive analysis for the POU5F1 (pluripotency), TBXT (mesodermal), GATA4 (early-cardiomyocyte), TBX5 (mid-cardiomyocyte), and TNNT2 (late-cardiomyocyte) classifications in this basic analysis. As indicated in Table 2, the example genes ZEB2 and ACTN2 are given the same “class (descriptive)” for both datasets (early for ZEB2 and late for ACTN2); therefore, both are on the 1711 gene list. The median expression patterns of each gene signature classification for descriptive analysis of both datasets are shown in Figure 3. These cardiac gene signatures approximate previous investigations of cardiomyocyte differentiation, demonstrating a robust method for precisely classifying gene expression patterns in a developmental system.¹²^,¹³^,¹⁵

devGEA shiny web application: The shiny web application computes the basic statistical properties of a dataset and allows the user to define clustering groups and rules for the clustering groups. The shiny web application can thus be used to easily perform the basic/descriptive analysis discussed here, as well as filtering based on p values and fold changes without requiring any programming or software installed locally.

Experimental confirmation

An independent experimental analysis of cardiomyocyte differentiation using a commercially available hiPSC cell line was performed to evaluate the success of gene expression signature calls and to validate the candidate cardiac genes identified in this analysis. The episomal GIBCO hiPSC line was obtained, expanded, and differentiated into cardiomyocytes using commercially available reagents. As expected, hiPSC differentiation proceeded as expected: on day 0, independent hiPSC colonies were visible; from days 2 to 4, differentiating cells had proliferated to confluency with detaching and dead cells visible in the culture dish; on day 6, small contracting areas within the cellular lawn were identifiable and proceeded to undergo remodeling to large rhythmically contracting web-like syncytia (Figure 2A). At equivalent time points to the GSE85331 experiment, total RNA was extracted, and RT-qPCR was used to validate relative gene expression across the experiment. The relative expression of each gene was determined by the 2^-ΔΔCT method using Glyceraldehyde-3-Phosphate Dehydrogenase (GAPDH) expression for normalization. The set of 32 genes tested in this way are listed in Figures 2 and 3 (RT-qPCR results) and Table 3 (summary of our analysis of these genes in the prior datasets). We now describe the set of 32 genes, their RT-qPCR results, and the analysis of the prior datasets.

Table 3. Summary of analysis for 32 genes from ISU gene set.

Summary of the analysis of the 32 genes for which we performed qPCR measurements. The results are given for each of the two datasets under which we analyzed the genes. First, the expected classification of the gene, gene id, and gene symbol are given. Then, for each dataset (gse122380, gse85311), the following are given: adjusted p values, log2 fold change between the highest and lowest major time point, classification by basic/descriptive analysis, classification by loess fit, correlation of loess fit to classified marker gene (pluri - POU5F1, meso-TBXT, early - GATA4, mid - TBX5, late - TNNT2), classification by day median correlation, and correlation of day median correlation classification.

		GSE122380							GSE85331							Total
Symbol	Expected	p (adj)	FC	basic class	loess class	corr	medians class	corr	p (adj)	FC	basic class	loess class	corr	medians class	corr	#predicted - expected	pass filtering
ACTN2	late	0.0e+00	7.38	late	late	0.97	late	0.96	0.0e+00	7.50	late	mid	0.99	mid	0.99	4	yes
CABLES1	meso	2.5e-05	0.85	pluri	meso	0.93	pluri	0.86	1.1e-09	2.10	meso	meso	0.96	meso	0.96	4	no
CDX1	meso	2.7e-11	3.75	meso	meso	0.99	meso	0.95	0.0e+00	7.00	meso	meso	0.99	meso	0.99	6	yes
CDX2	meso	3.5e-09	3.31	meso	meso	0.97	meso	0.96	0.0e+00	5.30	meso	meso	1.00	meso	1.00	6	yes
CMYA5	late	7.3e-10	3.59	late	late	0.99	late	0.98	0.0e+00	6.30	late	mid	0.99	mid	0.99	4	yes
CSRP3	late	0.0e+00	7.65	late	late	0.94	late	0.92	0.0e+00	9.50	late	mid	0.98	mid	0.98	4	yes
DMRT1	meso	3.9e-03	0.26	meso	meso	0.61	meso	0.92	1.7e-07	3.30	meso	meso	0.95	meso	0.96	6	no
FZD4	mid	0.0e+00	3.12	mid	early	0.90	early	0.85	4.2e-11	2.10	mid	mid	0.75	mid	0.83	4	yes
GATA4	early	0.0e+00	7.44	early	early	1.00	early	1.00	5.2e-15	5.50	early	early	1.00	early	1.00	6	yes
HIPK3	late	0.0e+00	3.05	late	late	0.99	late	0.98	0.0e+00	2.80	late	mid	0.99	mid	0.99	4	yes
IL17RD	meso	1.5e-06	2.12	meso	meso	0.98	meso	0.90	5.0e-12	2.50	meso	meso	0.91	meso	0.90	6	yes
ISL1	mid;late	1.0e-11	4.40	early	early	0.95	early	0.91	1.1e-11	3.00	mid	early	0.77	mid	0.79	2	yes
MEF2C	mid;late	0.0e+00	4.54	mid	mid	0.98	mid	0.98	0.0e+00	4.80	late	late	0.99	late	0.98	6	yes
MIXL1	meso	0.0e+00	7.25	meso	meso	0.97	meso	0.97	0.0e+00	9.20	meso	meso	0.96	meso	0.96	6	yes
MLIP	late	0.0e+00	3.57	late	late	0.74	late	0.72	0.0e+00	5.60	late	mid	0.98	mid	0.98	4	no
MSX1	meso;early	0.0e+00	6.89	meso	early	0.38	meso	0.53	5.2e-15	7.20	meso	meso	0.75	early	0.78	6	no
MYH6	late	0.0e+00	8.69	late	late	1.00	late	0.99	0.0e+00	11.60	late	mid	0.99	mid	0.98	4	yes
MYH7	late	0.0e+00	9.23	late	late	0.99	late	0.98	0.0e+00	11.90	late	mid	0.98	mid	0.98	4	yes
MYOM1	late	0.0e+00	7.30	late	late	0.95	late	0.94	0.0e+00	8.10	late	mid	0.99	mid	0.98	4	yes
NANOG	pluri	0.0e+00	4.47	pluri	pluri	0.94	pluri	0.90	6.4e-11	6.70	pluri	pluri	0.97	pluri	0.96	6	yes
NEXN	late	7.1e-14	2.28	late	late	0.98	late	0.97	5.2e-15	2.80	late	mid	0.99	mid	0.99	4	yes
NKX2-5	mid;late	0.0e+00	7.30	late	late	1.00	late	0.99	0.0e+00	6.40	late	mid	0.99	mid	0.98	6	yes
PKDCC	mid;late	1.8e-03	1.24	mid	early	0.94	early	0.75	5.4e-11	3.30	mid	early	0.57	early	0.60	2	no
POU5F1	pluri	0.0e+00	8.31	pluri	pluri	1.00	pluri	1.00	0.0e+00	8.60	pluri	pluri	1.00	pluri	1.00	6	yes
PPP1R12B	late	0.0e+00	3.92	late	late	0.98	late	0.98	0.0e+00	5.10	late	mid	0.99	mid	0.99	4	yes
TBX5	mid	0.0e+00	5.30	mid	mid	1.00	mid	1.00	1.3e-12	4.00	late	mid	1.00	mid	1.00	5	yes
TBXT	meso	0.0e+00	5.47	meso	meso	1.00	meso	1.00	0.0e+00	9.20	meso	meso	1.00	meso	1.00	6	yes
TNNT2	late	0.0e+00	8.39	late	late	1.00	late	1.00	0.0e+00	11.70	late	late	1.00	late	1.00	6	yes
WLS	early	0.0e+00	5.99	early	early	0.89	early	0.83	0.0e+00	5.90	early	early	0.94	early	0.96	6	yes
YAF2	early	0.0e+00	3.10	early	early	0.96	early	0.94	0.0e+00	3.90	early	early	0.98	early	0.98	6	yes
ZEB2	early	0.0e+00	4.43	early	early	0.97	early	0.94	0.0e+00	5.60	early	early	0.94	early	0.96	6	yes
ZNF503	early	0.0e+00	5.83	early	early	0.99	early	0.98	4.6e-14	4.70	early	early	1.00	early	0.99	6	yes

POU5F1 and NANOG expression was identified to verify pluripotency and the subsequent downregulation of pluripotency factors during differentiation (Figures 1 and 2B). Ten key cardiomyocyte differentiation markers were selected for evaluation and compared with the 1711 gene descriptive analysis results (Figure 2B). All ten differentiation markers were identified in the 3259 descriptive analysis list, while all but three (ISL1, MEF2C, TBX5) were present in the 1711 gene list. While pluripotency and mesodermal and early cardiomyocyte expression patterns were comparable between the RT-qPCR and descriptive analysis results, several discrepancies were identified within the mid-cardiomyocyte and late-cardiomyocyte expression pattern groups. For example, ISL1 demonstrated robust activation at day 4 (a mid-cardiomyocyte signature) in the RT-qPCR data but was absent from the 1711 gene list, indicating variable expression between the two prior datasets. Interestingly, MEF2C and NKX2-5 levels determined via RT-qPCR were extremely low or undetectable until day 8, which is consistent with a late cardiomyocyte signature rather than a mid-cardiac signature (Figure 2B). Within the 3259 gene list, NKX2-5 was classified as a late-cardiomyocyte signature in both datasets, while the MEF2C classification was mismatched between mid- and late-cardiomyocyte signatures in the prior data sets. The mid-cardiac genes ISL1, MEF2C, and NKX2-5 are important cardiac regulatory genes necessary for cardiac lineage development, cardiomyocyte differentiation, and proliferation. This inconsistency may result from the delayed induction of key cardiac genetic regulatory genes during cardiomyocyte differentiation. Additionally, biological variation or experimental differences between hiPSC lines and/or experimental methods may be responsible for these discrepancies. Therefore, these expected mid-cardiomyocyte genes possess similar expression profiles to late-cardiomyocyte genes, such as sarcomeric TNNT2 which is highly expressed prior to contraction. Interestingly, NKX2-5 expression marks the cardiac mesoderm, which later gives rise to a linear heart tube in early mouse embryonic heart development.¹¹^,³⁰^,³³^,⁴² The later-than-expected expression of these genes may represent a lagging entrance of differentiating cells into the cardiac progenitor phase or alterations to the overall cardiogenic program.

Next, twenty genes with previously unvalidated gene expression patterns in hiPSC cardiomyocyte differentiation representing various cardiomyocyte differentiation signatures as called by the GSE85331 basic/descriptive analysis were selected (six mesodermal genes, four early-cardiomyocyte genes, two mid-cardiomyocyte genes, and eight late-cardiac genes) and investigated for reproducibility using RT-qPCR (Figure 3). RT-qPCR was performed on each gene to establish the gene expression signature pattern in the GIBCO hiPSC line. The RT-qPCR expression patterns for the 20 genes were classified according to the descriptive signature call in the GSE85331 dataset (Figure 3). RT-qPCR expression patterns approximately matched the descriptive signature classifications within the RNA-seq data sequence.

To increase the rigor of this analysis, the 32 RT-qPCR genes investigated were filtered, requiring descriptive signature calls to match the GSE85331 and GSE122380 analyses. 29 of the 32 genes possessed the same descriptive signature classification between the two data sets. Interestingly, CABLES, ISL1, MEF2C, and TBX5 did not meet these criteria, indicating variability between the datasets. The expected mid-cardiomyocyte genes, ISL1 and TBX5, were not successfully replicated in these datasets, indicating potential difficulty in distinguishing mid- and late-cardiomyocyte genes because of their similarity in gene expression, perhaps due to differences in the activation of the cardiomyocyte differentiation program. Although discrepancies exist between the datasets, this stringent approach demonstrates the advantages of utilizing the descriptive analysis method to identify gene expression patterns.

Correlation analysis

Whereas gene signature classification utilizing hierarchical clustering, k-means clustering, and self-organizing maps relies on classification techniques that randomly sample the dataset, we utilized a method that classifies genes based on correlation with user-specified gene or expression profiles that model the gene signature patterns of interest. This method enables the user to utilize a priori knowledge of gene expression or to explore gene expression patterns of interest based on their analyses. It also provides reproducibility between analyses and flexibility in selecting genes based on the extent of correlation with a gene model. In correlation analysis of cardiac signatures during differentiation, specific gene models can be selected within the dataset based on prior knowledge.

Key marker genes - The following results were obtained from using the following cardiac marker gene signatures represented by a specific cardiomyocyte differentiation gene: POU5F1 (pluripotency), TBXT (mesodermal), GATA4 (early-cardiomyocyte), TBX5 (mid-cardiomyocyte), and TNNT2 (late-cardiomyocyte). Although ISL1 is a potential mid-cardiomyocyte signature due to its expression during early cardiac progenitor development, ISL1 was not considered a gene signature because the signature calls using our basic/descriptive statistical analysis did not match between the GSE85331 and GSE122380data sets (and thus ISL1 is not on the 1711 gene list). Instead, we arbitrarily selected TBX5 as the mid-cardiomyocyte gene expression signature for the following correlation analyses because it demonstrated moderate upregulation on day 4 or 5 within the RNA-seq and RT-qPCR data sets compared to later upregulation of the late-cardiomyocyte marker TNNT2 after this time point (Figure 5). Within the RT-qPCR data, TBX5 relative expression is moderately increased from day 2 (0.45) to day 4 (7.68); however, this expression was upregulated nearly 50 times on day 8 (49.12), whereas TNNT2 demonstrated several magnitudes of upregulation from day 4 (0.88) to day 8 (1899.00).

LOESS and medians correlation: Two different methods were used to determine correlations between each gene and the cardiac marker genes within the 3259 priority cardiomyocyte gene list (see above). The LOESS smoothing technique uses a model to fit each gene to the cardiomyocyte differentiation marker gene and then computes the correlation between the fit of each gene and the fit of each cardiac marker gene. The second method computes a median value for each time point for each gene, and then correlates the results with the medians of each cardiac marker gene. After computing the correlations (using either LOESS fit or medians), each gene was classified according to the cardiac marker gene with the highest correlations. Each gene was correlated to a specific gene model, with the highest correlation selected for a cardiac expression signature.

devGEA shiny web application: We note that the shiny web application includes gene clustering based on median correlation but does not yet (at the time of writing) contain LOESS correlation clustering.

Example genes: Consider example genes ZEB2 and ACTN2. The LOESS fits are given in the plots on the left side of Figure 5, and the median values are given in the middle plots. Table 2 shows the correlation between these values and those of each of the key marker genes. For example, ZEB2’s LOESS model fit for GSE122380 was 0.966721808 with the model fit of the early marker gene (GATA4), with the second-highest correlation being 0.745102698 with the model fit of the mid-cardiac marker gene (TBX5). ACTN2 was assigned a different classification using the LOESS fit for the two datasets (late-cardiomyocytes for GSE122380 and mid-cardiomyocytes for GSE85331). For GSE85331 and LOESS fit, the LOESS fit for ACTN2 had a correlation of 0.987544522 with the mid-cardiomyocyte marker gene (TBX5), with the second highest correlation of 0.95867473 with the late-cardiomyocyte marker gene (TNNT2). Note that both mid- and late-cardiomyocyte signatures are reasonably good fits for ACTN2 in the correlation analysis, while the drop-off in correlation with the other marker genes is much larger.

Filtering: A minimum correlation value of 0.75 was selected as an arbitrary cut-off to further filter the differentially expressed genes (the LOESS fit and median values should have a correlation of at least 0.75, assigned to least one of the key marker genes). The mean correlation factor for this gene list was 0.931 with a standard deviation of 0.057, demonstrating the combined effectiveness of this filtering and classification strategy. The mean expression patterns of all genes in a particular signature classification reflected the corresponding model in both the datasets (Figure 4).

Comparisons: The comparison of the LOESS and median correlation methods within each dataset was surprisingly robust. 93% of the 3259 genes within the GSE85331 dataset were classified into the same signature classification (same classification using LOESS correlation and using median correlation), while 91% of the genes in the GSE122380 dataset were similarly classified using LOESS and median correlation. The comparison between GSE85331 and GSE122380 datasets was less robust. Using the LOESS method, 58% of the 3259 genes were successfully classified into the same signature using the LOESS method between the GSE85331 and GSE122380, while 59% of the genes were successfully classified using the median correlation method in the two data sets. Consider two example genes: ZEB2 and ACTN2. As indicated in Table 2, ZEB2 has the same correlation classification using both methods and both datasets, so ZEB2 is among the genes counted for all these percentages. ACTN2 has the same classification using LOESS and medians fit for GSE122380 (so is among the 91%) and has the same LOESS and median classifications for GSE85331 (so is among the 93%). ACTN2 has different classifications for both datasets for median correlation (not among the 59% mentioned) and has different classifications between the two datasets for LOESS correlation (not among the 58% mentioned).

Overall results - all methods of classification

The median gene expression signatures of each method and RNA-seq dataset are shown in Figure 4. Discrepancies may reflect a combination of biological and technical variations and analysis limitations due to differing numbers of time points between the RNA-seq experiments of the differentiation procedures. Further contrast of these analyses requiring both datasets to give the same classification for both methods (GSE122380 provides the same classification for both LOESS and medians, and GSE85331 provides the same classification for both LOESS and medians) identifies a total of 1704 of the 3259 gene lists reproducibly assigned to the same signature.

Next, we compared our RT-qPCR results with the LOESS and median correlation results of the inter-RNA-seq dataset comparisons. A summary of the analysis of these 32 genes is provided in Table 3. Of the 32 genes investigated, 27 RT-qPCR genes were identified in 3259 genes. Interestingly, 13 passed all filters and were classified the same using all six of our classification methods (basic/descriptive on both datasets, loess correlation on both datasets, and medians correlation on both datasets). Of these, TNNT2 was the only gene with an expected classification as mid or late. Most of the other mid/late genes were not classified consistently or, as expected, in GSE85331. This points to the difficulty in distinguishing the later time points in the experiment, although we expect that we can obtain a higher degree of matching by either selecting different key marker genes or using constructed gene patterns rather than the model fits of the marker genes. We report the results based on our initial plans to avoid simply selecting the parameters that yield the best results. Even so, most of the 32 genes in the mid/late phase that do not have 6-way agreement have high correlation values for both mid and late marker genes, meaning that we should consider not only the marker gene with the highest correlation, but also the second highest correlation and group.

Finally, we considered which of the 3259 genes were classified as the same using all six methods. Of the 3259 genes, 840 were classified as the same using all six methods.

Overall, this investigation demonstrates the unique analytical methods that devGEA provides to classify gene expression signatures in order to organize and study high-throughput developmental time-series gene expression studies. In addition, these studies suggest that caution must be taken when interpreting these analyses because of the inherent variability between experiments. Multiple methodological approaches may be necessary to evaluate the biological processes that are investigated to obtain the best representation of gene expression data. Because of the inherent gene expression similarity between the later cardiomyocyte differentiation time points, further investigation is warranted to determine whether the different key marker genes or the constructed gene expression patterns would better differentiate these time points in all datasets.

Conclusion

Differential gene expression analysis of high-throughput RNA-seq gene expression data represents a challenge, as datasets grow to several time points or conditions containing many biological replicates. We have developed a devGEA bioinformatics software suite that combines basic statistical analysis techniques, data visualization, and novel gene expression signature classification methods to improve the analysis of developmental time series. Additionally, we demonstrated the effective use of our methods in classifying developmental signatures that represent key stages of cardiomyocyte differentiation by filtering genes based on statistical descriptions and correlation to specific gene signatures. The robustness of the software and analysis was validated via RT-qPCR analysis of gene expression using the commercially available Gibco hiPSC cardiomyocyte differentiation procedure.

The devGEA analysis of directed hiPSC-cardiomyocyte differentiation provides an effective methodology for identifying and exploring gene expression signatures that characterize cardiac development, which is applicable to other domains of research. Three different classification methods used either basic statistics or correlations to identify similarly expressed genes. Additional experiments can be used to enhance or validate data analysis.

The GSE85331 and GSE122380 datasets offer a means to robustly identify gene expression signatures across cardiomyocyte differentiation and to explore the effectiveness of each method. To validate these findings, 32 genes comprising differentiation markers and understudied genes of the 3259 priority gene list were evaluated using RT-qPCR profiling of the cardiomyocyte differentiation of the GIBCO hiPSC line when using stringent criteria requiring the same gene signatures to be called in all three datasets across both experiments. Thirteen of the 32 genes were validated in this manner, indicating the reliability of signature identification between these diverse methods. Of the remaining genes on the 32 gene list, most were mid or late-cardiac genes, where at least one of the six classifications placed the gene in an adjacent cluster (i.e., classifying mid genes as late or late genes as mid). The discrepancies between these experiments may be due to the biological nature of cardiomyocyte differentiation or the technical challenges of discriminating unique developmental expression signatures characterizing a differentiating subpopulation from a heterogeneous population of cells. Nevertheless, our RT-qPCR investigation of important developmental cardiac markers from the correlation analysis demonstrated the robust classification of differentially expressed genes into specific cardiac developmental signatures.

Furthermore, these experiments identified unique gene expression patterns occurring during human cardiomyocyte differentiation, with several of these genes associated with mammalian heart development or congenital cardiovascular defects.

The earliest cell state transition involves specification of the mesodermal lineage from the pluripotency state. The mesodermal signature genes were transiently upregulated on day 2 and then repressed during the remainder of cardiomyocyte differentiation. The cadual-type homeobox genes CDX1 and CDX2, are early embryonic genes that control the anterior-posterior patterning of the developing embryo.⁴³^,⁴⁴ Recent studies in mice have identified important roles for CDX genes in early embryonic fate decisions that drive differentiating cells to the cardiac lineage and in epicardial development.⁴⁵^,⁴⁶ Additional genes associated with the mesodermal signature include msh homeobox MSX1 and doublesex and mab-3 related DMRT1 transcription factors. MSX1 and MSX2 double homozygous null mutant mice developed abnormalities in the atrioventricular valves and myocardium.⁴⁷ Two additional genes include the CDK/ABL kinase interaction protein CABLES1 and the interleukin 17 receptor IL17RD which are yet to be investigated in heart development.

The early cardiomyocyte signature is characterized by transcriptional upregulation at day 2, with persistent or increasing gene expression persisting throughout the entire cardiomyocyte differentiation process. Examples of genes classified as early cardiomyocyte signatures include the Wnt cargo receptor WLS, YY1-associated factor YAF2, and transcriptional regulators ZEB2 and ZNF503. WLS encodes a conserved protein necessary for WNT secretion, which is required for early embryonic axis and heart field formation.⁴⁸^–⁵⁰ While developmental functions for YAF2, ZEB2, and ZNF503 have been identified in mice, embryonic cardiac developmental roles for these genes are yet to be reported.⁵¹^–⁵³

Mid-cardiomyocyte genes represent the induction of gene expression on day four of cardiomyocyte differentiation. The protein kinase PKDCC gene and frizzled-class WNT receptor FZD4 are examples of candidate genes that have yet to be investigated for cardiac developmental functions. PKDCC is an essential mesenchymal regulator of lung development.⁵⁴^,⁵⁵ FZD4 is expressed within the embryonic vasculature, and mutations are responsible for the incomplete vascularization of the retina occurring in human exudative vitreoretinopathy I disorder.⁵⁶^,⁵⁷ Furthermore, FZD4 was recently found to be a marker of the lateral plate mesoderm and to contribute to cardiomyocytes and other cardiac cell lineages.⁵⁸^,⁵⁹

Late-cardiomyocyte signature candidates are associated with later stages of differentiation involving myofibril organization and contractility. The actinin ACTN2, cardiomyopathy-associated protein CMYA5, sarcomeric protein CSRP3, nuclear lamin-interacting protein MLIP, sarcomeric protein MYOM1, and actin-binding protein NEXN were markedly enriched at the endpoint of the developmental series. Interestingly, each of these genes is associated with a human cardiomyopathy disorder, produces cardiomyopathy in a mouse model, or localizes to the cardiac myofibrils.⁶⁰^–⁶⁴ Additionally, protein kinase HIPK3 and protein phosphatase regulatory subunit PPP1R12B were classified as late-cardiac signature genes. HIPK3 is expressed in the early developing heart; however, the function of the gene may be dispensable because homozygous null adult mice survive into adulthood and develop impaired insulin secretion and glucose tolerance.⁶⁵^,⁶⁶ The PPP1R12B gene encodes a subunit of MYPT2, a muscle phosphatase that regulates cardiac myosin within contractile myofibrils.⁶⁷

Overall, devGEA provides an effective analysis tool that enables the reproducible identification of unique cardiac signatures occurring throughout hiPSC-cardiomyocyte differentiation. Given the reproducibility of several genes across multiple experiments and cell lines, these genes may represent important regulatory nodes in the human developmental cardiac network, warranting further investigations. The study of heart development is of critical importance to biomedical, clinical, and translational research, as it allows us to understand and intervene in human cardiovascular disease. Directed cardiomyocyte differentiation from hiPSCs allows researchers to answer important biological questions regarding the differentiation of these specialized cells, the pathogenesis underlying the disease, and their potential therapeutic treatment for human cardiovascular disease. Since differential gene expression during hiPSC-directed cardiomyocyte differentiation recapitulates the in vivo cardiogenic program of the embryo, this experimental system allows us to advance and expand our knowledge of gene regulation in human cardiogenesis from other model organisms such as mice. The translational medicine effectiveness of this technique is already being realized by the generation and differentiation of hiPSCs into cardiomyocytes from patients with congenital cardiovascular disease, which allows the determination of the molecular, cellular, developmental, and physiological dysfunction underlying gene mutations.⁶⁸^,⁶⁹ Additionally, hiPSCs can be genetically modified using RNAi, homologous recombination, or Crispr/Cas9 techniques to inactivate gene function during cardiomyocyte differentiation, allowing the study of gene function in the human developmental context. High-throughput gene expression profiling of hiPSC differentiation allows the classification of discrete gene expression patterns during cardiomyocyte differentiation, which may prove useful for the analysis of genetically modified hiPSCs and patient-derived hiPSCs that disrupt normal cardiac regulatory gene function.

Ethics and consent

This research did not involve human subjects or animals.

Data availability

The data and full results of this study are available in the Gitlab repository https://gitlab.com/jkinne/devgea with DOI https://doi.org/10.5281/zenodo.13118760.⁷⁰ The analysis code and shiny web app are also included in this repository. The entire repository, including the dataset, was provided using Apache License Version 2.0. The following data and result files were included:

• data/GIBCO_qPCR_cardiac_genes.csv – RT-qPCR measurements for genes studied across four timepoints.
• results/all_results.xlsx - analysis results for all genes within the previous datasets studied.
• results/results_pass_filters.csv - analysis results for all genes that passed the filters used in our study (p value, log2 fold change, correlation to one of the marker genes).
• results/results_6way_agreement.csv - analysis results for all genes that passed all filters and also produced the same classification under all of the methods used in the study.

All analyses were performed using R Statistical Software (v4.4.1 (2024-06-14)).⁷¹ Packages used for computations and analysis were matrixStats, ⁷² ClassComparison, ⁷³ and base R packages.⁷¹

Extended data

Tables are available as extended data via the above repository link.https://doi.org/10.5281/zenodo.13118760

Software availability

Source code available from: https://gitlab.com/jkinne/devgea/

Archived software available from: https://doi.org/10.5281/zenodo.13118760 ⁷⁰

License: Apache License Version 2.0.

Acknowledgements

We thank the following for useful discussions, technical assistance, and/or preliminary work related to this project: Dwayne Tally, Andrew Williamson, Mirian Alvarez-Dubon, Ishmeet Kaur, Joseph Dalloul, Danielle Muse, Basundhara Raj, Josh Soto, Laura Cochran, Michaela Ward, Hayden Fell, Garett Oxford, Om Dalal, AJ Farmer, Shaad Ahmad, and Rusty Gonser.

References

1. Smith CM, Hayamizu TF, Finger JH, et al.: The mouse Gene Expression Database (GXD): 2019 update. Nucleic Acids Res. 2019; 47(D1): D774–D779. Epub 2018/10/20. PubMed Abstract | Publisher Full Text | Free Full Text
2. Visel A, Thaller C, Eichele G: GenePaint.org: an atlas of gene expression patterns in the mouse embryo. Nucleic Acids Res. 2004; 32(Database issue): D552–6. Epub 2003/12/19. PubMed Abstract | Publisher Full Text | Free Full Text
3. Consortium GT: Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science. 2015; 348(6235): 648–60. Epub 2015/05/09. PubMed Abstract | Publisher Full Text | Free Full Text
4. Bult CJ, Blake JA, Smith CL, et al.: Mouse Genome Database G. Mouse Genome Database (MGD) 2019. Nucleic Acids Res. 2019; 47(D1): D801–6. Epub 2018/11/09. PubMed Abstract | Publisher Full Text | Free Full Text
5. Barrett T, Wilhite SE, Ledoux P, et al.: NCBI GEO: archive for functional genomics data sets--update. Nucleic Acids Res. 2013; 41(Database issue): D991–5. Epub 2012/11/30. PubMed Abstract | Publisher Full Text | Free Full Text
6. Takahashi K, Yamanaka S: Induced pluripotent stem cells in medicine and biology. Development. 2013; 140(12): 2457–2461. PubMed Abstract | Publisher Full Text
7. Shi Y, Inoue H, Wu JC, et al.: Induced pluripotent stem cell technology: a decade of progress. Nat. Rev. Drug Discov. 2017; 16(2): 115–30. Epub 2016/12/16. PubMed Abstract | Publisher Full Text | Free Full Text
8. Mummery CL, Zhang J, Ng ES, et al.: Differentiation of human embryonic stem cells and induced pluripotent stem cells to cardiomyocytes: a methods overview. Circ. Res. 2012; 111(3): 344–58. Epub 2012/07/24. PubMed Abstract | Publisher Full Text | Free Full Text
9. Doyle MJ, Lohr JL, Chapman CS, et al.: Human Induced Pluripotent Stem Cell-Derived Cardiomyocytes as a Model for Heart Development and Congenital Heart Disease. Stem Cell Rev. Rep. 2015; 11(5): 710–27. PubMed Abstract | Publisher Full Text | Free Full Text
10. Zhu Z, Huangfu D: Human pluripotent stem cells: an emerging model in developmental biology. Development. 2013; 140(4): 705–17. Epub 2013/01/31. PubMed Abstract | Publisher Full Text | Free Full Text
11. Burridge PW, Keller G, Gold JD, et al.: Production of de novo cardiomyocytes: human pluripotent stem cell differentiation and direct reprogramming. Cell Stem Cell. 2012; 10(1): 16–28. PubMed Abstract | Publisher Full Text | Free Full Text
12. Burridge PW, Thompson S, Millrod MA, et al.: A universal system for highly efficient cardiac differentiation of human induced pluripotent stem cells that eliminates interline variability. PLoS One. 2011; 6(4): e18293. Epub 2011/04/16. PubMed Abstract | Publisher Full Text | Free Full Text
13. Lian X, Hsiao C, Wilson G, et al.: Robust cardiomyocyte differentiation from human pluripotent stem cells via temporal modulation of canonical Wnt signaling. Proc. Natl. Acad. Sci. USA. 2012; 109(27): E1848–57. Epub 2012/05/31. PubMed Abstract | Publisher Full Text | Free Full Text
14. Kattman SJ, Witty AD, Gagliardi M, et al.: Stage-specific optimization of activin/nodal and BMP signaling promotes cardiac differentiation of mouse and human pluripotent stem cell lines. Cell Stem Cell. 2011; 8(2): 228–40. PubMed Abstract | Publisher Full Text
15. Burridge PW, Matsa E, Shukla P, et al.: Chemically defined generation of human cardiomyocytes. Nat. Methods. 2014; 11(8): 855–60. Epub 2014/06/15. PubMed Abstract | Publisher Full Text | Free Full Text
16. Lian X, Zhang J, Azarin SM, et al.: Directed cardiomyocyte differentiation from human pluripotent stem cells by modulating Wnt/beta-catenin signaling under fully defined conditions. Nat. Protoc. 2013; 8(1): 162–75. Epub 2012/12/20. PubMed Abstract | Publisher Full Text | Free Full Text
17. Liu Q, Jiang C, Xu J, et al.: Genome-Wide Temporal Profiling of Transcriptome and Open Chromatin of Early Cardiomyocyte Differentiation Derived From hiPSCs and hESCs. Circ. Res. 2017; 121(4): 376–91. Epub 2017/07/01. PubMed Abstract | Publisher Full Text | Free Full Text
18. Strober BJ, Elorbany R, Rhodes K, et al.: Dynamic genetic regulation of gene expression during cellular differentiation. Science. 2019; 364(6447): 1287–90. Epub 2019/06/30. PubMed Abstract | Publisher Full Text | Free Full Text
19. Koressaar T, Remm M: Enhancements and modifications of primer design program Primer3. Bioinformatics. 2007; 23(10): 1289–91. Epub 2007/03/22. PubMed Abstract | Publisher Full Text
20. Livak KJ, Schmittgen TD: Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods. 2001; 25(4): 402–8. PubMed Abstract | Publisher Full Text
21. Bustin SA, Benes V, Garson JA, et al.: The MIQE guidelines: minimum information for publication of quantitative real-time PCR experiments. Clin. Chem. 2009; 55(4): 611–22. Epub 2009/02/26. PubMed Abstract | Publisher Full Text
22. Takahashi K, Yamanaka S: Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell. 2006; 126(4): 663–76. Epub 2006/08/15. PubMed Abstract | Publisher Full Text
23. Okita K, Ichisaka T, Yamanaka S: Generation of germline-competent induced pluripotent stem cells. Nature. 2007; 448(7151): 313–7. Epub 2007/06/08. PubMed Abstract | Publisher Full Text
24. Yu J, Vodyanik MA, Smuga-Otto K, et al.: Induced pluripotent stem cell lines derived from human somatic cells. Science. 2007; 318(5858): 1917–20. Epub 2007/11/20. PubMed Abstract | Publisher Full Text
25. Kouskoff V, Lacaud G, Schwantz S, et al.: Sequential development of hematopoietic and cardiac mesoderm during embryonic stem cell differentiation. Proc. Natl. Acad. Sci. USA. 2005; 102(37): 13170–5. Epub 2005/09/02. PubMed Abstract | Publisher Full Text | Free Full Text
26. David R, Jarsch VB, Schwarz F, et al.: Induction of MesP1 by Brachyury(T) generates the common multipotent cardiovascular stem cell. Cardiovasc. Res. 2011; 92(1): 115–22. Epub 2011/06/01. PubMed Abstract | Publisher Full Text
27. Kuo CT, Morrisey EE, Anandappa R, et al.: GATA4 transcription factor is required for ventral morphogenesis and heart tube formation. Genes Dev. 1997; 11(8): 1048–60. PubMed Abstract | Publisher Full Text
28. van Berlo JH , Elrod JW, van den Hoogenhof MM , et al.: The transcription factor GATA-6 regulates pathological cardiac hypertrophy. Circ. Res. 2010; 107(8): 1032–40. Epub 2010/08/12. PubMed Abstract | Publisher Full Text | Free Full Text
29. Molkentin JD, Lin Q, Duncan SA, et al.: Requirement of the transcription factor GATA4 for heart tube formation and ventral morphogenesis. Genes Dev. 1997; 11(8): 1061–72. PubMed Abstract | Publisher Full Text
30. Lints TJ, Parsons LM, Hartley L, et al.: Nkx-2.5: a novel murine homeobox gene expressed in early heart progenitor cells and their myogenic descendants. Development. 1993; 119(2): 419–31. PubMed Abstract | Publisher Full Text
31. Edmondson DG, Lyons GE, Martin JF, et al.: Mef2 gene expression marks the cardiac and skeletal muscle lineages during mouse embryogenesis. Development. 1994; 120(5): 1251–63. PubMed Abstract | Publisher Full Text
32. Bu L, Jiang X, Martin-Puig S, et al.: Human ISL1 heart progenitors generate diverse multipotent cardiovascular cell lineages. Nature. 2009; 460(7251): 113–7. PubMed Abstract | Publisher Full Text
33. Bruneau BG, Logan M, Davis N, et al.: Chamber-specific cardiac expression of Tbx5 and heart defects in Holt-Oram syndrome. Dev. Biol. 1999; 211(1): 100–8. PubMed Abstract | Publisher Full Text
34. He X, Liu Y, Sharma V, et al.: ASK1 associates with troponin T and induces troponin T phosphorylation and contractile dysfunction in cardiomyocytes. Am. J. Pathol. 2003; 163(1): 243–51. PubMed Abstract | Publisher Full Text | Free Full Text
35. Martin FJ, Amode MR, Aneja A, et al.: Ensembl 2023. Nucleic Acids Res. 2023; 51(D1): D933–41. PubMed Abstract | Publisher Full Text | Free Full Text
36. NV OA: ShinyProxy.2024.
37. Wagner GP, Kin K, Lynch VJ: Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples. Theory Biosci. 2012; 131(4): 281–5. Epub 2012/08/08. PubMed Abstract | Publisher Full Text
38. Abrams ZB, Johnson TS, Huang K, et al.: A protocol to evaluate RNA sequencing normalization methods. BMC Bioinformatics. 2019; 20(Suppl 24): 679. Epub 2019/12/20. PubMed Abstract | Publisher Full Text | Free Full Text
39. Coombes KR, Fritsche HA Jr, Clarke C, et al.: Quality control and peak finding for proteomics data collected from nipple aspirate fluid by surface-enhanced laser desorption and ionization. Clin. Chem. 2003; 49(10): 1615–23. PubMed Abstract | Publisher Full Text
40. Zhao S, Ye Z, Stanton R: Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols. RNA. 2020; 26(8): 903–9. Epub 2020/04/13. PubMed Abstract | Publisher Full Text | Free Full Text
41. Nakamura T, Fidler IJ, Coombes KR: Gene expression profile of metastatic human pancreatic cancer cells depends on the organ microenvironment. Cancer Res. 2007; 67(1): 139–48. PubMed Abstract | Publisher Full Text
42. Lyons I, Parsons LM, Hartley L, et al.: Myogenic and morphogenetic defects in the heart tubes of murine embryos lacking the homeo box gene Nkx2-5. Genes Dev. 1995; 9(13): 1654–66. Epub 1995/07/01. PubMed Abstract | Publisher Full Text
43. Savory JG, Bouchard N, Pierre V, et al.: Cdx2 regulation of posterior development through non-Hox targets. Development. 2009; 136(24): 4099–110. Epub 2009/11/11. PubMed Abstract | Publisher Full Text
44. Savory JG, Pilon N, Grainger S, et al.: Cdx1 and Cdx2 are functionally equivalent in vertebral patterning. Dev. Biol. 2009; 330(1): 114–22. Epub 2009/03/27. PubMed Abstract | Publisher Full Text
45. Chu M, Wang L, Wang H, et al.: A novel role of CDX1 in embryonic epicardial development. PLoS One. 2014; 9(7): e103271. Epub 2014/07/28. PubMed Abstract | Publisher Full Text | Free Full Text
46. Foley TE, Hess B, Savory JGA, et al.: Role of Cdx factors in early mesodermal fate decisions. Development. 2019; 146(7). Epub 2019/04/01. PubMed Abstract | Publisher Full Text
47. Chen YH, Ishii M, Sucov HM, et al.: Msx1 and Msx2 are required for endothelial-mesenchymal transformation of the atrioventricular cushions and patterning of the atrioventricular myocardium. BMC Dev. Biol. 2008; 8: 75. Epub 2008/07/30. PubMed Abstract | Publisher Full Text | Free Full Text
48. Fu J, Jiang M, Mirando AJ, et al.: Reciprocal regulation of Wnt and Gpr177/mouse Wntless is required for embryonic axis formation. Proc. Natl. Acad. Sci. USA. 2009; 106(44): 18598–603. Epub 2009/10/19. PubMed Abstract | Publisher Full Text | Free Full Text
49. Banziger C, Soldini D, Schutt C, et al.: Wntless, a conserved membrane protein dedicated to the secretion of Wnt proteins from signaling cells. Cell. 2006; 125(3): 509–22. PubMed Abstract | Publisher Full Text
50. Miyamoto M, Kannan S, Anderson MJ, et al.: Cardiac progenitors instruct second heart field fate through Wnts. Proc. Natl. Acad. Sci. USA. 2023; 120(4): e2217687120. Epub 2023/01/17. PubMed Abstract | Publisher Full Text | Free Full Text
51. Van de Putte T, Maruhashi M, Francis A, et al.: Mice lacking ZFHX1B, the gene that codes for Smad-interacting protein-1, reveal a role for multiple neural crest cell defects in the etiology of Hirschsprung disease-mental retardation syndrome. Am. J. Hum. Genet. 2003; 72(2): 465–70. Epub 2003/01/09. PubMed Abstract | Publisher Full Text | Free Full Text
52. Eid A, Torres-Padilla ME: Characterization of non-canonical Polycomb Repressive Complex 1 subunits during early mouse embryogenesis. Epigenetics. 2016; 11(6): 389–97. Epub 2016/04/15. PubMed Abstract | Publisher Full Text | Free Full Text
53. Chen SY, Liu FC: The Fgf9-Nolz1-Wnt2 axis regulates morphogenesis of the lung. Development. 2023; 150(16). Epub 2023/08/17. PubMed Abstract | Publisher Full Text
54. Kinoshita M, Era T, Jakt LM, et al.: The novel protein kinase Vlk is essential for stromal function of mesenchymal cells. Development. 2009; 136(12): 2069–79. PubMed Abstract | Publisher Full Text
55. Brutsch SM, Madzharova E, Pantasis S, et al.: Mesenchyme-derived vertebrate lonesome kinase controls lung organogenesis by altering the matrisome. Cell. Mol. Life Sci. 2023; 80(4): 89. Epub 2023/03/15. PubMed Abstract | Publisher Full Text | Free Full Text
56. Xu Q, Wang Y, Dabdoub A, et al.: Vascular development in the retina and inner ear: control by Norrin and Frizzled-4, a high-affinity ligand-receptor pair. Cell. 2004; 116(6): 883–895. PubMed Abstract | Publisher Full Text
57. Summerhurst K, Stark M, Sharpe J, et al.: 3D representation of Wnt and Frizzled gene expression patterns in the mouse embryo at embryonic day 11.5 (Ts19). Gene Expr. Patterns. 2008; 8(5): 331–48. Epub 2008/02/13. PubMed Abstract | Publisher Full Text | Free Full Text
58. Chen Q, Zhang H, Liu Y, et al.: Endothelial cells are progenitors of cardiac pericytes and vascular smooth muscle cells. Nat. Commun. 2016; 7: 12422. Epub 2016/08/12. PubMed Abstract | Free Full Text Publisher Full Text |
59. Yoon C, Song H, Yin T, et al.: FZD4 Marks Lateral Plate Mesoderm and Signals with NORRIN to Increase Cardiomyocyte Induction from Pluripotent Stem Cell-Derived Cardiac Progenitors. Stem Cell Reports. 2018; 10(1): 87–100. Epub 2017/12/14. PubMed Abstract | Publisher Full Text | Free Full Text
60. Liu C, Spinozzi S, Chen JY, et al.: Nexilin Is a New Component of Junctional Membrane Complexes Required for Cardiac T-Tubule Formation. Circulation. 2019; 140(1): 55–66. Epub 2019/04/15. PubMed Abstract | Publisher Full Text | Free Full Text
61. Wu T, Xu Y, Zhang L, et al.: Filamin C is Essential for mammalian myocardial integrity. PLoS Genet. 2023; 19(1): e1010630. Epub 2023/01/27. PubMed Abstract | Publisher Full Text | Free Full Text
62. Huang ZP, Kataoka M, Chen J, et al.: Cardiomyocyte-enriched protein CIP protects against pathophysiological stresses and regulates cardiac homeostasis. J. Clin. Invest. 2015; 125(11): 4122–34. Epub 2015/10/05. PubMed Abstract | Publisher Full Text | Free Full Text
63. Tsoupri E, Kostavasili I, Kloukina I, et al.: Myospryn deficiency leads to impaired cardiac structure and function and schizophrenia-associated symptoms. Cell Tissue Res. 2021; 385(3): 675–96. Epub 2021/05/26. PubMed Abstract | Publisher Full Text
64. Broadway-Stringer S, Jiang H, Wadmore K, et al.: Insights into the Role of a Cardiomyopathy-Causing Genetic Variant in ACTN2. Cells. 2023; 12(5). Epub 2023/02/24. PubMed Abstract | Free Full Text Publisher Full Text |
65. Shojima N, Hara K, Fujita H, et al.: Depletion of homeodomain-interacting protein kinase 3 impairs insulin secretion and glucose tolerance in mice. Diabetologia. 2012; 55(12): 3318–30. Epub 2012/09/16. PubMed Abstract | Publisher Full Text
66. Isono K, Nemoto K, Li Y, et al.: Overlapping roles for homeodomain-interacting protein kinases hipk1 and hipk2 in the mediation of cell growth in response to morphogenetic and genotoxic signals. Mol. Cell. Biol. 2006; 26(7): 2758–2771. PubMed Abstract | Publisher Full Text | Free Full Text
67. Lee E, May H, Kazmierczak K, et al.: The MYPT2-regulated striated muscle-specific myosin light chain phosphatase limits cardiac myosin phosphorylation in vivo. J. Biol. Chem. 2024; 300(2): 105652. Epub 2024/01/13. PubMed Abstract | Publisher Full Text | Free Full Text
68. Sun RH, Hu BC, Li Q: Stress-induced cardiomyopathy complicated by multiple organ failure following cephalosporin-induced anaphylaxis. Intern. Med. 2012; 51(8): 895–9. Epub 2012/04/17. PubMed Abstract | Publisher Full Text
69. Ang YS, Rivas RN, Ribeiro AJS, et al.: Disease Model of GATA4 Mutation Reveals Transcription Factor Cooperativity in Human Cardiogenesis. Cell. 2016; 167(7): 1734–1749.e22. Epub 2016/12/17. PubMed Abstract | Publisher Full Text | Free Full Text
70. Isaiah Bartlett JK, Martin G, Coombes K, et al.: devGEA 1.0.1 - source code and dataset.2024. Publisher Full Text
71. Team RC: R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2024.
72. Bengtsson H: matrixStats: Functions that Apply to Rows and Columns of Matrices (and to Vectors).2024. Publisher Full Text
73. Coombes KR: ClassComparison: Classes and Methods for “Class Comparison” Problems on Microarrays.2019. ClassComparison. Publisher Full Text

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 26 Sep 2024

Author details Author details

¹ Department of Electronics and Computer Engineering, Indiana State University, Terre Haute, Indiana, USA
² The Rich and Robin Porter Cancer Research Center, Indiana State University, Terre Haute, Indiana, USA
³ The Center for Genomic Advocacy, Indiana State University, Terre Haute, Indiana, USA
⁴ Department of Biology, Indiana State University, Terre Haute, Indiana, USA
⁵ Augusta University Medical College of Georgia, Augusta, Georgia, USA

Isaiah Bartlett
Roles: Conceptualization, Data Curation, Formal Analysis, Investigation, Software, Visualization, Writing – Original Draft Preparation

Jeff Kinne
Roles: Conceptualization, Data Curation, Formal Analysis, Funding Acquisition, Investigation, Methodology, Project Administration, Resources, Software, Supervision, Validation, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing

Grace Martin
Roles: Data Curation, Investigation, Writing – Original Draft Preparation

Kevin Coombes
Roles: Conceptualization, Formal Analysis, Methodology

Kristopher R. Schwab
Roles: Conceptualization, Investigation, Supervision, Validation, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

National Institute on Minority Health and Health Disparities
5R25MD011712-04
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Article Versions (1)

version 1

Published: 26 Sep 2024, 13:1097

https://doi.org/10.12688/f1000research.154762.1

Copyright

© 2024 Bartlett I et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

Bartlett I, Kinne J, Martin G et al. A high-throughput gene expression analysis software tool for developmental time series and gene signature analysis of human cardiomyocyte differentiation [version 1; peer review: 2 approved with reservations]. F1000Research 2024, 13:1097 (https://doi.org/10.12688/f1000research.154762.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 1

VERSION 1

PUBLISHED 26 Sep 2024

Views

7

Reviewer Report 28 Dec 2024

Stephanie Agioti, Department of Life Sciences, School of Sciences, European University Cyprus, Nicosia, Cyprus

Apostolos Zaravinos, Department of Life Sciences, School of Sciences, Basic and Translational Cancer Research Center (BTCRC), Cancer Genetics, Genomics and Systems Biology Group, European University Cyprus, Nicosia, Cyprus

Approved with Reservations

https://doi.org/10.5256/f1000research.169830.r345653

High-throughput gene expression analysis is very important in the investigation of developmental time series of human hi- PSC cardiomyocyte differentiation and can be performed using various bioinformatic tools. Despite this, there are still several difficulties and gaps in the procedure ... Continue reading

High-throughput gene expression analysis is very important in the investigation of developmental time series of human hi- PSC cardiomyocyte differentiation and can be performed using various bioinformatic tools. Despite this, there are still several difficulties and gaps in the procedure of analysis of combined sample groups. The Developmental Gene Expression Analysis (devGEA) tool was developed by the team to provide an effective method for classifying cardiac signatures during hi-PSC cardiomyocyte differentiation. Enhancing gene expression and gene signature analysis is an important goal in cardiomyocyte differentiation, where the identification of key developmental markers is stimulated to disrupt cardiac regulatory gene function. The manuscript is highly informative and presents a promising tool for bioinformaticians studying developmental biology. It is well-designed, cites relevant literature, and presents accurate results with significant academic merit. However, there are still some shortcomings that need to be addressed, and the authors must revise the manuscript accordingly:

1. Figures 1 and 2 would be more appropriately presented in the results section, as they depict the research findings. I can understand that the reason researchers add figure 1 in background is because it has results from the previous studies by Liu et. al., 2017 and Strober et al., 2019, but for me you cannot present your results’ as a background. A good idea would be to leave only the diagrams of differentiation stage and developmental markers with the results from previous studies at the background section and the rest of the results move in the results’ section. Additionally, Figures 3, 4, 5, and 6 also present results from the study and should be moved to the results section. The datasets and experimental setup sections are part of the methodology and should be relocated accordingly. Results section must be re-organized with the figures 1-6 in order to present accurately only the research’s’ results and discussion part must be a different section. Furthermore, the analysis overview should be moved to the methodology section, clearly outlining the three methods used: linear model ANOVA, basic descriptive statistical analysis, and correlation analysis. Finally, overall results – all methods of classification must be used as a discussion of the study and last paragraph must be categorized as the conclusions of the study.

2. On page 5 (datasets section), the authors refer to the original procedure without explaining it in sufficient detail, leaving gaps in the readers' understanding. Also, authors don’t mention the multiple hPSC lines which they referred to (line 8). Moreover, the manuscript does not clearly explain why day 8 was selected as the 4th time point, while other studies used day 15 and day 30 as corresponding time points.

On page 21, paragraph 2, authors refer to twenty genes without mentioned them name by name, while figure 4 must be referred to paragraph 3 since it describes its results. In page 23, paragraph 2 (the devGA…), authors did not refer which are the three different classification methods were used for basic statistics or correlation. Finally, they are not clear in the manuscript the criteria used in the developmental markers for each stage as well as how the markers that are between two different development stages can be used.

3. The environment plays a critical role in influencing developmental processes, including the differentiation of human pluripotent stem cells (hPSCs) into cardiomyocytes. Given that gene expression and signaling are key environmental factors that influence the expression of other factors and molecules, it would be valuable for the authors to experimentally study how the overexpression and/or inhibition of these genes in the environment affects cardiomyocyte differentiation, and at which stages or levels this impact occurs.

4. A future perspectives section should be added to discuss the potential role of devGEA in identifying differentially expressed genes in cardiomyocyte differentiation in humans as a predictive tool. Additionally, its applicability to other developmental processes should be explored to highlight the broader value of the tool.
Team presents very clear the methodology used in their study, allowing the replication of analyses by others with very easy access to the source data used for the results to ensure full reproducibility. In addition, they provide a very clear guidance about devGA and how they designed it in order to give to the readers clear instructions how to use it. In addition, interpretation of the statistical analysis of the results was very appropriate and it is clear how statistically significant results were ‘’extracted’’ from the study.

Team presents very clear the methodology used in their study, allowing the replication of analyses by others with very easy access to the source data used for the results to ensure full reproducibility. In addition, they provide a very clear guidance about devGA and how they designed it in order to give to the readers clear instructions how to use it. In addition, interpretation of the statistical analysis of the results was very appropriate and it is clear how statistically significant results were ‘’extracted’’ from the study.

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Partly

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Cancer Genetics, Computational Biology

We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Views

12

Reviewer Report 12 Oct 2024

Qing Liu, Clemson University, Clemson, USA

Approved with Reservations

https://doi.org/10.5256/f1000research.169830.r327630

The manuscript "A high-throughput gene expression analysis software tool for developmental time series and gene signature analysis of human cardiomyocyte differentiation" by Bartlett et al. developed a novel bioinformatic tool named Developmental Gene Expression Analysis (devGEA), which can be be ... Continue reading

The manuscript "A high-throughput gene expression analysis software tool for developmental time series and gene signature analysis of human cardiomyocyte differentiation" by Bartlett et al. developed a novel bioinformatic tool named Developmental Gene Expression Analysis (devGEA), which can be be implemented locally or via web browsers to analyze gene expression for developmental biology study, such as developmental time series profiling, identification of differentially expressed genes and markers.
Overall this study is promising for those scientists who is focusing on stem cell biology or developmental biology but with limited expertise in bioinformatics. However, the manuscript has some pitsfalls in some parts that the author need to be clarified or revised.

1. Datasets used for this study is limited to stem-cell-based differentiation. If the authors state that this tool can be used for analyzing gene expression of the developmental biology, the dataset from in-vivo studies should be included. Otherwise, this will downgrade the significance and importance of tool.

2. Can this tool be used for single-cell RNA-seq? Or just for the bulk RNA-seq? One of major challenges of studying cell differentiation is the heterogeneity of the cell population. For instance, despite of using monolayer cardiomyocyte differentiation, the whole cell populations still include various cell types, such as ventricle-like, atrial-like, endothelial, epithelial cells, etc.
The authors need to state of challenge or potential pitfall of this tool

3. Choosing the markers are kind of arbitrary. For example, on page 16, I don't think it is a good idea to use GATA4 as the major marker for early cardiac stage. Actually, GATA4, TBX5 and TNTT2 can be detected from the early beating cardiomyocytes to late stages. This issue is also seen in using ZEB2 as a marker. Based on Liu's study (2017), those genes showed highly weight in the network analysis, but they are not used as typical makers for early stages. In addition, using only one marker for filtration has very low power.
Usually, for mesoderm, TBXT, EMOS, TBX6, et al,
For early cardiac lineage/cardiac progenitors, MESP1, NKX2.5,
For cardiac markers: GATA4, TNNT2, TBX5, MYL7, et al
Reference:
https://www.rndsystems.com/research-area/mesodermal-lineage-markers
https://www.rndsystems.com/research-area/cardiac-stem-cell-markers

4. If this tool can provide GO enrichment analysis or KEGG analysis, it will be a plus to increase the significance of this tool.

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Partly
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Partly

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Stem cell biology, genomics, cardiovascular

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 26 Sep 2024

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2
Version 1 26 Sep 24	read	read

Qing Liu, Clemson University, Clemson, USA
Stephanie Agioti, European University Cyprus, Nicosia, Cyprus

Apostolos Zaravinos, European University Cyprus, Nicosia, Cyprus

Comments on this article

All Comments(0)

Add a comment

Sign up for content alerts

Browse by related subjects

Back to all reports

Reviewer Report

7 Views

28 Dec 2024 | for Version 1

Stephanie Agioti, Department of Life Sciences, School of Sciences, European University Cyprus, Nicosia, Cyprus

Apostolos Zaravinos, Department of Life Sciences, School of Sciences, Basic and Translational Cancer Research Center (BTCRC), Cancer Genetics, Genomics and Systems Biology Group, European University Cyprus, Nicosia, Cyprus

7 Views Cite this report Responses(0)

Approved With Reservations

High-throughput gene expression analysis is very important in the investigation of developmental time series of human hi- PSC cardiomyocyte differentiation and can be performed using various bioinformatic tools. Despite this, there are still several difficulties and gaps in the procedure of analysis of combined sample groups. The Developmental Gene Expression Analysis (devGEA) tool was developed by the team to provide an effective method for classifying cardiac signatures during hi-PSC cardiomyocyte differentiation. Enhancing gene expression and gene signature analysis is an important goal in cardiomyocyte differentiation, where the identification of key developmental markers is stimulated to disrupt cardiac regulatory gene function. The manuscript is highly informative and presents a promising tool for bioinformaticians studying developmental biology. It is well-designed, cites relevant literature, and presents accurate results with significant academic merit. However, there are still some shortcomings that need to be addressed, and the authors must revise the manuscript accordingly:

1. Figures 1 and 2 would be more appropriately presented in the results section, as they depict the research findings. I can understand that the reason researchers add figure 1 in background is because it has results from the previous studies by Liu et. al., 2017 and Strober et al., 2019, but for me you cannot present your results’ as a background. A good idea would be to leave only the diagrams of differentiation stage and developmental markers with the results from previous studies at the background section and the rest of the results move in the results’ section. Additionally, Figures 3, 4, 5, and 6 also present results from the study and should be moved to the results section. The datasets and experimental setup sections are part of the methodology and should be relocated accordingly. Results section must be re-organized with the figures 1-6 in order to present accurately only the research’s’ results and discussion part must be a different section. Furthermore, the analysis overview should be moved to the methodology section, clearly outlining the three methods used: linear model ANOVA, basic descriptive statistical analysis, and correlation analysis. Finally, overall results – all methods of classification must be used as a discussion of the study and last paragraph must be categorized as the conclusions of the study.

2. On page 5 (datasets section), the authors refer to the original procedure without explaining it in sufficient detail, leaving gaps in the readers' understanding. Also, authors don’t mention the multiple hPSC lines which they referred to (line 8). Moreover, the manuscript does not clearly explain why day 8 was selected as the 4th time point, while other studies used day 15 and day 30 as corresponding time points.

On page 21, paragraph 2, authors refer to twenty genes without mentioned them name by name, while figure 4 must be referred to paragraph 3 since it describes its results. In page 23, paragraph 2 (the devGA…), authors did not refer which are the three different classification methods were used for basic statistics or correlation. Finally, they are not clear in the manuscript the criteria used in the developmental markers for each stage as well as how the markers that are between two different development stages can be used.

3. The environment plays a critical role in influencing developmental processes, including the differentiation of human pluripotent stem cells (hPSCs) into cardiomyocytes. Given that gene expression and signaling are key environmental factors that influence the expression of other factors and molecules, it would be valuable for the authors to experimentally study how the overexpression and/or inhibition of these genes in the environment affects cardiomyocyte differentiation, and at which stages or levels this impact occurs.

4. A future perspectives section should be added to discuss the potential role of devGEA in identifying differentially expressed genes in cardiomyocyte differentiation in humans as a predictive tool. Additionally, its applicability to other developmental processes should be explored to highlight the broader value of the tool.
Team presents very clear the methodology used in their study, allowing the replication of analyses by others with very easy access to the source data used for the results to ensure full reproducibility. In addition, they provide a very clear guidance about devGA and how they designed it in order to give to the readers clear instructions how to use it. In addition, interpretation of the statistical analysis of the results was very appropriate and it is clear how statistically significant results were ‘’extracted’’ from the study.

Team presents very clear the methodology used in their study, allowing the replication of analyses by others with very easy access to the source data used for the results to ensure full reproducibility. In addition, they provide a very clear guidance about devGA and how they designed it in order to give to the readers clear instructions how to use it. In addition, interpretation of the statistical analysis of the results was very appropriate and it is clear how statistically significant results were ‘’extracted’’ from the study.

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Partly

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Cancer Genetics, Computational Biology

We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

12 Views

12 Oct 2024 | for Version 1

Qing Liu, Clemson University, Clemson, USA

12 Views Cite this report Responses(0)

Approved With Reservations

The manuscript "A high-throughput gene expression analysis software tool for developmental time series and gene signature analysis of human cardiomyocyte differentiation" by Bartlett et al. developed a novel bioinformatic tool named Developmental Gene Expression Analysis (devGEA), which can be be implemented locally or via web browsers to analyze gene expression for developmental biology study, such as developmental time series profiling, identification of differentially expressed genes and markers.
Overall this study is promising for those scientists who is focusing on stem cell biology or developmental biology but with limited expertise in bioinformatics. However, the manuscript has some pitsfalls in some parts that the author need to be clarified or revised.

1. Datasets used for this study is limited to stem-cell-based differentiation. If the authors state that this tool can be used for analyzing gene expression of the developmental biology, the dataset from in-vivo studies should be included. Otherwise, this will downgrade the significance and importance of tool.

2. Can this tool be used for single-cell RNA-seq? Or just for the bulk RNA-seq? One of major challenges of studying cell differentiation is the heterogeneity of the cell population. For instance, despite of using monolayer cardiomyocyte differentiation, the whole cell populations still include various cell types, such as ventricle-like, atrial-like, endothelial, epithelial cells, etc.
The authors need to state of challenge or potential pitfall of this tool

3. Choosing the markers are kind of arbitrary. For example, on page 16, I don't think it is a good idea to use GATA4 as the major marker for early cardiac stage. Actually, GATA4, TBX5 and TNTT2 can be detected from the early beating cardiomyocytes to late stages. This issue is also seen in using ZEB2 as a marker. Based on Liu's study (2017), those genes showed highly weight in the network analysis, but they are not used as typical makers for early stages. In addition, using only one marker for filtration has very low power.
Usually, for mesoderm, TBXT, EMOS, TBX6, et al,
For early cardiac lineage/cardiac progenitors, MESP1, NKX2.5,
For cardiac markers: GATA4, TNNT2, TBX5, MYL7, et al
Reference:
https://www.rndsystems.com/research-area/mesodermal-lineage-markers
https://www.rndsystems.com/research-area/cardiac-stem-cell-markers

4. If this tool can provide GO enrichment analysis or KEGG analysis, it will be a plus to increase the significance of this tool.

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Partly
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Partly

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Stem cell biology, genomics, cardiovascular

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Respond to this report

Responses (0)

[1] 1. Smith CM, Hayamizu TF, Finger JH, et al.: The mouse Gene Expression Database (GXD): 2019 update. Nucleic Acids Res. 2019; 47(D1): D774–D779. Epub 2018/10/20. PubMed Abstract | Publisher Full Text | Free Full Text

[2] 2. Visel A, Thaller C, Eichele G: GenePaint.org: an atlas of gene expression patterns in the mouse embryo. Nucleic Acids Res. 2004; 32(Database issue): D552–6. Epub 2003/12/19. PubMed Abstract | Publisher Full Text | Free Full Text

[3] 3. Consortium GT: Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science. 2015; 348(6235): 648–60. Epub 2015/05/09. PubMed Abstract | Publisher Full Text | Free Full Text

[4] 4. Bult CJ, Blake JA, Smith CL, et al.: Mouse Genome Database G. Mouse Genome Database (MGD) 2019. Nucleic Acids Res. 2019; 47(D1): D801–6. Epub 2018/11/09. PubMed Abstract | Publisher Full Text | Free Full Text

[5] 5. Barrett T, Wilhite SE, Ledoux P, et al.: NCBI GEO: archive for functional genomics data sets--update. Nucleic Acids Res. 2013; 41(Database issue): D991–5. Epub 2012/11/30. PubMed Abstract | Publisher Full Text | Free Full Text

[6] 6. Takahashi K, Yamanaka S: Induced pluripotent stem cells in medicine and biology. Development. 2013; 140(12): 2457–2461. PubMed Abstract | Publisher Full Text

[7] 7. Shi Y, Inoue H, Wu JC, et al.: Induced pluripotent stem cell technology: a decade of progress. Nat. Rev. Drug Discov. 2017; 16(2): 115–30. Epub 2016/12/16. PubMed Abstract | Publisher Full Text | Free Full Text

[8] 8. Mummery CL, Zhang J, Ng ES, et al.: Differentiation of human embryonic stem cells and induced pluripotent stem cells to cardiomyocytes: a methods overview. Circ. Res. 2012; 111(3): 344–58. Epub 2012/07/24. PubMed Abstract | Publisher Full Text | Free Full Text

[9] 9. Doyle MJ, Lohr JL, Chapman CS, et al.: Human Induced Pluripotent Stem Cell-Derived Cardiomyocytes as a Model for Heart Development and Congenital Heart Disease. Stem Cell Rev. Rep. 2015; 11(5): 710–27. PubMed Abstract | Publisher Full Text | Free Full Text

[10] 10. Zhu Z, Huangfu D: Human pluripotent stem cells: an emerging model in developmental biology. Development. 2013; 140(4): 705–17. Epub 2013/01/31. PubMed Abstract | Publisher Full Text | Free Full Text

[11] 11. Burridge PW, Keller G, Gold JD, et al.: Production of de novo cardiomyocytes: human pluripotent stem cell differentiation and direct reprogramming. Cell Stem Cell. 2012; 10(1): 16–28. PubMed Abstract | Publisher Full Text | Free Full Text

[12] 12. Burridge PW, Thompson S, Millrod MA, et al.: A universal system for highly efficient cardiac differentiation of human induced pluripotent stem cells that eliminates interline variability. PLoS One. 2011; 6(4): e18293. Epub 2011/04/16. PubMed Abstract | Publisher Full Text | Free Full Text

[13] 13. Lian X, Hsiao C, Wilson G, et al.: Robust cardiomyocyte differentiation from human pluripotent stem cells via temporal modulation of canonical Wnt signaling. Proc. Natl. Acad. Sci. USA. 2012; 109(27): E1848–57. Epub 2012/05/31. PubMed Abstract | Publisher Full Text | Free Full Text

[14] 14. Kattman SJ, Witty AD, Gagliardi M, et al.: Stage-specific optimization of activin/nodal and BMP signaling promotes cardiac differentiation of mouse and human pluripotent stem cell lines. Cell Stem Cell. 2011; 8(2): 228–40. PubMed Abstract | Publisher Full Text

[15] 15. Burridge PW, Matsa E, Shukla P, et al.: Chemically defined generation of human cardiomyocytes. Nat. Methods. 2014; 11(8): 855–60. Epub 2014/06/15. PubMed Abstract | Publisher Full Text | Free Full Text

[16] 16. Lian X, Zhang J, Azarin SM, et al.: Directed cardiomyocyte differentiation from human pluripotent stem cells by modulating Wnt/beta-catenin signaling under fully defined conditions. Nat. Protoc. 2013; 8(1): 162–75. Epub 2012/12/20. PubMed Abstract | Publisher Full Text | Free Full Text

[17] 17. Liu Q, Jiang C, Xu J, et al.: Genome-Wide Temporal Profiling of Transcriptome and Open Chromatin of Early Cardiomyocyte Differentiation Derived From hiPSCs and hESCs. Circ. Res. 2017; 121(4): 376–91. Epub 2017/07/01. PubMed Abstract | Publisher Full Text | Free Full Text

[18] 18. Strober BJ, Elorbany R, Rhodes K, et al.: Dynamic genetic regulation of gene expression during cellular differentiation. Science. 2019; 364(6447): 1287–90. Epub 2019/06/30. PubMed Abstract | Publisher Full Text | Free Full Text

[19] 19. Koressaar T, Remm M: Enhancements and modifications of primer design program Primer3. Bioinformatics. 2007; 23(10): 1289–91. Epub 2007/03/22. PubMed Abstract | Publisher Full Text

[20] 20. Livak KJ, Schmittgen TD: Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods. 2001; 25(4): 402–8. PubMed Abstract | Publisher Full Text

[21] 21. Bustin SA, Benes V, Garson JA, et al.: The MIQE guidelines: minimum information for publication of quantitative real-time PCR experiments. Clin. Chem. 2009; 55(4): 611–22. Epub 2009/02/26. PubMed Abstract | Publisher Full Text

[22] 22. Takahashi K, Yamanaka S: Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell. 2006; 126(4): 663–76. Epub 2006/08/15. PubMed Abstract | Publisher Full Text

[23] 23. Okita K, Ichisaka T, Yamanaka S: Generation of germline-competent induced pluripotent stem cells. Nature. 2007; 448(7151): 313–7. Epub 2007/06/08. PubMed Abstract | Publisher Full Text

[24] 24. Yu J, Vodyanik MA, Smuga-Otto K, et al.: Induced pluripotent stem cell lines derived from human somatic cells. Science. 2007; 318(5858): 1917–20. Epub 2007/11/20. PubMed Abstract | Publisher Full Text

[25] 25. Kouskoff V, Lacaud G, Schwantz S, et al.: Sequential development of hematopoietic and cardiac mesoderm during embryonic stem cell differentiation. Proc. Natl. Acad. Sci. USA. 2005; 102(37): 13170–5. Epub 2005/09/02. PubMed Abstract | Publisher Full Text | Free Full Text

[26] 26. David R, Jarsch VB, Schwarz F, et al.: Induction of MesP1 by Brachyury(T) generates the common multipotent cardiovascular stem cell. Cardiovasc. Res. 2011; 92(1): 115–22. Epub 2011/06/01. PubMed Abstract | Publisher Full Text

[27] 27. Kuo CT, Morrisey EE, Anandappa R, et al.: GATA4 transcription factor is required for ventral morphogenesis and heart tube formation. Genes Dev. 1997; 11(8): 1048–60. PubMed Abstract | Publisher Full Text

[28] 28. van Berlo JH , Elrod JW, van den Hoogenhof MM , et al.: The transcription factor GATA-6 regulates pathological cardiac hypertrophy. Circ. Res. 2010; 107(8): 1032–40. Epub 2010/08/12. PubMed Abstract | Publisher Full Text | Free Full Text

[29] 29. Molkentin JD, Lin Q, Duncan SA, et al.: Requirement of the transcription factor GATA4 for heart tube formation and ventral morphogenesis. Genes Dev. 1997; 11(8): 1061–72. PubMed Abstract | Publisher Full Text

[30] 30. Lints TJ, Parsons LM, Hartley L, et al.: Nkx-2.5: a novel murine homeobox gene expressed in early heart progenitor cells and their myogenic descendants. Development. 1993; 119(2): 419–31. PubMed Abstract | Publisher Full Text

[31] 31. Edmondson DG, Lyons GE, Martin JF, et al.: Mef2 gene expression marks the cardiac and skeletal muscle lineages during mouse embryogenesis. Development. 1994; 120(5): 1251–63. PubMed Abstract | Publisher Full Text

[32] 32. Bu L, Jiang X, Martin-Puig S, et al.: Human ISL1 heart progenitors generate diverse multipotent cardiovascular cell lineages. Nature. 2009; 460(7251): 113–7. PubMed Abstract | Publisher Full Text

[33] 33. Bruneau BG, Logan M, Davis N, et al.: Chamber-specific cardiac expression of Tbx5 and heart defects in Holt-Oram syndrome. Dev. Biol. 1999; 211(1): 100–8. PubMed Abstract | Publisher Full Text

[34] 34. He X, Liu Y, Sharma V, et al.: ASK1 associates with troponin T and induces troponin T phosphorylation and contractile dysfunction in cardiomyocytes. Am. J. Pathol. 2003; 163(1): 243–51. PubMed Abstract | Publisher Full Text | Free Full Text

[35] 35. Martin FJ, Amode MR, Aneja A, et al.: Ensembl 2023. Nucleic Acids Res. 2023; 51(D1): D933–41. PubMed Abstract | Publisher Full Text | Free Full Text

[36] 36. NV OA: ShinyProxy.2024.

[37] 37. Wagner GP, Kin K, Lynch VJ: Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples. Theory Biosci. 2012; 131(4): 281–5. Epub 2012/08/08. PubMed Abstract | Publisher Full Text

[38] 38. Abrams ZB, Johnson TS, Huang K, et al.: A protocol to evaluate RNA sequencing normalization methods. BMC Bioinformatics. 2019; 20(Suppl 24): 679. Epub 2019/12/20. PubMed Abstract | Publisher Full Text | Free Full Text

[39] 39. Coombes KR, Fritsche HA Jr, Clarke C, et al.: Quality control and peak finding for proteomics data collected from nipple aspirate fluid by surface-enhanced laser desorption and ionization. Clin. Chem. 2003; 49(10): 1615–23. PubMed Abstract | Publisher Full Text

[40] 40. Zhao S, Ye Z, Stanton R: Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols. RNA. 2020; 26(8): 903–9. Epub 2020/04/13. PubMed Abstract | Publisher Full Text | Free Full Text

[41] 41. Nakamura T, Fidler IJ, Coombes KR: Gene expression profile of metastatic human pancreatic cancer cells depends on the organ microenvironment. Cancer Res. 2007; 67(1): 139–48. PubMed Abstract | Publisher Full Text

[42] 42. Lyons I, Parsons LM, Hartley L, et al.: Myogenic and morphogenetic defects in the heart tubes of murine embryos lacking the homeo box gene Nkx2-5. Genes Dev. 1995; 9(13): 1654–66. Epub 1995/07/01. PubMed Abstract | Publisher Full Text

[43] 43. Savory JG, Bouchard N, Pierre V, et al.: Cdx2 regulation of posterior development through non-Hox targets. Development. 2009; 136(24): 4099–110. Epub 2009/11/11. PubMed Abstract | Publisher Full Text

[44] 44. Savory JG, Pilon N, Grainger S, et al.: Cdx1 and Cdx2 are functionally equivalent in vertebral patterning. Dev. Biol. 2009; 330(1): 114–22. Epub 2009/03/27. PubMed Abstract | Publisher Full Text

[45] 45. Chu M, Wang L, Wang H, et al.: A novel role of CDX1 in embryonic epicardial development. PLoS One. 2014; 9(7): e103271. Epub 2014/07/28. PubMed Abstract | Publisher Full Text | Free Full Text

[46] 46. Foley TE, Hess B, Savory JGA, et al.: Role of Cdx factors in early mesodermal fate decisions. Development. 2019; 146(7). Epub 2019/04/01. PubMed Abstract | Publisher Full Text

[47] 47. Chen YH, Ishii M, Sucov HM, et al.: Msx1 and Msx2 are required for endothelial-mesenchymal transformation of the atrioventricular cushions and patterning of the atrioventricular myocardium. BMC Dev. Biol. 2008; 8: 75. Epub 2008/07/30. PubMed Abstract | Publisher Full Text | Free Full Text

[48] 48. Fu J, Jiang M, Mirando AJ, et al.: Reciprocal regulation of Wnt and Gpr177/mouse Wntless is required for embryonic axis formation. Proc. Natl. Acad. Sci. USA. 2009; 106(44): 18598–603. Epub 2009/10/19. PubMed Abstract | Publisher Full Text | Free Full Text

[49] 49. Banziger C, Soldini D, Schutt C, et al.: Wntless, a conserved membrane protein dedicated to the secretion of Wnt proteins from signaling cells. Cell. 2006; 125(3): 509–22. PubMed Abstract | Publisher Full Text

[50] 50. Miyamoto M, Kannan S, Anderson MJ, et al.: Cardiac progenitors instruct second heart field fate through Wnts. Proc. Natl. Acad. Sci. USA. 2023; 120(4): e2217687120. Epub 2023/01/17. PubMed Abstract | Publisher Full Text | Free Full Text

[51] 51. Van de Putte T, Maruhashi M, Francis A, et al.: Mice lacking ZFHX1B, the gene that codes for Smad-interacting protein-1, reveal a role for multiple neural crest cell defects in the etiology of Hirschsprung disease-mental retardation syndrome. Am. J. Hum. Genet. 2003; 72(2): 465–70. Epub 2003/01/09. PubMed Abstract | Publisher Full Text | Free Full Text

[52] 52. Eid A, Torres-Padilla ME: Characterization of non-canonical Polycomb Repressive Complex 1 subunits during early mouse embryogenesis. Epigenetics. 2016; 11(6): 389–97. Epub 2016/04/15. PubMed Abstract | Publisher Full Text | Free Full Text

[53] 53. Chen SY, Liu FC: The Fgf9-Nolz1-Wnt2 axis regulates morphogenesis of the lung. Development. 2023; 150(16). Epub 2023/08/17. PubMed Abstract | Publisher Full Text

[54] 54. Kinoshita M, Era T, Jakt LM, et al.: The novel protein kinase Vlk is essential for stromal function of mesenchymal cells. Development. 2009; 136(12): 2069–79. PubMed Abstract | Publisher Full Text

[55] 55. Brutsch SM, Madzharova E, Pantasis S, et al.: Mesenchyme-derived vertebrate lonesome kinase controls lung organogenesis by altering the matrisome. Cell. Mol. Life Sci. 2023; 80(4): 89. Epub 2023/03/15. PubMed Abstract | Publisher Full Text | Free Full Text

[56] 56. Xu Q, Wang Y, Dabdoub A, et al.: Vascular development in the retina and inner ear: control by Norrin and Frizzled-4, a high-affinity ligand-receptor pair. Cell. 2004; 116(6): 883–895. PubMed Abstract | Publisher Full Text

[57] 57. Summerhurst K, Stark M, Sharpe J, et al.: 3D representation of Wnt and Frizzled gene expression patterns in the mouse embryo at embryonic day 11.5 (Ts19). Gene Expr. Patterns. 2008; 8(5): 331–48. Epub 2008/02/13. PubMed Abstract | Publisher Full Text | Free Full Text

[58] 58. Chen Q, Zhang H, Liu Y, et al.: Endothelial cells are progenitors of cardiac pericytes and vascular smooth muscle cells. Nat. Commun. 2016; 7: 12422. Epub 2016/08/12. PubMed Abstract | Free Full Text Publisher Full Text |

[59] 59. Yoon C, Song H, Yin T, et al.: FZD4 Marks Lateral Plate Mesoderm and Signals with NORRIN to Increase Cardiomyocyte Induction from Pluripotent Stem Cell-Derived Cardiac Progenitors. Stem Cell Reports. 2018; 10(1): 87–100. Epub 2017/12/14. PubMed Abstract | Publisher Full Text | Free Full Text

[60] 60. Liu C, Spinozzi S, Chen JY, et al.: Nexilin Is a New Component of Junctional Membrane Complexes Required for Cardiac T-Tubule Formation. Circulation. 2019; 140(1): 55–66. Epub 2019/04/15. PubMed Abstract | Publisher Full Text | Free Full Text

[61] 61. Wu T, Xu Y, Zhang L, et al.: Filamin C is Essential for mammalian myocardial integrity. PLoS Genet. 2023; 19(1): e1010630. Epub 2023/01/27. PubMed Abstract | Publisher Full Text | Free Full Text

[62] 62. Huang ZP, Kataoka M, Chen J, et al.: Cardiomyocyte-enriched protein CIP protects against pathophysiological stresses and regulates cardiac homeostasis. J. Clin. Invest. 2015; 125(11): 4122–34. Epub 2015/10/05. PubMed Abstract | Publisher Full Text | Free Full Text

[63] 63. Tsoupri E, Kostavasili I, Kloukina I, et al.: Myospryn deficiency leads to impaired cardiac structure and function and schizophrenia-associated symptoms. Cell Tissue Res. 2021; 385(3): 675–96. Epub 2021/05/26. PubMed Abstract | Publisher Full Text

[64] 64. Broadway-Stringer S, Jiang H, Wadmore K, et al.: Insights into the Role of a Cardiomyopathy-Causing Genetic Variant in ACTN2. Cells. 2023; 12(5). Epub 2023/02/24. PubMed Abstract | Free Full Text Publisher Full Text |

[65] 65. Shojima N, Hara K, Fujita H, et al.: Depletion of homeodomain-interacting protein kinase 3 impairs insulin secretion and glucose tolerance in mice. Diabetologia. 2012; 55(12): 3318–30. Epub 2012/09/16. PubMed Abstract | Publisher Full Text

[66] 66. Isono K, Nemoto K, Li Y, et al.: Overlapping roles for homeodomain-interacting protein kinases hipk1 and hipk2 in the mediation of cell growth in response to morphogenetic and genotoxic signals. Mol. Cell. Biol. 2006; 26(7): 2758–2771. PubMed Abstract | Publisher Full Text | Free Full Text

[67] 67. Lee E, May H, Kazmierczak K, et al.: The MYPT2-regulated striated muscle-specific myosin light chain phosphatase limits cardiac myosin phosphorylation in vivo. J. Biol. Chem. 2024; 300(2): 105652. Epub 2024/01/13. PubMed Abstract | Publisher Full Text | Free Full Text

[68] 68. Sun RH, Hu BC, Li Q: Stress-induced cardiomyopathy complicated by multiple organ failure following cephalosporin-induced anaphylaxis. Intern. Med. 2012; 51(8): 895–9. Epub 2012/04/17. PubMed Abstract | Publisher Full Text

[69] 69. Ang YS, Rivas RN, Ribeiro AJS, et al.: Disease Model of GATA4 Mutation Reveals Transcription Factor Cooperativity in Human Cardiogenesis. Cell. 2016; 167(7): 1734–1749.e22. Epub 2016/12/17. PubMed Abstract | Publisher Full Text | Free Full Text

[70] 70. Isaiah Bartlett JK, Martin G, Coombes K, et al.: devGEA 1.0.1 - source code and dataset.2024. Publisher Full Text

[71] 71. Team RC: R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2024.

[72] 72. Bengtsson H: matrixStats: Functions that Apply to Rows and Columns of Matrices (and to Vectors).2024. Publisher Full Text

[73] 73. Coombes KR: ClassComparison: Classes and Methods for “Class Comparison” Problems on Microarrays.2019. ClassComparison. Publisher Full Text

A high-throughput gene expression analysis software tool for developmental time series and gene signature analysis of human cardiomyocyte differentiation

Abstract

Keywords

Introduction

Background - cardiomyocyte differentiation

Figure 1. hiPSC directed cardiomyocyte differentiation procedures used in this study.

Figure 2. Directed cardiomyocyte differentiation of Gibco hiPSCs and RT-qPCR analysis of developmental marker expression.

Datasets

Experimental setup

hiPSC line

Directed cardiomyocyte differentiation

Total RNA isolation and cDNA synthesis

Reverse transcription quantitative PCR (RT-qPCR)

Table 1. Gene-specific primers for RT-qPCR analysis of GIBCO hiPSC cardiomyocyte differentiation.

Identification of key marker genes and expression patterns

Figure 3. Gene expression signatures of clustered genes.

Figure 4. RT-qPCR analysis of candidate genes.

Figure 5. Model fits and RT-qPCR of two sample genes, ZEB2 and ACTN2.

Figure 6. Gene expression profiles of mid-cardiomyocyte marker candidates obtained from the GSE85331 RNA-seq experiment.

Methods

Implementation

Operation

Results and discussion

Data import, normalization, filtering, scaling

Figure 7. Principal components analysis of combined GSE122380/GSE85331.

Analysis overview

Linear model ANOVA analysis

Selection of a priority cardiomyocyte differentiation gene list

Table 2. Analysis summary for ZEB2 and ACTN2.

Basic/descriptive statistical analysis

Experimental confirmation

Table 3. Summary of analysis for 32 genes from ISU gene set.

Correlation analysis

Overall results - all methods of classification

Conclusion

Ethics and consent

Data availability

Extended data

Software availability

Acknowledgements

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated