An R-based reproducible and user-friendly preprocessing pipeline for CyTOF data

Mass cytometry (CyTOF) has become a method of choice for in-depth characterization of tissue heterogeneity in health and disease, and is currently implemented in multiple clinical trials, where higher quality standards must be met. Currently, preprocessing of raw files is commonly performed in independent standalone tools, which makes it difficult to reproduce. Here, we present an R pipeline based on an updated version of CATALYST that covers all preprocessing steps required for downstream mass cytometry analysis in a fully reproducible way. This new version of CATALYST is based on Bioconductor’s SingleCellExperiment class and fully unit tested. The R-based pipeline includes file concatenation, bead-based normalization, single-cell deconvolution, spillover compensation and live cell gating after debris and doublet removal. Importantly, this pipeline also includes different quality checks to assess machine sensitivity and staining performance while allowing also for batch correction. This pipeline is based on open source R packages and can be easily be adapted to different study designs. It therefore has the potential to significantly facilitate the work of CyTOF users while increasing the quality and reproducibility of data generated with this technology.


Introduction
Over the past decade, mass cytometry (CyTOF) has advanced our understanding of a wide range of cellular processes, particularly in the field of immunology and tumor biology 1,2 , by enabling the simultaneous measurement of 40+ parameters at the single cell level. Currently, mass cytometry is transitioning from an exploratory research approach toward a diagnostic tool used in clinical laboratories and this transition is associated with an increased need for standardization 3 . Various studies have already suggested improvements on the experimental workflows to increase the robustness of mass cytometry data by working with frozen antibody cocktails or by including shared reference samples in each independent experiment to enable for batch correction 4,5 . Similarly, advanced downstream analyses benefit from the large number of analysis tools and algorithms implemented in R, which allow for fully reproducible analyses 6 .
Between data generation and downstream data analysis, data preprocessing is an multi-step procedure required to convert raw FCS files into data objects that can be input to downstream statistical analysis and visualization 7 . Upon data collection, the first step consists in concatenating files from sequential CyTOF acquisitions and removing events with unstable signal, which are usually caused by uneven flow rate or introduction of air in the fluidic system. As a second step, CyTOF data need to be corrected for time dependent signal drift, which is mostly due to cone contamination, mass calibration drift or loss of detector sensitivity over time. This correction is performed by acquiring metal tagged polystyrene beads together with the cell suspension, where bead signals can be used as a reference to normalize the cell signals 8 . Another potential artefact in CyTOF data is due to signal spillover between channels. Although lower than what is usually observed in fluorescent flow cytometry, spillover in mass cytometry can still account for up to 4% of the signal in some channels and needs to be corrected using signal compensation 9 . Sample barcoding prior to staining is a common approach used in mass cytometry to combine multiple samples in a single experiment to minimize experimental variation due to staining and CyTOF acquisition. In this case, individual cells have to be assigned to their respective sample via a process called single cell debarcoding 10 . In large studies where samples are collected over a long period of time by different users, on different machines or at different sites, an important step is to correct for batch effects, which can be achieved by including a shared control sample in each independent batch 11,12 . Finally, only live, intact single cells are relevant for the downstream analysis. Beads, doublets, debris and dead cells are excluded by gating on scatter plots 7 .
Each step of the preprocessing pipeline requires expert decisions to determine the best parameters to achieve an optimal signal correction and cell selection. Moreover, all the chosen parameters should be recorded for reproducibility purposes. Despite these requirements, many current preprocessing pipelines still rely on switching between platforms that include, for example, MATLAB applications and (at least partially) closed source online platforms (e.g., Cytobank 13 ). This approach necessitates uploading the data to different platforms and carrying out certain steps in a purely manual fashion, which makes it time-consuming and difficult to reproduce. This is particularly limiting in a clinical setting, where reproducibility and large-scale data analysis are required. Thus, we propose a semi-automated R-based preprocessing pipeline for CyTOF data that is: i) fully reproducible; ii) includes quality checks and, iii) has limited need for supervision once the original setup has been made. This pipeline is developed around an updated version of CATALYST, an R package designed for preprocessing and differential analysis of mass cytometry data 9,14 . This new version of CATALYST is based on Bioconductor's SingleCellExperiment class, the standard for high dimensional single cell data analysis. This pipeline can easily be adapted to each CyTOF user's needs and will accelerate CyTOF data preprocessing while improving the quality of mass cytometry data generated.

Data description
The data used in this pipeline were generated in the context of the Tumor Profiler project, a multi-center observational study investigating the relevance of different innovative technologies, including CyTOF, imaging mass cytometry, single-cell DNA and RNA sequencing, as well as ex vivo drug testing to improve the diagnostic of advanced cancer patients 15 .
The samples of interest included tumor biopsies and blood samples collected at the University Hospital Zurich in spring 2020. These samples were assessed by mass cytometry in the context of a set of references including commercially available cell lines, PBMCs from healthy donors and PHA activated PBMCs. PBMCs from patients and healthy donors were collected based on a ficoll gradient 16 , and tumor samples were dissociated as previously described 17 . Once in single-cell suspension, all samples were stained for 5 min on ice with Cell-ID TM Cisplatin-194Pt (#201194, Standard BioTools) to identify dead cells and subsequently fixed with PFA 1.6% (#15710, Electron Microscopy Sciences). Samples were stored as dry pellet at −80°C until CyTOF measurement.
The dataset used in this study was obtained from a single CyTOF experiment, also called batch, where nine references, two blood samples and two tumor samples were barcoded with a 20-well barcoding plate 17 . Reference samples were selected to contain positive and negative populations for each marker included in the study's antibody panel. This design was chosen to enable for quality control and batch correction across independent experiments based on quantile scaling as described in 11. Pooled cells were stained with a 40-Ab panel designed to perform an in-depth characterization of the samples' immune compartment. DNA intercalation was performed with a 1h incubation in Cell-ID TM Intercalator-Ir (#201192B, Fluidigm). Finally, the cell suspension was diluted 1:10 in Maxpar® Cell Acquisition Solution (#201240, Fluidigm) and 10% of EQ Four Element Calibration Beads (#201078, Fluidigm), and acquired on a Helios™ upgraded CyTOF 2 system at a flow rate of 150 events per second.
Throughout this workflow, we will make use of a set of metadata for standard preprocessing steps (normalization, debarcoding and compensation), as well as various quality controls previously acquired over seven independent experiments. An overview of the metadata used is given in Table 1. Table 1. Overview of metadata files used throughout this pipeline, including each file's description, dimensionality (if appropriate), and purpose for preprocessing or quality control.

normalization_beads.fcs
Beads identified using CATALYST during the normalization step of a previous CyTOF experiment.
Used as reference beads to correct for changes in signal sensitivity over time.

ref_bead_counts.csv
A table of mean dual counts for the 6 different bead channels (columns) obtained from 7 previous CyTOF experiments (rows).
Used as a reference to assess the measurement sensitivity.

spillover_matrix.csv
A spillover matrix calculated with CATALYST from beads singlestained with each of the 40 antibodies included in the panel used in this study. The matrix contains, for each measurement channel (rows), the percentage of signal received by all other channels (columns).
Used for correction of spillover.

ref_cell_counts.csv
A Used to assess the staining efficiency of the current experiment

Data organization
Most data used and returned throughout this workflow are kept in an object of Bioconductor's SingleCellExperiment (SCE) class from the SingleCellExperiment package 18 . This data structure can store all single-cell related data (measurement data and transformations thereof; cell, feature and experiment-wide metadata; dimensionality reductions), allowing for synchronized and thus less error-prone data manipulation.
The key component of SCEs are matrix-like assays, where rows are features (targets) and columns are observations (cells), that store the measurement data and any data derived thereof. Metadata associated with cells are stored under colData, feature metadata under rowData, and any experiment-wide metadata may be stored in the metadata slot. Lastly, the SCE can store an arbitrary number of dimensionality reductions under reducedDims. For a more detailed description of usage and structure of SCEs, we refer to the SingleCellExperiment package's documentation.

Results
The pipeline presented here describes all steps required to process raw mass cytometry data to a state where the user may proceed with downstream analyses (e.g., dimensionality reduction, differential analysis, trajectory inference). The process includes the concatenation of the individual acquisitions, the exclusion of part of the acquisition with unstable signal, the correction for time-dependent signal drift via bead normalization, the correction for signal spillover via compensation, the selection of cells of interest via automated gating, and the correction for batch effects. The workflow is exemplified on data from a single CyTOF experiment collected via three successive acquisitions (individual FCS files) of 15 barcoded samples mixed with calibration beads.
Throughout, raw measurement data (FCS files) as well as all metadata (for debarcoding, normalization, compensation, and quality control) are expected to be located inside a data/ subdirectory (relative to where the code is being run); otherwise, the presented file paths require modification.

Constructing a SingleCellExperiment
By default, flowCore's read.FCS() function, which underlies read.flowSet() for reading in a set of FCS files, transforms channel intensities and removes events with extreme values. To omit this behavior, we recommend reading in files with arguments transformation = FALSE and truncate_max_range = FALSE; by default, files will be read in by CATALYST's prepData() function with these settings.
As described above, the SCE class allows the keeping of multiple data transformations in a single object. Thus, when applying a transformation to arrive at expression-like data, we can store the transformed data in a separate assay without overwriting the raw ion count data. In this way, any data generated and used throughout preprocessing (e.g., normalized, compensated or batch-corrected counts and their arcsinh-transformed counterparts) can be in principal retained, and written to intermediate FCS files for backup or quality control outside of R. However, it is worth noting that this procedure could lead to a shortage of memory for large datasets, in which case overwriting the data at each step is advisable; if not specified otherwise, CATALYST overwrites by default.
A SCE can be constructed using CATALYST's prepData() function, which accepts a path to a directory with one or many FCS files, a character vector of FCS filenames, a single or list of flowFrame(s), or a flowSet (flowCore package 23 ). By default (transform = TRUE), an arcsinh-transformation with cofactor = 5 is applied to the input (count) data, and the resulting expression matrix is stored in the exprs assay slot of the output SCE: # construct 'SingleCellExperiment' fcs <-list.files("data", "acquisition", full.names = TRUE) (sce <-prepData(fcs, transform = TRUE, cofactor = 5)) Initially, our SCE has two assays containing dual ion counts (assay counts) and cofactor-5 arcsinhtransformed counts (assay exprs). The cofactor used for transformation is stored inside the object's internal metadata (int_metadata(sce)$cofactor), and the FCS file of origin for each cell under cell metadata column sample_id (accessible via colData(sce)$sample_id or, equivalently, sce$sample_ id). In our dataset, FCS files correspond to acquisitions rather than biological samples. Thus, we rename the cell metadata variable sample_id to file_id to avoid ambiguity: i <-match("sample_id", names(colData(sce))) names(colData(sce))[i] <-"file_id" In both mass and flow cytometry, each feature has both a channel and target associated with it. As can be seen from printing the sce variable above, prepData() defaults to using targets as rownames (when available). We can retrieve each feature's measurement channel using the channels() accessor, and use channel metals and masses to extract the indices of features that are relevant to different preprocessing steps. Namely, we assign channels measuring DNA to the variable dna (here, Ir191 and Ir193), and channels for live gating (here, Ir191 for DNA and Pt194 for cisplatin) to live: # store character vector or channels names chs <-channels(sce) # store DNA & live channel indices dna <-grep("^Ir", chs) live <-grep("191|194", chs) Filtering for stable signal High quality data generation requires a stable signal throughout the acquisition. Various issues can lead to signal change over time, including unstable flow rate, introduction of air or introduction of metal contamination in the system. These changes in signal intensity can vary in terms of duration and intensity, and can affect all or only a subsets of channels simultaneously. In order to detect regions of the acquisition affected by signal instability, we display the signal for selected channels as a function of time in a scatter plot ( Figure 1).

Normalization
In the case of mass cytometry, signal drift during acquisition due to a progressive loss of sensitivity must be accounted and normalized for. A widely established strategy is to mix samples with polystyrene beads embedded with metal lanthanides, allowing monitoring of instrument performance throughout data acquisition 8 . These beads are in turn used to estimate and correct for the signal's time drift. When independent experiments have to be analyzed in the same context, variation due to changes in instrument performance over time combined with intervals between scheduled maintenance have to be taken into account as well. In this case, the bead signal should be normalized to a set of reference beads from an earlier experiment. This ensures that different experiments are normalized to the same level, independent of the CyTOF's sensitivity.
A MATLAB tool to perform normalization outside of R was available until recently at nolanlab/beadnormalization; current R implementations are available through CATALYST and premessa. CATALYST provides an extension of bead-based normalization as described by Finck et al. 8 , with automated identification of bead singlets (used for normalization), as well as of bead-bead and cell-bead doublets (to be removed), thus eliminating the need for manual gating. This is implemented as follows: 1. beads are initially identified as those events that have their highest signals in the bead channels 2. cell-bead doublets are removed by applying a separation cutoff on the distance between the lowest bead and highest non-bead channel signal 3. events passing all vertical gates defined by the lower bounds of bead signals are removed (these include bead-bead and bead-cell doublets) 4. bead-bead doublets are removed by applying a default median ± 5 mad rule to events identified in step 2; the remaining bead events are used for normalization The above procedure is carried out by a single function, normCytof(), which takes as input a SCE and a set of arguments that control the normalization parameters and output format. Here, we specify dna = 191 (Ir191) and beads = "dvs", corresponding to DVS Science beads (lanthanides Ce140, Eu151, Eu153, Ho165, Lu175). Secondly, we provide the path to a set of reference beads (argument norm_to) that are used to compute baseline intensities for normalization. Lastly, we set overwrite = FALSE to retain both raw and normalized data, and remove_beads = TRUE to exclude bead and doublet events: # specify path to reference beads ref_beads <-file.path("data", "normalization_beads.fcs") # apply bead-based normalization system.time(res <-normCytof(sce, beads = "dvs", dna = 191, norm_to = ref_beads, remove_beads = TRUE, overwrite = FALSE)) ## user system elapsed ## 20.134 1.343 21.963 When remove_beads = TRUE (the default), normCytof() will return a list of three SCEs containing filtered, bead and remove events, respectively, as well as two ggplot objects: names(res) ## [1] "data" "beads" "removed" "scatter" "lines" The first SCE (res$data) contains the filtered data with the additional assay slot "normed" housing normalized expressions. The remaining two SCEs are data subsets that contain any events identified as beads (slot beads) and all removed events (including beads, bead-bead and bead-cell doublets; slot removed), respectively; thus, the beads themselves are a subset of the removed events. Here, we compare the number and percentage of cells contained in each subset: # view no. of remaining, bead & removed events ns <-sapply(res[1:3], ncol) ps <-sprintf("%1.2f", ns/ncol(sce)*100) data.frame(t(cbind("# events" = ns, "% of total" = ps))) ## data beads removed ## # events 337525 27544 30627 ## % of total 91.68 7.48 8.32 As a first quality control plot, res$scatter ( Figure 2) renders scatter plots of bead channels (x-axis) versus DNA (y-axis), where events identified as beads as well as their expression range are highlighted in color; bead events should have low DNA intensity (since they are not cells) and high intensities across all bead channels. Secondly, res$lines ( Figure 3) displays smoothed median bead intensities before and after normalization; these typically decrease with time prior to normalization, and should be approximately constant and centered around the baseline after normalization. In our dataset, normalization is performed based on previously acquired reference beads. Thus, baseline values correspond to the reference bead's mean bead channel intensities. As shown in Figure 3, the bead channel levels are considerably lower after normalization, indicating higher sensitivity in the current experiment. Importantly, the slight decrease in signal over time is no longer present after normalization. In order to assess the sensitivity of the CyTOF during acquisition and identify potential issues that would have remained undetected during the tuning of the instrument, we compute the mean bead channel counts across events identified as beads (res$beads subset). A logical vector of which channels correspond to beads is stored under rowData column bead_ch, which we can use to subset the counts assay to include bead channels only.
# compute mean bead channel counts for current experiment is_bead <-rowData(res$beads)$bead_ch # get bead channels bead_cs <-counts(res$beads) [ To assess the measurement sensitivity during the current experiment, we compare the mean bead channel counts computed above to those obtained from 7 previously acquired experiments available in metadata table ref_bead_counts.csv. The resulting boxplot ( Figure 4) shows that the current experiment's sensitivity is relatively high, but well in the range of previous experiments.

Debarcoding
In mass cytometry, samples are often labeled with unique sample-specific barcodes and pooled together for processing and measurement, an approach termed multiplexing 26 . The most widely used barcoding scheme is based on Zunder et al. 10  The single cell debarcoding (SCD) algorithm first sorts each cell's barcode intensities to assign preliminary barcode IDs such that a cell is assigned to the barcode population for which its barcode intensities are highest. Next, intensities within each barcode population are scaled using the 95th expression quantiles, and thereby brought to a comparable scale. Finally, events whose separation between highest negative and lowest positive barcode intensity is below a threshold value (separation cutoff ) are left unassigned.
In the initial SCD algorithm, sample yields are determined by a single global cutoff on the separation between positive and negative barcode populations. Naturally, this procedure is suboptimal when yields as a function of the applied cutoff do not decline simultaneously. To optimize cell yields in such cases, CATALYST provides an option to automatically estimate or specify sample-specific separation cutoffs.
The SCD algorithm is implemented in CATALYST as a three-step procedure: i) preliminary barcode assignment (assignPrelim()); ii) automated estimation of sample-specific separation cutoffs (estCutoffs()); and, iii) application of cutoffs to arrive at final barcode assignments (applyCutoffs()).

## [1] TRUE
During this first debarcoding step, each event is preliminarily assigned to a barcode according to its top-k expressed barcode channels. Here, events whose expression is highest for a combination of barcode channels that does not appear in the debarcoding scheme (bc_key) will be given barcode ID 0 (for "unassigned"). Thus, we can remove empty barcodes from the bc_key variable such that events assigned to these barcodes are left unassigned from the start. Alternatively, one could perform debarcoding using the non-subsetted key, and filter out empty barcodes downstream.
# remove empty barcodes from debarcoding scheme is_empty <-grepl("empty", rownames(bc_key)) bc_key <-bc_key[!is_empty, ] bc_ids <-rownames(bc_key) For preliminary barcode assignment, we use CATLAYST's assignPrelim() function, providing the input data (sce) and debarcoding scheme (bc_key). If not specified otherwise, assignPrelim() will default to using the exprs assay slot (argument assay). Because we ran normCytof() with overwrite = FALSE, this assay contains arcsinh-transformed raw counts; we set assay = "normexprs" in order to use the normalized values instead: # do preliminary barcode assignments system.time(sce <-assignPrelim(sce, bc_key, assay = "normexprs")) ## user system elapsed ## 14.290 0.347 14.770 In the returned SCE, feature metadata (rowData) column is_bc indicates whether or not a channel corresponds to a barcode channel: For each event, barcode identifiers are stored in colData column bc_id. After this preliminary round of assignment, 57980/337525 events (17.18%) have been left unassigned: # tabulate number of (un)assigned events Furthermore, for each cell, the barcode channel expressions are scaled relative to the 95th expression percentiles of its respective barcode population. The scaled data is stored in assay slot scaled. Based on these scaled barcode channel intensities, a separation value is computed as the distance between highest negative and lowest positive barcode channel; separations are stored in colData column delta.

Estimation of separation cutoffs
To decide on separation cutoffs, we consider yields upon debarcoding as a function of the applied cutoff ( Figure 6). Commonly, this function will be characterized by an initial weak decline, where doublets are excluded, and subsequent rapid decline in yields to zero. In-between, low numbers of counts with intermediate barcode separation give rise to a plateau. Ideally, the applied separation cutoffs should provide a balance between high cell yield and low assignment uncertainty, marking the approximate midpoint of the yield function's plateau region.
plotYields(sce, which = "0") Instead of a single global cutoff, we estimate a sample-specific cutoff to account for barcode population yields that decline in an asynchronous fashion. To this end, we fit both a linear and a three-parameter log-logistic model to each yield function. For the linear fit, we estimate the cutoff as the value for which yields have declined to 50%. For the log-logistic fit, we compute the cutoff as the value for which there is minimal yield decline by minimizing each yield function's 1st derivative. For each barcode, the final cutoff estimate is computed as the mean of both estimates, weighted with the goodness (residual sum of squares) of each fit (see Methods for details). Thus, the choice of thresholds for the distance between negative and positive barcode populations is: i) automated and ii) independent for each barcode. Nevertheless, reviewing barcode-specific yield plots and, in rare cases, refining the estimated separation cutoffs is advisable (see Figure 7). Cutoff estimation is performed by CATALYST's estCutoffs() function, which takes as input a SCE as returned by assignPrelim(); that is, preliminary barcode assignments are required to estimate separation cutoffs. estCutoffs() will store sample-specific cutoff estimates under metadata slot sep_cutoffs, but will leave barcode assignments unchanged.  We can visually inspect the estimated cutoffs using plotYields() with argument which specifying the barcode ID of interest ( Figure 7). In our example, the cutoff estimate nicely marks the midpoint of the yield function's plateau or, equivalently, the valley between peaks of cell yields. To decrease the stringency of the applied cutoff, and thus increase the resulting cell yield, we could set the sample's cutoff to e.g. 0.1. Vice versa, a more stringent cutoff of e.g. 0.2 would decrease the cell yield but yield a purer population.
As an alternative to inspecting the cutoff estimate for each sample in R, we could specify which = bc_ids to obtain a list of yield plots for all barcodes; the generated set of plots may be written to a single PDF file via providing plotYields() with an out_path to allow for easy reviewing of the separating cutoffs currently stored within the object.
plotYields(sce, which = "PBMC_R1") Besides a cutoff on the separation between positive and negative barcode populations, to trim outliers, the SCD algorithms applies an additional cutoff on the Mahalanobis distance (argument mhl_cutoff), a metric that quantifies the distance of a given event to the expression distribution of the barcode population it has been assigned to.
In Figure 6, we can observe that population yields decline synchronously with increasing separation cutoffs, and that we might consider applying a global separation cutoff, e.g., at ∼ 0.15. For this data, yields are in fact similar, independent of whether we apply sample-specific cutoffs or a single global one. Nevertheless, applying sample-specific cutoffs is recommended in order to maximize cell yields while minimizing uncertainty in barcode assignments.
# store preliminary barcode IDs bc_ids0 <-sce$bc_id # apply global & sample-specific separation cutoff(s) sce_glob <-applyCutoffs(sce, sep_cutoffs = 0.15, mhl_cutoff = 30) sce_spec <-applyCutoffs(sce, mhl_cutoff = 30) # compare cell yields for both cutoff strategies c(global = mean(sce_glob$bc_id == 0), specific = mean(sce_spec$bc_id == 0)) ## global specific ## 0.3573839 0.3584979 After debarcoding, we compare the number of events assigned to each barcode population before and after applying separation cutoffs, and filter out events that have been left unassigned (barcode ID 0). As shown in Figure 8, after applying the separation cutoffs, the number of unassigned cells (0) increases, while the number of cells assigned to each barcoding well decreases. We also observe a higher decrease in assigned cells for tumor samples, which underwent a dissociation process and contain more debris. Conversely, highly viable cell lines and PBMCs have a higher recovery yield.

Compensation
Mass cytometry utilizes heavy metals (usually from the lanthanide series) as reporters to label antibodies. As a result, channel crosstalk originating from spectral overlap and autofluorescence is significantly less pronounced in mass cytometry compared to flow cytometry. Yet, spillover due to abundance sensitivity, isotopic impurities, and oxide formation still exists, giving rise to artefactual signal that can impede data interpretability.
A combined experimental-computational pipeline to correct for spillover in mass cytometry data has been proposed by Chevrier et al. 9 and is implemented in the CATALYST package. In brief, compensation is achieved via the following three-step approach outlined here (see for details).
We will apply a pre-acquired spillover matrix (metadata file spillover_matrix.csv). Thus, we enter at step 3, which involves only compensating the input dataset using CATALYST's compCytof() function. By default, compCytof() will reuse the cofactor stored in int_metadata(sce)$cofactor for computing arcsinh-transformed data from the compensated counts, thus applying the same transformation as during data preparation and normalization: # read in pre-computed spillover matrix sm <-file.path("data", "spillover_matrix.csv") sm <-read.csv(sm, row.names = 1) # apply NNLS compensation system.time( sce <-compCytof(sce, sm, method = "nnls", assay = "normcounts", overwrite = FALSE)) ## user system elapsed ## 63.538 5.880 70.095 To visually inspect how compensation affects signal intensities, we can generate scatter plots pre-and post-compensation; an exemplary pair of channels is shown in Figure 9. In such a plot, we can observe a slight positive association between the signals of spill-affected channels, which should be removed upon compensation.

Gating
Many events acquired in mass cytometry may in fact be debris, doublets or dead cells, and should be filtered out through a gating step. Here, we suggest a strategy that first applies an elliptical gate on cell events, defined as double positive for the DNA channels Ir191/Ir193. This allows the exclusion of debris and doublets. As a second step, we discard cells that are positive for the dead cell marker Pt194.

Gating on cells
In order to apply sample-specific gates, we first convert the SCE into a flowSet with a separate frame for each sample (argument split_by = "bc_id"). As gating should be performed on expression-like data (not ion counts), we further specify assay = "exprs" to retain the arcsinh-transformed assay slot. Thirdly, since conversion from SCE to flowCore data structures requires matrix transposition (rows correspond to targets in the SCE, but to events in flowFrame/Sets), we retain only those channels that are relevant when gating of (live) cells: DNA and dead channels, whose indices are stored in variables dna and live.  # split SCE by sample fs <-sce2fcs(sub, assay = "compexprs", split_by = "bc_id", keep_cd = TRUE) # construct 'GatingSet' gs <-GatingSet(fs) We apply an elliptical gate (gating_method = "flowClust.2d") to exclude the two lowest density percentiles (quantile = 0.98). Because the input gating set contains a separate frame for each barcode, the gate will be computed separately for each sample. In case of a single DNA channel (e.g., Rh103), one-dimensional gates (i.e., thresholds on minimum and maximum values) would be applicable instead.

Gating on live cells
The wrapper function .live_gate() defines a polygonal gate comprised of a line and a bivariate standard normal density Z, such that cells pass gating when i) their expression is within the qth quantile of Z; and, ii) their expression falls below a line parameterized by intercept i and slope s. In this way, the gate is centered around the expression peak, while excluding cells whose dead channel intensities increases with DNA content.
# define live cell gate plug-in # x = expression matrix, q = quantile, i = intercept, s = slope .live_gate <-function(x, q = 0.99, i = 1, s = 0.5) { # specifying gating function .gating_fun <-function(fr, pp_res, channels = NA, id = "", ...) { # subset channels of interest x <-exprs(fr[, channels]) # scale data for comparison w/ 'qnorm()' x0 <-scale(x) # set boundary level as q-th quantile of standard normal z <-qnorm(q) # find p(x) for that level pd <-dmvnorm(c(z, z)) [1] px <-dmvnorm(x0) # find points above boundary level keep1 <-px > pd # find points below line y = a + b * x keep2 <-(i + s * foo <-register_plugins( fun = .gating_fun, methodName = "liveGate", dep = "mvtnorm", "gating")) } In contrast to the cell gates above, we apply live gates with sample-specific gating parameters. To this end, we specify a list l containing quantiles q, intercepts i and slopes s for each sample. These parameters are updated iteratively to remove dead cells while retaining cell yields as high as possible ( Figure 11). After manual adjustments, we arrive at the following sample-specific gating parameters: We display the yield of "cell" and "live" gates on each samples to quickly assess the cell losses occurring at the two gating steps ( Figure 12). As expected the "cell" gate leads to a systematic loss of around 1% of cells across all the samples. The "live" gate leads to a stronger reduction of cell yield in the tumor samples, consistent with the fact that those samples, which underwent enzymatic dissociation, contain more dead cells.

Quality control
Having completed the standard preprocessing steps, we proceed to investigate how the current experiment compares to prior experiments in terms of the number of cells in each reference and sample, and the expression levels of each target. Large parts of the metadata generated by now may no longer be needed, and unnecessarily increases output file sizes for large-scale datasets. Therefore, we will retain only two key cell metadata variables: sample_id containing the FCS filename each cell originates from, and bc_id containing the barcode population assignments. We secondly rename these variable to make the following quality control steps more intuitive.
# drop all cell metadata except file of origin & barcode IDs colData(sce) <-colData(sce)[c("file_id", "bc_id")] # rename cell metadata variable i <-match("bc_id", names(colData(sce))) names(colData(sce))[i] <-"sample" In the debarcoding scheme used for deconvolution of the multiplexed samples (Section Debarcoding), barcode identifiers were chosen to contain all information relevant for each sample. This setup allows us to extract sample metadata directly from the bc_ids. Alternatively, and especially for more complex experimental designs, this information could be stored in a separate metadata table. Such a table could then be used to match the bc_ids with the listed samples, and add arbitrary metadata information (e.g., batch, patient ID, treatment).

Quality control (QC) on reference cell counts
As a first quality control, we compare the cell counts of each reference sample (R) to those obtained from 7 previous experiments ( Figure 14). Since the references are obtained from pre-barcoded aliquots of cells, the number of reference cells acquired gives direct information regarding the cell yield throughout the whole experiment: From cell barcoding to acquisition on the CyTOF. As shown in Figure 14, the current experiment tends to have a lower yield compared to average experiments.

QC on mean marker intensities
As the third and final quality control, we compare the 98th expression quantiles across all targets of interest over the pooled references to those obtained from 7 previously acquired experiments available in metadata table ref_marker_levels.csv ( Figure 16). We chose the 98th percentile to account for the fact that some populations are rare, and we are particularly interested in assessing signal stability for positive cells rather than the median of the population. Since the pooled references are identical from one experiment to another, this gives a direct indication of the current experiment's staining efficacy and enables early identification of antibody degradation over time.

Batch alignment
Each CyTOF experiment contains the same set of references. Similar to the approach used by Schuyler et al. 11 , we use these references as anchors to calculate a channel-specific correction factor by dividing the 98th percentile measured in the current experiment by the average 98th percentile obtained across the first seven experiments of the project. The signal observed in each channel for the samples of interest is then divided by these correction factors derived from the reference samples.

Discussion
In this workflow, we have presented a pipeline for reproducible and highly-automated preprocessing of CyTOF data, based on an updated version of CATALYST. Our pipeline covers four standard steps: Normalization for signal time-drift using bead standards (Section Normalization), single-cell deconvolution of multiplexed samples (Section Debarcoding), correction for spillover via compensation (Section Compensation), and gating for live cells (Section Gating). Moreover, we have included various quality control steps that compare the current experiment to a set of reference data (Section Quality control). These steps ensure that measurement sensitivity, gating cell yields, sample cell counts, and expression levels lie within the expected range.
A key advantage of both using and developing Bioconductor packages is that they utilize common data structures, thereby greatly facilitating interaction between them. For example, many of the data structures used in scRNA-seq data analysis have only become established relatively recently. Meanwhile, the cytometry community has been relying on the FCS file format for data storage, and flowCore's flowFrame/flowSet as well as flowWorkspace's GatingSet classes for computational analyses. While there exists a lot of infrastructure around these data structures, they impede method development for newly emerging standards, and act as a barrier for interpolation of analyses across currently developed packages. This is particularly visible in the context of other fast growing single-cell data types such as scRNA-seq data analysis, where most current methods are being developed around Bioconductor's SingleCellExperiment class. To name just two examples, an extensive collection of visualization tools for SCEs is available through scater 27 , including a variety of dimensionality reduction methods; and methods for differential abundance (DA) analysis (to detect subpopulations that are differently abundant between conditions) and differential state (DS) analysis (to test for subpopulation-specific expression changes across conditions) are implemented in diffcyt 28 .
The SCE class allows storing multiple assays that can, for example, contain raw counts, expression-like data obtained upon arcsinh-transformation, as well as any intermediate data matrices obtained after normalization, compensation and batch correction. Moreover, any event (cell) and feature (marker) metadata generated in the process can be added to the object's colData/rowData, alongside an arbitrary number of dimensionality reductions. Thus, SCEs present an overall more compact and less error-prone data structure for both preprocessing and downstream analysis when compared to storing the various data matrices or metadata in separate variables, which would have to be combined for certain computations, separately subsetted and saved to independent outputs.
There is an obvious benefit for the mass cytometry community to take advantage of these new infrastructure developments. However, it is equally important to maintain backward compatibility with well-established standards in the field. For example, it can be desirable to write out intermediate outputs (FCS files) after each proprocessing step, or make use of available tools that build around flowCore's flowFrame and flowSet classes, or other classes derived thereof (e.g., flowWorkspace's GatingSet). Thus, while CATALYST's transition to a more recent and an arguably advantageous data structure is motivated by the ability to leverage many existing and newly-developed tools, a complete dismissal of the large infrastructure that is available in the realm of cytometry data analysis is impossible at this time. To facilitate conversion between SCEs and conventional cytometry data structures, CATALYST provides the sce2fcs() function, which allows the user to specify which assay data to retain, whether to drop or keep available cell metadata and dimensionality reductions, and (optionally) to split the input dataset by a non-numeric variable (to, e.g., export each sample to a separate FCS file).
Although the current version of this pipeline constitutes a comprehensive approach to generate high-quality data for downstream analysis, further developments could be added in the future. In particular, it could be useful to implement an automated way of identifying and removing part of the data with unstable signal, similar to the approach proposed by flowClean 29 , an R package designed to exclude fluorescent anomalies in flow cytometry data. Given that selection of anomalies in the dataset by the user is subjective, or that they may be altogether undetectable by eye, the advantage of such an approach would be to further standardize the process while decreasing manual work.
Recently, batch normalization has become of increased importance in order to enable integration of datasets acquired at different times, by different users and on different instruments. Here, we use scaling normalization where references are used as anchors to correct all samples included in the analysis in a channel specific way, similar to the strategy proposed by Schuyler et al. 11 . While this approach requires a well-defined experimental procedure where references with positive and negative subsets for each marker have to be included in each experiment, it does not make any assumptions on sample compositions. Thus, since the dataset used in this pipeline was acquired on the same instrument and stained with the same frozen antibody panel as previous experiments, scaling by expression quantiles provides an efficient way to correct for batch effects.
To increase the flexibility of batch correction in cases where the experimental variation is higher, CATALYST could integrate different methods that have the potential to increase batch correction efficiency. For example, CytoNorm 12 computes quantiles for every metacluster and for every marker after aggregation of control samples from each batch. Such an approach could be more appropriate in cases where the references' expression distributions are less aligned. An alternative method, CytofRUV 30 , exploits the concept of pseudo-replicates to remove unwanted variation (RUV) between proteins and cells. Here, cells are grouped into subpopulations using FlowSOM 31 clustering. Groups of cells present across all batches are considered to be pseudo-replicates, and their deviation (residuals) from the average signal across batches is used to estimate and correct for the batch effects.
Although various methods to correct for batch effects in both the presence and absence of references have been proposed, a systematic comparison of batch correction tools for mass cytometry data is missing. Thus, whether the approach used in this study to align batches on the basis of shared references is the most accurate remains unresolved.
Our pipeline is entirely R-based and does not rely on switching between platforms. Thus, it omits the need for heavy data transfers between online cloud services, graphical user interfaces (GUI), and programming environments for different parts of preprocessing and downstream analysis. As a result, each step in the analysis is fully reproducible and any parameters used throughout can be easily modified and documented. This transition from manual, GUI-based to largely automated, programmatic data processing is indispensable for clinical and other large-scale studies, where sample throughput is high and reproducibility ever so important.
Since its first submission to Bioconductor in 2017, CATALYST has undergone continuous maintenance and development. The most noteworthy changes include implementation of a comprehensive visualization suite based on Nowicka et al. 14 's workflow for differential discovery; and, the transition from custom data structures to using Bioconductor's SingleCellExperiment class for differential analysis with Bioconductor v3.11, and for preprocessing with v3.12. Taken together, these developments have transformed CATALYST into a one-stop tool for cytometry data analysis, enabling both data preprocessing and in-depth downstream analysis.

Normalization
Identification of bead events. Commonly, bead events are identified by manual gating on scatter plots of DNA vs. bead channels where DNA should be low, and expression should be high across all bead channels. Instead, we propose a programmatic way to identify beads that includes detection of bead-bead and cell-bead doublets.
Our normalization strategy leverages the already established SCD algorithm for preliminary tagging of events as beads. In this context, the debarcoding scheme is a 2×(2+m) matrix ( Figure 18). Here, columns correspond to the two DNA channels and m barcode channels; rows correspond to barcodes 0 (no bead) and 1 (is bead), where non-bead events are positive for DNA channels only (barcode 11000. . . ), while bead events are negative for DNA and positive for all bead channels (barcode 00111. . . ): Upon initial assignment of bead events, we apply a median ± n median absolute deviation (MAD) rule to remove low-and high-signal events from the bead population used for estimating normalization factors. As n decreases, bead populations become more narrow and bead-bead doublets are excluded. The extent to which bead populations are trimmed can be adjusted via argument trim (default 5).
Notably, slight over-trimming does not affect normalization. It is therefore recommended to choose a trim value that is small enough to assure removal of doublets at the cost of reduced bead population sizes.

Correcting for signal-decrease over time.
To correct for the effect of event acquisition time on signal intensity, we follow the method proposed by Finck et al. 8 . In essence, bead intensities are smoothed using a median sliding-window with width k (default 500 bead events). At each timepoint, the slope of a line with intercept zero is computed by minimizing the squared error between smoothed bead and mean bead intensities (= baseline). Alternatively, a reference set of beads from which to compute the baseline can be provided. Slopes for non-bead timepoints are obtained via constant interpolation of these slopes. Here, large slopes correspond to significant deviation from the baseline, and small slopes indicate that the signal is already similar to the baseline. Thus, raw bead counts are normalized by multiplication with the fitted slopes at each timepoint.
Debarcoding Preliminary barcode assignment. The debarcoding process commences by assigning each event a preliminary barcode ID. This requires specification of a binary barcoding scheme (or debarcoding key) where i = 1, ..., n is the barcode index, j = 1, ..., m a barcode channel, and n, m denote the number of unique barcodes and barcoding channels, respectively. Further, let k i denote the number of positive barcoding channels for barcode i: .., n (i.e., every barcode has the same number of positive barcoding channels), the k channels with the highest signal in a given event are considered to be positive, the remaining m − k to be negative. The separation δ of positive and negative events is then defined as the difference between the kth highest and (m − k)th lowest scaled intensity for that event.
Seperation cutoff estimation. When the separation between positive and negative barcoding channels is low, we cannot be confident in the barcode assignment.
For the estimation of cutoff parameters, we consider yields upon debarcoding as a function of the applied cutoffs. Commonly, this function will be characterized by an initial weak decline, where doublets are excluded, and subsequent rapid decline in yields to zero. In between, low numbers of counts with intermediate barcode separation give rise to a plateau. To facilitate robust estimation, we fit a linear and a three-parameter log-logistic function 32 to the yields function with drc's LL.R function 33 ( Figure 19). As an adequate cutoff estimate, we target a point that marks the end of the plateau regime and on-set of yield decline to appropriately balance confidence in barcode assignment and cell yield. We define the linear model cutoff estimate c LM as the value for which the cell yield Y has declined to half of the initial Yield β 0 : where β 0 , β 1 are the intercept and slope of the linear model fit, respectively.
We define the log-logistic model cutoff estimate c LLM as the value for which the log-logistic function's decline is minimized relative to its value: The final cutoff estimate c is defined as the weighted mean between these estimates: Compensation Retrieval of real signal. As in conventional flow cytometry, we model spillover linearly, with the channel stained for as predictor, and spill-effected channels as response. Thus, the intensity observed in a given channel j are a linear combination of its real signal and contributions of other channels that spill into it. Let I denote the (unknown) real and J the observed signal. Further, let s ij be the proportion of channel j signal that is due to channel i, and w j the set of channels that spill into channel j. Then  While mathematically exact, the solution to this equation will yield negative values, and does not account for the fact that ion counts are strictly non-negative. A computationally efficient way to adress this is to instead use non-negative linear least squares (NNLS), which optimizes the least squares criterion under the constraint of non-negativity: To solve for I, we apply the Lawson-Hanson algorithm 34,35 for NNLS implemented in the nnls package.
Spillover estimation. Because any signal not in a single stain experiment's primary channel j results from channel crosstalk, each spill entry s ij can be approximated by the slope of a linear regression with channel j signal as the response, and channel i signals as the predictors, where i ∈ w j . computeSpillmat() offers two alternative ways for spillover estimation (20).
The default method approximates this slope with the following single-cell derived estimate: Let i + denote the set of cells that are positive in channel i, and c ij s be the channel i to j spill computed for a cell c that has been assigned to this population. We approximate c ij s as the ratio between the signal in unstained spillover receiving and stained spillover emitting channel, I j and I i , respectively. The expected background in these channels, On the basis of their additive nature, spill values are estimated independently for every pair of interacting channels. Hereby, we take into account only interactions that are sensible from a chemical and physical point of view: M ± 1 channels (abundance sensitivity), M + 16 channels (oxide formation), and channels measuring isotopes (impurities; Figure 21). Alternatively, interactions = "all" will compute a spill estimate for all n · (n − 1) possible interactions, where n denotes the number of measurement parameters. Estimates falling below the threshold specified by th will be set to zero. Lastly, note that diagonal entries s ii = 1 for all i ∈ 1, ..., n, so that spill is relative to the total signal measured in a given channel. This project contains the following underlying data:

Data availability
• CyTOF_acquisition_1-3.fcs (40-Ab panel CyTOF data of 2 blood and 2 tumor samples, and 9 reference samples selected to contain positive and negative populations for each marker included in the study's Ab-panel. Samples were multiplexed with a 20-well barcoding plate, and obtained from a single experiment provided as 3 FCS files.) • normalization_beads.fsc (Beads identified using 'CATALYST' during the normalization step of a previous CyTOF experiment. -Used as reference beads to correct for changes in signal sensitivity over time across multiple CyTOF experiments.) • ref_bead_counts.csv (A table of mean dual counts for the six different bead channels (columns) obtained from 7 previous experiments (rows). -Used as a reference to assess the measurement sensitivity for the current experiment.) • debarcoding_scheme.csv (A binary barcoding scheme of 6-choose-3 = 20 barcodes with columns cor-responding to barcode channel masses (101, 104, 105, 106, 108, 110) and rows corresponding to

Software availability
Analyses were run in R v4. target value defining the center of the ellipse, the user can control how many cells are excluded from the gate and ensure that most cycling cells are kept in the analysis.
To gate on alternative DNA stains, a different pair of channels could be assigned to the "dna" variable in the corresponding code chunk. In the case of a single DNA channel, a one-dimensional gating (i.e., thresholding) could be applied (as opposed to the currently used elliptical gate). We have added a comment mentioning this to the text under "Gating on cells".
Gating on live cells: The approach suggested by the authors worked well on my test data, however, it takes a while to manually adjust values for every file to fit the gates closely to the data. While I see the value of automating this step, I also think that some manual gating could simplify the process and further increase downstream data quality. Potentially, the authors could adopt an approach like the gate_draw function from the CytoRSuite library.
Indeed, the approach depicted in this paper works well in cases where a limited number of samples are included in a run and when the live/dead cell profile is well defined and consistent between samples. The process can indeed be tedious when hundreds of samples are included in a run or when the live/dead cell profile is more complex. In the latter case, including a function similar to gate_draw function from CytoRSuite could be helpful. However, we here aimed at proposing an automated pipeline; manual gating would defeat this purpose.
As a side note: We have attempted applying CytoRSuite, however, encountered several confusing issues that we've been unable to resolve: The CytoRSuite site ( https://dillonhammill.github.io/CytoRSuite) lists a GH repository that no-longer exists; we could find an installable version at https://github.com/gfinak/cytoRSuite (is this the same?) but 'drawGate()' gave an error that we have not been able to resolve; meanwhile, any of CytoRSuite, cytoRSuite and cytoSuite (from which the latter has been forked) have not been changed in 4 years. Taken together, this gave us the feeling that the tool is no longer maintained and likely to be inapplicable with current versions of R and Bioconductor.
Of course, there may be other tools available at this point for manual gating, and we leave it to the user to incorporate these into their workflow should that be of interest. A possible strategy then might be to i) perform manual gating and export the resulting gates into a table (gating scheme); ii) apply that scheme in an automated fashion (e.g., using the code we presented); and, iii) do manual adjustments to refine gates according to the current experiment.

2.
Compensation: The workflow includes compensation as a preprocessing step which the authors have shown in separate publications to improve data quality, but which is currently not routinely performed by many researchers working using mass cytometry. I, therefore, assume that most users of this pipeline would be relying on published spillover matrices that reflect estimates of isotope purity and oxidation. While I agree with the usefulness of this function, I believe that adding additional quality control functions could 3.
improve acceptance of and trust in this approach. For example, in flow cytometry, overcompensation is often easily spotted by the occurrence of overly negative values, however, using their NNLS approach this is not readily apparent in compensated mass cytometry data. It would be very helpful to have a quality metric that would alert users to such potential issues introduced by the compensation step.
These are all very good points and legitimate concerns. As indicated in the original paper, the spillover matrix used to compensate mass cytometry data should be calculated based on the antibodies included in the panel. We should stress here that, based on the single-stained bead acquisition approach presented in the original paper, the experimental procedure required to generate a compensation matrix is fairly straightforward and can be achieved rapidly. Using a previously published spillover matrix is a risky strategy, which can indeed lead to inaccurate compensation. The user should instead first run the compensation in classic mode and perform a visual inspection to ensure no overcompensation can be detected before using the NNLS method. This is a valuable option to avoid this specific type of artefact. Automating this step is a good suggestion, but is out of the scope of this publication and comes with some disadvantages. The risk we see is that this process could be imperfect and potentially misleading for the user. Indeed, such an approach would only identify overcompensation in channels where a single positive population is present but not in the case of a double positive population. In other words, it will highly depend on the user's data type. Furthermore, it would not identify undercompensated signals. As a consequence, providing an approach to alert users of potential issues would likely be imperfect and could give a wrong impression that the data are correctly compensated if no alert is raised, which is not necessarily the case. Moreover, to the best of our knowledge, such an approach also doesn't exist in fluorescent flow cytometry, most likely due to the fact that ensuring accurate compensation on a fully stained data set is a challenging task. We should also mention that the spillover coefficients in mass cytometry rarely exceed 4% and therefore the consequences of a slight over or under-compensation are less important in mass cytometry than in flow cytometry. Minor comments: While this might only be needed in rare cases, a function to rename channels and potentially match these names across multiple fcs files could enhance the adaptability of this package. In my test case, conflicting channel names prevented the import of the files into the workflow. In other cases, it might help to match channel names between batches. The authors could look to the premessa package for inspiration.
We very much appreciate this comment as we have encountered various discrepancies between panels, especially in long-term projects. To date, we have used a custom R script to i) read in files separately; ii) fix panels according to a reference file (i.e., removing/adding additional/missing channels); and, iii) write out a new set of FCS files with concordant panels.
However, this solution is suboptimal as it leads to a duplication of files (or, in case the original files were overwritten, the process would no longer be reproducible). Similarly, a GUI solution (as 'premessa') would defeat the purpose of providing an 1.
automated, reproducible preprocessing solution. Thus, taken together, we propose (and have now implemented) the following strategy: > 'prepData' now exposes additional arguments to be passed to 'flowCore::read.FCS' via '...' > 'read.FCS' provides an argument 'channel_alias': "an optional 'data.frame' used to provide the alias of the channels to standardize and solve the discrepancy across FCS files. [...]" > independent of whether or not this option is used, 'prepData' will check whether panels (FCS channel names) match between files: in case of any discrepancy, the newly added 'fix_chs' argument will be used to determine how to resolve discrepancies ○ "all" will keep all channels (i.e., the union across files); any missing channels will be added to the respective samples, and a channels x samples matrix is stored in the object to track which channels were present in which samples originally ○ "common" will keep shared channels (i.e., the intersection across files); any other channels will be dropped from the respective files ○ 'prepData' will, in any case, return a 'SingleCellExperiment', i.e., no altered FCS files or 'flowFrame's will be written out / returned ○ The authors have incorporated various options for DNA channels which is much appreciated. My test data had been stained with a rhodium intercalator. Specifying this worked well, only the res$scatter function seems to ignore this choice and instead seems to default to iridium DNA intercalators.
We thank the reviewer for noticing this. Indeed, while the workflow allows for specification of the DNA channels used (via variable 'dna'), these were fixed internally in CATALYST's 'normCytof()' function. We have added an additional argument to allow passing custom DNA channel masses (default 'dna = c (191,193)' for Ir191/3; for Rh103, the argument would be 'dna = 103' instead); the output scatter plot of DNA vs. bead intensities ('res$scatter') is now generated based on the first matching DNA channel (see updated '?normCytof' documentation).

2.
Sample specific debarcoding is appreciated. Figure 6 and the plotYields function return a debarcoding percentage. I believe this percentage refers to percent of initial assignments, but it is not specifically stated. It might be helpful to get a feeling of the percentage of cells (out of total cells) that are assigned after refining the initial assignments.
That is correct: As in the original Finck et al. outputs (a MATLAB application), yields (left-hand y-axis) correspond to the proportion of cells that would be retained upon applying a given cutoff (x-axis). In Figure 8, we compare the absolute barcode population sizes before vs. after debarcoding. Analogously, it would be straightforward for users to generate such a barplot from cell counts obtained when applying various thresholding schemes (e.g., no filtering compared to global vs. sample-specific separation cutoffs). 3.

Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: I work in Bioinformatics and especially in the normalization and batch correction of CyTOF datasets.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Author Response 23 Jun 2022
Helena Crowell, University of Zurich, Zurich, Switzerland It will be useful to define clearly what are the differences between successive acquisitions, single CyTOF run and batch.
Indeed, the meaning behind the concepts of successive acquisitions, single CyTOF run and batch was not fully clear and these terms were not used in a consistent way. A "CyTOF run" corresponds to an independent experiment where samples are stained and acquired simultaneously on the CyTOF. We replaced the term run with experiment to clarify the meaning. Each CyTOF experiment corresponds to one "batch" and this term is used to refer to the batch correction which is performed on the different CyTOF experiments. The data from a single CyTOF experiment are usually acquired over multiple "successive acquisitions", each leading to the generation of a single FCS file. We also made the use of these terms consistent throughout the paper.

1.
The different samples and runs listed through the different examples could be better presented with a table containing all runs and samples. In the data description, it explains that "The dataset used in this study was obtained from a single CyTOF run containing nine references, three blood samples and three tumor samples barcoded with a 20-well barcoding plate". However in the quality checks section; additional data is being analyzed which makes it confusing, coming from additional runs, sometimes from 7 runs or other times from 8 runs.
The pipeline described in this paper was designed to preprocess CyTOF data acquired over a long period of time with a focus on ensuring data consistency over time. The aim of the workflow is to guide the readers through the preprocessing steps required to convert FCS files obtained in a given CyTOF experiment to a format suitable for downstream analysis, while presenting key quality checks to ensure the reliability of the data generated in the experiment of interest. Therefore, the whole analysis is based on a dataset obtained from a single CyTOF experiment, which is benchmarked against data acquired during a preparatory phase. For consistency reasons, we included now systematically the data from seven previous CyTOF experiments to benchmark the data of the CyTOF experiment preprocessed in this paper.

2.
Batch alignment: Could you provide an additional plot showing the effect of applying this correction factor? How are you assessing the performance of your batch alignment method?
The batch alignment presented in this paper is based on a linear scaling based on a percentile, using references as anchoring points, similar to a previously published method (Schuyler et al, 2019). To assess the performance of our batch alignment method, we have now included a figure to compare the expression distributions before and after batch correction (including their 98th percentiles and those of the references). As intended, 98th percentiles align with the references' upon correction, while expression distributions remain virtually unchanged.

3.
argument norm_to in the normCytof() function: give explanations on how it being computed when giving reference beads, especially how does it compute the new baseline, does it take into account both the beads from the reference and current by averaging both?
Normalization using reference beads follows the methodology originally introduced in Finck et al. (2013): The baseline is computed as the mean intensity of the reference beads only, not including the current experiment. Would the average be taken over both, intensities would not be aligned between current and reference experiment. While the statement "[...] We provide the path to a set of reference beads (argument `norm_to`) that are used to compute baseline intensities for normalization" explains this only briefly, we believe that the method is well established and readers should refer to the original publication for more detail.
Can it be used to normalize data from different batches? If so how does it deal with distinguishing times and ordering the beads and time which would be similar in separate batches?
Yes, certainly. The normalization aims at correcting the signal time-drift due to progressive loss of sensitivity during acquisition. This is a technical effect that is independent of batch effects, and should be accounted for regardless of whether or not batch effects are present: these should be corrected for downstream analysis.
Events from different FCS files (independent of whether these are different acquisitions of the same experiment or batches) are concatenated. How event times are dealt with depends on prepData()'s input arguments. When by_time = TRUE, files are ordered according to their acquisition time (stored under each flowFrame's $BTIM description field). Otherwise, they are kept in the order provided by the input metadata table (argument md).

Figure 4: Could you please give more explanations on how to assess run sensitivity and how does the user decide what is acceptable and what is not?
Instrument sensitivity is an important parameter that should be closely monitored. This parameter is assessed during the tuning but those data cannot be easily 5.
exported and compared between experiments. The aim was to take advantage of the beads, which are run together with the samples to report on instrument sensitivity. Figure 3 provides key information regarding how the sensitivity evolves during the run, while the point of Figure 4 is to show how the average sensitivity evolves from one experiment to another. Instrument sensitivity varies from machine to machine and deciding what is acceptable will depend on the requirements of the users. The point of this plot was to offer an option for the user to easily identify in case the sensitivity is getting low compared to previous experiments, and to make a link between the quality of the data generated in a specific experiment with the sensitivity of the instrument.
Also, you need to load the library(reshape2) to run this part.
Yes, thank you for catching this; we've added reshape2 to the list of dependencies, and it is now loaded along the other required libraries.
The wrap_plots function is missing here.
Yes, thank you for catching this; we've added patchwork to the list of dependencies, and it is now loaded along the other required libraries.

6.
I got an error when running the QC on reference cell counts. "Error: Can't combine `1$CellLine_R1` and `2$CellLine_R1`." True, thank you; I could reproduce this with the current R and package versions. It has been fixed by converting the 'run' object of class 'table' to call 'integer' using c().

7.
Minor comments: When running the code using the data provided, the directory name should be modified to "CyTOF_acquisition_1-3.FCS/" instead of data: fcs <-list.files("CyTOF_acquisition_1-3.FCS/", "acquisition", full.names = TRUE) We are not sure we understand this comment. 'list.files(path, pattern, …)' expects the first argument to be a directory (where the FCS files are located), not the file names themselves ("xxx.FCS").
Also, it should be specified that the directory name containing all the data should be called "data" and it refers to the directory name, or an alternative is to have the local directory "." instead of data like in here: # specify path to reference beads ref_beads <-file.path(".", "normalization_beads.fcs") Thank you, yes, we forgot to mention that in the presented code all data used throughout the workflow is expected to sit inside a "data" subdirectory relative to where the .Rmd file is being run. We have now added a note explaining this in the 2nd paragraph under "Results".

1.
Introduction: "an important step is to correct for batch effects, which can be achieved by 2.