Flow cytometry of bone marrow aspirates from neuroblastoma patients is a highly sensitive technique for quantification of low-level neuroblastoma

Background: Bone marrow involvement is an important aspect of determining staging of disease and treatment for childhood neuroblastoma. Current standard of care relies on microscopic examination of bone marrow trephine biopsies and aspirates respectively, to define involvement. Flow cytometric analysis of disaggregated tumour cells, when using a panel of neuroblastoma specific markers, allows for potentially less subjective determination of the presence of tumour cells. Methods: A retrospective review of sequential bone marrow trephine biopsies and aspirates, performed at Great Ormond Street Hospital, London, between the years 2015 and 2018, was performed to assess whether the addition of flow cytometric analysis to these standard of care methods provided concordant or additional information. Results: There was good concurrence between all three methods for negative results 216/302 (72%). Positive results had a concordance of 52/86 (61%), comparing samples positive by flow cytometry and positive by either or both cytology and histology. Of the remaining samples, 20/86 (23%) were positive by either or both cytology and histology, but negative by flow cytometry. Whereas 14/86 (16%) of samples were positive only by flow cytometry. Conclusions: Our review highlights the ongoing importance of expert cytological and histological assessment of bone marrow results. Flow cytometry is an objective, quantitative method to assess the level of bone marrow disease in aspirates. In this study, flow cytometry identified low-level residual disease that was not detected by cytology or histology. The clinical significance of this low-level disease warrants further investigation.


Introduction
Neuroblastoma is the most common extracranial solid tumour of childhood (Xie, Onyskio and Morrison, 2018). A combination of stage of disease, patient age, tumour histology and tumour biology are used to risk stratify patients for treatment (Monclair et al., 2008). Metastatic disease in patients more than 18 months of age places a patient in the high-risk category. Consequently, accurate staging at the time of diagnosis is critical. These patients receive multimodal treatment with chemotherapy, myeloablative chemotherapy and autologous stem cell rescue, surgery, radiation therapy and immunotherapy. Approximately 50% of those diagnosed with neuroblastoma have high-risk stage M disease, with poor overall survival of <50% (Tas et al., 2020). Importantly, GD2 is not expressed by normal bone marrow cells (Swerts et al., 2004). CD56 antibody is present on a subset of CD4+, and CD8+ T-cells and NK cells in peripheral blood, as well as neural derived cells and tumours (Beiske et al., 2005). CD45 is present on all human leukocytes but absent on neuroblastoma cells. Using flow cytometry, Komada et al. (1998) were able to detect a single neuroblastoma cell in up to 1 Â 10 4 /10 5 mononuclear cells. Szantho et al. (2018) analysed 36 samples from 16 patients and concluded that flow cytometry was highly specific and more sensitive than immunohistochemistry, as more cells can be evaluated. However, other studies have suggested that flow cytometry is 10-fold less sensitive than immunocytology or quantitative reverse transcriptase polymerase chain reaction (RTqPCR) The role of minimal residual disease (MRD) in neuroblastoma is increasingly under investigation, although its clinical utility is yet to be defined. In haematological malignancies PCR-based detection of MRD has become part of the routine method for risk stratification and ongoing monitoring of patients during treatment, with an escalation in treatment if there is inadequate MRD response. The new version includes responses to initial reviewer comments.
The table has been amended to include information on whether samples were taken at diagnosis, relapse or during treatment. Methods section has been edited to remove ambiguity regarding number of patients that were high risk trial and having PCR analyses.
Results section on Correlation between cytology, histology, and flow cytometry has new information on intra-patient correlation between different bone marrow sites at the same time points.
Results section on Patient disease course of flow cytometry-only positive samples has increased information on patient 7 to describe how a positive flow cytometry result in this patient was not a harbinger of disease recurrence.
Any further responses from the reviewers can be found at the end of the article localised neuroblastoma and found no significant difference in overall survival of patients with MRD compared to those without detectable MRD in bone marrow. In patients with metastatic disease there was no difference in overall survival by bone marrow disease detected by MRD using either immunocytology or PCR techniques. The Children's Oncology Group (COG) also showed no difference in overall survival for patients with localised disease that had bone marrow involvement detected by immunocytology alone at diagnosis (Seeger et al., 2000). In the same study, COG demonstrated a clear correlation between increasing tumour burden in bone marrow and poor event free survival in those patients with stage M disease, but no difference in survival if bone marrow infiltration was only detected by immunohistochemistry and not by cytology (Seeger et al., 2000). Conversely, others have shown a poorer prognosis in those patients with neuroblastoma detected by flow cytometry but negative by immunophenotyping (Popov et al., 2019) and poor overall survival in those with neuroblastoma detectable by RTqPCR after induction therapy (Druy et al., 2018). These studies have been limited by the small number of analysed samples. Flow cytometry does have an advantage over immunocytology as it helps identify cases that have lost GD2 expression. This is increasingly important as future treatment concentrates on targeting GD2 expression either though GD2-antibodies or experimentally through GD2 targeting CART-cells (Schumacher-Kuckelkorn et al., 2016).
In this study, our aim was to compare flow cytometry with the combination of histological and immunohistological assessment of trephines and cytological review of bone marrow aspirates, to determine if there is a difference in detection of positive results between the various methods and if flow cytometry can provide any additional information.

Method
The study was performed as an internal evaluation of bone marrow results by flow cytometry in neuroblastoma, which had been introduced as a standard additional technique at Great Ormond Street Hospital, London in 2015. Samples from consecutive patients diagnosed with neuroblastoma at our institution between June 2015 to March 2018 were evaluated. Samples taken at any time point of treatment/surveillance were included in the review. . At each time point, samples for cytology of aspirate, flow cytometry of aspirate, and histology/immunocytology of trephine biopsy were taken from the left and/or right side, which were then grouped by side of collection. Bone marrow aspirates and trephines reports issued as part of routine of care were reviewed, which included morphological and flow cytometric assessment of aspirates, and morphology plus immunohistochemical staining of trephine biopsies. Flow cytometry was performed 12-60hrs post collection of bone marrow aspirates. Neuroblastoma cells were identified by using live/dead gating followed by identification of CD45 À /Lin neg/CD56 + /GD2 + stained populations.
For final analysis any patients with missing data for flow, aspirate or trephine analyses were excluded ( Figure 1). Any difference between the results of the trephine histology/immunohistochemistry, aspirate morphology, or flow cytometry were recorded. Significance testing was performed using unpaired t-test with Welch's correction, with a p-value ≤ 0.05 considered as significant.
Patients with high risk neuroblastoma treated on the European HR-NBL1/SIOPEN trial (ClinicalTrials.gov registration number: NCT01704716) (Viprey et al., 2014) also had bone marrow aspirates collected for RNA testing, performed by RTqPCR. The results from RTqPCR and flow cytometry analyses were compared, in order to establish if there are any correlations between the two assays. RNA was extracted and RTqPCR for the neuroblastoma mRNAs paired like homeobox 2B (PHOX2B) and tyrosine hydroxylase (TH) performed according to standard operating procedures (Viprey et al., 2007(Viprey et al., , 2014  For statistical analysis, the Log2 delta Ct values from the RTqPCR were converted to linear values for correlation with flow values by Pearson coefficient and correlation of flow with aspirate morophogy or trephine immunohistochemistry was performed using Welch's T test. Statistical analyis we performed using Prism software version 9.

Results
A total of 392 bone marrow samples from 72 patients were analysed. Complete bone marrow, trephine and flow cytometry data was available for 302 samples (Figure 1). RTqPCR results were available for 26 samples from 15 patients. A total of 15 samples from eight patients had both flow cytometry and RTqPCR data available (see Underlying data).
Correlation between cytology, histology, and flow cytometry There was concordance in a negative result across all three modalities for 216/302 samples and a concordance of 38/86 for positive results across all three modalities (Figure 2A), with a further 14/86 (16%) samples positive by flow cytometry and either cytology of aspirates or histology of trephine. Of the 86 samples that were positive by at least one test, 14/86 (16%) were positive by flow cytometry alone. Taken together, trephine and aspirate morphology detected 20/86 (23%) positives that were negative by flow cytometry (trephine only n = 11, cytology of aspirates only n = 3, both trephine and cytology of aspirates n = 6) ( Figure 2A).
Flow cytometry provides the additional benefit of allowing enumeration of the neuroblastoma cells within the bone marrow sample by calculating the positively gated events and negative gated events. We performed an absolute numerical comparison of flow cytometry results against the binary trephine and aspirate results ( Figure 2B and 2C) to determine if numerical flow cytometry results correlate with the aspirate morphology or trephine categorisation. Bone marrow samples that were positive by analysis of trephines were significantly more likely to be positive than negative on flow cytometry (p = 0.0027) and the same was true for samples positive for cytology (p = 0.0056), suggesting a good concordance between these modalities. When comparing trephine and flow cytometry, 18 samples were positive by flow cytometry but not positive on trephine histology. These samples had a percentage detection range of 0.0130% to 5.3% ( Figure 2B). Similarly, when comparing flow cytometry and cytology, there were 24 samples positive by flow cytometry, which were negative by cytology ( Figure 2C). These samples had a percentage detection range from 0.0041% to 3.75%. Therefore, flow cytometry of bone marrow aspirates detects low-level disease not reported after analysis of trephines or cytology of bone marrow aspirates.  co-efficient 0.8090 (p-value < 0.0001) and B) R 2 co-efficient was 0.8697 (p-value < 0.0001). Where PHOX2B is homeobox 2B, and TH is tyrosine hydroxylase.
All comparisons involved evaluation of separate aspirate or trephine from left and right iliac crests respectively. We found low levels of intrapatient discordance between results from the two sites for both morphology (8 discordant results from 163 sampling episodes = 5%) and trephine (10/163 = 6%). Interestingly only one sampling event showed discordance in both aspirate and trephine.
Patient disease course of flow cytometry-only positive samples A total of 14 samples from nine patients were positive solely on flow cytometry. These patients represent potential cases where flow cytometry may be useful for detecting bone marrow disease below the combined threshold of cytology and trephines. All nine of these patients were diagnosed as high-risk ( Table 1). The level of disease detected by flow cytometry was low ranging from 0.008% to 2.37%. Only two patients (patient 5 and 7) had no radiological evidence of metastatic skeletal disease at the time of bone marrow sampling. Patient 5 had radiological localised disease and had bone marrow sample taken at diagnosis. This patient was treated as high-risk due having a MYCN amplified tumour and is now 42 months post diagnosis with no evidence of progression or relapse. Patient 7 had stage M high-risk neuroblastoma with positive bone marrow morphology at diagnosis which became negative following induction. Following high dose chemotherapy with busulfan and melphalan high dose chemotherapy, the left sided aspirate was positive on flow cytometry but negative on morphology. Subsequent analyses were all negative. This patient is now 38 months post diagnosis with no evidence of relapse. Thus, the current study adds to existing understanding that in patients with no bone marrow metastases detected by conventional techniques, low levels of disease in the form of mRNA or DNA or neuroblastoma cells can be detected. There is no evidence that this low level disease can alter outcomes, and the clinical follow up of these cases in the current study does not provide any support for altering staging or treatment in such patients.

Correlation between flow cytometry and RTqPCR
To further evaluate the results of flow cytometry, we compared 15 samples from eight patients who had corresponding RTqPCR performed for mRNA using PHOX2B and TH markers. We performed simple linear regression modelling on RTqPCR and flow cytometry data for matched samples (Figure 3). For PHOX2B the R 2 co-efficient was 0.8090 (p-value < 0.0001) and for TH R 2 co-efficient was 0.8697 (p-value < 0.0001). This excellent correlation between RTqPCR and flow cytometry further validates the flow cytometry results.

Discussion and conclusion
In comparing flow cytometry, histology and cytology of aspirates results, our investigations show a good concordance across all three modalities for negative samples (72%). Taking positivity for either trephine and/or cytology of aspirates samples together, there is was also good concordance for positive results 52/86 (61%), though both flow cytometry (23%) or combination of histology/cytology (16%) did miss samples that were positive by the other modality. Flow cytometry is a routine test in diagnostic laboratories, which does require the development of expertise for analysis of results. Our results show some discordance between cytology/histology/flow cytometry. This discordance could be related to sampling differences, as different samples may be taken for analysis by various parts of diagnostic laboratories. Further, neuroblastoma cells have a propensity to aggregate. During flow cytometry analysis, clots are removed and samples filtered, which may lead to removal of some neuroblastoma aggregates. Bone marrow aspirates and trephine samples are not disaggregated, which may account for some disparity in results. Further, an element of subjectivity is present in the histological/cytology analysis of bone marrow trephines and aspirates, whereas flow cytometry provides an unequivocal characterisation of individual neuroblastoma cells.
Flow cytometry may be particularly useful for defining disease in patients who do not have adequate trephine biopsies or cells available for review on aspirates. It could serve as an additional quick and cost-effective tool for detection of low-threshold disease in patients with neuroblastoma. However, the presence of 20/86 samples with positivity by either cytology or histology analysis but no detectable neuroblastoma by flow cytometry, whilst may be accountable by sampling differences, highlights the importance of expert haematological and histopathological analysis of samples from these children. The clinical significance of low-level disease, detected using different methods, in neuroblastoma continues to be explored globally and remains to be seen.

Ethic statement
The evaluation of results from bone marrow flow cytometry was a routine retrospective evaluation of standard of care procedures and not a formal research study. As such it did not require ethics committee approval. Consent for marrow aspirates and standard of care analysis was obtained from all patients using standard hospital consent procedures. This is an interesting article that describes an analysis of bone marrow flow cytometry (FC) to identify bone marrow (BM) minimal residual disease (MRD) in consecutive patients diagnosed with neuroblastoma (NB) from a single centre, Great Ormond Street Hospital (GOSH). This is a laboratory-based comparative study focused on the comparison of NB BM MRD with morphology on BM aspirate, BM trephine and in a small number of cases to correlate with qRT-PCR for NB specific transcripts.

References
MRD has an established role in the management of acute lymphoblastic leukaemia (ALL) and is routinely used in ALL clinical management. MRD in combination with clinical and genetic factors remains one of the most important clinical tools to guide therapy and risk stratification in both newly diagnosed and relapsed ALL 1,2,3,4,5,6,7 . The two major techniques used to measure MRD in ALL are flow cytometry or quantitative PCR 8, 9,10 . The European Scientific foundation for Laboratory Hemato Oncology (ESLHO, https://eslho.org/about/) has established consortia to promote the innovation, standardisation, quality control and education of laboratory diagnostics including flow MRD (EuroFlow) 10 and quantitative PCR MRD (EuroMRD) 8, 9 . The International Neuroblastoma Risk Group (INRG) Task Force developed standard recommendations for the detection of residual neuroblastoma using immunocytology on BM cytospins or smears and/or qRT-PCR on BM 11 . The INRG Task Force felt that the volume of bone marrow for analysis postchemotherapy by flow cytometry may limit the sensitivity to detect residual NB to approximately 1 NB per 10 4 BM cells 11 , however, this is a level of sensitivity similar to what can be achieved using FC MRD in ALL MRD 10 . However, the use of FC in the assessment of residual NB in BM remains clinically underdeveloped and it is a technology that may be easily and widely adapted to assessing treatment response in HR-NB.
Currently, MRD and liquid biopsy are at an earlier stage of development in paediatric solid tumours and NB compared to ALL. The role of MRD in solid tumours continues to be explored within clinical research and/or clinical trials but is not currently used to modify risk classification or treatment. There are many evolving technologies that have the capacity to detect and quantify either residual tumour cells (e.g. immunocytology, flow cytometry, qRT-PCR for genes expressed in NB cells 12 , DNA based assays to detect NB cells 13,14,15 ) or products secreted by tumour cells (e.g. cell-free tumour DNA 16,17 , exosomal miRNA 18,19 ) which could be used to assess NB treatment response. So a challenge in the field is to clearly identify the methods and techniques of MRD/liquid biopsy which are prognostic and/or predictive in NB and which can be implemented in the clinical setting.
The strengths of the current study are that it describes using FC to measure BM MRD in a consecutive cohort of 72 children with NB treated at Great Ormond Street Hospital. 392 BM samples were collected from these patients at various time points during their treatment. The study team has compared FC with immunocytology (BM aspirate) and histology (BM trephine). There were a limiting number of patients where qRT-PCR analysis was also available. In terms of the patient cohort, there is very limited detail on the clinical characteristics of the cohort, treatment or exact timing of the BM sample in relation to treatment received and clinical outcome.
The only clinical information that is presented is for a subset of 9 patients with high-risk neuroblastoma (HR-NB) with FC positive marrows. In terms of FC methodology, the timing of FC varied from 12-60 hours post collection of the BM sample. The major experimental aim was a comparison of results between the different methodologies, so there is a focus on the Overall, the data shows that there is promise in using FC to assess NB MRD. As noted by the study team, FC is widely available and routinely used to assess haematologic malignancy. Undertaking these studies within the context of routine clinical practice is important as ultimately it can facilitate transfer into clinical practice. The EuroFlow, EuroMRD, and INRG consortia highlight the importance of the standardisation of sample collection, processing, workflow, analysis and reporting of samples 8,9,10 . In this study, it's not clear whether the time between collection and running the FC to assess NB MRD, between 12-60 hours, may have had an impact on the results. Specifically, are there differences in samples run at 12 versus 60 hours?
In the GOSH cohort, it's not clear how the FC assay performs with different input BM cell numbers and whether there were adequate cell numbers for each sample at each time point for the analysis. Assay performance and correlation are likely to be related to the amount of BM sample available for analysis and the levels of residual disease present within the sample. In ALL FC MRD, setting a threshold for the minimum number of BM cells to be analysed by FC is a key feature to permit sensitive detection of rare ALL cells after starting treatment 10 . The performance of the FC, IC and qRT-PCR assays can be directly compared using spike-in experiments with NB cells and BM MNC, which, although not without limitations, allows an estimation of assay performance under different controlled conditions (e.g. BM cell number and dilution of NB cells).
As discussed, it's known that solid tumours and NB form clumps in the BM and it's possible that a false negative FC result may occur from NB clumps being filtered out of the sample prior to the FC analysis (identified by the study team as a limitation) or as a consequence of NB being a "patchy" disease within the BM, so there may have been minimal or no NB at the site sampled by BMA, yet there may be clear evidence of BM disease at other sites on functional scanning (e.g. MIBG or FDG-PET CT scan). It would be helpful to understand how each of the marrow assessment methodologies compares with the measured INRC metastatic response 20 in the cohort.
Whether there is additional value arising from FC NB MRD can't really be assessed in this cohort due to the limited amount of clinical and laboratory data. It's likely that the potential impact of NB MRD measured by any technique will be strongly influenced by the underlying genetics and risk classification of the individual patient. It's known in ALL, MRD cutoffs are strongly influenced by the underlying ALL biology and genetics 21  Although GOSH had introduced BM FC as a routine standardised assessment from 2015 onwards, the underlying objective(s) of assessing NB MRD by BM FC isn't clearly stated in the submission. In managing ALL, assessing treatment response by MRD forms an integral component of the final ALL risk classification which occurs at the end of induction chemotherapy (i.e. after treatment has commenced). In ALL, rapid clearance of ALL and MRD negativity at the end of induction chemotherapy identifies patients with improved survival, and conversely slow ALL response with persisting high MRD levels at end induction identifies patients with decreased survival. In NB, risk classification is completed at diagnosis, prior to treatment initiation and is not modified by treatment response 22 . Therefore, a potential role for NB MRD (measured by any methodology) to modify INRG risk stratification may not be likely. However, the clinical assessment of NB treatment response is critical and determines post-induction treatment allocation. So analogous to treatment response measurement in ALL, the tempo and completeness of treatment response in HR-NB is important and prognostic. HR-NB patients with inadequately responding disease have a poor chance of survival. One of the major clinical issues facing clinicians managing HR-NB is the early identification of patients with (i) inadequately responding disease or (ii) those who ultimately have a high risk of disease recurrence and poor survival. So standardised response assessment and MRD assessment at earlier time points in HR-NB induction chemotherapy might provide an opportunity to stratify patients to different treatments before the end of induction chemotherapy. Conversely, the role of MRD at later time points in treatment will be different, e.g. the early detection of relapsing disease in patients with a prior good response to treatment. So, in this context, whilst it's important to assess the concordance of different laboratory assays which measure the same variable, it remains difficult to define the clinical significance of NB BM FC in this heterogenous cohort with samples taken at multiple time points and where the linked clinical data isn't available. The value of this dataset will come into its own when it is more comprehensively linked and analysed with matching clinical and response information, specifically with clinically used and/or validated endpoints, such as INRC disease response, progression-free survival and overall survival.
The data presented here represent an incremental step in using NB FC for MRD assessment. However, to take the field of liquid biopsy/MRD assessment forward in neuroblastoma, the ideal will be to prospectively define clear objectives for MRD assessment at different treatment time points (e.g. early or mid-induction versus later in therapy), collect appropriate samples at standardised time points, and directly compare multiple liquid biopsy/MRD methodologies against each other and also with the use of clinically validated endpoints.
In the methods, the authors describe that 8 patients had BM aspirates collected for RNA testing, and a few sentences later in the results they describe that PCR results were available for 26 samples from 15 patients. Please correct and align the actual numbers here.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
Author Concordance between left and right sites. We have added this information to the first results section with the following additional text: "All comparisons involved evaluation of separate aspirate or trephine from left and right iliac crests respectively. We found low levels of intrapatient discordance between results from the two sites for both morphology (8 discordant results from 163 sampling episodes = 5%) and trephine (10/163 = 6%). Interestingly only one sampling event showed discordance in both aspirate and trephine." ○ Competing Interests: None