Accurate cytogenetic biodosimetry through automation of dicentric chromosome curation and metaphase cell selection

Software to automate digital pathology relies on image quality and the rates of false positive and negative objects in these images. Cytogenetic biodosimetry detects dicentric chromosomes (DCs) that arise from exposure to ionizing radiation, and determines radiation dose received from the frequency of DCs. We present image segmentation methods to rank high quality cytogenetic images and eliminate suboptimal metaphase cell data based on novel quality measures. Improvements in DC recognition increase the accuracy of dose estimates, by reducing false positive (FP) DC detection. A set of chromosome morphology segmentation methods selectively filtered out false DCs, arising primarily from extended prometaphase chromosomes, sister chromatid separation and chromosome fragmentation. This reduced FPs by 55% and was highly specific to the abnormal structures (≥97.7%). Additional procedures were then developed to fully automate image review, resulting in 6 image-level filters that, when combined, selectively remove images with consistently unparsable or incorrectly segmented chromosome morphologies. Overall, these filters can eliminate half of the FPs detected by manual image review. Optimal image selection and FP DCs are minimized by combining multiple feature based segmentation filters and a novel image sorting procedure based on the known distribution of chromosome lengths. Applying the same segmentation filtering procedures to both calibration and test sample image data reduced the average dose estimation error from 0.4Gy to <0.2Gy, obviating the need to first manually review these images. This reliable and scalable solution enables batch processing multiple samples of unknown dose, and meets current requirements for triage radiation biodosimetry of high quality metaphase cell preparations.


INTRODUCTION
The analysis of microscopy images of cells is the basis of several types of analysis of the effects of damage by ionizing radiation. The gold standard radiation biodosimetry method, the dicentric chromosome assay (DCA), involves measuring the frequency of aberrant dicentric chromosomes in a patient sample. While some aspects of the assay have been successfully automated and streamlined, its overall throughput remains limited by the labour-intensive dicentric (DC) scoring step, potentially affecting timely estimation of radiation exposures of multiple affected individuals, for example, in a large accident or a mass casualty event 1,2 .
One issue with automated analysis is the selection of images of adequate quality for accurate identification of the chromosome damage. With DCA, the decision to select or exclude microscope images for analysis has traditionally been performed manually; yet current automated image capture approaches make this approach impractical due to the growing size of datasets. Image quality assessment often estimates new data in relation to reference images 3 , complex mathematical models 4 , or distortions from a training set recognized by machine learning 5 . Generic methods of assessing image quality are not appropriate in our situation. Features tailored for ranking chromosome images cannot be generalized to entropy measures based on applying frequency filter to intensity distributions. To be useful, quality assurance for evaluation of specific microscopic biological objects in an image may require expertderived rules to categorize preferred images.
To address issues with automation of the DCA, we have been developing the Automated Dicentric Chromosome Identifier (ADCI) software to automate DC scoring and radiation dose estimation. The algorithms underlying ADCI have been described and experimentally validated [6][7][8][9][10][11] . Briefly, foreground objects are extracted from the metaphase cell image by thresholding intensities above background levels.
Preprocessing filters remove most (but not all) non-chromosomal objects (e.g. debris, nuclei, overlapping chromosomes). Each remaining object is regarded as a single, intact, post-replication "chromosome" object. Each chromosome is processed to determine a contour (chromosome boundary) and its centerline (chromosome long axis). The Intensity-Integrated Laplacian method 9,10 constructs a width profile from consecutive vector field tracelines running approximately orthogonal to the centerline, and potential centromere locations ("centromere candidates") are identified from constrictions in the said width profile (see Fig. 1) 12 . Machine learning (ML) modules use image segmentation features derived from each chromosome to classify centromeres and dicentric chromosomes 6,11 . The first Support Vector Machine (SVM) ranks potential centromere candidates in each chromosome according to their corresponding hyperplane distances; then another SVM scores the chromosome as either monocentric (MC) or dicentric (DC) using features derived from the top two candidates.
Samples exposed to known radiation doses (in Gy) are processed by ADCI to construct a dose-response calibration curve. The average frequency of DC's per cell in dose calibrated samples, the radiation response, is fit to a linear-quadratic function. The response for test samples exposed to unknown radiation levels can then be analyzed with this equation to estimate their corresponding doses.
We noticed that metaphase cell images of inconsistent quality can affect accuracy of dose estimation by ADCI. Previous studies evaluated the efficacy of ADCI at chromosome classification and dose estimation 10,11 . While the sensitivity (recall) for DCs was acceptable (~70%) and relatively constant at all radiation exposure levels, precision showed a strong dependence on dose. Chromosome misclassification, in particular false positive dicentrics (FPs) were more prevalent at low (≤1Gy) compared to high  doses; at 1Gy, FPs could outnumber true positive dicentrics (TPs) by a factor of 4 to 5. Consequently, ADCI-processed samples exhibited a reduced range of accurate responses to radiation compared to manually scored samples. Although use of the same algorithm to derive the calibration curve compensates for some of these differences, reliability of dose estimation ultimately hinges on DC classification accuracy. As DCs are greatly outnumbered by MCs (background frequency in normal, unexposed individuals is one DC per 1000 cells 6 ), this study focuses on improving the distinction between TP and FP DCs without compromising recall.
FPs reflect inadequacies in misinterpreting certain chromosome morphologies or non-chromosomal objects. Selective targeting and removal of these instances would reduce FPs without limiting TP identification, improving overall classification accuracy. We investigated FP morphologies to identify problematic cases, and devised a set of post-processing object segmentation filters to eliminate them.
Then, to ensure consistent performance, segmentation filters were developed to remove poor quality cell images. These images are usually characterized by either a lack of or incomplete complement of metaphase chromosomes, misclassified interphase or micro-nuclei as metaphases, or incorrectly segmented sister chromatids as individual chromosomes. Each proposed filter was tested individually, and the best performing filters were integrated, and tested on actual cytogenetic dosimetry data exposed to various radiation doses. The effects of these filters on classification performance was evaluated on image sets from two independent biodosimetry laboratories, and their impact on dose estimation was assessed on cells obtained from an international biodosimetry exercise.
We present this hybrid approach which selects images based on a combination of optimal global image properties for scoring metaphase cells, and customized object segmentation, identification and elimination of false positive DCs. These improvements in ADCI ensure timely, reproducible, and accurate quantitative assessment of acute radiation exposure.

METHODS & MATERIALS
Cytogenetic data were obtained by biodosimetry laboratories at Health Canada (HC) and Canadian Nuclear Laboratories (CNL) according to IAEA guidelines. Blood samples were irradiated by an XRAD-320 (Precision X-ray, North Branford, CT) at Health Canada and processed at both laboratories.
Peripheral blood lymphocyte samples were cultured, fixed, and stained at each facility according to established protocols 2,12 . Metaphase images from Giemsa-stained slides were captured independently by each lab using an automated microscopy system (Metasystems). One set of metaphase images from CNL and two sets from HC (Table 1) were used for development and initial testing of the proposed algorithms.
After image processing by ADCI, called DCs were manually reviewed and the consensus scores of TPs or FPs by 3 trained individuals were determined. Calibration curves were prepared based on 6 samples of known radiation dose (Table 2). An additional 6 samples 11 were initially blinded to the actual radiation exposures as test samples (Table 3). Test samples were exposed to a range of radiation doses bounded by the doses of samples used to construct the calibration curve. The sample naming convention is the lab name followed by the sample identifier, e.g. HC1Gy signifies the 1 Gy calibration sample prepared at HC, whereas CNL-INTC03S04 represents the INTC03S04 international exercise test sample (exposed to 1.8 Gy) prepared at CNL. Data consisted either of all "metaphase" images captured by the microscopy system, or a manually curated set of 500 high quality images. Selection of raw metaphase images for inclusion in samples was done automatically at HC using the default image classifier of the Metafer slide scanning system, while CNL selected images manually according to IAEA guidelines 12 . Experts from CNL selected for images deemed analyzable by humans with respect to chromosome count, spatial distribution and morphology.

1) ADCI settings & metaphase image data
ADCI software (V1.0) 11 was used for DC detection and dose prediction, with the MC-DC SVM tuning parameter, σ , set to 1.5. ADCI libraries were initially written in MATLAB (R2014a) to develop and test the proposed DC FP filters, and were subsequently rewritten in C++ and integrated into ADCI. For development and validation of segmentation filters, independent datasets used three sets of roughly 200 images each (2 low dose, 1 high dose) were prepared from larger image sets that were originally used for validation of previous versions of ADCI (see Table 1; HC-mixed image set).

2) Morphological characterization of FPs
FPs and TPs were compared according to their respective segmentation features, including contour, width profile, centerline placement, centromere candidate placement, and total pixel area (Table 1). FPs were grouped by common distinguishing traits and assigned to one or more of the following morphological classes:

I.
Sister chromatid separation: Sister chromatid separation (SCS) of a chromosome refers to the loss of sister chromatid cohesion at the telomeres, and often along the sister chromatids, excluding the centromeres.
Due to inherent limitations of a centerline derived from contour skeletonization in chromosomes, SCS often resulted in partial or complete localization of the centerline along a single chromatid, rather than along the long axis of the full-width chromosome [8][9][10] . Complete centerline localization to chromatids of the q arm was common among acrocentric chromosomes (see Fig. 2A). This resulted in a width profile in which the displaced centerline did not accurately represent the width of the chromosome, and compromised centromere determination.

II. Chromosome fragmentation:
Sister chromatid pairs were completely dissociated in metaphase images, resulting in incorrect labeling of each chromatid as separate chromosomes. Occasionally, segmentation fragmented images of intact non-uniform chromosomes into multiple, chromosomal artifacts 6 (see Fig. 2B). Artifactual fragmentation into incomplete chromosome fragments led to unpredictable results, increasing FPs and FNs.

III. Chromosome overlap:
Poor spatial separation of chromosomes produced clusters of overlapping/touching chromosome clusters which were inseparable. Occasionally, the cluster is segmented as a single contiguous object (see Fig. 2C). Like chromosome fragments, analysis of these overlapping chromosome clusters produces erroneous results. FP DCs were produced from clusters comprising two underlying monocentric chromosomes, each contributing a centromere to the combined object.

IV. Noisy contour:
Poor image contrast at the chromosomal boundary produced "noisy," jagged chromosome contours contributing multiple small constrictions to the width profile (see Fig. 2D). These artifactual constrictions were incorrectly identified as multiple centromeres if their magnitudes were similar to the true centromere, leading to FP assignment.

V. Cellular debris:
Non-chromosomal objects such as nuclei and cellular debris were generally removed by preprocessing based on thresholding relative size and pixel intensity. However, aggregated cellular debris were occasionally labelled as a chromosome and naively analyzed by the software (see Fig. 2E).

VI. Machine learning error:
A "catch-all" subclass for MCs with no identifiable morphological traits and reasonable contours and centerlines (see Fig. 2F). These cases reflect deficiencies in the feature set or training data of the machine learning (ML) classifiers, rather than image segmentation errors.

3) Filtering out False Positive Objects
Quantitative filters were created and tested to delineate FP DCs. Each formula targets one or more of the morphological classes described above, and generates a unitless filter score for each object, independent of the biodosimetry reference laboratory source. For any metaphase image, {c 1 ,…,c N } denotes the set of N chromosomes within the image and c* denotes the predicted DC of interest. Each filter classifies c* as either a TP or FP by comparing its filter score against a heuristically-defined threshold that is independent of laboratory provenance. Thresholds were established empirically to maximize elimination of FPs without altering recognition of TPs. FPs generally produce lower filter scores than TPs (i.e. lower area, lower width, less oblong footprint, more asymmetrical), so FPs were selectively targeted by eliminating candidate DCs with scores below a threshold. Due to the low frequency of DCs in any given sample, minimizing the loss of TPs is paramount to minimize the likelihood of TP removal. For each filter, corresponding filter scores were calculated for all DCs in the HC-mixed image set (Table 1), and a heuristic threshold (to 2 significant digits; see below) was set to the minimum value observed in TPs.
Thresholds for filters VI to VIII were calculated by repeating the same procedure on a chromosome set of 244 TPs from the MC-DC SVM training set, and the final thresholds were set to the lower of each pair of values.

I. Area filter:
A(c) denotes the pixel area occupied by chromosome c (see Fig 3B).

II. Mean width filter:
W mean (c) denotes the mean value of the width profile of chromosome c (see Fig 3C). c* was classified as FP if W mean (c*)/median({W mean (c 1 ),…,W mean (c N )}) < 0.80 or as TP otherwise. This filter targets SCS and chromosome fragments.

III. Median width filter:
W med (c) denote the median value of the width profile of chromosome c (see Fig 3C).

VI. Oblongness filter:
S(c) denotes the pair of side lengths of the minimum bounding rectangle enclosing the contour of chromosome c (see Fig. 3D). c* was classified as FP if 1 − min(S(c*))/max(S(c*)) < 0.28 or as TP otherwise. This filter targets acrocentric chromosomes with SCS and some cases of overlapping chromosomes.

VII. Contour symmetry filter:
Let L(c) denote the pair of arc lengths of contour halves produced by partitioning the contour of chromosome c at its centerline endpoints (see Fig. 3E). Classify c* as FP if min(L(c*))/max(L(c*)) < 0.51 or as TP otherwise. This filter targets SCS.

VIII. Intercandidate contour symmetry filter:
L C (c) denotes the pair of arc lengths of the contour regions of chromosome c that run between the traceline endpoints of its top 2 centromere candidates (see Fig. 3F). c* was classified as FP if min(L C (c*))/max(L C (c*)) < 0.42 or as TP otherwise. This filter targets SCS and some cases of overlapping chromosomes.
Incorporation into existing algorithms: After chromosome processing and MC-DC SVM classification 11 but prior to dose determination, all DC chromosomes inferred by ADCI were analyzed with the proposed DC filters. DC filter scores exceeding TP thresholds were included in the dose determination, whereas DCs classified as FPs by any filters (inclusive "or") were eliminated. DCs that were filtered out are outlined in yellow in the ADCI cell image viewer 11 (Fig. 4).

Determination of optimal filter subset:
The proposed filters were not completely independent of each another, as some measures were related to the same chromosome segmentation features (i.e. width for filters II-V, contour symmetry for VII-VIII) and/or targeted the same morphological subclass (notably SCS). Thus, the "optimal" filter subset (termed "FP filters") was defined as the subset of filters which maximized FP removal ability while minimizing redundant FPs. Performance for a given set of filters was the total percentage of FPs removed by any of its filters (inclusive "or") in the HC-mixed image set (see Table 1). Using a forward selection approach, individual filters were added iteratively to identify those which produced the largest improvement in performance.

Evaluation of FP specificity on HC test samples: All objects removed by the FP filters in each image in
HC samples INTC03S01, INTC03S08 and INTC03S10 (Table 3) were manually reviewed ( Fig. 4).
Filtered TPs and filtered objects with ambiguous classifications (TP or FP) were reviewed with another expert before final classification. For each sample, the number of filtered FPs was determined by subtracting number of filtered TPs from the total filtered count, and FP specificity was defined as the ratio of count of FPs to that of all filtered objects.

4) Dose estimation analysis
In ADCI, a pre-computed dose-response calibration curve is also used to estimate radiation absorbed in samples with unknown exposures 11 . For a given sample, ADCI calculates the mean response from total number of detected DCs divided by the number of cell containing images. Calibration curves can be generated from a set of calibration samples either by processing and calculating a response for each sample, or allowing the user to input the corresponding response, and fitting the dose-response paired data to a linear-quadratic curve by regression. Because sample preparation protocols vary between laboratories, dose estimation of test samples were performed with calibration curves generated by the same source 11 .
Distinct calibration curves were generated for each laboratory, either enabling or disabling FP filters, for the 0, 0.5, 1, 2, 3 and 4Gy calibration samples (see Table 2). Radiation doses of images obtained by HC for test samples (Table 3) were estimated using the HC calibration curve derived by ADCI after applying the same FP filters. A similar analysis was carried out for the 5 CNL test samples using the CNL calibration curve data.

5) Effect of filtering on manually image selected HC data
To investigate the impact of manual image selection on dose accuracy, we compared HC calibration curves derived from manually curated samples with the FP filters either enabled or disabled (Table 2).
Manual curation of the HC samples was similar to manual image selection performed by CNL. Images were selected requiring: I) Complete complement of approximately 46 chromosomes, >40 segmented objects, <5 segmented objects from different nuclei if multiple nuclei present; II) Exclusion of "harlequin" chromosomes. Cells with unevenly stained sister chromatids cultured in the presence of bromodeoxyuridine (BrdU), which is indicative of 2 nd division metaphases, were excluded 10 ; III) Wellspread, sharply-contrasted chromosomes with minimal sister chromatid dissociation. Only images with <5 incorrectly-segmented chromosomes were included, where incorrect segmentation was defined as chromosome overlaps (indicating poor spread), fragments (indicating sister chromatid dissociation) and overly-noisy contours (indicating poor image contrast); IV) Adequate chromatid condensation.
Depending on the stage of metaphase arrest, the degree of chromosome condensation can differ 13,14 .
Prometaphase cells have longer chromosomes, are less rigid, exhibit greater overlap and less well-defined centromere constrictions, all of which pose a significant challenges for automated chromosome classifiers 14,15 . Metaphase images with longer, thinner chromosomes (roughly corresponding to >500band level 14 ) were also excluded. Guidelines I-III and a minimum sample size of 500 cells were adopted from IAEA recommendations 12 , whereas guideline IV was added after preliminary inspection of HC calibration samples. Manual curation was performed within ADCI by retrospectively excluding images in processed samples from dose analysis (Fig. 4). For each sample, consecutive images meeting all criteria were evaluated until 500 images were accrued. DC classifications were hidden during image selection to minimize bias. After generation of the curated HC calibration curves, the radiation doses of the three HC test samples (Table 3) were re-estimated on the new curves, with and without the FP filters enabled.

6) Automating removal of suboptimal images by morphology filtering
Reference biodosimetry laboratories screen for interpretable metaphase cell images prior to DC analysis.
Manual selection of images assures consistency and reliability of metaphase data, which increases analytic accuracy. As automated DC analysis can also be affected by variable cell image quality, excluding undesirable images in a sample would be expected to reduce FPs, and expected to more accurately estimate radiation exposures.
Image segmentation filters used empirically determined criteria to eliminate metaphase cells with characteristics that increased FP DCs. Image-level segmentation filters that threshold features I and II (below) were used to detect cells in prometaphase (relatively long and thin chromosome morphology), prominent sister chromosome dissociation, and highly bent and twisted chromosomes; another filter (III) denotes the threshold SD common to all 3 filters that identifies outlier images. This SD value was set heuristically to 1.5 after by varying T after applying these filters to the HC2Gy calibration sample (Table   2). Similarly, suggested thresholds in filters IV-VI are also derived from experiences of testing multiple samples.

VI.
Classified object ratio (ClassifiedRatio) filter defines the ratio of objects recognized as chromosomes to the total number of segmented objects. It prevents images in which ADCI fails to process most chromosomes from being included. An image is removed if the value is less than a threshold of either 0.6 or 0.7, which is determined by the desired level of stringency for applicatoin of this filter.
Combining filters. Applying these filters sequentially to the same image distinguished the metaphase images for dose estimation from less optimal cells with increased FPs. This was done by combining the Z-scores of the image filters in a linear expression of features I-VI that provides an assessment of image quality. The resultant total score represents the degree to which a particular image deviates from the population of images in a sample: Each feature has a positive free parameter, weight, to adjust its contribution to the total score. The term LW determines that longer and thinner chromosomes in the image will increase the score, as do bending and twisted chromosomes due to the term CD. Lower chromosome concavity also drives the score higher because of FD term. Object count and segmented object count describe chromosome positioning, separated sister chromatid level, etc. Assuming the majority of images in a sample are good images, these terms will result in higher scores for images exhibiting either incomplete, multiple cells or severe sister chromatid separation. The last terms produce high scores for images that the algorithm does not process accurately. Images with smaller combined score are of higher quality. The weights used are identified by evaluating many possible weights and selecting those that minimize the error in curve calibration. The weights obtained are optimal for calibration samples, which will perform well on test samples, subject to the condition that the calibration and test samples have comparable chromosome morphologies. The score, however, cannot be used for inter-sample image quality comparisons, as z-scores are normalized within a sample.
Another, more general method was also developed to assess metaphase images separately from other images in the same sample. Image morphology is the primary consideration in assessing metaphase image (determined by the longest C group chromosome) but more than 2% (determined by the shortest C group chromosome) of total base-pairs in the set. Any chromosome in category DC contains fewer than 2% (determined by the longest D group chromosome) of the total base-pairs. These thresholds 2.9% and 2% are acceptable for the X and Y chromosomes, respectively. We apply these thresholds to object areas to count the number of chromosomes in each category in a metaphase image. An ideal metaphase image will have 10 AB chromosomes, 16  When images in a sample are sorted, by either combined z-score or by chromosome group bin area measurement, a certain number of top ranked images can then be selected for dicentric chromosome analysis. Complex image selection models can be created by filtering images first with filters and then selecting a certain number of top scoring images.

7) Sample Quality Confidence Measurement
Metaphase image artifacts such as sister chromatid separation and chromosome fragmentation interfere with the ability to correctly identify dicentric chromosomes, and compromises the reliability of dose estimates. This dependence of dose estimation accuracy on sample image quality motivates objective tests to evaluate and flag data from lower quality samples and exclude such images from analysis. Samples

Application of chromosome morphology filters to remove FPs
False positive DCs (n=97) from a low dose set metaphase images were classified to uniquely identify, and ultimately eliminate these objects. Chromosomal morphological subclasses (Fig. 3)  Segmentation filtering criteria were applied to these images. Scale-invariant filters were tested to determine thresholds that selectively removed subclasses I-III without eliminating any TPs. Of the 51 SCS cases, 35 involved short, acrocentric chromosomes. FPs were distinguished from TPs based on either their lower relative pixel area or width (filters I-V), substantially non-oblong footprint (filter VI), or substantial contour asymmetry across the centerline (filters VII and VIII). For filters I-V, normalization to median scores of other objects in the same image performed similarly to normalization to other measures of central tendency (e.g. z-score, mean, and mode after binning scores). FPs were eliminated for each morphological subclass (Table 4), with most of the segmentation filters acting on the targeted subclass, however, the effects of each filter were not exclusive to those subclasses (Methods 3).
To evaluate individual filter performance, the percentage of FPs removed by each filter was calculated for the HC-mixed image set (Table 5). A two-sample Kolmogorov-Smirnov test (K-S) was also performed for each filter (α=0.05) on the same data, where one sample consisted of the filter scores of all TPs (n=183) and the other sample consisted of the scores of all FPs (n=158). All 8 filters rejected the null hypothesis (  (Table 7), consisting of 2 HC image sets (HC-low and HC-high, which were used during filter development) and an independent low dose image set from CNL. On average, 55 ± 9.6% of FPs were removed among all sets; individually the filters eliminated 52% of FPs from the CNL set, which was comparable to the HC sets (66% and 48% for low and high dose sets, respectively). All TPs were retained in each of the sets after processing of FPs (i.e. 100% specificity).
Dose-response calibration curves for HC and CNL data were generated in ADCI to investigate the effect of the filters on dose estimation accuracy (Fig. 5). Dose accuracy was assessed by determining the absolute error (absolute difference between dose estimate and true physical dose). For comparison, the dose estimates of 6 test samples (3 from HC, 3 from CNL) were compared which were either unfiltered and in which combinatorial FP filters were applied (Table 8). In samples that were manually curated by CNL, accuracy was also improved >2-fold by applying the 5 combined FP filters (average error decreased from 0.43Gy to 0.18Gy).
The dose accuracy in the HC samples was impacted by addition of these filters (mean absolute error increased from 0.85Gy to 1.03Gy). One explanation was either the filters were removing many TPs inadvertently, or FPs removed by the filters were offsetting previously undetected DCs (false negatives) in the HC samples. All objects eliminated with these filters in the 3 HC samples were reviewed and classified as either TP or FP, and the FP specificity across the samples was determined (Table 9). Similar to earlier findings, the FP filters exhibited very high specificity for FPs (97.7-100%), indicating that the filters retained high specificity for TPs in the HC samples.
We hypothesized that the difference in image selection protocols was responsible for the discrepancies seen in classification performance and dose estimation accuracy between the two sources. While CNL manually selected for images deemed suitable for DCA analysis, image selection at HC was done with an automated metaphase classifier that effectively removed only images lacking metaphases (see Methods 1). Manual review of images in the HC and CNL samples confirmed noticeable differences in image quality: In concordance with findings from our previous study 1 , CNL data contained more images with well-spread, minimally-overlapping chromosomes, and fewer images with extreme SCS and chromosome fragments (complete dissociation of sister chromatids). The HC data contained a greater percentage of high-band-level (less condensed) chromosomes, characteristic of prometaphase/early-metaphase cell images. These chromosomes were the source of many unfiltered FPs, due to the lack of a strong primary constriction at the centromere.
A new set of HC calibration curves were then generated from manually curated, selected images from calibration samples (Fig. 6). Images were excluded based on IAEA criteria 17 , along with cells exhibiting long chromosomes in early prometaphase 16 (Methods 5). (Table 10). Dose estimation accuracy of the HC samples (INTC03S01, INTC03S08 and INTC03S10) was significantly improved by enabling the 5 FP segmentation filters (mean unfiltered absolute error was 0.37Gy, and was 0.15Gy with the filters; Table   10). Therefore, application of FP filters to both CNL and curated HC data led to > 2-fold reduction in the mean absolute error of the estimated dose (p = 0.024, paired two tailed t-test). To rank images with the combined z-score method, a weight vector corresponding to each of the 6 filters comprising the total score was first determined. Optimal weights were obtained by searching a large number of possible values among the set of HC calibration samples for those exhibiting smallest residuals when fit to the curve. The potential weights were defined as integers ranging from [1,5]. This limited the search space and eased computational complexity, but nevertheless ensured that diverse combinations of weights were evaluated. In experiments, three optimal weight vectors, namely [5,2,4,3 After images were assigned scores and sorted according to their combined z-scores (or by the chromosome group bin method-see below), the 250 top ranked images were subsequently selected to determine dicentric aberration frequency for that sample. An adequate number of top ranked images are selected to provide sufficient images to generate a reproducible DC frequency for that sample. The top ranked image set also has to effectively remove poor quality images that could distort the DC frequency.

Application of Image Selection Models
IAEA has recommended at least 100 DCs be detected for samples with physical doses >1 Gy. In practice, laboratories score >250 images, but often more. Considering the total number of images in a sample ranges from 500 to 1500, we found that selecting the 250 top scoring images gave satisfactory results. Figure 7 indicates that the DC frequency for the HC3Gy calibration sample stabilizes after at least 250 to 300 top images were included. Similar results were obtained for other test and calibration samples (not shown). DC frequencies can differ between image selection methods because each method can select different images. When the number of top ranked images significantly exceeds 300 images, differences between the specific image selection methods are minimized as they share increasing numbers of selected images. Unfiltered randomly sampled images from this sample tend to exhibit higher overall DC frequencies due to increased numbers of FP DCs.
The deviations of estimated doses of all of the HC and CNL test samples, respectively, from physical doses, were determined for various ADCI image selection models (Tables 10 and 11 sample was relatively lower quality than others in this set, and the unfiltered set of metaphase images was smaller than the recommended minimum, consisting of 477 cells (Table 12).

Sample Quality Assessment after Image Selection
To evaluate whether the image selection models improved sample quality, a Chi squared goodness of fit test was performed on the observed DC/cell vs. Poisson distributions for the CNL and HC samples, both prior to and after automated and manual image selection (Table 12). Manual image selection for CNL samples was performed by CNL during sample preparation, while image selection for HC samples was performed on unselected datasets (see Methods 5; samples HC-INTC03S01, HC-INTC03S08, HC-INTC03S10 were analyzed, despite <500 images being available). For each laboratory, the best performing image selection models were used for FP and image level filtering (Tables 10 and 11). Image selection with filters I-III and chromosome group bin method was applied to HC sample data, whereas filters I-VI were applied to the CNL data. At the 1% significance level (i.e. Poisson goodness-of-fit, p  Scale-invariance is an obligate property for any object-level filter, since chromosome structures may vary between cells, individuals, and laboratory preparations. Scale invariance is also necessary to control for pixel-based chromosome measurements affected by condensation differences over the course of metaphase and differences in optical magnification. This principle was achieved by either using filter scores normalized to the median "raw" score of all objects within the same cell image (i.e. filters I-V), or in which scores were derived from ratios of two pixel-based measurements (i.e. filters VI-VIII).
Limitations of the current set of filters were revealed by differences in accuracy between the manually and automatically-selected images for dose estimation. For the previously manually curated CNL and HC samples, the FP object filters respectively reduced the average dose estimation error from 0.4Gy to <0.2Gy (with a maximum error of 0.4Gy). This placed the accuracy our software comfortably within the ±0.5Gy requirement for triage purposes 17 . However, applying the FP object filters alone to unselected HC metaphase data did not improve accuracy (average error increased by 0.15Gy). Thus, FP object filters alone did correct for inaccurate dose response estimates in all cases.
Variable cell image quality in some samples contributed to this source of error. Some unselected HC samples contained images with high levels of SCS, which upon processing produced large numbers incorrectly classified chromosome fragments. Image level filters I-V targeted these fragments, however they were not excluded based on their threshold values, because they comprised the predominant morphology within these particular cells. For similar reasons, object-level filtering was not suitable for elimination for removal of prometaphase images containing high resolution chromosomes (>800 band level). These observations suggested the need for image-level filters to select low quality images for removal in addition to the object-level filters.
Image quality is critical to the accurate DC detection. Manual inspection and quality control is common practice in cytogenetics and biodosimetry laboratories, but it is labor-intensive. Image-level filtering was automated to address this problem. These methods apply statistical thresholds to morphological features of chromosomes and non-chromosomal objects throughout a metaphase cell image. Image scoring methods select a defined number of top-ranked, processed images for dose estimation. The combined zscore method is a weighted sum of standard deviations below or above the mean score of objects in an image for each of the filter, and indicates relative image quality. The chromosome group bin method is a more general criterion that is calibrated to relative chromosome lengths (and area) in base pairs. ADCI evaluates the morphological deviation of chromosome area and ranks cell images relative to that expected from the standard, normal karyotype. These FP filtering and image scoring methods, which are referred to collectively as image selection models, can be applied either individually or in combinations within ADCI.              Table   2). The CNL curves consistently show a more pronounced quadratic component than the HC curves, which exhibit a nearly linear response. After applying FP filters (cyan), the curves show a diminished dose-response (green), due to elimination of some detected FP DCs. Figure 6. Original vs. manually curated calibration curves for HC samples. The dose-response calibration curves for HC sample data, with and without FP filters applied, before and after curation. Response (mean DC frequency) on vertical axis, corresponding radiation dose (Gy) on horizontal axis. Green curve is not curated and includes all images, cyan curve is not curated and applies FP DC filters, red curve is curated, but unfiltered, and blue curve is curated and FP filters have been applied. Uncurated curves were generated from 0, 0.5, 1, 2, 3 and 4Gy calibration image data (see Table 2). Curated curves were generated from the same data (however 0.5Gy was not included) after lower quality images were manually removed (see Methods 6). After manual curation, the curves show a stronger quadratic component, similar to the CNL curves (see Fig. 5).