Reliability and reproducibility of spectral and time domain optical coherence tomography images before and after correction for patients with age-related macular degeneration

Purpose: To evaluate the reproducibility and reliability of optical coherence tomography scans obtained using the time domain (TD-OCT) Stratus TM OCT, and the Spectral Domain (SD-OCT) Spectralis TM and Cirrus TM OCT devices before and after manual correction in eyes with either Neovascular (NV-AMD) or Non-Neovascular (NNV-AMD) age-related macular degeneration. Design: Prospective observational study. Methods: Setting: University-based retina practice. Patients: Thirty-six patients (50 eyes) with NV-AMD or NNV-AMD. Procedure: OCT scans were taken simultaneously using one TD-OCT and two SD-OCT devices. Main Outcome Measures: Macular thickness measurements were assessed before and after correction of the algorithm by constructing Bland-Altman plots for agreement and calculating intraclass correlation coefficients (ICCs) and coefficients of repeatability (COR) to evaluate intraclass repeatability. Results: Spectralis had the highest number of images needing manual correction. All machines had high ICCs, with Spectralis having the highest. Also, Bland-Altman plots indicated that there was low agreement between Cirrus™ and Stratus™, Spectralis™ and Stratus™, while there was good agreement between the Cirrus™ and Spectralis™. The CORs were lowest for Spectralis TM and similar and higher for Cirrus TM and Stratus TM. Agreement, CORs, and ICCs generally improved after manual correction, but only minimally. Conclusion: Agreement is low between devices, except between both SD-OCT machines. Manual correction tends to improve results.


Introduction
Optical Coherence Tomography (OCT) is a non-invasive imaging modality that allows acquisition of cross-sectional images of the retina. OCT is useful in monitoring and evaluating retinal thickness in many retinal disorders. One example is Age-related Macular Degeneration (AMD), a progressive, blinding disease that is mostly non-neovascular (NNV-AMD) but can be associated with choroidal neovascularization (NV-AMD). Currently, OCT is also being employed as an outcome measure in many multicenter clinical trials of AMD with Time Domain OCT (TD-OCT) device being the most common 1,2 .
As this technology is increasingly being utilized by many ophthalmologists to evaluate and monitor patients and guide treatment decisions 2 , it is important to understand the reliability and accuracy of thickness measurements obtained with various devices currently available. Recently, studies have shown that in patients with AMD, there is a high frequency of errors in automated retinal thickness measurements due to incorrect segmentation of the retina in the TD-OCT machine specifically in NV-AMD 2,3 . Using an Spectral Domain OCT (SD-OCT) device Menke et al. found that NNV-AMD had fewer errors than NV-AMD, mostly due to the pathology of the disease resulting in retinal pigment epithelial (RPE) layer changes 4 .
Manual correction of the algorithm is an option in newer generations of the review software and as more OCT devices are coming to the market, it is important to understand the clinical importance of manual correction of OCT algorithms and the agreement of thickness measurements from different machines before and after correction. In our study, we evaluated the intra-session repeatability and agreement in retinal thickness measurements for patients with NV-AMD and NNV-AMD before and after manual correction using three different OCT devices: Stratus™ TD-OCT and two SD-OCTs, Spectralis™ and Cirrus™.

Methods
Institutional Review Board (IRB)/Ethics Committee approval was obtained and HIPAA guidelines were followed for the study.
Informed consent was obtained from study subjects.

Patients and scanning
Patients with confirmed diagnosis of AMD were enrolled in the study. Two senior retina specialists (QDN and DVD) made the diagnosis of AMD. Patients under treatment with intravitreal injections of anti-vascular endothelial growth factor (VEGF) agents were also allowed to participate in the study.
Patients were scanned twice by certified OCT operators on a TD-OCT device (Stratus™ OCT) and two SD-OCT devices (Spectralis™, and Cirrus™ OCT) machines in random order and with 5-10 minutes between each device. The same operator performed all the scans on any given patient. Scans on a single device were performed consecutively and 5 minutes apart from each other.

Optical Coherence Tomography
One TD-OCT machine, Stratus™ (software version 4), and two SD-OCT machines, Spectralis™ (software version 5.0 I and Cirrus™ (software version 5.0.0.326) were used. Stratus™ is a TD-OCT machine that uses a super luminescent diode with a wavelength of 820 nm. It provides an axial resolution of 10µm and image acquisition speed of 400 A-scans/second. Using the Stratus™, two fast macular thickness maps (FMTP) were acquired from each eye. The FMTM is created through acquiring six radial B-scans, each consisting of 512 A-scans, and at an angle of 30° from each other with the point of intersection centered on the fovea.
Spectralis™ uses a super luminescent diode with a wavelength of 870 nm. It provides axial resolution of 4µm and image acquisition speeds of up to 40,000 A-scans per second. Two volume scans were acquired from each eye using a raster scan of 19 lines covering 20×15 o of the fundus. Using the TruTrack™ functionality of the Spectralis™ OCT, each line was averaged 15 times or more. Cirrus™ HD-OCT also uses a super luminescent diode with a wavelength of 840 nm. It provides images with an axial resolution of 5µm and acquisition speeds of 27,000 A-scans per second. We acquired two 512×128 macular cube scans (128 B-scans and 512 A-scans, covering a retinal area of 6.0×6.0 mm) from each eye.
Error determination, manual correction, and exclusion of scans Scans from each of the three devices were reviewed at the Ocular Imaging Research and Reading Center at the Stanley M. Truhlsen Eye Institute by two independent graders. Segmentation errors due to incorrect identification of inner and outer retinal boundaries by automated algorithms in the Spectralis™ and Cirrus™ devices were identified and manually corrected by these graders. Stratus™ images could not be corrected due to the lack of editing capabilities in the operating system provided with the machine at the time of conducting the study. Only 5 patients required corrections and were excluded from the analysis. The proprietary software identifies retinal boundaries for measurement of retinal thickness that are specific to each device. Meanwhile each device identifies the inner limiting membrane (ILM) as the inner boundary of retina, identification of the outer boundary is different for each device. Stratus™ identifies the junction between the inner and outer segments of photoreceptors (IS/OS) as the outer boundary, Spectralis™ identifies the posterior border of the retinal pigment epithelium (RPE), and Cirrus™ identifies the inner border of the RPE as the outer retinal boundary.
Whenever the foveal center could be identified, grids were repositioned for scans with off-center positioning of the ETDRS grid. However, in some cases, morphological changes associated with the advanced disease made identification of the foveal center unreliable. Adjustment of grid position was not possible for Stratus™ OCT. Scans were excluded from analysis only if identification of retinal layers and determination of the retinal thickness was not possible.
OCT scans from which extraction of thickness data for the central 1mm sub-field was not reliable, due to missing data in the image or the scan being out of range, were also excluded from analysis.
The retinal thickness measurements of the nine standard ETDRS subfields (Appendix A illustrates the nine-subfield abbreviations) were recorded from each device before and after correcting the errors in the scans algorithm.

Statistical analysis
No formal sample size calculation was performed before the conduct of the study. Bland-Altman plots were constructed to determine agreement between devices; both 95% confidence intervals and limits of agreements were calculated. Reproducibility of measurements was determined by calculating the coefficients of repeatability (COR) for each machine. Intraclass correlation coefficients (ICCs) were used to determine the reproducibility for each device. Statistical significance of difference in thickness before and after correction of images across devices was determined via student's t-test with α = 0.05 with Bonferroni correction for multiple comparisons. STATA version 10 and Microsoft Excel 2007 were used for data management and analysis. The statistical analysis was performed before and after any manual corrections were made to the algorithm errors described above.

Results
Fifty eyes from 36 patients were included in the study; 29 eyes had NV-AMD and 21 eyes had NNV-AMD. The mean age of the study subjects was 76.6 years.

Exclusion and corrections
Stratus™ Scans from four eyes could not be recovered from the database and scans from three eyes had algorithm errors with incorrect identification of retinal boundaries and were excluded from analysis. Scans were not corrected for off-center positioning of the scan as moving the ETDRS grid was not possible with the available software version.

Cirrus™
Scans in six eyes scanned first and eight eyes scanned second were corrected either for off-center fixation of the eye or for incorrect automated identification of retinal boundaries. The thickness measurements before and after correction were not statistically significant (P<.05) for any of the subfields and also when stratified by diagnosis.

Spectralis™
Thirty-three scans among the first set and 32 among the second set were corrected. The inner inferior subfield for NV-AMD was the only subfield that was statistically significant before and after correction. Figure 1 plots the frequency of the differences before and after correction for the central subfield for all scans. 77% of the differences were less than 48µm and 50% were less than 10µm. Micrometer difference (um)

OCT characteristics
The mean (±SD) of the macular thickness of all of the subfields, including the central 1mm subfield (FTH) for Stratus™, Cirrus™, and Spectralis™ before and after manual correction of scans, stratified by diagnosis of NV-AMD and NNV-AMD, is shown in Table 1.
For NV-AMD, the FTH values for central 1mm were 375µm (±129µm), 253µm (±74µm), 312µm (±110µm) for Spectralis™, Stratus™, and Cirrus™ respectively. After correction, the values were 335µm (±106µm) for Spectralis™ and 318µm (±110µm) for Cirrus™. On the other hand, the FTH values for NNV-AND in the central 1mm before correction were 298µm (87µm), 193µm (±32µm), and 229µm (±30µm) Table 2 shows the ICC values between images for all three machines before and after correction, both combined and stratified by diagnosis. It should be noted that all of the machines had ICC values >90% for the central subfield while the Spectralis™ had no subfields less than 99% after correction. In the central subfield, Spectralis™ had a COR of 20µm NV-AMD which increased to 23µm; both Cirrus™ and Stratus™ had relatively larger CORs of 64µm (reduced to 49µm after correction) and 35µm, respectively. For NNV-AMD, the COR for the central subfield was 15µm for both Cirrus™ and Spectralis™, and was 24µm for Stratus™. After correction, the value decreased for Spectralis™ to 12µm and increased to 36µm for Cirrus™. The COR of all subfields for each device before and after correction of algorithms and stratification by disease are given in Table 3.
Overall Spectralis™ had the lowest COR, with values ranging from 5-30µm. Cirrus™ and Stratus™ had similar values ranging from 5-70µm, even after correction. The COR for Cirrus™ increased by 15-40µm after correction for NNV-AMD. Also, Cirrus™ COR values were 10-30µm higher than Stratus™ values for both NV-AMD and NNV-AMD. Agreement between machines was poor, except between Spectralis™ and Cirrus™ after correction. Table 4-Table 5 show 95% confidence intervals and limits of agreement of the Bland-Altman plots between devices before and after manual correction.
Figure 2a-f show Bland-Altman plots with 95% confidence intervals for the FTH comparison of the machines before and after correction. Before correction, the mean difference between the machines was 32µm for Spectralis™ vs. Cirrus™, 52µm for Cirrus™ vs. Stratus™, and 84µm for Spectralis™ vs. Stratus™. Manual correction reduced the differences, with it being 15µm for Spectralis™ vs. Cirrus™, 51µm for Cirrus™ vs. Stratus™, and 67µm for Spectralis™ vs. Stratus™. When stratified by diagnoses, the values were 34µm and 29µm for Spectralis™ vs. Cirrus™, 53µm and 47µm for Cirrus™ vs. Stratus™, and 88µm and 79µm for Spectralis™ vs. Stratus™ for NV-AMD and NNV-AMD before correction, respectively. After manual correction, the values reduced to 17µm and 14µm Spectralis™ vs. Cirrus™ and 70µm and 61µm Spectralis™ vs. Stratus™ for NV-AMD and NNV-AMD, respectively. The confidence interval widths, on average, were 5-10µm smaller than between an SD-OCT and TD-OCT machine. The average interval width decreased between 5-10µm after correction for any disease and comparison, except for the Cirrus™ vs. Stratus™ comparison.

Discussion
The advent of OCT has revolutionized the way patients with retinal disorders are evaluated and monitored. However, like every new device, the current devices employing time-or spectral domain technology have certain limitations. One such common and clinically relevant issue is the presence of a random error in the identification of the inner and outer boundaries of the retina by the algorithm. With respect to AMD, studies have shown that in lesions such as fibrotic scars, choroidal neovascularization disrupting the RPE, and subretinal fluid the automated segmentation algorithms would produce errors because the software would not correctly delineate the outer retinal boundary 3,5 . In our study, we found that 66% of the Spectralis™, 14% of the Cirrus™ and 6.5% of the Stratus™ scans had algorithm errors. Giani et al. reported similar results; for Cirrus™, they reported 25% and 16% algorithm error rates for NNV-AMD and NV-AMD, respectively. However, for Spectralis™, they reported 16.67% and 57.6% algorithm error rates and 8.33% and 62.5% rates for Stratus™ for NNV-AMD and NV-AMD, respectively 5 . Other studies have reported Stratus™ outer boundary algorithm errors of approximately 43% for both forms of AMD and 60% for NV-AMD 3,6 .
Reasons for differences in our error rates compared to previous include a lack of standard definition of an algorithm error. Rather than having an exact definition of an algorithm error, which may not be clinically significant 5 , in our study, the decision was made by two masked observers who determined if the correction would be important. In addition, even though Spectralis™ segments the outer border of the RPE, a study by Jaffe et al. reported that it may also be including the Bruch's membrane in its calculation, thus including sub-RPE pathology such as drusen when segmenting the outer border of the retina 7 . These differences may be due to the fact that our study was prospective and while acquiring scans, the operators tried their best to ensure no errors occurred during scan acquisition. Lastly, we did not exclude scans if the signal strength was low or if the machine gave a low analysis confidence message, as other studies have done 8-10 . After correction the thickness measurements for the Spectralis™ and Cirrus™ scans were not significantly different. This may be due to the fact that the majority of the scans required minor corrections. For example, more than 50% of the Spectralis™ scans resulted in a 10µm or less change in the central subfield thickness. Krebs et al. have also previously reported no significant differences in retinal thickness measurements before and after correction of segmentation errors of scans taken using Cirrus™ 11 .
The differences in the mean thickness values before and after correction in scans taken using Spectralis™ were most obvious in the central subfields of the retina (C1, N1, S1, T1, and I1) with the peripheral subfields being spared (N2, S2, T2 and I2). This may be attributed to the fact that the pathology of AMD is located centrally and therefore pathology related inaccuracies in segmentation are more likely to occur in these subfields.
Retinal thickness measurements were similar in both SD-OCT machines and were greater than Stratus™. Correction reduced the difference of the thickness measurements between the two SD-OCT devices to less than 20um; in some cases as noted above, the difference was no longer statistically significant. Other studies in normal and pathologic eyes including DME and macular degeneration have also demonstrated that the difference in retinal thickness between the SD machines can be attributed to the differences in segmentation of the automated algorithms 7,10,12 .
Despite the large numbers of scans with algorithm errors, the COR of Spectralis™ was lower for every subfield than that of Stratus™ or Cirrus™. The COR of Cirrus™ was equal to or larger than Stratus™ for both forms of the disease. In all three devices, the COR was generally better for NNV-AMD when compared to NV-AMD, especially after correction. The disease difference can be attributed to the pathology of NV-AMD disrupting the outer border, which makes it difficult for the automated algorithm to accurately segment the retinal layers 13,14 . Krebs et al. evaluated the repeatability of retinal thickness measurements using Spectralis™ and Cirrus™ in patients with AMD. For images taken using Spectralis™ the mean difference between repeated measurements was found to be within 11µm before correction and within 1µm after correction. For images taken using Cirrus™ the mean difference between repeated measurements was found to be within 6µm before correction and within 4µm after correction 15 . Previous studies on normal eyes have reported a high repeatability of measurements with Spectralis™, with differences between repeated measurements being within 1µm 12,16 . For Stratus™ OCT images, other studies have found central subfield repeatability values in patients with NV-AMD to be 50µm and 32-35µm for NNV-AMD patients after correction/exclusion of scans with errors 8,17 ; our study confirms this finding. There has been one other published study looking at the repeatability of Cirrus™ OCT in NV-AMD, which found a central subfield repeatability value of 42um before correction and 26µm after exclusion of scans with significant segmentation errors 18 . The difference between this study and our measurements may be associated with our smaller sample size. In addition, we chose not to exclude any poor quality scans, which may cause larger differences.
In addition to a lower COR, Spectralis™ also had the highest ICC values for both NV-AMD and NNV-AMD, before and after correction. For NV-AMD, Cirrus had higher coefficients after correction, and for NNV-AMD, Cirrus™ had lower coefficients as compared to Stratus™. While no previous studies have reported ICC values for AMD patients, Pierro et al. found comparable results in normal eyes, with Cirrus™ ICC values ranging from 83-97% and Stratus™ ICC values from 72-95% 19 . The most likely reason for the low repeatability and high ICC values for Spectralis™ is the eye-tracking capability, which ensures that artifacts due to eye movement are minimized and the machine scans only when the tracking software identifies the same position on the fundus 16 .
Bland-Altman plots indicate that there is agreement between SD-OCT machines. Correcting images also influenced agreement between machines. We found that 95% confidence intervals were narrower as compared to an SD-OCT and TD-OCT and correcting the algorithm errors further narrowed the intervals. The mean difference between machines indicates that the lowest differences were between Spectralis™ and Cirrus™, especially after correction. This is mostly likely due to the effects of manually correcting the  Spectralis™ images and that both machines have similar scanning technologies. The limits of agreement were similarly very wide for all three machines, and were narrower after correction of images, especially for the two SD-OCT machines. Jaffe et al. reported similar results looking at NV-AMD, with limits of agreements being approximately 225um between a SD-OCT and TD-OCT 7 . The poor agreement warrants caution for clinicians when trying to use the data from different machines interchangeably especially in the central 1mm of retina since most clinicians.
Our study is not without its limitations. All images were taken at a single imaging center; this might have introduced some bias. The version of software used for the Stratus™ images did not allow correction of segmentation errors and therefore these images had to be excluded from the analysis. Two independent graders manually corrected all the images; this may have resulted in some inaccuracies in segmentation line correction. In addition, in a subset of patients that had a difference in the severity of disease, both eyes were included in the analysis; this may also have resulted in possible bias. The Cirrus device that was used to capture the images did not have eye tracking and may have led to the slightly larger COR values when compared to Spectralis.
In summary, we found that although Spectralis™ had the highest frequency of errors in AMD patients, correction of images did not result in significant changes in retinal thickness due to the errors being very small. Spectralis™ had the lowest COR values. Thus Spectralis™ maybe the best suited for examining minute morphological and thickness changes. Also, because of the wide Bland-Altman 95% intervals, there is not much agreement between the SD-OCT and TD-OCT machines. Based on our findings, we recommend that scans be carefully analyzed at reading centers before the thickness values are accepted as reliable.
Author contributions AR and YJS and RC conceived the study. AR, MAS, RC, and EH and MS carried out the research. AR, MAS, YJS, RC and MS prepared the manuscript. YJS and MAS provided statistical support. DVD and QDN supervised the project. All authors were involved in the revision of the draft manuscript and have agreed to the final content.

Competing interests
No competing interests were disclosed.

Grant information
The author(s) declared that no grants were involved in supporting this work.