Using different methods to process forced expiratory volume in one second (FEV 1) data can impact on the interpretation of FEV 1 as an outcome measure to understand the performance of an adult cystic fibrosis centre: A retrospective chart review

Background: Forced expiratory volume in one second (FEV 1) is an important cystic fibrosis (CF) prognostic marker and an established endpoint for CF clinical trials. FEV 1 is also used in observation studies, e.g. to compare different centre’s outcomes. We wished to evaluate whether different methods of processing FEV 1 data can impact on centre outcome. Methods: This is a single-centre retrospective analysis of routinely collected data from 2013-2016 among 208 adults. Year-to-year %FEV 1 change was calculated by subtracting best %FEV 1 at Year 1 from Year 2 (i.e. negative values indicate fall in %FEV 1), and compared using Friedman test. Three methods were used to process %FEV 1 data. First, %FEV 1 calculated with Knudson equation was extracted directly from spirometer machines. Second, FEV 1 volume were extracted then converted to %FEV 1 using clean height data and Knudson equation. Third, FEV 1 volume were extracted then converted to %FEV 1 using clean height data and GLI equation. In addition, year-to-year variation in %FEV 1 calculated using GLI equation was adjusted for baseline %FEV 1 to understand the impact of case-mix adjustment. Results: Year-to-year fall in %FEV 1 reduced with all three data processing methods but the magnitude of this change differed. Median change in %FEV 1 for 2013-2014, 2014-2015 and 2015-2016 was –2.0, –1.0 and 0.0 respectively using %FEV 1 in Knudson equation whereas the median change was –1.1, –0.9 and –0.3 respectively using %FEV 1 in the GLI equation. A statistically significant p-value (0.016) was only obtained when using %FEV 1 in Knudson equation extracted directly from spirometer machines. Conclusions: Although the trend of reduced year-to-year fall in %FEV 1 was robust, different data processing methods yielded varying results when year-to-year variation in %FEV 1 was compared using a standard related group non-parametric statistical test. Observational studies with year-to-year variation in %FEV 1 as an outcome measure should carefully consider and clearly specify the data processing methods used.

Although the trend of reduced year-to-year fall in %FEV was Conclusions: robust, different data processing methods yielded varying results when year-to-year variation in %FEV was compared using a standard related group non-parametric statistical test. Observational studies with year-to-year variation in %FEV as an outcome measure should carefully consider and clearly specify the data processing methods used.

Keywords
Cystic fibrosis, epidemiology, patient outcome assessment, forced expiratory volume Martin  No competing interests were disclosed.

Competing interests:
The author(s) declared that no grants were involved in supporting this work. Using different methods to process forced expiratory volume in one second (FEV ) data can impact on the interpretation of FEV as an outcome measure to understand the performance of an adult cystic fibrosis centre: A retrospective chart review [

Introduction
Cystic fibrosis (CF) is a multi-system genetic condition but the two main affected organs are lungs (resulting in recurrent infections and respiratory failure) and gastrointestinal tract (resulting in fat malabsorption and poor growth) 1 . Median survival has improved to 45 years, in part because of improvement in care quality 2 . An important quality improvement initiative is benchmarking, which involves identifying high-performing centres and the practices associated with outstanding performance [3][4][5] . Since forced expiratory volume in one second (FEV 1 ) is an important CF prognostic marker 6-9 , it is often used as an outcome measure for benchmarking [3][4][5]10 .
Different statistical methods of analysing FEV 1 data can yield different results 11 , but there is scant attention paid to the methods of processing FEV 1 data. We previously reported a statistically significant reduction in year-to-year %FEV 1 fall for our CF centre from 2013-2016 12 . We now set out to understand the impact of using different FEV 1 data processing methods on our CF centre's outcome.

Methods
This is a single-centre retrospective analysis of routinely collected clinical data from 2013-2016. Regulatory approval for the analysis was obtained from NHS Health Research Authority (IRAS number 210313). All adults with CF diagnosed according to the UK CF Trust criteria aged ≥16 years were included, except those with lung transplantation or on ivacaftor. These treatments have transformative effects on %FEV 1 13-15 , thus may affect the interpretation of year-to-year variation in %FEV 1 .
Demographic data (age, gender, genotype, pancreatic status, CF related diabetes, Pseudomonas aeruginosa status), body mass index (BMI) and FEV 1 data were collected by two investigators (HZH and RC / HZH and MEG) independently reviewing paper notes and electronic records. Where data from the two investigators differ, the original data from paper notes or electronic records were reviewed to by both investigators to ensure the accuracy of abstracted data. This process ensures the accuracy of abstracted data and helps avoid potential bias from inaccurate or inconsistent data collection 16 . FEV 1 data were processed with three different methods prior to analysis. First, %FEV 1 readings (calculated with Knudson equation 17 and available in whole numbers) were directly extracted from spirometer machines. Second, FEV 1 volumes (in litres, to two decimal places) were extracted and clean height data were used to calculate %FEV 1 (as whole numbers) with Knudson equation 17 . Third, FEV 1 volumes (in litres, to two decimal places) were extracted and clean height data were used to calculate %FEV 1 with GLI equation 18 using an Excel Macro (Microsoft Excel 2013).
Best %FEV 1 , i.e. the highest %FEV 1 reading in a calendar year for each study subject was used for analysis since it is most reflective of the true baseline %FEV 1
Year-to-year %FEV 1 change was calculated by subtracting best %FEV 1 at Year 1 from Year 2 (i.e. negative values indicate fall in %FEV 1 and positive values indicate increase in %FEV 1 ). In addition to calculating year-toyear %FEV 1 change using three different FEV 1 data processing methods, %FEV 1 change calculated with GLI equation was also adjusted for baseline %FEV 1 using reference values from Epidemiologic Study of CF (ESCF) 20 . The ESCF study found median %FEV 1 change of -3%/year, -2%/year and -0.5%/year for baseline %FEV1 ≥100%, 40-99.9% and <40% respectively 20 . Adjusted %FEV 1 change was calculated by subtracting median ESCF %FEV 1 change from actual %FEV 1 change. Thus, an adjusted %FEV 1 change >0 meant the subject's year-to-year change in %FEV1 was less than expected (indicating better health outcome) whilst an adjusted %FEV 1 change <0 meant the subject's year-to-year change in %FEV 1 was more than expected (indicating worse health outcome). %FEV 1 change from 2013-2014 to 2015-2016 calculated using different FEV 1 data processing methods were compared using Friedman test. Bland-Altman analyses 21 were also used to compare year-to-year variation in FEV 1 as calculated with Knudson equation against year-to-year variation in FEV 1 as calculated with GLI equation, to understand the impact of using different reference equations. Analyses were performed using SPSS v24 (IBM Corp) and Prism v7 (GraphPad Software). P-value <0.05 was considered statistically significant.

Results
This analysis included 208 adults, with 147 adults providing data for all four years. Overall, the cohort was ageing but baseline %FEV 1 increased from 2014 onwards (see Table 1).
The %FEV 1 increase was in part due to younger adults with higher %FEV 1 transitioning from paediatric care because %FEV 1 tended to decline from year to year (see Table 2). However, different year-to-year change in %FEV 1 results were obtained with different FEV 1 data processing methods. There was statistically significant reduction in year-to-year fall in %FEV 1 using %FEV 1 readings as recorded in spirometer machines (p=0.016). Cleaning of height data and standardisation of %FEV 1 calculation with Knudson equation 17 did not alter the magnitude of year-to-year variation in %FEV 1 , but the p-value was no longer statistically significant (p=0.062). The use of

Amendments from Version 1
As recommended by Prof McKone, we have used Bland-Altman analyses to compare different reference equations (Knudson vs GLI).
As recommended by Prof Burgel, we have: 1. Performed a sensitivity analysis for the results in Table 2 using only adults aged 18 years and above -we have also done the same for the Bland-Altman analyses that were added following suggestion from Prof McKone 2. Replaced the term "FEV1 decline" with "year-to-year FEV1 variation"   ESCF -Epidemiologic Study of cystic fibrosis † The vast majority of the %FEV 1 data were from spirometer machines at the Sheffield Adult cystic fibrosis (CF) centre, which were calculated with Knudson equation 17 in whole numbers. Some %FEV 1 data were from spirometer machines at the Pulmonary Function Unit which operationalised the Knudson equation differently; by calculating age to one decimal place to determine the predicted FEV 1 . These spirometer machines also provided %FEV 1 to two decimal places, but this was rounded to whole numbers for the purpose of analysis. These results were presented at the 2017 North American CF Conference and were published as an abstract in Pediatric Pulmonology 12 . ‡ FEV 1 volumes were available in litres to two decimal places from spirometer machines. Height data were also extracted to allow the calculation of predicted FEV 1 . This led us to uncover the inconsistency recording of height, which affected 30-40% of the study subjects and would have introduced erroneous variability to the %FEV 1 because all equations for predicted %FEV 1 are dependent on height. Height data were cleaned to weed out error. Where there was uncertainty regarding the height, the higher value was used to obtain a conservative estimate of %FEV 1 . To replicate calculation process of the spirometer machines at the Sheffield Adult CF centre, age was rounded down to a whole number and predicted FEV 1 in volume were calculated to two decimal places using Knudson equation 17 . This was used to derive the %FEV 1 , which was then rounded to whole numbers for the purpose of analysis.

REVISED
ϕ FEV 1 and height data were extracted as above. %FEV 1 was calculated using the GLI equation 18 using an Excel Macro available at the European Respiratory Society website. § %FEV 1 calculated using the GLI equation 18 as described above, then adjusted for baseline %FEV 1 as described in the 'Methods' section. An adjusted %FEV 1 change of >0 meant the subject's year-to-year fall in %FEV 1 was less than expected for his / her baseline %FEV 1 , indicating better health outcomes. Table 3. Discrepancies in year-to-year %FEV 1 variation with different methods of processing forced expiratory volume in one second (FEV 1 ) data among adults aged ≥18 years. Similar results were obtained when restricting the analyses to those aged ≥18 years (see Table 3). Bland-Altman analyses comparing year-to-year variation in %FEV 1 calculated from clean FEV 1 data using Knudson equation 17 vs year-to-year variation in %FEV 1 calculated from clean FEV 1 data using GLI equation 18 indicate the tendency for Knudson equation 17 to over-estimate the magnitude of year-to-year fall in %FEV 1 by a mean difference of 0.1-0.4% (see Figure 1).

Discussion
We demonstrated that different centre-level year-to-year variation in %FEV 1 results were obtained using different FEV 1 data processing methods. In particular, year-to-year fall in %FEV 1 was smaller in magnitude when %FEV 1 was calculated using GLI equation 18 instead of Knudson equation 17 . This is in part due to the demographic of our centre which has a relatively young adult population. A previous study found a near-linear %FEV 1 decline from childhood to adulthood with GLI equation, whereas there was accelerated %FEV 1 decline during adolescence and young adulthood when %FEV 1 was calculated with Knudson equation 24 . One advantage of using the GLI equation, which is seamless across all ages, is that it improves the interpretation of %FEV 1 decline 24,25 . Another advantage is that %FEV 1 decline can be adjusted for baseline %FEV 1 using ESCF reference values (since the ESCF values for %FEV 1 decline were calculated using the GLI equation 20 ).
The limitation for all single-centre analysis is the potential lack of generalisability. Another limitation of our analysis is that the ESCF reference values used to adjust year-to-year variation in %FEV 1 were derived using a cohort from around 15 years ago 20 , and may not represent the current population. Our results nonetheless highlighted that year-to-year variation in %FEV 1 can be extremely sensitive to the FEV 1 data processing methods. This is one of the challenges of using year-to-year variation in %FEV 1 to infer quality of care. Another challenge is that %FEV 1 lacks sensitivity as an outcome measure. A recent sample size estimation using the UK CF registry data suggests that 273 adults per centre are needed to detect a 5% FEV 1  Given the limitations of FEV 1 as an outcome measure in CF, results of centre comparisons based on FEV 1 data should be carefully interpreted. Observational studies with year-to-year variation in %FEV 1 as an outcome measure should carefully consider and clearly specify the data processing methods used.  Table 2) against year-to-year variation in %FEV 1 as calculated with GLI equation (i.e. "Method 3" for processing FEV 1 data according to Table 2).

Ethical considerations
Regulatory approval for the analysis was obtained from NHS Health Research Authority (IRAS number 210313).

Competing interests
No competing interests were disclosed.

Grant information
The author(s) declared that no grants were involved in supporting this piece of work. 1.

2.
3. Clive Osmond MRC Lifecourse Epidemiology Unit, University ofSouthampton, Southampton, UK Thank you for sending me this interesting note. A few thoughts on the analysis from a statistician It's an interesting, though sobering, fact that between 30 and 40 percent of the machine-entered heights are incorrect. Normally the tendency would be for such errors to obscure, rather than generate, associations. This now-known, high error rate makes it less interesting to explore this section of the results.

Open Peer Review
What does a Friedman test measure? It's a non-parametric version of a repeated measures one-way analysis of variance. Two issues are worth considering.It requires a complete table, so only those subjects with all four years of data may be included. Secondly it produces a three degree of freedom test, which is not very well directed to address the most likely question of interest. We might be most interested in detecting a smooth, linear trend over time. However just as much weight is being given to detect non-linear patterns such as curvature {low, high, high, low} and saw-tooth {low, high, low, high}. I don't know how centres are compared officially. Comparison of neighbouring years' data would be unstable. Also, using these non-linear components could be very misleading. I hope that the linear trend is used. Are there any alternative analyses that would address these two issues?Certainly. To begin with, let's use the original data for FEV1% and not worry about their normality. You could fit a mixed model to these data once you stack them in long format ("varstocases" in SPSS). This would enable you to use all data, not just data for those with a complete set. It would also enable you to extract a one degree of freedom test for trend across the four years.This should be a more powerful approach. I now see that the other referees refer to this as well, though I don't agree that you need to have at least three observations per subject.
If you are worried about the normality (though the published quartiles are not that alarming) then two alternatives would be (1) to find a normalising transformation that would apply to the stacked column of FEV1% values, or (2) to use a rank-based transformation ("Fisher-Yates") available in SPSS as "rank y /normal into z." What might be the mathematics underlying any difference in slope obtained by the Knudson and GLI methods? I have tried to abstract the formulae used by Knudson and by GLIin deriving the predicted FEV1 that is used in the calculation of FEV1%. For a specific example I have chosen males (slightly more common in this study) aged 25 to 28 years (somewhere near the median age) with height of 175cm (just below mean UK adult height). The Knudson equation has a functional form FEV1 predicted = 5.1228 -0.0292.age. FEV1% = FEV1/(FEV1 predicted) can then be differentiated to see how it varies with changes in age and FEV1 FEV1 However the GLI equivalent is given as point estimates from a Cole-Green LMS fitting procedure. The penalised cubic splines are not given, so no functional form is available.

The
shows, just for this combination, how the predicted values compare.Those from Knudson are table slightly lower and decrease slightly more rapidly with age.Such differences, and those from other combinations, will work together to determine how FEV1% might be expected to change with age and observed FEV.

Is the work clearly and accurately presented and does it cite the current literature? Yes
Is the study design appropriate and is the work technically sound? Partly

If applicable, is the statistical analysis and its interpretation appropriate? Partly
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Partly
No competing interests were disclosed.

Competing Interests:
Referee Expertise: Stats I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above. 1.

2.
Referee Expertise: Adult pulmonologist with experience in the care of adults with cystic fibrosis. Researcher.

If applicable, is the statistical analysis and its interpretation appropriate? Partly
Are all the source data underlying the results available to ensure full reproducibility? No source data required Are the conclusions drawn adequately supported by the results? Partly No competing interests were disclosed.

Competing Interests:
Referee Expertise: Adult pulmonologist with experience in the care of adults with cystic fibrosis. Researcher.
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
The paper supports the standardization of FEV1 collection and reference equations which is currently in development by CF International Registries. It also highlights that different approaches to data collection can impact the interpretation of statistical analyses.

Comments:
Differences in FEV1 percent predicted using different equations is well known (Rosenfeld et al and more recently in the cited UK/US comparison study). For this reason, the GLI have been recently accepted as the standard for most CF registries.
Although year to year subtraction is a method of looking at longitudinal changes, regression methodology is preferable to analyse these changes, especially, as in this case, where you have 3 time points. This also allows to adjust for baseline factors such as lung disease severity.
The method of adjustment for baseline Iung function is a bit crude. The medians subtracted are from a US population over 10 years ago and are likely to overestimate lung function decline in this population. In the Morgan et al, J Pediatr 2016 paper cited, the benefits of using this type of adjustment was shown using regression.
Did their statistical approach factor in that these were repeated measures in the same patients?
Bland & Altman plots comparing different reference equations could be considered.
The results suggest that height inaccuracy is impacting the results. As this is a single centre study, it is difficult to determine is this is a more universal problem. PubMed Abstract

If applicable, is the statistical analysis and its interpretation appropriate? Partly
Are all the source data underlying the results available to ensure full reproducibility? Yes Are the conclusions drawn adequately supported by the results?