Further investigations of gateway effects using the PATH study

Background: Considerable interest exists in whether e-cigarette use (“vaping”) by youths increases the risk of initiating cigarette smoking. Based on Waves 1 and 2 of the Population Assessment of Tobacco and Health study we reported that adjustment for propensity for vaping using Wave 1 variables explained about 80% of the unadjusted relationship. This analysis may be over-adjusted had vaping at Wave 1 affected some variables recorded then. Here we present analyses using Waves 1 to 3 to avoid this possibility. Methods: Our main analysis M1 concerned those who had never smoked by Wave 2 and never vaped by Wave 1. Wave 2 vaping was linked to smoking initiation by Wave 3, adjusting for Wave 1 predictors. Sensitivity analyses excluded other tobacco product users at Wave 1, included other tobacco product use as an additional predictor, or were based on propensity for ever smoking or ever any tobacco use, rather than ever vaping. Other analyses adjusted for propensity as derived originally, or ignored Wave 1 data. Other analyses used grouped age (only available originally) or exact age (available now) as a confounder variable, attempted residual confounding adjustment by modifying values of predictor variables using data later recorded, or considered interactions with age. Results: In M1, propensity adjustment removed about 50% of the excess odds ratio (i.e. OR–1), the unadjusted OR, 5.60 (95% CI 4.52-6.93) becoming 3.37 (2.65-4.28), 3.11 (2.47-3.92) or 3.27 (2.57-4.16) depending whether adjustment was for propensity as a continuous variable, as quintiles, or for the 16 variables making up the propensity score. Many factors studied hardly affected the results, including using grouped or exact age, consideration of other tobacco products, including interactions, or using predictors of smoking or tobacco use rather than vaping. The clearest conclusion was that analyses avoiding over-adjustment only explained about 50% of the excess OR whereas analyses subject to over-adjustment explained about 80%. Conclusions: Although much of the unadjusted gateway effect results from uncontrolled confounding, our current analysis provides stronger evidence of a causal effect of vaping than did our earlier


Abstract
Background: Considerable interest exists in whether e-cigarette use ("vaping") by youths increases the risk of initiating cigarette smoking. Based on Waves 1 and 2 of the Population Assessment of Tobacco and Health study we reported that adjustment for propensity for vaping using Wave 1 variables explained about 80% of the unadjusted relationship. This analysis may be over-adjusted had vaping at Wave 1 affected some variables recorded then. Here we present analyses using Waves 1 to 3 to avoid this possibility.
Methods: Our main analysis M1 concerned those who had never smoked by Wave 2 and never vaped by Wave 1. Wave 2 vaping was linked to smoking initiation by Wave 3, adjusting for Wave 1 predictors. Sensitivity analyses excluded other tobacco product users at Wave 1, included other tobacco product use as an additional predictor, or were based on propensity for ever smoking or ever any tobacco use, rather than ever vaping. Other analyses adjusted for propensity as derived originally, or ignored Wave 1 data. Other analyses used grouped age (only available originally) or exact age (available now) as a confounder variable, attempted residual confounding adjustment by modifying values of predictor variables using data later recorded, or considered interactions with age.
Results: In M1, propensity adjustment removed about 50% of the excess odds ratio (i.e. OR-1), the unadjusted OR, 5.60 (95% CI 4.52-6.93) becoming 3.37 (2.65-4.28), 3.11 (2.47-3.92) or 3.27 (2.57-4.16) depending whether adjustment was for propensity as a continuous variable, as quintiles, or for the 16 variables making up the propensity score. Many factors studied hardly affected the results, including using grouped or exact age, consideration of other tobacco products, including interactions, or using predictors of smoking or tobacco use rather than vaping. The clearest conclusion was that analyses avoiding over-adjustment only explained about 50% of the excess OR whereas analyses subject to over-adjustment explained about 80%.
Conclusions: Although much of the unadjusted gateway effect results from uncontrolled confounding, our current analysis provides stronger evidence of a causal effect of vaping than did our earlier analysis. However, some doubts remain about the completeness of confounder adjustment.

Background
In youths, use of e-cigarettes ("vaping") has increased considerably in recent years in many countries (e.g. [1][2][3]). It is generally recognized that vaping significantly reduces exposure to harmful constituents compared to smoking [4], so one might expect risks from vaping to be much lower [5]. However, there are concerns about the rise in vaping. The concern of interest here is the possibility that vaping may encourage some individuals to start smoking who would otherwise not have done so, often referred to as the "gateway" effect. The concern that vaping may act as a gateway into smoking was originally brought sharply into focus by a 2017 meta-analysis [6] based on nine US cohort studies in young people linking previous vaping to subsequent initiation of smoking. This paper reported that among baseline never-smokers, ever vaping at baseline strongly predicted initiation in the next 6 to 18 months, with an odds ratio (OR) of 3.62 (95% confidence interval (CI) 2.42-5.41) after adjustment for various predictors of initiation. Similarly baseline past 30-day vaping also predicted subsequent 30-day cigarette use (OR 4.25, 95% CI 2.52-7.37).
We have previously published two papers relating to the gateway effect. Our first paper [7] considered various general issues. It made a number of relevant points: Although studies reported that vaping significantly predicts smoking initiation following adjustment for various other predictors, the sets of predictors considered were generally quite incomplete. No study considered residual confounding arising from inaccurate measurement of predictors. More precise adjustment may have substantially reduced the association. Any true gateway effect would likely have affected smoking prevalence only modestly.
Smoking prevalence in US and UK youths in 2014-2016 declined somewhat faster than predicted by the preceding trend, whereas a substantial gateway effect would suggest the opposite. Even were some gateway effect to exist, introducing e-cigarettes would still be likely to reduce deaths from smoking-related diseases.
Our second paper [8] described results of our own analyses, based on data from Waves 1 and 2 of the Population Assessment of Tobacco and Health (PATH) study, a nationally representative longitudinal cohort study in the United States of tobacco use and how it affects the health of people. Wave 1 was conducted from 12 September 2013 to 15 December 2014, with Wave 2 the first annual follow-up. For each Wave, data are available separately for Youths (aged 12-17 years) and Adults (aged 18 + years), the Youth data including some information from the parents. Publicly available data files include extensive information on use of various types of tobacco products and on a range of variables linked to initiation of tobacco. Note that where youths become 18 between successive waves of the survey, their data will be available in the Adult data rather than the Youth data. Also additional youths who were under 12 at the time of Wave 1 are added into the Youth data when they reach the age of 12 at a subsequent wave.
The main analyses we described considered Wave 1 never cigarette smoking youths who, at Wave 2, had information available on smoking initiation. Having constructed a propensity score for ever e-cigarette use from Wave 1 variables, we found that adjustment markedly reduced the unadjusted OR of 5.70 (95% CI 4.33-7.50) to 2.48 (1.85-3.31), 2.47 (1.79-3.42) or 1.85 (1.35-2.53), depending on whether adjustment was made for propensity as quintiles, for propensity as a continuous variable, or for the individual variables making up the propensity score. Various sensitivity analyses confirmed that adjustment removed most of the gateway effect.
Although we found that confounding was a major factor, explaining most of the observed gateway effect, we were particularly concerned about the possibility of over-adjustment, if taking up e-cigarettes had affected the values of some of the Wave 1 predictor variables considered. At the time, we noted that the possibility of overadjustment could be avoided using data from Waves 1, 2 and 3 of the PATH study, by relating initiation of cigarette smoking at Wave 3 to vaping at Wave 2, restricting attention to those who, at Wave 1, had never vaped, and using propensity indicators recorded at Wave 1 linked to uptake of e-cigarettes by Wave 2.
Here we describe the results of extensive analyses conducted based on Waves 1, 2 and 3 which not only include the main analyses envisaged at the time of our earlier paper [8], but also a variety of sensitivity and alternative analyses.

Methods
Some aspects of the analyses described here are the same as those described earlier [8] and are not presented again here. The selection of demographic and other predictor variables is the same as before, except that in some analyses we use exact age (12, 13, 14, 15, 16 and 17) which could now be estimated from the age group (12)(13)(14)(15)(16)(17) at the three Waves and the Wave when youths became adults (18+) for the first time. Use of the person-level weights provided in the PATH study database is as before, as is the process by which a sequence of logistic regression analyses is used to develop the shorter list of demographic variables to be used in forming the propensity scores.
Our main analysis M1 is based on those with data at Waves 1, 2 and 3 who had never smoked cigarettes by Wave 2 and had never used e-cigarettes by Wave 1. This analysis predicts Wave 3 ever smoking from Wave 2 ever e-product use, with adjustment based on Wave 1 predictors used to derive a propensity index for taking up e-products between Waves 1 and 2, and exact age being used in preference to grouped age. Note that, whereas in Wave 1 questions in PATH related only to e-cigarette use, in Waves 2 and 3 questions related to ever e-product use, which also included use of e-cigars, e-pipes and e-hookahs.
Associated with main analysis M1 are four sensitivity analyses (S1 to S4) which are otherwise similar, except that: S1. Those who had ever used other tobacco products at Wave 1 are excluded; S2. Ever use of other tobacco products at Wave 1 is included as an additional predictor variable; S3. The analysis is based on a propensity score for ever cigarette smoking rather than for ever vaping; or S4. The analysis is based on a propensity score for ever use of any tobacco product rather than for ever vaping.
Note that in our original paper [8] we also presented results of a further sensitivity analysis, based on linking current vaping to current smoking. This was not repeated here as numbers of new current smokers in current vapers were very low.
Main analysis M2 is similar to M1, except that analysis adjusts for the propensity index as originally derived [8], based on 12 variables recorded at Wave 1. Alternative versions of M2 substitute exact age rather than grouped age in deriving the propensity index, and/or included Wave 1 vapers in the analysis.
Main analysis M3 adjusts for a propensity index derived by linking Wave 2 predictors to Wave 2 e-product use. This is a replicate of the analysis conducted originally [8], but using a different period of taking up cigarettes. Data for Wave 1 were ignored, except that where the data for a characteristic was "ever in last 12 months", Wave 1 data were used to define "ever". An alternative version of M3 replaces grouped age by exact age in deriving the propensity index.
Apart from analyses linking Wave 2 e-product use to additional cigarette smoking at Wave 3 in those who had never smoked at Wave 2, two additional analyses (A1 and A2) were also conducted.
Additional analysis A1 relates e-cigarette use at Wave 1 to cigarette smoking at Wave 2 as in our earlier publication [8], but based on individuals who provided data at all three Waves. One version of this uses the same 12 variables as before to develop the propensity index, the other replaces grouped age by exact age. The OR from this analysis can be combined with that reported for main analysis M2 to give a combined estimate of the gateway effect for Wave 1 to 2 initiation and Wave 2 to 3 initiation based on the same set of variables determined at Wave 1.
Additional analysis A2 ignores Wave 2 data and relates e-cigarette use at Wave 1 to cigarette smoking at Wave 3 using the same 12 variables as before, but replacing grouped age by exact age.
Consideration of residual confounding was also taken into account for three of the analyses described above (M1, M3, A1), all involving exact age. In each case, the list of predictor variables was unaltered from that used originally, but the values of the predictor variables and of the propensity index were revised based on data available at all three Waves. For age, individual year of age at Wave 1 was used, while gender and Hispanic origin did not change between Waves. For the other variables used to form the propensity index, we used all the available data, generally choosing the response most associated with increased e-cigarette use where response varied between Waves (see Additional File Table 1 for further details). For analyses M1, M3 and A1, alternative versions were also run in which the number of variables adjusted for was increased by also including interactions of age with each of the other three predictors most strongly linked to the relevant gateway effect.

M1
Relating initiation of cigarette smoking between Waves 2 and 3 to ever e-product use at Wave 2, with adjustment for Wave 1 predictors linked to uptake of e-cigarettes between Waves 1 and 2 Initial analyses linked exact age, four other demographic variables (gender, Hispanic origin, race and census region) and 60 other selected predictor variables to ever e-product use at Wave 2 in those who had not smoked or used e-cigarettes at Wave 1. A propensity index based on 16 variables was derived using the three step process described earlier [8]. Additional File Table 2 shows the steps at which different variables were eliminated from consideration, while Table 1 gives the fitted equation for the propensity index.  Notes: The table shows the effects of adjustment based on the Wave 1 predictors used to derive a propensity index for taking up e-products between Wave 1 and 2. The analyses are based on those with data at Waves 1, 2 and 3 who had never smoked cigarettes by Wave 2 and had never used e-cigarettes by Wave 1. Between Waves 2 and 3261/7367 (3.54%) of never users of e-products at Wave 2 took up smoking, while 148/893 (16.57%) of ever users did so. For individuals who were 16-17 at Wave 1, adult data were used to determine e-product use and cigarette smoking at later Waves. The table includes the results of a stepwise regression based on successively including the most significant adjustment variables, given that ever e-product use at Wave 2 was included in the model.
Four sensitivity analyses of M1 were carried out, fuller details being given in Tables 3 to 6 of the Additional File. Compared to M1, S1 excluded those who had ever used products other than cigarettes or e-cigarettes at Wave 1, both in the construction of the propensity index and in estimating the gateway effect. Whereas M1 involved 8260 youths, of which 409 initiated smoking between Waves 2 and 3, S1 involved 7945, of which 359 took up smoking. The propensity index developed for S1 involved all the 16 variables shown in Table 2, except for "Number of times seen Movie 4" and "Think you will try a cigarette soon". Here, the pattern of results is similar to that for M2 Relating initiation of cigarette smoking between Waves 2 and 3 to ever e-product use at Wave 2, with adjustment for the same Wave 1 predictors as previously reported [8] Here, instead of deriving the Wave 1 predictors linked to uptake of e-cigarettes between Waves 1 and 2, analysis M2 uses the same set of Wave 1 predictors used in our earlier work [8], the results being shown in Table 3  Notes: The table shows the effects of adjustment based on the same Wave 1 predictors as used in our original paper [8].
The analyses are based on those with data at Waves 1, 2 and 3 who had never smoked cigarettes by Wave 2 and had never used e-cigarettes by Wave 1. Between Waves 2 and 3, 249/7133 (3.49%) of never users of e-products at Wave 2 took up smoking, while 146/880 (16.59%) of ever users did so. For individuals who were 16-17 at Wave 1, adult data were used to determine e-product use and cigarette smoking at later Waves. The table includes the results of a stepwise regression based on successively including the most significant adjustment variables, given that ever e-product use at Wave 2 was included in the model. Similar analyses were also run which did not exclude those who had used e-cigarettes by Wave 1. This increased the number of ever e-product users who took up smoking from 146 to 201, and slightly increased the unadjusted OR to 5.95 (4.89-7.23). However, the pattern of decline following adjustment was quite similar. For example, the OR adjusted for the individual variables reduced to 3.31 (2.65-4.12) using grouped age and to 3.26 (2.62-4.06) using exact age.

M3 Relating initiation of cigarette smoking between Waves 2 and 3 to ever e-product use at Wave 3, with adjustment for Wave 2 predictors
As noted in the Methods section, M3 is essentially a replicate of our earlier work [8], but using a different period of introduction of cigarettes. The propensity score developed was based on 18 variables, using age group or exact age as alternatives. The results, shown in  A1 Relating initiation of cigarette smoking between Waves 1 and 2 to ever e-cigarette use at Wave 1, based on individuals who provided data at all three Waves Table 5 summarizes the main results of these analyses and compares them with those reported earlier [8]. While the original analyses were based on 9423 youths, 421 of whom initiated smoking, the new analyses were based on 8700 youths, 389 of whom initiated smoking. As can be seen, the results in the original analysis, based on grouped age, were similar to those from the new analyses, whether grouped or exact age was used. The results from analysis A1 for grouped age may theoretically be combined with those from analysis M2 shown in Table 3, as they both use the Wave 1 predictors from our original paper [8]

A2 Relating Wave 3 ever smoking to Wave 1 e-cigarette use, ignoring Wave 2 data
This analysis is similar to that reported originally [8] but relates to a longer follow-up period, and uses exact rather than grouped age. The results of this analysis, shown in Table 6, are quite similar to those shown in Table 5. Again, an unadjusted OR is markedly reduced by adjusting for propensity, whether as quintiles or as a continuous variable, and is further reduced by adjusting for all the 12 individual variables considered. Notes: The table shows the effects of adjustment based on the same Wave 1 predictors as used in our original paper [8] but replacing age range by exact age. The set of ORs is based on those with data at Waves 1, 2 and 3 who had never smoked cigarettes by Wave 1. Between Waves 1 and 3, 716/8334 (8.59%) of never users of e-cigarettes at Wave 1 took up smoking, while 123/366 (33.61%) of ever users did so. The table includes the results of a stepwise regression based on successively including the most significant adjustment variables, given that ever e-product use at Wave 1 was included in the model. Table 7 summarizes the main results shown in Table 2 for main analysis M1, which make no allowance for residual confounding, and compares them with the results of an analysis using the same list of predictor variables, but with values modified in an attempt to adjust for residual confounding. As can be seen, markedly more of the unadjusted association was explained when allowance for residual confounding was made, with the adjusted ORs in the range 2.36 to 2.46 when allowance was made, compared with 3.11 to 3.37 when it was not. Note that the unadjusted ORs in the two sets of results vary slightly, as missing values in some individuals in the original analyses were replaced by estimates taken from other Waves. Notes: The "no allowance" results correspond to those in Table 6.

Attempting to account for residual confounding
The analyses are based on those with data at Waves 1, 2 and 3 who had never smoked cigarettes by Wave 2 and had never used e-cigarettes by Wave 1. Between Waves 2 and 3 261/7367 (3.54%) of never users of e-products at Wave 2 took up smoking, while 148/893 (16.57%) of ever users did so in the population considered in the "no allowance" analyses The corresponding figures in the "allowance" analyses were 267/7682 (3.48%) and 150/915 (16.39%). For individuals who were 16-17 at Wave 1, adult data were used to determine e-product use and cigarette smoking at later Waves. The table includes the results of a stepwise regression based on successively including the most significant adjustment variables, given that ever e-product use at Wave 2 was included in the model.
While allowance for residual confounding has quite a marked effect for analysis M1, the analysis which avoided the possibility of over-adjustment, it did not for analyses M3 and A2, which did not avoid this possibility. Detailed results are shown in Tables 7 and 8 in the Additional File.

Investigating whether introducing some interactions explains more of the gateway effect
Versions of analyses M1, M3 and A1 were also seen, in which the number of variables adjusted for was extended by also including interactions of age with each of the other three predictors most strongly linked to the gateway effect. For analysis M1, allowance for these interactions had virtually no effect, the original estimate of 3.27 (95% CI 2.57-4.16) shown in Table 2    The difference between these two groups is that the first set of results are subject to the problem of overadjustment, with the values of the predictors used possibly having been affected by having used e-cigarettes. This is mainly so where the baseline wave was Wave 1, but was also true for analysis M3 where Wave 1 data were essentially ignored. In contrast the second set of results avoided over-adjustment by considering follow-up from Wave 2 to 3, with predictors based on Wave 1 data in youths who had never used e-cigarettes. However in this second set of results the variables used were not as up-to-date as in the first analyses.

Summary of results
The variant analysis of M1, allowing for residual confounding (row P) gives an intermediate result with about 70% of the excess risk being explained, whether by the full set of variables or by propensity. This analysis, however, does not avoid the problem of over-adjustment as it incorporates some information from waves where individuals were already using e-cigarettes.
It is clear from Table 8 that many of the variables studied had little effect on the pattern of results. These included use of grouped or exact age, taking into account use of other products, and using predictors of cigarette smoking or any tobacco use rather than predictors of e-cigarette use.
Two other conclusions may be drawn from The other is that adjustment for the first six variables in the model generally explained a very substantial part of the unadjusted excess OR explained by the full set. Though this was not true for analysis M2, it was still true that adjustment for the last eight or nine variables explained far less of the excess OR than did the first eight or nine.

Discussion
In our publication based on Waves 1 and 2 [8] our analyses showed that an unadjusted estimate of the gateway effect 5.70 (85% CI 4.33-7.50) could be considerably reduced by adjustment, to 1.59 (1.14-2.20) in the most striking case. Because of the marked reduction in the OR following adjustment, and the possibility of incomplete control for confounding we regarded it as "unclear whether prior vaping actually increases uptake of cigarette smoking". However, we did note the possibility of over-adjustment, with vaping at Wave 1 possibly having affected the recorded values of some of the variables used for adjustment.
At that time we noted that this possibility of over-adjustment could be addressed in analyses relating initiation of cigarette smoking at Wave 3 to vaping at Wave 2, restricting attention to those youths who, at Wave 1, had never vaped, and using adjustment variables recorded at Wave 1. This we have done in the analyses reported here, and our major finding is that adjustment reduced the excess risk far less, by only about 50% rather than about 80%, in our main analysis M1.
While these results more strongly support the existence of a true gateway effect of taking up vaping, there must still remain doubt about its magnitude. One reason is that predictors recorded a year before the baseline may not fully account for the characteristics of the youth at the start of follow-up. A second reason is that, although the PATH study records data on a whole range of possibly relevant characteristics, there may be some relevant predictors not considered. A third reason is that the answers to some of the questions may have been inaccurately measured. We have attempted to address this problem of residual confounding by amending values of predictors recorded at Wave 1 to take into account data recorded at later Waves. However, this problem reintroduces the problem of over-adjustment as Wave 2 and 3 values may have been affected by vaping. Theoretically, one could use data from Waves 1 to 4, using data for Waves 1 and 2 from youths who have never vaped to produce more accurate estimates of the predictors to use for a study of gateway effects between Waves 3 and 4. But this would add to the problem of using predictors recorded some time before follow-up.
Since the time that we published our earlier analysis [8] and our paper on general considerations relating to vaping as a possible gateway into cigarette smoking [7] a number of other authors have presented evidence from prospective studies [9][10][11][12][13][14][15]. The studies vary in the extent to which potential confounding variables have been adjusted for, with large OR estimates tending to be reported in studies with more limited control. Thus, a study in the Netherlands [15], which adjusted only for sex, age education and a single indicator of propensity to smoke, reported an OR of 11.90 (95% CI 3.36-42.11) for the relationship between ever use of e-cigarettes with nicotine and initiation of cigarette smoking during follow-up. Also, a study in the US [9], which adjusted only for demographic variables and use of other tobacco products, reported ORs of 7.08 (2.34-21.42) and 3.87 (1.86-2.06) depending on the follow-up period studied, while another US study [13], with limited control for confounding variables, reported an OR of 3.57 (1.96-6.45). Apart from a US study [14] ,which reported an OR of 6.8 (1.7-28.3), following adjustment for ten covariates independently associated with initiation of smoking, most of the other studies which appear to have better control for confounding gave lower estimates. These included a study in Taiwan [10] which reported an OR of 2.14 (1.66-2.75), a study in Germany [12] which reported an OR of 2.18 (1.65-2.87) and a study in Finland [11] which reported that adjustment reduced the OR from 11.52 (4.91-26.56) to 2.92 (1.09-7.85). Notably, a study in Great Britain [16] reported an OR of 11.89 (3.56-39.72) estimated using the usual logistic method, but a reduced value of 1.34 (1.05-1.72) using causal mediation analysis.
Generally our results are consistent with the literature in confirming that a substantial proportion, but not all, of the observed association between e-cigarette use and subsequent initiation of cigarette smoking can be explained by adjustment for factors linked to susceptibility to tobacco. However, large cohort studies with high quality, accurate, data on a wide range of predictive factors recorded at regular intervals will be needed to gain better insight into the magnitude of any true causal effect of vaping. The PATH study with its multiple waves and comprehensive questionnaire should prove more and more useful in the future.
There are a number of theoretical beneficial and adverse effects of e-cigarettes [7]. Beneficial effects relate to individuals who would otherwise have smoked vaping instead, smokers who would otherwise have continued to smoke switching instead to vaping, vaping helping established smokers to quit, and vaping helping established smokers to reduce their cigarette consumption. Apart from vaping encouraging initiation of smoking, theoretical adverse effects occur if smokers intending to quit switch to vaping instead, or smokers add vaping to their normal cigarette consumption. It is clearly important, therefore, to take all these considerations into account when attempting to estimate the health impact of e-cigarettes.

Conclusions
By using data from three Waves of the PATH study, the analyses of the gateway effect reported here improve on those reported earlier [8] based on the first two Waves by allowing potential confounding variables to be determined at a time before vaping started. Whereas the earlier analyses suggested that the adjustment for confounding explained about 80% of the unadjusted relationship between vaping and subsequent initiation of smoking, our current analyses suggest that adjustment explains only about 50%. This provides stronger evidence of a true effect of vaping, although doubt still remains about its true magnitude for reasons discussed.

Availability of data and materials
The data from the PATH study are publicly available at https://pathstudyinfo.nih.gov