Keywords
electronic health records; electronic nicotine delivery systems; tobacco use; oral tobacco; real-world evidence
This article is included in the Health Services gateway.
Due to the lack of standardized codes, documentation of electronic nicotine delivery system (ENDS) use is non-existent in healthcare claims data and severely limited in structured electronic health record (EHR) data. We conducted a comprehensive assessment on the documentation of terms related to ENDS, nicotine pouch, and combustible tobacco use in unstructured EHR clinical notes.
Utilizing retrospective data from Optum’s EHR and Clinical Notes Databases (2007–2023), 13.6 million patients had ≥1 diagnosis code for combustible tobacco use history, and 5.2 million also had ≥1 clinical note. Searches were conducted for terms for ENDS, nicotine pouches, or combustible tobacco in the clinical notes.
There were 4.4 million patients with ≥1 clinical note containing a term for ENDS, nicotine pouches, or combustible tobacco, with 66.1 million notes. Seventy-seven ENDS-related terms were identified, in addition to 27 ENDS brand terms, with the terms used evolved over time. There were 8 terms and 10 brand names for nicotine pouches, and 118 terms for combustible tobacco.
Extensive ENDS use data are available in unstructured EHR clinical notes that can be extracted for research. When combined with additional data sources (structured EHR, claims, and survey linkage), these data may enable robust studies on the health impacts associated with the use of ENDS. Standardization and consistent documentation practices will further enhance the utility of these data for research.
electronic health records; electronic nicotine delivery systems; tobacco use; oral tobacco; real-world evidence
There is a general consensus that using smoke-free products (SFP) like electronic nicotine delivery systems (ENDS) and nicotine pouches is associated with lower health risks than cigarette smoking.1,2 Compared to combustible cigarettes, SFPs have much lower levels of harmful and potentially harmful constituents (HPHC).3–8,9–11 Adults who switched from cigarettes to SFP not only substantially reduced their exposure to the corresponding HPHCs12–15 but also show either significant reductions or positive trends in the biomarkers of potential harm (BoPH).16,17 Epidemiological evidence on the health impact of switching from cigarette smoking to SFP is emerging but still limited, reporting mixed results.18–22 Interpretation of results from these early epidemiological studies often faces challenges in the cross-sectional design, limited number of participants in the studies, as well as varying degrees of control for important confounding factors, including history of cigarette smoking. Real-world data from healthcare claims and electronic health records (EHR) may enable a new approach to assess the health impact associated with the use of ENDS as they capture data on product usage patterns and health outcomes in diverse, real-world settings.19,23–25 However, due to the lack of standardized codes, the documentation of ENDS use is non-existent in healthcare claims data26 and severely limited in the structured fields of EHR data,27,28 in contrast to documentation for combustible tobacco products.
Previous studies using data from the EHR system of the Department of Veterans Affairs (VA) reported that documentation of ENDS use has increased substantially in recent years in the clinical notes and the terms used to describe ENDS are diverse and changed over time.29,30 The ENDS product landscape in the US has continued to evolve rapidly.31 More recently, new SFPs like nicotine pouches have also emerged, for which standardized codes are not available either. We conducted a survey on the documentation of ENDS, nicotine pouches as well as combustible tobacco products in clinical notes using recent data from much larger commercial EHR datasets that more closely reflect the general US population. This study aimed to describe the documentation of these tobacco products’ use in the unstructured clinical notes between 2007 and 2023 among a population of patients with a history of combustible tobacco use, since most adults who use ENDS and nicotine pouches had current or prior smoking history.19,32
This was a retrospective study using data from January 2007 through August 2023 (study period) from Optum® de-identified Electronic Health Record data set (Optum EHR Database)33 which is derived from the electronic health records of a network of healthcare provider organizations across the United States. Clinical encounter data are aggregated from more than 2,000 hospitals and 7,000 clinics. The Optum EHR Database currently has more than 107 million unique patients across the United States, with an average of 56 months of observed data per patient. The Optum EHR Database is robust, longitudinally linked, HIPAA compliant, and statistically de-identified. The EHR clinical notes database comprises a subset of more than 45 million patients from the EHR Database34 that includes free-text clinical notes sourced from more than 50 large healthcare systems throughout the US. This study did involve any human subjects, identifiable personal information, other HIPPA identifiers, or any other information that could lead to the identification of a patient. No patient’s or provider’s identity or medical records, or electronic notes, or facility identifiers were disclosed for the purposes of this study. The study protocol was reviewed by the WCG Institutional Review Board (WCG IRB ID: IRB00000533) and determined to be exempt from IRB review; therefore, informed consent was not required.
The study population (Table 1) consisted of all patients who met the following criteria.
• Patients with at least one EHR record with an International Classification of Diseases (ICD) diagnosis code ( Table 2) for the history of combustible tobacco use during the study period, and
• Presence of at least one clinical note during the study period, and
• Presence of at least one term of interest related to ENDS, nicotine pouches, or combustible tobacco in the clinical notes during the study period.
Terms of interest used to define the study population were classified into three product categories: ENDS, nicotine pouch, and combustible tobacco. ENDS terms fall into two groups: variants of vape and variants of e-cigarette. Combustible tobacco terms included terms related to current and former cigarette smoking, cigar smoking, and the use of water pipes. An initial list of terms for each category was constructed by subject matter experts incorporating data from existing literature. Additional terms were identified during the manual annotation of clinical notes in the process of developing a novel natural language processing (NLP) algorithm for extracting ENDS use data from EHRs (to be reported separately). The final lists of terms searched in this study for ENDS are listed in Table 3.
When searching for the terms of interest in clinical notes, capital and lower-case letters were treated as equivalent, i.e., the following terms were all interchangeable: e-cig, E-cig, and E-CIG. Additionally, the query looked for any sequence containing the term of interest, i.e., notes containing “vapes” or “vaped” would all be flagged as containing “vape”. Therefore, patient and note counts for longer terms (e.g., vapes, vaped) represent subsets of the overlapping shorter terms (e.g., vape).
Patients with at least one clinical note containing a term (e.g., “vape”) or a set of terms (e.g., all vaping-related terms) ( Table 3) were flagged, and the count of notes containing terms of interest was retained. Analyses were conducted for each year in the study period (2007–2023) to better understand how ENDS use documentation has evolved in recent years (excluding January to August 2023, which is a partial year). No statistical testing was performed. A patient with multiple notes containing term(s) of interest within each product category (i.e., ENDS, nicotine pouch, or combustible,) was counted only once. Similarly, a note containing multiple instances of term(s) of interest within each product category was only counted once.
Among the 107 million unique patients in the Optum’s EHR Database, 13.6 million patients had at least one diagnosis code for history of combustible tobacco use (EHR Population, Table 1). Of these, 5.2 million met the inclusion criterion of having at least one clinical note (Notes Population, Table 1). A total of 4.4 million patients had at least one term of interest related to ENDS, nicotine pouches, or combustible tobacco products in 66.1 million clinical notes (Study Population, Table 1), which served as the study population. Detailed breakdown of patient and note counts in the full EHR Database by ICD diagnosis code is provided in Table 2.
| Criteria | Patients | Notes | ||
|---|---|---|---|---|
| Remaining | Remaining | |||
| n | % | n | % | |
| EHR Population *- Patients with at least 1 EHR record with an ICD dx code for combustible tobacco use | 13,583,236 | |||
| Notes Population- At least 1 clinical note in the observation period | 5,194,000 | 38.2% | 862,491,151 | |
| Study Population- At least 1 note containing a term for ENDS, nicotine pouches or combustible tobacco | 4,371,006 | 84.2% | 66,101,820 | 7.7% |
| ≥1 combustible tobacco term** | 3,443,072 | 78.8% | 48,432,066 | 73.3% |
| ≥1 ENDS term OR ≥ 1 ENDS brand term** | 1,207,884 | 27.6% | 7,738,198 | 11.7% |
| ≥1 nicotine pouch term OR ≥ 1 nicotine pouch brand term** | 1,217 | 0.03% | 2,653 | 0.004% |
A total of 222 combustible tobacco terms, 78 ENDS terms plus 52 ENDS brand terms, and 10 nicotine pouch terms plus 12 nicotine pouch brand terms were pre-identified. Searching for these terms in the clinical notes found 118 combustible tobacco product terms, 77 ENDS terms plus 27 ENDS brand terms (104 total terms), and 8 nicotine pouch terms plus 10 nicotine pouch brand terms (18 total terms). The remaining pre-identified search terms were not found in the notes. A summary of patient and note counts for each of the three product categories among the Study Population is provided in Table 1.
3.2.1. ENDS-Related Terms
Of the 77 ENDS-related terms found in notes, 31 were variations of “e-cigarette” (e-cig variant terms) and 46 were variations of “vape” (vape variant terms). An additional 27 ENDS brand terms were identified. The full set of ENDS-related and ENDS brand terms searched in this study, and the corresponding number of patients and notes identified, can be found in Table 3.
The unique patient and unique note count distribution associated with key ENDS-related terms (vape, e-cig, and ENDS brand terms) are enumerated in Table 4. Because a patient could have multiple terms (e.g., both e-cig and vape variant terms) in their notes, a patient/note may appear in more than one row (e.g., both “≥1 Vape term” and “≥1 E-cig term”) in Table 4. Overall, during the 2007–2023 study period, vape variant terms were more frequently used than e-cigarette terms by both patient count (1,114,448 vs. 718,117) and note counts (7,270,473 vs. 3,265,350), respectively, whereas ENDS brand names were infrequently captured in the notes (5,458 patients, 16,544 notes). Combining 718,117 patients with “≥1 E-cig term” with 1,114,448 patients for “≥1 Vape term” yielded a total of 1,205,838 patients for “≥1 Vape term OR ≥1 E-cig term”, an increase of only 91,390, indicating that 87% [1 – (91,390/718,117)] patients with e-cig variant terms also have vape variant terms in their clinical notes. Similarly, 86% [1 – (7,727,143-7,270,473)/3,265,350] notes with e-cig variant terms also included vape variant terms. With its broadest definition, the bottom row of Table 4 represents the total numbers of unique patients (1,207,884, 27.6%) and notes (7,738,198, 11.7%) with documentation of ENDS use in the Study Population.
From 2007–2011, only e-cig variants were used in clinical notes. Documentation of e-cig variant terms continued to increase rapidly until 2021 and then decreased thereafter. Vape variants first occurred in clinical notes in 2012, remained sparse until 2018 when their documentation increased dramatically, and surpassing e-cig variants in 2022 [ Figure 1]. Overall, vape variant terms were more frequently used than e-cig variants during the 2007–2023 study period by patient count. Note count evolved in a pattern closely mirroring that for patient count shown in Figure 1, with much higher counts as expected (data not shown).

The same patient may be counted under multiple terms if these terms are found in the patient’s clinical notes. Patient counts for longer terms (e.g., vapes) represent a subset of overlapping shorter terms (e.g., vape). Patient counts are cumulative over time, i.e., counts in earlier years are subsumed in those for subsequent years.
In addition to the shift from e-cig to vape variants over time, specific terms used for both e-cig and vape variants have also evolved over time. For example, “e smoking” was the most popular e-cig variant up to 2018 when its use peaked and was taken over by “e-cig” in 2019, which has remained the most popular e-cig variant since then. “Vape” was the most popular vape variant between 2012 and 2015, surpassed by “vaping” in 2016, and regained the lead between 2019 and 2020. Vaping became the dominant term used since 2021.
Twenty-seven ENDS brand terms were identified ( Table 3, with corresponding patient and note counts). There were no ENDS brand terms in the clinical notes until 2011. The highest patient counts for the most popular brand remain below 2,500 per year throughout 2011–2023. Compared to the e-cig and vape variant terms discussed above, ENDS brand names were rarely documented in notes, limiting the feasibility of conducting brand-specific ENDS studies using EHR.
3.2.2. Nicotine Pouch Terms
Documentation of nicotine pouch use was first recorded for one patient in 2011 and remained almost non-existent between 2011 and 2018, with fewer than 10 patients per year. While increasing, documentation of nicotine pouch use in clinical notes is still sparse. There were only a total of 1,217 patients with such documentation in clinical notes among the study population, compared to 1,207,884 for ENDS ( Table 1).
3.2.3. Combustible Tobacco Terms
The 118 combustible tobacco terms documented in clinical notes can be divided into 4 general groups: 49 for current smoking, 34 for former smoking, 21 for cigar, and 14 for water pipe. The terms with the two highest patient count for each of these 4 groups are, “current smoker” (238,277) and “continues to smoke” (173,614); “former smoker” (1,152,965) and “ex-smoker” (109,708); “cigar” (44,446) and “smokes cigar” (8,083); and “hookah” (6,015) and “smokes hookah” (929), respectively.
While all 13,583,236 patients in the EHR Population ( Table 1) had at least one ICD code indicating combustible product use history, combustible tobacco use was documented for 3,443,072 patients (25.3%), in 48,432,066 of the 862,491,151 clinical notes (5.6%) ( Table 1).
In this retrospective survey, we investigated terms used to document ENDS, nicotine pouch, and combustible tobacco use in EHR clinical notes among patients with a history of combustible tobacco use, and changes in the terms used over time. The number of terms used for each product category shows large variations in how tobacco product use is documented in clinical notes which likely resulted from inconsistencies in documentation between EHR systems as well as among individual practitioners, and present major challenges in utilizing such data for studies on the health impact of SFP use.
Documentation on ENDS use in clinical notes has increased substantially in recent years to become a valuable data source for assessing the real-world impact of transitioning from combustible tobacco to ENDS, as well as on the potential health impact of ENDS use among adults who did not have a history of using combustible products. However, while there was at least one documentation of ENDS use for 27.6% of the study population, ENDS use was documented in only 11.7% of the notes, indicating ENDS use is still under-documented in clinical notes, which is consistent with earlier reports.28,29 Clinicians used a large variation of terms to document ENDS use, and the terms used evolved substantially over time, which poses challenges for extracting consistent ENDS use information from clinical notes. Initially, e-cigarette variant terms were exclusively used before 2012, but by 2022, the latest full-year data available, vape variants were more commonly used (2.3 million notes and 599.6 thousand patients vs. 2.1 million notes and 567.3 thousand patients for e-cig variants). In contrast, ENDS brand names were captured infrequently in clinical notes (e.g., 4,400 notes, 2,600 patients in 2022). This evolution reflects changes in product popularity and terminology used by both consumers and healthcare providers. Understanding these changes is crucial for accurately capturing ENDS use history in future research that utilizes clinical notes. Results on ENDS use documentation from this study using data from a general US population are in line with earlier report based on data from the Department of Veteran Affairs29 and can inform the creation of an ontology for ENDS, supporting further research into ENDS-related outcomes.
Consistent with earlier reports,28–30,35 combustible tobacco product use is also substantially under documented in clinical notes, for about a quarter of the patients with ICD codes for combustible product use history in the Optum EHR dataset. Nevertheless, clinical notes contain details on combustible tobacco product use not captured by ICD codes and therefore still have value for real-world evidence studies. For example, since ICD diagnosis codes do not differentiate combustible product types, information from clinical notes will be particularly useful for studies on products other than cigarettes (e.g., cigars and water pipes). Combining both structured and unstructured data from EHR would likely yield more accurate data on combustible tobacco use than from either source alone.
Documentation of nicotine pouch use, while increasing, is still too limited, with the highest annual patient count in the hundreds, for meaningful real-world evidence studies. It will likely take several years for patient counts with nicotine pouch documentation in the EHR to reach a meaningful level, making them valuable for health outcomes research. More diligent and consistent documentation of tobacco product use in EHR by clinicians will enhance their utility for real-world evidence studies. Standardization of SPF use documentation in EHRs (e.g., through the adoption of ICD codes, addition of structured fields to EHR systems, or the use of consistent terms) can substantially increase the efficiency as well as the quality of such real-world evidence studies.
A strength of this study is the use of a large EHR database comprising 107 million unique patients across the United States, with more than 45 million patients having clinical notes as the data source. This data source enables the results reported here to closely reflect the current practices of tobacco product use documentation in clinical notes in the US. Another strength is the study period; trends in documentation were captured from 2007 through 2023. The study has several limitations. The search strategy used in this study did not consider the context in which the terms appeared within the clinical notes, which would overestimate patients and notes documenting actual ENDS use, as cases were counted even when the notes indicated no ENDS use. However, data on positive confirmation of no ENDS use would also be of value for identifying such cohorts for comparison in future studies. The study population identification approach used in this study does not capture patients with documentation of tobacco use in their clinical notes without the ICD diagnosis codes for combustible tobacco in their EHR. However, the missed population is likely to be relatively small since combustible tobacco use has been more broadly documented with ICD codes than documentation of tobacco use in clinical notes.28–30,35 Since the terms identified in the study were based on an initial list from subject matter experts plus additional terms identified during the annotation of a subset of randomly sampled clinical notes, there might be other terms in the clinical notes not captured by this approach. The number of such terms are likely to be relatively small due to the large number of notes manually annotated during the study.
A large variation of terms has been used to document ENDS use in the unstructured clinical notes. Such documentation has increased substantially in recent years, generating a large body of ENDS use data. Importantly, these data can be extracted with tools like NLP algorithms for real-world evidence studies on the impact of transitioning from combustible tobacco to ENDS use. Clinical notes also contain additional information on combustible tobacco use not captured by ICD codes. In contrast, documentation of nicotine pouch use in clinical notes is still very limited. Findings from this study on the variation, distribution, and evolution over time, of terms used to document combustible tobacco, ENDS, and nicotine pouches in clinical notes can help researchers in planning and designing real-world evidence studies on the health impact of transitioning from combustible tobacco to modern smoke-free tobacco products, including ENDS and nicotine pouches. Combining data from clinical notes with data from other sources (e.g., structured EHR, claims, consumer surveys) will enable more robust real-world evidence studies. Standardization in the documentation of modern SFP use in EHR will further enhance the utility of these data for real-world evidence research.
Ethical review and approval were waived for this study due because this study did not involve any human subjects, identifiable personal information, other HIPPA identifiers, or any other information that could lead to the identification of a participant. All database records used in the study are statistically de-identified and certified to be fully compliant with US patient confidentiality requirements set forth in the Health Insurance Portability and Accountability Act of 1996. The study protocol was reviewed by WCG Institutional Review Board (WCG IRB ID: IRB00000533) which determined it to be exempt from IRB review and informed consent was therefore not required.
This study was reviewed by an Institutional Review Board and was determined to be exempt from human subjects research oversight. The underlying electronic health record (EHR) data used are not publicly available due to restrictions on access to proprietary EHR data and the need to protect patient privacy. The study used de-identified data from Optum’s EHR and Clinical Notes databases, which are subject to data use agreements that prohibit public sharing. These datasets contain sensitive health information and are licensed for research use under specific contractual and ethical restrictions. Researchers interested in the data may request a license from Optum, subject to license agreements and approval by Optum. Requests should be directed to Pope, Derek at [email protected]. Aggregate results generated in this study are provided within the manuscript. Additional aggregate results can be made available upon reasonable request.
The authors gratefully acknowledge Abhinav Nayyar, Shashi Khan, Shikha Anand, Tamanna Tara, and Pankaj Kang of Optum for their contributions to data curation and model development and Noelle Gronroos and Ami Buikema of Optum for their contributions to study design and interpretation. The authors would also like to thank Dr. Yan Wang for her comments on the draft manuscript.
Third party trademarks are the property of their respective owners, are used for reference only, and are not intended to suggest any affiliation.
| Views | Downloads | |
|---|---|---|
| F1000Research | - | - |
|
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)