<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="data-paper" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.20193.1</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Data Note</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>Curation of an intensive care research dataset from routinely collected patient data in an NHS trust.</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 1; peer review: 2 approved]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>McWilliams</surname>
                        <given-names>Chris</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-3816-5217</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Inoue</surname>
                        <given-names>Joshua</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Wadey</surname>
                        <given-names>Philip</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Palmer</surname>
                        <given-names>Graeme</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Santos-Rodriguez</surname>
                        <given-names>Raul</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Bourdeaux</surname>
                        <given-names>Christopher</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="corresp" rid="c2">b</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>Department of Engineering Mathematics, University of Bristol, Bristol, UK</aff>
                <aff id="a2">
                    <label>2</label>University Hospitals Bristol NHS Foundation Trust, Bristol, UK</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:chris.mcwilliams@bristol.ac.uk">chris.mcwilliams@bristol.ac.uk</email>
                </corresp>
                <corresp id="c2">
                    <label>b</label>
                    <email xlink:href="mailto:christopher.bourdeaux@uhbristol.nhs.uk">christopher.bourdeaux@uhbristol.nhs.uk</email>
                </corresp>
                <fn fn-type="con">
                    <p>GP, PW, JI and CM together extracted and pre-processed the data. GP, JI, PW and CB provided intimate knowledge of data collection procedures and systems. RS and CB conceived of the research dataset and oversaw its curation. CM conducted the coding and analysis. All authors contributed to writing the manuscript and approved the final version.</p>
                </fn>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>19</day>
                <month>8</month>
                <year>2019</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2019</year>
            </pub-date>
            <volume>8</volume>
            <elocation-id>1460</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>7</day>
                    <month>8</month>
                    <year>2019</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 McWilliams C et al.</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/8-1460/pdf"/>
            <abstract>
                <p>In this data note we provide the details of a research database of 4831 adult intensive care patients who were treated in the Bristol Royal Infirmary, UK between 2015 and 2019. The purposes of this publication are to describe the dataset for external researchers who may be interested in making use of it, and to detail the methods used to curate the dataset in order to help other intensive care units make secondary use of their routinely collected data. The curation involves linkage between two critical care datasets within our hospital and the accompanying code is available online. For reasons of data privacy the data cannot be shared without researchers obtaining appropriate ethical consents. In the future we hope to obtain a data sharing agreement in order to publicly share the de-identified data, and to link our data with other intensive care units who use a Philips clinical information system.</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>Intensive care</kwd>
                <kwd>electronic health record</kwd>
                <kwd>medical database</kwd>
                <kwd>research data</kwd>
                <kwd>critical care data</kwd>
                <kwd>ICNARC</kwd>
                <kwd>Philips</kwd>
                <kwd>clinical information system</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1">
                    <funding-source>Above and Beyond</funding-source>
                </award-group>
                <award-group id="fund-2" xlink:href="http://dx.doi.org/10.13039/501100000266">
                    <funding-source>Engineering and Physical Sciences Research Council</funding-source>
                    <award-id>EP/R511663/1</award-id>
                </award-group>
                <funding-statement>CM was funded by the EPSRC Impact Acceleration Account (EP/R511663/1) with a contribution from Above and Beyond.</funding-statement>
                <funding-statement>
                    <italic>The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</italic>
                </funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        <sec sec-type="intro">
            <title>Introduction</title>
            <p>The increasing use of clinical information systems on intensive care units (ICUs) means that large amounts of patient data are being generated as part of routine care. These data are stored in electronic health records (EHR) and represent a valuable resource with huge potential to improve patient care. Collaboration between clinicians, researchers and industry stakeholders is required to realise the potential of these data by developing new methodologies and digital technologies. However, there exists a more fundamental set of barriers to making the required data available for secondary use and until these barriers are overcome the ability to maximise patient benefit via data-driven approaches will be limited. Here we introduce what we see as the four main barriers, and then explain how the publication of this data note (and its associated methodology for data curation) contributes to overcoming these barriers.</p>
            <sec>
                <title>Barrier 1: Data format</title>
                <p>There is no standard format for storing intensive care EHR data. This is mainly due to two factors: differences between the proprietary formats used by different clinical information systems, and the high level of configurability of each system. EHR data are stored in proprietary formats designed by the companies who provide the data collection and storage software. In the intensive care units at our hospital we use the Philips ICCA clinical information system (CIS), which is currently the most widely deployed system across the NHS, with installation at 27 sites at the time of writing. Although the various available critical care CIS products do facilitate secondary data usage to some extent, they were all designed primarily as charting systems and therefore secondary use of the data is always a challenge. The main issue with ICCA is the high level of configurability of the system, meaning that data encoding can vary extensively between sites but can also change over time at a single site. The consequence of this configurability is that it can be challenging to locate and harmonise even a single simple data element, such as heart rate, for a cohort of patients over a period of time.</p>
            </sec>
            <sec>
                <title>Barrier 2: Data linkage</title>
                <p>There are two related issues around data linkage: 1) different types of data from different sources within the hospital (or beyond) need to be linked in order to make the data more useful to researchers; and 2) data from different hospitals need to be combined to increase data volume and therefore statistical power.</p>
                <p>The first issue relates to the scope of individual data sources. The ICCA database contains data collected routinely as part of patient care on ICU, but does not contain any information about what happened to the patient before or after their ICU stay. Therefore, taken in isolation, the data in ICCA are of limited use for research purposes. In order to make the data more useful they must be linked to other datasets that capture diagnoses, past medical history, outcomes etc. For this purpose we use data that is compiled locally for national audit by ICNARC (see Methods for more details). Linkage of our ICCA data to the local ICNARC data is a procedure that should be simple but is in fact challenging because of several error sources relating to the way that the data are collected. Developing a robust data linkage procedure has required an intimate knowledge of the data. Exposition of this data linkage procedure is one of the main purposes of this paper, because it will help other NHS trusts unlock secondary value from their data.</p>
                <p>The second issue relates to the fact that individual intensive care datasets are relatively small. The general intensive care unit at UHB has 20 beds and treats around 1300 patients each year. To date the research database contains 4831 patients database and this number will increase to ~ 6100 with the update at the end of 2019. Most machine learning algorithms need more cases than this to achieve good performance, hence the motivation to link datasets across hospitals. Two US-based critical care datasets have achieved high volumes of data via different means. MIMIC-III
                    <sup>
                        <xref ref-type="bibr" rid="ref-1">1</xref>
                    </sup> contains around 60,000 ICU admissions, collected from a single large teaching hospital with multiple units over a period of 12 years. Conversely the eICU database
                    <sup>
                        <xref ref-type="bibr" rid="ref-2">2</xref>
                    </sup>, produced by Philips, contains around 200,000 patient stays from different hospitals over a period of two years
                    <sup>
                        <xref ref-type="other" rid="FN2">a</xref>
                    </sup>. The eICU data were collected with purpose built software to facilitate high-frequency data collection in a coherent format. Both the MIMIC and eICU datasets are publicly available and their widespread use by researchers will be hugely beneficial to patients. In the UK the CCHIC
                    <sup>
                        <xref ref-type="bibr" rid="ref-3">3</xref>
                    </sup> has work on linking data from multiple hospitals with different CIS products. The challenges posed by linking data from the different proprietary systems are significant, but the data has begun to be used by researchers affiliated with the CCHIC. We feel that focusing solely on data from a single CIS system (e.g. ICCA) would significantly simplify the linkage process and that, given the widespread deployment of ICCA across the NHS, there is good potential to produce a large high-quality intensive care research database by linking data from ICCA sites only. The first stage in this process is to encourage and facilitate local pre-processing of the data at each site.</p>
            </sec>
            <sec>
                <title>Barrier 3: Data privacy</title>
                <p>There is a growing consensus that the best way to unlock value from data is to share them widely and openly with researchers. Given the sensitive nature of medical data there are important ethical issues to consider in this context. However, we are ultimately of the opinion that it is unethical 
                    <italic toggle="yes">not</italic> to use routinely collected data to improve patient care. Therefore, addressing the issues around data privacy requires the development of information governance frameworks to facilitate data sharing while ensuring transparency, trust and safeguarding of patient data. The public data sharing agreements of MIMIC and the eICU represent precedents in this area that the NHS should pursue in order to unlock maximum value from their data.</p>
                <p>In this data note we outline the steps we have taken to make our routinely collected critical care data &#x2018;research ready&#x2019; and provide some related resources via GitHub. Our intention is that this will contribute to overcoming the above barriers, particularly by facilitating other ICUs with the ICCA system to link and process their data for secondary use. Curating our data using the methods described here has expanded our capacity for clinical reporting. We now regularly review a wide range of practices such as proning, pressure area care and prescribing. In real-time we use clinical dashboards to show the status of beds on the unit and generate retrospective reports to study trends over time. We have previously published work on the effectiveness of our clinical dashboards in improving ventilation practice via behavioural nudges
                    <sup>
                        <xref ref-type="bibr" rid="ref-4">4</xref>,
                        <xref ref-type="bibr" rid="ref-5">5</xref>
                    </sup>. Since then we have continued to expand the capabilities of the dashboards to support clinical decision making and improve the quality of care. We have collaborated with Philips on the development of dashboard intervention for acute kidney injury
                    <sup>
                        <xref ref-type="bibr" rid="ref-6">6</xref>,
                        <xref ref-type="bibr" rid="ref-7">7</xref>
                    </sup> and have begun to explore machine learning methods for the automatic classification of ward-dischargeable patients
                    <sup>
                        <xref ref-type="bibr" rid="ref-8">8</xref>
                    </sup>.</p>
                <p>In the future, under the correct information governance framework, linkage between several ICUs with ICCA could produce a large high quality critical care research dataset. In the meantime we encourage researchers to consider using our data by obtaining the appropriate ethical consents (see 
                    <italic toggle="yes">Data availability</italic>) and provide a brief summary of the data that would be available to them.</p>
            </sec>
        </sec>
        <sec sec-type="materials | methods">
            <title>Materials and methods</title>
            <p>In this section we describe the processing that we have done so far to make our routinely collected data &#x2018;research ready&#x2019;. We first detail the two sources of our research data, then outline the procedure for linking data from these two sources and finally discuss the importance of further processing, including data harmonisation, to increase the general usability of these data. In the text we refer to open-source SQL and Python scripts that we have shared on our group GitHub account for readers wanting to process their own data in a similar way.</p>
            <sec>
                <title>Data sources</title>
                <p>
                    <bold>
                        <italic toggle="yes">ICCA.</italic>
                    </bold> Philips IntelliSpace Critical Care and Anesthesia information system (ICCA) is a patient monitoring, documentation and prescribing system used in the four intensive care units at our hospital
                    <sup>
                        <xref ref-type="other" rid="FN3">b</xref>
                    </sup>. ICCA collects rich data about a patient&#x2019;s condition, both via automated data streams from bedside monitors and manually input by health care providers. These data include ventilation details, medications and regular notes from medical staff. The data are stored in a reporting database, which is managed using Microsoft SQL Server and follows a star-schema that is well documented by Philips.</p>
                <p>The ICCA data are used by medical staff to monitor patients while they are on the unit, and secondary usage has traditionally focused on financial reporting within the trust to capture the value of care provided in each ICU stay. More recently we have started to make use of the data for clinical reporting and have established regular meetings to schedule work on reporting requests from clinicians.</p>
                <p>
                    <bold>
                        <italic toggle="yes">ICNARC.</italic>
                    </bold> The Intensive Care National Audit and Research Centre (ICNARC) is an independent national charity set up with funding from the Department for Health and the Welsh Health Common Services Authority in 1993. The Case Mix Programme (CMP)
                    <sup>
                        <xref ref-type="bibr" rid="ref-9">9</xref>
                    </sup> started in 1994 is one of ICNARC&#x2019;s main national audits which today provides a comprehensive dataset across 268 critical care unit, covering 99% of all adult critical care units in the in the UK and Northern Ireland. The CMP dataset (currently version 3.1) consists of 209 data fields (as listed Table S1, 
                    <italic toggle="yes">Extended data</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref-10">10</xref>
                    </sup>), which overlap with most of the 34 data fields in the Critical Care Minimum dataset
                    <sup>
                        <xref ref-type="bibr" rid="ref-11">11</xref>
                    </sup> and include the CCMDS subset of all 14 mandatory data fields used to generate the Healthcare Resource Group (HRG). This data is collected for every patient that passes through a CMP participating ICU and covers: basic demographic information; pre-admission details including past medical history and reason for ITU admission (using the ICNARC Coding Method); severity during the first 24 hours; number of days of organ support during their ICU stay and outcomes on both leaving the unit and then final discharge from hospital. The purpose of the audit is to provide a national resource for research and a local and national benchmarking tool for individual critical care units.</p>
                <p>Ward Watcher
                    <sup>
                        <xref ref-type="bibr" rid="ref-12">12</xref>
                    </sup> is the bespoke proprietary software (provided by Critical Care Audit Ltd) we use in the trust to collect this CMP dataset before sending it off to ICNARC. This software allows us to collect extra information for each patient that is not sent to ICNARC but is used within the Trust to generate detailed custom reports. It has been configured to automatically generate new records when a new admission is entered into a bed space on the Philips ICCA system and will pull data from the flowsheet and completed forms in ICCA for manual verification.</p>
            </sec>
            <sec>
                <title>Data linkage</title>
                <p>A careful procedure is required to link datasets from different sources to produce valid and usable data. Here we describe our procedure for linking data from ICCA and ICNARC to produce patient records with both routinely collected ICU data and outcome descriptors. This method will be useful for any intensive care unit the ICCA system who want to make secondary use of their data in-house. The method is also detailed step-by-step in an iPython notebook (see Script S1, 
                    <italic toggle="yes">Software availability</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref-10">10</xref>
                    </sup>).</p>
                <p>The main challenge to overcome is that erroneous entries in both datasets prevent a clean link. Without these errors the linkage would be a simple case of joining data tables on a unique identifier corresponding to each ICU stay. Therefore, we must first identify the erroneous entries and handle them according to the type of error that produced them. This procedure would not be possible without an intimate first-hand knowledge of the data and they way they are generated. There are three stages in the data linkage: first we handle the errors in the ICNARC data, then we handle the errors in the ICCA data and finally we link the two datasets together.</p>
            </sec>
            <sec>
                <title>Handling ICNARC errors</title>
                <p>Every patient record in the ICNARC data
                    <sup>
                        <xref ref-type="other" rid="FN4">c</xref>
                    </sup> is manually validated by the data team, so we can be sure that each record corresponds to a real ICU stay and contains valid patient data. In the Ward Watcher software each ICNARC patient record links to an identifier in ICCA called the 
                    <italic toggle="yes">encounterId</italic>. In theory the 
                    <italic toggle="yes">encounterId</italic> uniquely identifies each ICU stay that has been captured in the CIS. However, there are various sources of error in the ICCA 
                    <italic toggle="yes">encounterIds</italic> which break the one-to-one mapping with patient records in Ward Watcher. For a small number of cases the patient record in Ward Watcher points to an empty or corrupt ICU stay in ICCA. In these cases we simply redirect the record in Ward Watcher to point to the correct stay in ICCA. For completeness we also create a new column to record the erroneous ICU stay that was pointed to originally.</p>
            </sec>
            <sec>
                <title>Handling ICCA errors</title>
                <p>When patients are admitted to ICU, a record with a unique 
                    <italic toggle="yes">encounterId</italic> is manually created in ICCA. All data associated with that ICU stay is linked with this 
                    <italic toggle="yes">encounterId</italic> until the patient is discharged from ICU, at which point they are manually removed from the system. Since the admission and discharge actions in ICCA are conducted manually and are not retrospectively validated, there is potential for a number of different types of error. For example, patients can be admitted and discharged erroneously leading to phantom, nested or disjointed stays. All the potential types of error are listed in Table S2 (
                    <italic toggle="yes">Extended data</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref-10">10</xref>
                    </sup>), but there are broadly two classes of error, which are handled differently: 1) multiple 
                    <italic toggle="yes">encounterIds</italic> corresponding to a single ICU stay; and 2) multiple actual ICU stays with a single 
                    <italic toggle="yes">encounterId</italic>. For the first class of error, we replace the duplicate 
                    <italic toggle="yes">encounterIds</italic> with the original 
                    <italic toggle="yes">encounterId</italic> that was created for that stay such that a single coherent record is produced. We again produce a new column (specifically in the 
                    <italic toggle="yes">D_Encounters</italic> table) to record the duplicate 
                    <italic toggle="yes">encounterIds</italic> that have been replaced. For the second class of error there is no simple solution that could be robustly automated, so we leave these cases for manual processing by individual researchers
                    <sup>
                        <xref ref-type="other" rid="FN5">d</xref>
                    </sup>. To facilitate manual processing we introduce another column (to the table 
                    <italic toggle="yes">D_Encounters</italic>) which specifies the type of error, if any, associated with each 
                    <italic toggle="yes">encounterId</italic>.</p>
            </sec>
            <sec>
                <title>Linking</title>
                <p>Having handled the errors in both datasets, we now have one-to-one mapping between ICNARC records and stays in ICCA. We then extract all the CMP patient data from Ward Watcher in a standard XML format and use it to produce another table in our research database called 
                    <italic toggle="yes">D_Icnarc</italic>. This table has one row for each ICU stay and one column for each of the 209 variables in the CMP dataset, and links to other tables via 
                    <italic toggle="yes">encounterId</italic> and 
                    <italic toggle="yes">ptCensusId</italic>
                    <sup>
                        <xref ref-type="other" rid="FN6">e</xref>
                    </sup>.</p>
            </sec>
            <sec>
                <title>Data harmonisation</title>
                <p>The configurability of ICCA means that the way interventions are encoded can change over time. For retrospective studies it is necessary to search for medical concepts and variables in the SQL database, which can be time consuming. We have provided a well commented SQL script (see Script S2, 
                    <italic toggle="yes">Software availability</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref-10">10</xref>
                    </sup>) for locating variables in the back end of ICCA which should be useful for anyone working with the system. In general the best strategy is to search on the 
                    <italic toggle="yes">longLabel</italic> for interventions and on the 
                    <italic toggle="yes">shortLabel</italic> for the corresponding attributes, and then to calculate usage frequency to confirm that the variable located is in use. In the future we hope to produce a software tool for variable location that is usable by those without knowledge of SQL or experience of working with ICCA.</p>
            </sec>
            <sec>
                <title>Ethics</title>
                <p>The full database is stored on a secure hospital server to which only UHB data managers have access. We follow the guidelines of the NHS Health Research Agency Confidentiality Advisory Group
                    <sup>
                        <xref ref-type="bibr" rid="ref-13">13</xref>
                    </sup>. Curation of the data for internal audit and service evaluation does not require research ethics approval, and for projects that extend beyond routine reporting we produce de-identified extracts of the required data with sensitive information removed (names, dates of birth, addresses, rare diagnoses, etc.).</p>
            </sec>
        </sec>
        <sec>
            <title>Dataset validation</title>
            <p>The ICNARC data are validated internally at our hospital and externally at the national office. Therefore, we can have confidence in the validity of these data. The above procedure for data linkage also removes erroneous entries in the ICCA data. Users of the data must be aware that there are other sources of error in CIS data. In particular, some data are entered manually (medical notes, free form laboratory results, etc.) and are therefore vulnerable to corruption. Certain data fields are populated automatically (e.g. from bedside monitors) but not stored until a nurse confirms that the value is representative. Such fields are therefore valid when recorded but subject to missing values.</p>
            <p>In 
                <xref ref-type="table" rid="T1">Table 1</xref> we provide a brief summary of 30 selected physiological variables to give readers a feel for the type of data contained in the database, including the frequency of recording of different variables and the extent of missing data values. We also provide a demographic summary of the patients represented in the data (
                <xref ref-type="table" rid="T2">Table 2</xref>). Readers are referred to Supplementary Figures S1&#x2013;S4 (see 
                <italic toggle="yes">Extended data</italic>
                <sup>
                    <xref ref-type="bibr" rid="ref-10">10</xref>
                </sup>) for further demographic information, and Supplementary Figures S5&#x2013;S7 for distributions of variable values.</p>
            <table-wrap id="T1" orientation="portrait" position="anchor">
                <label>Table 1. </label>
                <caption>
                    <title>Summary of selected variables.</title>
                    <p>&#x2018;Record completeness&#x2019; is the percentage of ICU stays that contain at least one recording of the variable. &#x2018;Frequency recorded&#x2019; is the number of times the variable is recorded per hour for the ICU stays that contain records of that variable. (Note: these frequencies are calculated over the full length of stay and so may be distorted when a variable is measured only during a subset of the stay.)</p>
                </caption>
                <table content-type="article-table" frame="hsides">
                    <thead>
                        <tr>
                            <th align="left" colspan="1" rowspan="1">Variable</th>
                            <th align="center" colspan="1" rowspan="1">Value,mean (&#x00b1;1
                                <italic toggle="yes">s</italic>.
                                <italic toggle="yes">d</italic>.)</th>
                            <th align="center" colspan="1" rowspan="1">Record completeness, %</th>
                            <th align="center" colspan="1" rowspan="1">Frequency recorded, mean (&#x00b1;1
                                <italic toggle="yes">s</italic>.
                                <italic toggle="yes">d</italic>.)</th>
                        </tr>
                    </thead>
                    <tbody>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">Heart rate</td>
                            <td align="center" colspan="1" rowspan="1">85.88 (&#x00b1;19.06)</td>
                            <td align="center" colspan="1" rowspan="1">0.997</td>
                            <td align="center" colspan="1" rowspan="1">0.836 (&#x00b1;0.311)</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">GCS</td>
                            <td align="center" colspan="1" rowspan="1">10.47 (&#x00b1;4.75)</td>
                            <td align="center" colspan="1" rowspan="1">0.993</td>
                            <td align="center" colspan="1" rowspan="1">0.284 (&#x00b1;0.133)</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">Central Temperature</td>
                            <td align="center" colspan="1" rowspan="1">36.10 (&#x00b1;1.80)</td>
                            <td align="center" colspan="1" rowspan="1">0.245</td>
                            <td align="center" colspan="1" rowspan="1">0.547 (&#x00b1;0.666)</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">Peripheral Temperature</td>
                            <td align="center" colspan="1" rowspan="1">37.06 (&#x00b1;0.96)</td>
                            <td align="center" colspan="1" rowspan="1">0.984</td>
                            <td align="center" colspan="1" rowspan="1">0.292 (&#x00b1;0.123)</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">Respiratory rate</td>
                            <td align="center" colspan="1" rowspan="1">18.43 (&#x00b1;11.19)</td>
                            <td align="center" colspan="1" rowspan="1">0.996</td>
                            <td align="center" colspan="1" rowspan="1">1.310 (&#x00b1;0.923)</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">FiO2</td>
                            <td align="center" colspan="1" rowspan="1">36.50 (&#x00b1;14.57)</td>
                            <td align="center" colspan="1" rowspan="1">0.841</td>
                            <td align="center" colspan="1" rowspan="1">0.922 (&#x00b1;0.789)</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">PEEP</td>
                            <td align="center" colspan="1" rowspan="1">8.02 (&#x00b1;2.75)</td>
                            <td align="center" colspan="1" rowspan="1">0.509</td>
                            <td align="center" colspan="1" rowspan="1">0.535 (&#x00b1;0.387)</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">Airway</td>
                            <td align="center" colspan="1" rowspan="1">-</td>
                            <td align="center" colspan="1" rowspan="1">0.991</td>
                            <td align="center" colspan="1" rowspan="1">0.671 (&#x00b1;0.297)</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">pO2</td>
                            <td align="center" colspan="1" rowspan="1">10.87 (&#x00b1;5.71)</td>
                            <td align="center" colspan="1" rowspan="1">0.991</td>
                            <td align="center" colspan="1" rowspan="1">0.348 (&#x00b1;0.313)</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">pCO2</td>
                            <td align="center" colspan="1" rowspan="1">5.62 (&#x00b1;1.41)</td>
                            <td align="center" colspan="1" rowspan="1">0.991</td>
                            <td align="center" colspan="1" rowspan="1">0.350 (&#x00b1;0.312)</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">SpO2</td>
                            <td align="center" colspan="1" rowspan="1">95.71 (&#x00b1;3.57)</td>
                            <td align="center" colspan="1" rowspan="1">0.995</td>
                            <td align="center" colspan="1" rowspan="1">0.810 (&#x00b1;0.309)</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">Non-Invasive BP Mean</td>
                            <td align="center" colspan="1" rowspan="1">83.91 (&#x00b1;19.37)</td>
                            <td align="center" colspan="1" rowspan="1">0.834</td>
                            <td align="center" colspan="1" rowspan="1">0.254 (&#x00b1;0.367)</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">Non-Invasive BP Systolic</td>
                            <td align="center" colspan="1" rowspan="1">124.32 (&#x00b1;26.62)</td>
                            <td align="center" colspan="1" rowspan="1">0.839</td>
                            <td align="center" colspan="1" rowspan="1">0.259 (&#x00b1;0.364)</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">Non-Invasive BP Diastolic</td>
                            <td align="center" colspan="1" rowspan="1">65.87 (&#x00b1;18.17)</td>
                            <td align="center" colspan="1" rowspan="1">0.838</td>
                            <td align="center" colspan="1" rowspan="1">0.259 (&#x00b1;0.364)</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">Arterial BP Mean</td>
                            <td align="center" colspan="1" rowspan="1">80.04 (&#x00b1;18.34)</td>
                            <td align="center" colspan="1" rowspan="1">0.953</td>
                            <td align="center" colspan="1" rowspan="1">0.700 (&#x00b1;0.357)</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">Arterial BP Systolic</td>
                            <td align="center" colspan="1" rowspan="1">119.99 (&#x00b1;24.73)</td>
                            <td align="center" colspan="1" rowspan="1">0.954</td>
                            <td align="center" colspan="1" rowspan="1">0.698 (&#x00b1;0.356)</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">Arterial BP Diastolic</td>
                            <td align="center" colspan="1" rowspan="1">59.31 (&#x00b1;14.07)</td>
                            <td align="center" colspan="1" rowspan="1">0.954</td>
                            <td align="center" colspan="1" rowspan="1">0.698 (&#x00b1;0.356)</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">Serum sodium</td>
                            <td align="center" colspan="1" rowspan="1">137.27 (&#x00b1;5.57)</td>
                            <td align="center" colspan="1" rowspan="1">0.999</td>
                            <td align="center" colspan="1" rowspan="1">0.454 (&#x00b1;0.452)</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">Serum pH</td>
                            <td align="center" colspan="1" rowspan="1">7.40 (&#x00b1;0.09)</td>
                            <td align="center" colspan="1" rowspan="1">0.991</td>
                            <td align="center" colspan="1" rowspan="1">0.350 (&#x00b1;0.312)</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">Serum potassium</td>
                            <td align="center" colspan="1" rowspan="1">4.38 (&#x00b1;0.60)</td>
                            <td align="center" colspan="1" rowspan="1">0.999</td>
                            <td align="center" colspan="1" rowspan="1">0.451 (&#x00b1;0.451)</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">Serum ionised calcium</td>
                            <td align="center" colspan="1" rowspan="1">1.13 (&#x00b1;0.15)</td>
                            <td align="center" colspan="1" rowspan="1">0.991</td>
                            <td align="center" colspan="1" rowspan="1">0.351 (&#x00b1;0.314)</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">Serum bicarbonate</td>
                            <td align="center" colspan="1" rowspan="1">25.65 (&#x00b1;4.84)</td>
                            <td align="center" colspan="1" rowspan="1">0.991</td>
                            <td align="center" colspan="1" rowspan="1">0.622 (&#x00b1;0.492)</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">Serum urea</td>
                            <td align="center" colspan="1" rowspan="1">9.13 (&#x00b1;6.91)</td>
                            <td align="center" colspan="1" rowspan="1">0.991</td>
                            <td align="center" colspan="1" rowspan="1">0.107 (&#x00b1;0.184)</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">Serum creatinine</td>
                            <td align="center" colspan="1" rowspan="1">105.55 (&#x00b1;89.35)</td>
                            <td align="center" colspan="1" rowspan="1">0.990</td>
                            <td align="center" colspan="1" rowspan="1">0.107 (&#x00b1;0.184)</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">Bilirubin</td>
                            <td align="center" colspan="1" rowspan="1">23.92 (&#x00b1;48.74)</td>
                            <td align="center" colspan="1" rowspan="1">0.990</td>
                            <td align="center" colspan="1" rowspan="1">0.098 (&#x00b1;0.161)</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">Platelets</td>
                            <td align="center" colspan="1" rowspan="1">246.14 (&#x00b1;151.25)</td>
                            <td align="center" colspan="1" rowspan="1">0.992</td>
                            <td align="center" colspan="1" rowspan="1">0.111 (&#x00b1;0.337)</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">Haemoglobin</td>
                            <td align="center" colspan="1" rowspan="1">101.87 (&#x00b1;22.86)</td>
                            <td align="center" colspan="1" rowspan="1">0.991</td>
                            <td align="center" colspan="1" rowspan="1">0.109 (&#x00b1;0.337)</td>
                        </tr>
                    </tbody>
                </table>
            </table-wrap>
            <table-wrap id="T2" orientation="portrait" position="anchor">
                <label>Table 2. </label>
                <caption>
                    <title>Demographic summary of the cohort represented in the research dataset.</title>
                </caption>
                <table content-type="article-table" frame="hsides">
                    <thead>
                        <tr>
                            <th align="center" colspan="1" rowspan="1">Variable</th>
                            <th align="center" colspan="1" rowspan="1">Value</th>
                        </tr>
                    </thead>
                    <tbody>
                        <tr>
                            <td align="center" colspan="1" rowspan="1">Total ICU stays</td>
                            <td align="center" colspan="1" rowspan="1">4831</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1">Gender, % female</td>
                            <td align="center" colspan="1" rowspan="1">0.396</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1">Age, median years (IQR)</td>
                            <td align="center" colspan="1" rowspan="1">64.2 (50.8, 63.4)</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1">LOS, median days (IQR)</td>
                            <td align="center" colspan="1" rowspan="1">2.9 (1.7, 5.4)</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1">Readmission to ICU, # (%)</td>
                            <td align="center" colspan="1" rowspan="1">147 (3.0)</td>
                        </tr>
                        <tr>
                            <td align="center" colspan="1" rowspan="1">Mortality, # (%)</td>
                            <td align="center" colspan="1" rowspan="1">905 (18.7)</td>
                        </tr>
                    </tbody>
                </table>
            </table-wrap>
        </sec>
        <sec>
            <title>Future work</title>
            <p>The curation of this data has highlighted to us the importance of close collaboration between the people and teams responsible for collecting, administering and validating the data. The more that is known about an intensive care dataset&#x2014;the way the data are collected, the way they are affected by clinical practice, idiosyncrasies in the digital systems involved, operational factors&#x2014;the more value and information that can be extracted from them and ultimately the more value we can deliver to patients. In the future we will continue to improve and expand this research database. In particular we will work with colleagues in NICU, PICU and CICU to link data from the other intensive care units in our hospital. We will also look to include datasets from across the trust to capture information about patient hospital admissions outside the ICU.</p>
            <p>We hope to work with external collaborators to develop a robust method for de-identifying medical notes. Finally, we will explore the possibility of linking with data from external NHS trusts who also use ICCA in their ICUs. Eventually the expansion of this research data will require more extensive data harmonisation to combine multiply-defined clinical concepts, and crucially will require a bespoke information governance framework to allow us to bring this data to researchers. We note that there is a precedent for such governance agreements in other projects referenced previously
                <sup>
                    <xref ref-type="bibr" rid="ref-1">1</xref>&#x2013;
                    <xref ref-type="bibr" rid="ref-3">3</xref>
                </sup>.</p>
        </sec>
        <sec>
            <title>Data availability</title>
            <sec>
                <title>Underlying data</title>
                <p>The sensitive nature of these data means that they are only available internally to UHB staff for the purposes of clinical audit and service evaluation activities via the CAG guidelines. For external researchers, ethical approval may be obtained via formal application to the NHS Integrated Research Application System (IRAS) for a specific research project. The IRAS website (
                    <ext-link ext-link-type="uri" xlink:href="https://www.myresearchproject.org.uk/">www.myresearchproject.org.uk</ext-link>) has full instructions; however, interested parties are advised to contact the corresponding author (
                    <email xlink:href="mailto:christopher.bourdeaux@uhbristol.nhs.uk">christopher.bourdeaux@uhbristol.nhs.uk</email>) to discuss the application.</p>
            </sec>
            <sec>
                <title>Extended data</title>
                <p>Zenodo: UHBristolDataScience/data-note-extended-data 
                    <ext-link ext-link-type="uri" xlink:href="https://dx.doi.org/10.5281/zenodo.3361287">https://doi.org/10.5281/zenodo.3361287</ext-link>
                    <sup>
                        <xref ref-type="bibr" rid="ref-14">14</xref>
                    </sup>.</p>
                <p>This project contains the following extended data:</p>
                <list list-type="bullet">
                    <list-item>
                        <p>Table S1: extended_tables/icnarc_cmp_dataset_properties.xlsx</p>
                    </list-item>
                    <list-item>
                        <p>Table S2: extended_tables/icca_encounterid_error_types.xlsx</p>
                    </list-item>
                    <list-item>
                        <p>Figure S1: extended_figures/admisson_types_discharge_reasons.png</p>
                    </list-item>
                    <list-item>
                        <p>Figure S2: extended_figures/discharge_time_histograms.png</p>
                    </list-item>
                    <list-item>
                        <p>Figure S3: extended_figures/reasons_for_admission.png</p>
                    </list-item>
                    <list-item>
                        <p>Figure S4: extended_figures/stay_length_histograms.png</p>
                    </list-item>
                    <list-item>
                        <p>Figures S5&#x2013;S7: extended_figures/variable_hists[1-3].png</p>
                    </list-item>
                </list>
                <p>Extended data are available under the terms of the 
                    <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/legalcode">Creative Commons Attribution 4.0 International license</ext-link> (CC-BY 4.0).</p>
            </sec>
        </sec>
        <sec>
            <title>Software availability</title>
            <p>

                <bold>Script S1 available from:</bold> 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/UHBristolDataScience/ICNARC-to-Philips-Linkage/blob/v1.0.1/dataset_curation.ipynb">GitHub (file &#x2018;clean_encounterids.py&#x2019;)</ext-link>.</p>
            <p>

                <bold>Script S2 available from:</bold> 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/UHBristolDataScience/ICNARC-to-Philips-Linkage/blob/master/variable_location_in_ICCA.sql">GitHub (file &#x2018;variable_location_in_ICCA.sql&#x2019;)</ext-link>.</p>
            <p>

                <bold>Archived code at time of publication:</bold> 
                <ext-link ext-link-type="uri" xlink:href="https://dx.doi.org/10.5281/zenodo.3358750">https://doi.org/10.5281/zenodo.3358750</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-10">10</xref>
                </sup>.</p>
            <p>

                <bold>Licence:</bold> 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/UHBristolDataScience/ICNARC-to-Philips-Linkage/blob/v1.0.0/LICENSE">MIT licence</ext-link>.</p>
        </sec>
        <sec>
            <title>Notes</title>
            <p id="FN2">
                <sup>a</sup>This is the publicly available component of the eICU dataset. The full dataset held by Philips is much larger.</p>
            <p id="FN3">
                <sup>b</sup>The use of the same database by the four units is one source of error in the data (e.g. erroneous transfers or patients being attached to the wrong unit identifier).</p>
            <p id="FN4">
                <sup>c</sup>Note that in some very rare cases there are stays which are excluded from the ICNARC data.</p>
            <p id="FN5">
                <sup>d</sup>For example, researchers may wish to simply remove such cases, although removal would likely introduce some bias since these cases usually represent readmissions to ICU. Alternatively they may wish to manually split the stay into two records.</p>
            <p id="FN6">
                <sup>e</sup>The 
                <italic toggle="yes">ptCensusId</italic> in ICCA uniquely identifies spells in different units during the same ICU stay.</p>
        </sec>
    </body>
    <back>
        <ack>
            <title>Acknowledgements</title>
            <p>We would like to thank all of our colleagues within UHB who have made these data curation activities possible and who continue to support secondary use of the data. In particular: Russell McDonald-Bell, Matt Rogers, Colin Salandy, Amy Smith, all the members of UHBDataScience and our colleagues in NICU, PICU and CICU who work with ICCA. The expertise of Brian Millar (author of the Ward Watcher software) and Phil Stuart-Douek (Philips) has also been essential.</p>
        </ack>
        <ref-list>
            <ref id="ref-1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Johnson</surname>
                            <given-names>AE</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pollard</surname>
                            <given-names>TJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Shen</surname>
                            <given-names>L</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>MIMIC-III, a freely accessible critical care database.</article-title>
                    <source>

                        <italic toggle="yes">Sci Data.</italic>
</source>
                    <year>2016</year>;<volume>3</volume>: 160035.
                    <pub-id pub-id-type="pmid">27219127</pub-id>
                    <pub-id pub-id-type="doi">10.1038/sdata.2016.35</pub-id>
                    <pub-id pub-id-type="pmcid">4878278</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Pollard</surname>
                            <given-names>TJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Johnson</surname>
                            <given-names>AEW</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Raffa</surname>
                            <given-names>JD</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The eICU Collaborative Research Database, a freely available multi-center database for critical care research.</article-title>
                    <source>

                        <italic toggle="yes">Sci Data.</italic>
</source>Nature Publishing Group,<year>2018</year>;<volume>5</volume>: 180178.
                    <pub-id pub-id-type="pmid">30204154</pub-id>
                    <pub-id pub-id-type="doi">10.1038/sdata.2018.178</pub-id>
                    <pub-id pub-id-type="pmcid">6132188</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Harris</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Shi</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Brealey</surname>
                            <given-names>D</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Critical Care Health Informatics Collaborative (CCHIC): Data, tools and methods for reproducible research: A multi-centre UK intensive care database.</article-title>
                    <source>

                        <italic toggle="yes">Int J Med Inform.</italic>
</source>
                    <year>2018</year>;<volume>112</volume>:<fpage>82</fpage>&#x2013;<lpage>89</lpage>.
                    <pub-id pub-id-type="pmid">29500026</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.ijmedinf.2018.01.006</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bourdeaux</surname>
                            <given-names>CP</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Thomas</surname>
                            <given-names>MJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gould</surname>
                            <given-names>TH</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Increasing compliance with low tidal volume ventilation in the ICU with two nudge-based interventions: evaluation through intervention time-series analyses.</article-title>
                    <source>

                        <italic toggle="yes">BMJ Open.</italic>
</source>
                    <year>2016</year>;<volume>6</volume>(<issue>5</issue>):<fpage>e010129</fpage>.
                    <pub-id pub-id-type="pmid">27230998</pub-id>
                    <pub-id pub-id-type="doi">10.1136/bmjopen-2015-010129</pub-id>
                    <pub-id pub-id-type="pmcid">4885280</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bourdeaux</surname>
                            <given-names>CP</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Birnie</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Trickey</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Evaluation of an intervention to reduce tidal volumes in ventilated ICU patients.</article-title>
                    <source>

                        <italic toggle="yes">Br J Anaesth.</italic>
</source>
                    <year>2015</year>;<volume>115</volume>(<issue>2</issue>):<fpage>244</fpage>&#x2013;<lpage>251</lpage>.
                    <pub-id pub-id-type="pmid">25979150</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bja/aev110</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Pachucki</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ghosh</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Palanisamy</surname>
                            <given-names>K</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>1354: Acute kidney injury (aki) progression during the first five days of an icu stay.</article-title>
                    <source>

                        <italic toggle="yes">Critical Care Medicine.</italic>
</source>
                    <year>2018</year>;<volume>46</volume>(<issue>1</issue>):<fpage>660</fpage>.
                    <pub-id pub-id-type="doi">10.1097/01.ccm.0000529357.08587.74</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Pachucki</surname>
                            <given-names>MA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ghosh</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Eshelman</surname>
                            <given-names>L</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Descriptive study of differences in acute kidney injury progression patterns in general and cardiac intensive care units.</article-title>
                    <source>

                        <italic toggle="yes">Journal of the Intensive Care Society.</italic>
</source>
                    <year>2018</year>; 1751143718771261.
                    <pub-id pub-id-type="doi">10.1177/1751143718771261</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>McWilliams</surname>
                            <given-names>CJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lawson</surname>
                            <given-names>DJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Santos-Rodriguez</surname>
                            <given-names>R</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Towards a decision support tool for intensive care discharge: machine learning algorithm development using electronic healthcare data from MIMIC-III and Bristol, UK.</article-title>
                    <source>

                        <italic toggle="yes">BMJ Open.</italic>
</source>
                    <year>2019</year>;<volume>9</volume>(<issue>3</issue>):<fpage>e025925</fpage>.
                    <pub-id pub-id-type="pmid">30850412</pub-id>
                    <pub-id pub-id-type="doi">10.1136/bmjopen-2018-025925</pub-id>
                    <pub-id pub-id-type="pmcid">6429919</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <collab>ICNARC</collab>:
                    <article-title>A brief history of the cmp.</article-title>URL<year>2019</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://www.icnarc.org/Our-Audit/Audits/Cmp/About/History">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>McWilliams</surname>
                            <given-names>C</given-names>
                        </name>
</person-group>:
                    <article-title>UHBristolDataScience/ICNARC-to- Philips-Linkage: Software resources for data curation. </article-title>
                    <year>2019</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://www.doi.org/10.5281/zenodo.3358750">http://www.doi.org/10.5281/zenodo.3358750</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <collab>NHS Digital</collab>:
                    <article-title>Critical care minimum data set overview</article-title>.<year>2019</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://www.datadictionary.nhs.uk/data_dictionary/messages/supporting_data_sets/data_sets/critical_care_minimum_data_set_fr.asp">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Millar</surname>
                            <given-names>B</given-names>
                        </name>
</person-group>:
                    <article-title>Ward watcher</article-title>.<year>2014</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://www. sicsag.scot.nhs.uk/Data/wardWatcher.html">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-13">
                <label>13</label>
                <mixed-citation publication-type="journal">
                    <article-title>Health Research Authority: Guidance for cag applicants</article-title>.<year>2019</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://www.hra.nhs.uk/about-us/committees-and-services/confidentiality-advisory-group /guidance-confidentiality-advisory-group-applicants">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-14">
                <label>14</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>McWilliams</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Inoue</surname>
                            <given-names>J</given-names>
                        </name>
</person-group>:
                    <article-title>UHBristolDataScience/data-note-extended-data. </article-title>
                    <year>2019</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://www.doi.org/10.5281/zenodo.3361287">http://www.doi.org/10.5281/zenodo.3361287</ext-link>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report52675">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.22180.r52675</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Harris</surname>
                        <given-names>Steve</given-names>
                    </name>
                    <xref ref-type="aff" rid="r52675a2">2</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-4982-1374</uri>
                </contrib>
                <contrib contrib-type="author">
                    <name>
                        <surname>Lee</surname>
                        <given-names>Min Ji</given-names>
                    </name>
                    <xref ref-type="aff" rid="r52675a1">1</xref>
                    <role>Co-referee</role>
                </contrib>
                <aff id="r52675a1">
                    <label>1</label>University College Hospital, London, UK</aff>
                <aff id="r52675a2">
                    <label>2</label>Bloomsbury Institute of Intensive Care Medicine, University College London, London, UK</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>3</day>
                <month>9</month>
                <year>2019</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Lee MJ and Harris S</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport52675" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.20193.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The authors describe a curated dataset of 4831 adult intensive care patients treated at the Bristol Royal Infirmary between 2015 and 2019. Two critical care data sources (ICCA and ICNARC) are linked and curated to create a single comprehensive &#x2018;research ready&#x2019; dataset. By publishing the curation process the aim is to help external researchers make secondary use of their own routinely collected data. Fundamental barriers to making the required data available for secondary research use are discussed. Due to privacy constraints, the dataset is not fully published but external researchers may gain access through a formal application process.</p>
            <p> From the perspective of a novice data scientist and clinician this was an insightful and informative paper. The data note explains the rationale, barriers and methodologies allowing transparency and reproducibility for interested external researchers. Scripts outlining the dataset curation process were easy to follow with step-by-step commentary. Making this information accessible provides the opportunity for deeper understanding, in particular to those new to data science but curious about its potential. This is important given the need for close collaboration between clinicians, researchers and industry stakeholders to realise the full potential of routinely collected data to improve patient care.</p>
            <p> </p>
            <p> The authors discuss&#x00a0;how the publication of the data note and curation methodology contributes to overcoming the barriers of data format and data linkage. However its role in mitigating barriers associated with data privacy is less clear. Further explanation may be of interest as the tension between maintaining data privacy and usability of data for researchers is highly relevant in this field.&#x00a0;</p>
            <p> </p>
            <p> To illustrate the barriers related to data format, the authors describe the challenges in locating and harmonising a single data element such as heart rate within the Philips ICCA clinical information system (CIS). The high level of configurability, where data elements can be renamed and relabelled between sites, can prevent cross-site collaboration and sharing of these modifications using code review despite using the same CIS.&#x00a0;As these factors are beyond the researcher&#x2019;s direct control, we would welcome the authors&#x2019; perspective on how commercial companies could make this process easier.</p>
            <p>Are sufficient details of methods and materials provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Is the rationale for creating the dataset(s) clearly described?</p>
            <p>Yes</p>
            <p>Are the datasets clearly presented in a useable and accessible format?</p>
            <p>No</p>
            <p>Are the protocols appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Intensive Care Medicine.</p>
            <p>We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report52676">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.22180.r52676</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Young</surname>
                        <given-names>J Duncan</given-names>
                    </name>
                    <xref ref-type="aff" rid="r52676a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-6838-4835</uri>
                </contrib>
                <aff id="r52676a1">
                    <label>1</label>Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, UK</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>27</day>
                <month>8</month>
                <year>2019</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Young JD</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport52676" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.20193.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>
                <bold>General comments</bold>
            </p>
            <p> </p>
            <p> Thank you for the opportunity to review this manuscript. I have reviewed the text and figures. However I am not an expert on IT/computing and so have not commented on the details of the SQL and PYTHON code in the repositories.</p>
            <p> </p>
            <p> This paper describes the curation and linking of two databases containing information on patients treated on intensive care units in Bristol. The first database (ICCA) contains detailed information on patients&#x2019; stay in a single Trust&#x2019;s intensive care unit collected as part of routine care. The second contains data submitted on the same patients to a national comparative audit programme (ICNARC CMP). The manuscript is well written and is clear.</p>
            <p> </p>
            <p> Similar challenges have been addressed elsewhere (notably by the CCHIC teams in the UK and the MIMIC-III team in the USA), though there is very little detailed information on the processes and problems these teams encountered which has been published. This paper addresses some of the lack of detail.</p>
            <p> </p>
            <p> The paper contains descriptions of the curation and linking processes. There are no data linked directly to the paper, as publishing identifiable patient data is not permitted. However, contact details are given to allow interested researchers to explore obtaining appropriate permissions to interrogate the data.</p>
            <p> </p>
            <p> In general the paper is informative and useful. It might benefit from a brief comment on how generalisable these methods are to other patient groups where highly granular data are collected such as patients treated in Emergency departments, or those undergoing surgery or invasive procedures.</p>
            <p> 
                <bold>Minor detailed comments</bold>
            </p>
            <p> </p>
            <p> The sentence on ICNARC&#x2019;s origin should probably be modified to read &#x201c;The Intensive Care National Audit and Research Centre (ICNARC) is an independent national charity 
                <italic>originally</italic> set up with funding from the Department for Health and the Welsh Health Common Services Authority in 1993&#x201d; as funding now comes from different sources.</p>
            <p> </p>
            <p> &#x201c;Barrier 3: Data privacy&#x201d;. The MIMIC-III and eICU programmes are able to share data publically and they are anonymised. There is no mention of this approach and the difficulties with true anonymisation, this paper rather assumes data will be accessed using ethical approvals.</p>
            <p> </p>
            <p> It might be helpful to emphasise that the XML file format that Wardwatcher software uses to export ICNARC CMP data is common to all the different software packages used to collect ICNARC data, and is not a software-specific format.</p>
            <p> </p>
            <p> Use of intensive care as adjective (eg &#x201c;&#x2026;intensive care EHR data&#x201d;) is common in published papers but is probably best avoided.</p>
            <p> </p>
            <p> 
                <bold>Supplementary material graphics comments</bold> 
                <list list-type="bullet">
                    <list-item>
                        <p>Discharge reasons bar chart: No X axis labels.</p>
                    </list-item>
                    <list-item>
                        <p>Discharge time histogram: X axis labels at 5h 33m 20s intervals. Why this unusual spacing?</p>
                    </list-item>
                    <list-item>
                        <p>Stay length: Unusual to use logged Y axis for these graphs though I assume this is because of the high frequency of single day stays.</p>
                    </list-item>
                    <list-item>
                        <p>Variables histogram 1: FiO
                            <sub>2</sub> is fractional, not %. Units needed for heart rate, haemoglobin, respiratory rate, SpO
                            <sub>2</sub>, and blood pressures on X axes.</p>
                    </list-item>
                    <list-item>
                        <p>Variable histograms 2&amp;3: Attention to all X axis units as above. SI notation for partial pressures (P0
                            <sub>2</sub>, PCO
                            <sub>2</sub>) uses a capitalised &#x201c;P&#x201d;.</p>
                    </list-item>
                </list>
            </p>
            <p>Are sufficient details of methods and materials provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Is the rationale for creating the dataset(s) clearly described?</p>
            <p>Yes</p>
            <p>Are the datasets clearly presented in a useable and accessible format?</p>
            <p>No</p>
            <p>Are the protocols appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>My expertise is in critical care and associated interrogation of routinely collected healthcare data. I have no expertise in coding and have made this clear in the report. Please note the wording below (&#x201c;I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard&#x201d;) has been added to my peer review report by the publishers and was neither a part of the report I submitted nor do I have any control over this addition.</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
        <sub-article article-type="response" id="comment4861-52676">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>McWilliams</surname>
                            <given-names>Chris</given-names>
                        </name>
                        <aff>University of Bristol, UK</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>29</day>
                    <month>8</month>
                    <year>2019</year>
                </pub-date>
            </front-stub>
            <body>
                <p>Professor Young - Thank you for taking the time to read and review our manuscript. We appreciate your comments and will act on your suggestions to produce a revised version once we have received the other peer review reports. We will then also provide a detailed response to your review.</p>
            </body>
        </sub-article>
    </sub-article>
</article>
