Assessing the development and implementation of the Global Trigger Tool method across a large health system in Sicily

The Institute for Healthcare Improvement (IHI) has proposed Background: a new method, the Global Trigger Tool (IHI GTT), to detect and monitor adverse events (AEs) and provide information to implement improvement. In 2015, the Sicilian Health System adopted IHI GTT to assess the number, types and severity levels of AEs. The GTT was implemented in 44 of 73 Sicilian public hospitals and 18,008 clinical records (CRs) were examined. Here we present the standardized application of the GTT and the preliminary results of 14,706 reviews of CRs. IHI GTT was adapted to the local context, and developed and Methods: implemented. Reviews of CRs were conducted by 199 professionals divided into 71 review teams consisting of three individuals: two of whom had clinical knowledge and expertise, and a physician to authenticate the AE. The reviewers entered data into a dedicated IT-platform. All 44 of the public hospitals were included, with approximately 300,000 inpatient yearly admission out of a population of approximately 5 million. In total, 14,706 CRs of inpatients from medicine, surgery, obstetric and ICU wards, from June 2015 to June 2018 were reviewed. In 975 (6.6%) CRs at least one AE was found. Approximately Results: 20,000 patients of the 300,000 discharged each year in Sicily have at least one AE. In 5,574 (37.9%) CRs at least one trigger was found. A total of 1,542 AEs were found. The analysis of ROC curve shows that the presence of two triggers in a CR indicates an AE with a high probability. The most frequent type of AE was in-hospital related infection. The GTT is an efficient method to identify AEs and to track Conclusions: improvement of care. The analysis and monitoring of some triggers is important to prevent AEs. However, it is a labor-intensive method, particularly if the CRs are paper based. 1 1 1 1


Introduction
Safety is one of the domains of quality in healthcare. Improving the safety of patients is a political priority worldwide, as studies on the safety of patients have drawn attention to the high rates of health care-related harm 1-3 . Improving patient safety requires effective and reliable methods to identify and monitor adverse events (AEs) so that learning can take place and improvements can be made.
Even though several methods to detect AEs are available, there is no universally recognized method that reliably provides a comprehensive overview of the extent of the problem. These methods include incident reporting, clinical records (CRs) review and automated extraction using hospital administrative data, for example Patient Safety Indicators (PSI) as developed by the Agency for Healthcare Research and Quality (AHRQ). Incident reporting is the most commonly used method to detect AEs in hospitals, but is based on voluntary reporting. Despite considerable efforts by local hospitals, reporting systems only detect a limited number of AEs 4 . The effectiveness of automated extraction using hospital administrative data for detecting AEs depends on the accuracy of data compilation. CRs review, as used in the Harvard Medical Practice Study, is very labor intensive, thereby limiting its use 4 . As a result, health services, governments and researchers have focused on developing harm detection tools.
This paper is one of the first to report the findings of the application of the Global Trigger Tool, developed by the Institute for Healthcare Improvement (IHI), across the whole health system in Sicily. In 2015, the Sicilian Health System adopted IHI GTT 5 to assess the number, types and severity levels of AEs.

IHI GTT
The measurement instrument used in our study is the Italian version of the IHI GTT 6 . The Italian version was adapted to be appropriate to the regional context 7 . The triggers are grouped into the same seven categories as the original version (care, medication, surgical, intensive care, obstetric, pediatrics and emergency care); however, changes to some triggers have been introduced. We did not consider triggers and AEs that were present on admission and we added three new triggers: change in procedure anesthesia, duration of surgery greater than 6 hours, and hospital stay greater than five days after delivery 7 .

Sampling selection
The sampling method followed that recommended by the IHI GTT, with 10 inpatient CRs randomly selected monthly. From the ordered sequence of the numbering of the CRs of the period under evaluation, a CR was selected every 10 (ie 10th, 20th, 30th, 40th, 50 th , etc). In the case in which the number of patients discharged during that month is less than 100, we proceed to remove the previously selected CRs and selecting another CR in the same way (one CR every 10). Eligibility criteria were an admission lasting more than 24 hours, and all the administrative data completed. In the Intensive Care Unit (ICU), all patient CRs, discharged during the reference period, were reviewed.

Review team
As per the IHI protocol, each review team was composed of three individuals: two with clinical knowledge and expertise on patient clinical documentation, and a physician whose role was to authenticate the findings and the severity rating of the AEs. The total number of the reviewers was 199 divided into a 71person team. Where possible, the review team remained consistent over time.

Review process
We excluded triggers and/or events that took place outside the time of the patient admission to the hospital and we considered only triggers and AEs that occurred during hospitalization. The two clinical reviewers audited all the CRs on their own. We used five worksheets: general care, medication, surgical, obstetric and intensive care, with some changes in accordance with the IHI GTT. The CRs were examined following the order of the sections described in the IHI GTT. Limit for review of each patient record was 20 minutes. The "20-minute rule" was applied to all records regardless of size 5 . The reviewers entered data into a specially developed dedicated IT-platform, developed by our IT team (based on Jawascript HTML and PHP) 8

Statistical analysis
For the statistical analysis we used the software SPSS ver. 20. We used it also to develop the Receiver Operating Characteristic (ROC) curve analysis.

Results
From June 2015 to June 2018, 18,008 CRs from 105 wards of 44 Sicilian public hospitals were examined. In this study, we analyzed 14,706 CRs relating to patients discharged from 89 medicine, surgery, obstetrics and intensive care wards. In 5,574 (37.9%) CRs at least one trigger was found. In 7 CRs an AE was detected without a trigger being present. AEs were determined in 1,542 CRs (Table 1). The identification of triggers allowed us to identify corresponding AEs (Table 2).
This analysis allowed us to highlight how isolated triggers are not always a good indicator of AEs. A Receiver Operating Characteristic (ROC) curve analysis demonstrates that the presence of two triggers in a CR has a high probability of an AE having occurred (Figure 1). In CRs with a high frequency of triggers, a corresponding number of AEs was not always detected. As indicated in Table 3, on the contrary, some triggers were associated with a large number of AEs. Triggers and AEs were analyzed when isolated triggers were identified (Table 4). For example, the isolated trigger C01 (Blood products use) was present in 483 cases, but identified only two AEs, and the trigger M05 (Rising BUN or serum creatinine >2 times the baseline) did not identify any AEs. AEs were classified using the 2009 edition of the WHO International Classification for Patient Safety (ICPS) 9 , and a clinical classification developed by our group (Table 5). The most frequent type of AEs observed: in-hospital related infections; surgical complications; pressure ulcers; acute kidney injury; and procedure complications.

Discussion
The evaluation of the quality and safety of health systems is difficult, but has become a priority of healthcare funders and organizations. Outcome, management and patient satisfaction indicators are available to measure the different dimensions of health care quality, but reliable measurements of safety have been elusive. Many methodologies and indicators, such as the PSI developed by AHRQ, and the review of health documentation and incident reporting are currently used. The documentation and study of AEs, i.e. where they occur, and the type and degree of harm, is essential to promote specific opportunities for improvement interventions and to evaluate effectiveness of any intervention over time.
The IHI GTT is one methodology proposed to detect and monitor AEs and provide information to implement improvement. At present, compared to other methods, it may be the best methodology to use 4 . A systematic review reported the use of GTT methodology in 15 countries in 44 hospitals, with 79,004 clinical records examined 10 . The data are an underestimation, as the report did not include some comprehensive Swedish and Norwegian studies 11 . Recently, papers have been published in Italy, Austria, China and Russia 12-16 . A critical appraisal of the studies and their results is difficult, as the methodology used is heterogeneous, protocols are often locally adapted to the local context, the populations studied are different, and the skills of the reviewers vary. We adapted the IHI GTT to the local context in Sicily for this study and did not consider triggers and AEs identified at the admission of the patient as well as modifying some triggers. In this study, the triggers were analyzed both when associated with other triggers and when isolated. In both cases the correlation with AEs was analyzed. In general, it would appear that little attention is paid to triggers if they are not related to an AE. Instead, many triggers of the IHI GTT protocol could be considered to be a measure of near misses and potential AEs. These include, decrease of Hb or Ht >25%, readmission within 30 days, transfer to higher level of care, clostridium difficile-positive stool, PTT >100 seconds, INR > 6, glucose < 50 mg/dl, rising BUN or serum creatinine >2 times baseline, blood loss >500 mL (after vaginal delivery) or >1000 mL (Cesarean section), and readmission to ICU.

Rates of AEs
In a systematic review, de Vriess et al. 18 reported that in 8 studies that included 74,485 CRs, the median overall incidence of in-hospital AEs was 9.2%. Another systematic review reported 44 hospitals with 79,004 CRs, had an incidence between 7 and 51% 10 . In the Sicilian public hospitals, 1,542 AEs were detected in 975 clinical records, corresponding to an incidence of 6.6% of CRs examined, and to 17.5% of CRs with triggers. It would appear   the AEs comparing them to the triggers to allow for their identification: ▪ Most AEs are associated with general care triggers (n=6,103) ( Table 2). If triggers are isolated, AEs are more frequently associated with care triggers (n=80) ( Table 4).
▪ The triggers related to general care have been identified 7,497 times, with an AE in 6,103 (81.4%).
that the percentage of clinical records with triggers increases with the increase of percentage of CRs examined, compared to the patients discharged, while the percentage of CRs with AEs remains more stable (Figure 2).
ICUs have the highest incidence of AEs, both with respect to CRs examined (30.4%) and those with triggers (35.9%) ( Table 1). This could be due to patients being transferred to ICU and the cause of the AE was in another clinical setting. We analyzed   ▪ Medications-related triggers have been identified 2,878 times, with an AE in 1,525 (53%).
▪ Intensive-care-related triggers have been identified 1,817 times with an AE in 1,924 (i.e. it is very common for triggers to identify more AEs in the same patient) ( Table 1 and  Table 2).
However, if isolated triggers are considered, the intensivecare-related triggers were detected 46 times and they were correlated with only one AE (Table 4). This observation suggests that isolated triggers rarely allow to identify an AE and that the strength of the IHI's GTT methodology is linked to the association of triggers. It is evident that the detection of many triggers in a CR is associated with a high probability of AEs. The analysis of the ROC curve ( Figure 1) shows that it is sufficient to detect two triggers in a CR because it can be almost certain that in that CR there may be an AE.
We have classified the AEs using the ICPS 2009 classification and a clinical classification, developed by our group. In both classifications, hospital acquired infections are the most frequent AEs present (Table 5), observed in the ICU in 625 clinical records (84.3%). Surgical complications (n=175) were observed in 55.9% (n=99) in ICU, i.e. they were AEs in patients undergoing surgery and then transferred to ICU due to the onset of a complication. In 44.6% (n=80), the AEs are represented by hemorrhagic complications (intra-and post-operative hemorrhages or hematomas). Pressure ulcer lesions were detected in 172 cases, usually in the ICUs (n=112 -65.1%). Also, the complications from procedures (n=109) were observed mainly in the ICU (n=78; 71.5%). In total, 33 complications from procedures are related to orotracheal intubation, 22 at central venous catheter, and 10 at childbirth analgesia. The complications of child birth (n= 47) were represented more frequently in 59.5% (n=28) by bleeding and in 29.8% (n=14) by lacerations.

Limitations
Our study has some several limitations. The first concerns the inter-rater reliability assessment of review teams that is not available. The second limitation is the underlying quality of CRs, which may have affected the results. However, all reviewers received the same training and each team followed the same protocols to ensure reliability.

Conclusion
The Global Trigger Tool is an effective method to identify AEs and track improvement of care. It provides to clinical teams an understanding of the patient safety issues that are present in their clinical area, as well as opportunities to improve. With active involvement of clinical teams, it places patient safety in the centre of clinical activity and fosters a culture of safety. It also provides an effective way to assess the quality of the clinical records. The drawback is that the process is labor intensive, particularly if the clinical records are paper based. The introduction of electronic medical records would allow a quicker process with the automation of the identification of triggers and the possibility to link triggers together in the identification of adverse events and near misses, especially where there has been more than one trigger 19 . Finally, we conclude that the analysis and monitoring of some triggers, as potential indicators of near misses, is important to prevent adverse events.

Ethical considerations
Since the data used in this study was gathered during routine practice and is used for analysis of hospital procedures, no ethical approval was obtained. Every patient gave written informed consent on admission to hospital for the use of their data for scientific research. This consent is the "Information on the processing of personal data" with reference to the Italian law n. The Harvard Medical Practice Study (HMPS -cited by the authors) used post-discharge chart review to detect care-associated injuries (adverse events -AEs) that occurred during hospitalization. Under HMPS, trained nurses reviewed a random selection of charts. If those nurses discovered what they judged to be care-associated injuries, they flagged those events in the chart as a potential AE. Charts that contained one or more AEs were forwarded to 2 independent physician reviewers. If both physician reviewers judged, for each AE, that event had occurred, then it was reported as a confirmed AE. Those physicians also judged whether each AE was avoidable, and whether each could be considered negligent (substandard) care.
The IHI Global Trigger Tool (IHI GTT) built on the HMPS methodology. It attempted to improve the ability of initial nurse reviewers to detect potential AEs, by providing a set of 51 "review triggers," falling into 7 major subcategories -explicit initial events that, if detected in the chart, chained to examination of other specific events. Initial assessments of the IHI GTT showed much higher detection rates for AEs than did the original HMPS methodology, which itself far exceeded typical voluntary reporting mechanisms.
This study reviews the use of the IHI GTT in 44 Sicilian public hospitals across a 3 year time period -June 2015, through June 2018. The study's authors adapted the IHI GTT to their specific environment. They also added 3 additional review triggers, beyond those included in the original IHI GTT. They focused their analysis on a subset of all charts assessed with their modified IHI GTT: They analyzed only charts for patients hospitalized on medicine, surgery, obstetrics, and intensive care wards.
As this study notes, many other groups are using the IHI GTT to detect AEs. This work is useful because it supplies empiric observation of one such use, across a large system of hospitals and an extended period of time.
This report has at least 2 major differences from other work in the field. First, only 6.6% of records reviewed had at least one care-associated adverse event. Other studies have shown much higher AE rates. The authors note this in their text, and suggest it may reflect differences in local environments and 1.
rates. The authors note this in their text, and suggest it may reflect differences in local environments and chart review methods. Second, most other IHI GTT reviews show adverse drug events (medication-related events -overdoses, drug-drug interactions, and allergic or idiosyncratic reactions) as the dominant category of AEs detected. In this study, they are a distant number 3. Why?

Specific suggestions:
It appears that the entire IHI GTT program assessed a total of 18,008 records from among 105 hospital wards from June 2015 through June 2018. However, this analysis examines only medicine, surgery, obstetrics, and intensive care wards. Those represented a total 89 wards, with a total of 14,706 records reviewed.
This is unclear in both your abstract and your text. It would be very helpful to more clearly explain how you derived the CRs included in your analysis.
You mention that you had a total of 199 individuals who participated in 71 3-person review teams. Obviously, some people participated in more than 1 team. It is not clear how the 2 initial reviewers shared their work, before their findings were submitted to a physician for final validation. Did they both separately review all records? Did they divide their assigned records between them? Please clarify.
In your introduction, 2 paragraph, you review alternative methods for detecting AEs, including incident reporting, CR review, and automated administrative data review (e.g., PSIs Under Methods, please change the heading "Sampling selection" to say something like "Sample selection." More importantly, the frame within which "10 inpatient CRs" were selected monthly is not clear. Working the total number of charts reviewed backward, you were probably sampling "10 inpatient CRs" each month for each of the 44 Sicilian public hospitals. Please clarify. How was record selection balanced across the types of wards (medicine, surgery, obstetrics, and ICUs) in each hospital?
Under Results, please add clarity regarding the number of patients that had at least 1 AE during their index hospitalization (975); then break out the number of patients who experienced a single AE versus those who had more than 1 AE during their index hospitalization. 10.
with trigger isolated".
Consider moving the first 2 paragraphs of the section entitled "Rates of triggers" from the Discussion section into the Results section.
Clarify your text in the first paragraph of your "Rates of triggers" section. For example, it might read: "In this study, 37.9% (n=5,574) of all CRs examined had at least 1 positive trigger. Of those, 2,778 CRs had a single positive trigger (49.8% of all CRs with positive triggers) while 2,796 CRs had more than one positive trigger (51.2% of all CRs with positive triggers).
Consider modifying Table 2: a) Label each of the subsections in Table 2. For example, in your text you cite the Table relative to "general care triggers". What are "general care triggers"? Presumably you mean the subset of triggers with "C" labels. It would be quite helpful if you put headings on each of the subsections (for C, M, S, P, and I) that labelled the contents of each subsection.
b) You are (quite appropriately) approaching AE detection as a screening test with 2 main steps.
Step 1 involves evaluation of the initial triggers.
Step 2 involves following up on positive initial triggers, to see if a validated AE emerges. This is done in the context of a general review by a trained clinical expert, who can (and sometimes does) detect AEs that fall outside of the trigger system.
It would be very useful if you separated the total time spent on CR review into (a) initial assessment of the triggers; versus follow-up analysis of the positive triggers.
It would similarly be very useful if, in the 3 column of Table 2 (labelled "Times associated with AEs, n (%)" you showed the proportion of positive triggers that yielded a confirmed AE related to/deriving from the initial positive trigger. The Table, as currently constructed, is quite confusingin 10 instances (C03, C04, C08, C11, C12, C14, M02, M04, I01, I03) the count in 3 column is larger than the count in 2 column. Presumably, this is because you are not connecting the trigger to derivative AEs (but I can't really tell, with any certainty).
The section entitled "Rates of AEs", first paragraph, contains the sentence: "It would appear that the percentage of clinical records with triggers increases with the increase of percentage of CRs examined, compared to patients discharged, while the percentage of CRs with AEs remains more stable." I can't translate quite what that means. Please simplify and clarify. It may relate to the 2 upper lines in Figure 2, which both appear to be increasing over time. However, you supply no statistical analysis to show that trends over time in the 2 lines are statistically associated, and you offer no reasoning as to why such a relationship may have meaning. 12. 1.

5.
reasoning as to why such a relationship may have meaning. Table 3 appears to identify high frequency triggers, then count the number of AEs of any sort that occur in CRs that had a high frequency trigger. It would be much more useful if you tracked each trigger to related or derivative AEs, rather than treating AEs generically.
In the section labelled "Rates of AEs" you have a bulleted list. The last item in that list says: "Intensive-care-related triggers have been identified 1,817 times with an AE in 1,924 (i.e., it is very common for triggers to identify more AEs in the same patient)".
I think I get the gist of what you mean, but (a) this sentence needs more clarity; and (b) the main purpose for identifying AEs is to move toward intervention and prevention (better, safer care). In that circumstance, treating AEs as generic items is not very useful. It is necessary to list them by specific causes. In that framing, associating triggers (which are cause specific) with a total generic AE count is not very informative or useful. You might want to rethink your framing.

Spelling corrections:
In the Abstract, Methods subsection, it should say "300,000 inpatient yearly admission ". s In the Methods section, it should say "Sample selection," not "Sampling selection".
In the Discussion section, 2 paragraph, it should say "A critical appraisal of the studies and their results is difficult, as the methodolog use heterogeneous,". ies are In Figure 1, the label for the blue line should say " ," not "Sensibility (%)".

Sensitivity (%)
In the section labelled "Rates of AEs" there is a bulleted list. The 4 bullet should say "Surgery-r elated" (not "Surgery-elated").

If applicable, is the statistical analysis and its interpretation appropriate? Partly
Are all the source data underlying the results available to ensure full reproducibility? No source data required We appreciate the review by Professor Btrent of our paper, the accuracy of his observations and the relevance of his suggestions. We are honored that he considers our paper "useful because it supplies empiric observation of one such use, across a large system of hospitals and an extended " period of time.
We are aware that some of the points require further clarity, which may have been lost as we limited the word count.

The Harvard Medical Practice Study (HMPS -cited by the authors) used post-discharge chart review to detect care-associated injuries (adverse events -AEs) that occurred during hospitalization. Under HMPS, trained nurses reviewed a random selection of charts. If those
nurses discovered what they judged to be care-associated injuries, they flagged those events in the chart as a potential AE. Charts that contained one or more AEs were forwarded to 2 independent physician reviewers. If both physician reviewers judged, for each AE, that event had occurred, then it was reported as a confirmed AE. Those physicians also judged whether each AE was avoidable, and whether each could be considered negligent (substandard) care. The IHI Global Trigger Tool (IHI GTT) built on the HMPS methodology. It attempted to improve the ability of initial nurse reviewers to detect potential AEs, by providing a set of 51 "review triggers," falling into 7 major subcategories -explicit initial events that, if detected in the chart, chained to examination of other specific events. Initial assessments of the IHI GTT showed much higher detection rates for AEs than did the original HMPS methodology, which itself far exceeded typical voluntary reporting mechanisms.

Response
Harvard's medical practice study is a key pillar in patient safety and care research. We agree that the IHI Global trigger tool is an attempt to improve the Harvard Medical Practice Study methodology. In our opinion, the IHI Global Trigger Tool has two significant differences from the Harvard medical practice study. The first is that the IHI Global Trigger Tool is conducted on a random sample of clinical records that are analyzed only if triggers are present. The second is that the IHI Global Trigger Tool "focuses on only those adverse events related to the active delivery of care (commission) and excludes, as much as possible, issues related to substandard care" (omission

Comment
This report has at least 2 major differences from other work in the field. First, only 6.6% of records reviewed had at least one care-associated adverse event. Other studies have shown much higher AE rates. The authors note this in their text, and suggest it may reflect differences in local environments and chart review methods.

Response
The percentages of adverse events found in other studies are very different. In a recent systematic review of the literature, the percentage of adverse events is between 7% and 51%. (Hibbert PD, Molloy CJ, Hooper TD, et al .: The application of the Global Trigger Tool: a systematic review. Int J Qual Health Care. 2016; 28 (6): 640-649.) We think that the most important reasons, which can explain these wide differences, could be twofold. The first could be due to the different care settings (medicine, surgery, obstetrics, ICU) and therefore to the different complexity of the patients studied. A risk adjustment system would be needed to evaluate the results homogeneously. In our experience, for example, there are significant differences in the AEs rate between apparently homogeneous wards belonging to different hospitals. We have not included these tables to be brief. Since it seems to us that this is very interesting, this aspect will be the subject of other publications. The second reason could be due to changes in the detection protocol. The survey methods are, in fact, different in the different studies. We, for example, have turned our attention only to the triggers observed during admission and not to the triggers present at the time of admission. Therefore the adverse events found do not include those present at the time of admission. This could justify the differences in EAs observed compared to other studies.

Comment
Second, most other IHI GTT reviews show adverse drug events (medication-related eventsoverdoses, drug-drug interactions, and allergic

Comment
You mention that you had a total of 199 individuals who participated in 71 3-person review teams. Obviously, some people participated in more than 1 team. It is not clear how the 2 initial reviewers shared their work, before their findings were submitted to a physician for final validation. Did they both separately review all records? Did they divide their assigned records between them? Please clarify.

Response
The total number of reviewers was 199 divided into 71 teams. In some teams, medical reviewers from another team were the supervisors. The two primary record reviewers should each review all records independently. The third reviewer was always a physician. The physician, who did not review the records, authenticated the consensus of the two primary record reviewers.

Comment
In your introduction, 2nd paragraph, you review alternative methods for detecting AEs, including incident reporting, CR review, and automated administrative data review (e.g., PSIs)

. Consider adding "prospective (concurrent) clinical trigger systems," that track possible real-time clinical responses to AEs then track back to see if an AE actually occurred. Dr. R. Scott Evans at LDS Hospital in Salt Lake City, Utah; and Dr. David Bates at Brigham & Women's Hospital in Boston, Massachusetts, developed and demonstrated such systems. Such approaches find AEs that never make their way into a traditional clinical record, and may detect far more events.
Response This is very interesting. In part, this concept is reported in the conclusions "The introduction of electronic medical records would allow a quicker process with the automation of the identification of triggers and the possibility of linking together in the identification of adverse events and near misses, especially where there has been more than one trigger " We will review the work and attempt to explain this concept better.

Comment
Under Methods, please change the heading "Sampling selection" to say something like "Sample selection." More importantly, the frame within which "10 inpatient CRs" were selected monthly is not clear. Working the total number of charts reviewed backward, you were probably sampling "10 not clear. Working the total number of charts reviewed backward, you were probably sampling "10 inpatient CRs" each month for each of the 44 Sicilian public hospitals. Please clarify. How was record selection balanced across the types of wards (medicine, surgery, obstetrics, and ICUs) in each hospital? Response We will change the heading "Sampling selection" to "Sample selection." We randomly selected 10 CRs for each department that participated, not for each hospital. The wards that participated were not homogeneously represented in the different hospitals. In some hospitals only the ICU CRs were analyzed. In some hospitals there were no medical, surgical or obstetrics or ICU wards. In others there were more than one medicine, surgery, obstetrics and ICU wards. In addition, hospitals were recruited at different times. The CRs sample is therefore not representative. The aim of our study was not to provide a true representation of the patient population present in hospitals during the observation period. Coherently with as reported with the IHI GTT protocol, the main purpose of applying GTT methodology is to produce a sampling approach that is sufficient for the design of safety work in the hospital.

Comment
Under Results, please add clarity regarding the number of patients that had at least 1 AE during their index hospitalization (975); then break out the number of patients who experienced a single AE versus those who had more than 1 AE during their index hospitalization.

Response
We will indicate the number of patients who experienced a single AE versus those who had more than 1 AE during their index hospitalization. Table 1: Please clarify -I can't understand what you mean by the phrases "CRs with triggers per inpatient wards" versus "CRs with trigger isolated per wards"; or "CRs with trigger" versus "CRs with trigger isolated". Response We will modify the table.

Comment
The second line "CRs with triggers for inpatient wards" indicates the number and percentage of CRs with one or more triggers. The third line "CRs with trigger isolated for wards" indicates the number and percentage of CRs with only triggers. The fourth and fifth lines are repetitions. It it is a printing error.

Comment
Consider moving the first 2 paragraphs of the section entitled "Rates of triggers" from the Discussion section into the Results section.

Response
We will move the first 2 paragraphs of the section entitled "Rates of triggers" from the Discussion section into the Results section.

Comment
Clarify your text in the first paragraph of your "Rates of triggers" section. For example, it might read: "In this study, 37.9% (n=5,574)

Response
Thank you for this good advice. We will modify accordingly. Table 2. For example, in your text you cite the Table relative to "general care triggers". What are "general care triggers"? Presumably you mean the subset of triggers with "C" labels. It would be quite helpful if you put headings on each of the subsections (for C, M, S, P, and I) that labelled the contents of each subsection.

Response
We will put headings on each of the subsections (for C, M, S, P, and I) that labelled the contents of each subsection.

Comment b) You are (quite appropriately) approaching AE detection as a screening test with 2 main steps.
Step 1 involves evaluation of the initial triggers.
Step 2 involves following up on positive initial triggers, to see if a validated AE emerges. This is done in the context of a general review by a trained clinical expert, who can (and sometimes does) detect AEs that fall outside of the trigger system.

It would be very useful if you separated the total time spent on CR review into (a) initial assessment of the triggers; versus follow-up analysis of the positive triggers. Response
We are unclear of the changes required on this point.

Comment
it would similarly be very useful if, in the 3rd column of Table 2 (labelled "Times associated with AEs, n (%)" you showed the proportion of positive triggers that yielded a confirmed AE related to/deriving from the initial positive trigger. The Table, as currently constructed, is quite confusingin 10 instances (C03, C04, C08, C11, C12, C14, M02, M04, I01, I03) the count in 3rd column is larger than the count in 2nd column. Presumably, this is because you are not connecting the trigger to derivative AEs (but I can't really tell, with any certainty).

Response
In the third column of table 2 we reported the number of times a trigger was found.
In the fourth column the number of times in which an adverse event was related to that trigger. The title of the fourth column (Times associated, with AE)does not express this clearly. Therefore in the third column the triggers are indicated, while in the fourth column the adverse events are indicated. We have found some CRswith more than one adverse event. In the third column of table 2 we intended to report that the C03 trigger was found in 279 CRs examined. Instead, in the fourth column of

Response
We mean that we did not find a correlation between the number of CRs examined in the period and the number of EAs observed in the same period. We will try to verify if the differences are statistically significant.
Comment Table 3 appears to identify high frequency triggers, then count the number of AEs of any sort that occur in CRs that had a high frequency trigger. It would be much more useful if you tracked each trigger to related or derivative AEs, rather than treating AEs generically.

Response
This would be very interesting to describe, but we do not know if it will be possible to provide this kind of detail due to editorial guidelines. It could be the subject of a new study to be published later.

Comment
In the section labeled "Rates of AEs" you have a bulleted list.

Response
This would also be interesting to describe. We have been required to be fit in with te word count and to comply with the editorial guidelines. We will clarify this section No competing interests were disclosed.
issues where some further explanation would clarify the methods and results, and aid the reader in evaluating the contributions of this paper. In addition, there were a couple of minor editing points: In my first reading of the paper, I was confused by Table 2. It is not clear how a trigger could be associated with AEs more often than it is detected. This happened with C03, C04, C08, etc. Table  3 presents some of the same information, but labels the second column as AEs with triggers. To evaluate the effectiveness of GTT as a screening tool, it is useful to see both what percent of records with a specific trigger had an AE as well as how many AEs were associated with that trigger. The authors should clarify in how they identified AEs (role of physician, why Methods record without a trigger was reviewed) and clarify whether results they present are based on the count of AEs or the percent of records with AEs (or both). I'd recommend both, as in Table 1. The first paragraph of the results states that 18,008 records were reviewed, but only 14,706 were analyzed. The authors should explain why the records were excluded, to show they did not bias the results. A flow chart might help. They also state that 7 CRs had AEs without triggers. My understanding of the GTT method is that only records with triggers are reviewed for AEs. Please explain how these AEs were found. The first paragraph under "Rates of triggers" in the discussion, which should be moved to the section, refers to significant differences. The method of testing these statistics should be Results added to the "Statistical analysis" section of the Methods.  Figure 1: style of sensitivity line in graph does not match that of label. Table 5: Procedure complication has an asterisk, but no explanation. After reading the manuscript, it appears that there are two main questions that can be answered from the results: How does patient safety in Sicily, based on the GTT, compare to other published studies? Can we learn anything from the analysis of the triggers that can help us improve the review process? The first question is addressed in the discussion (page 4). Adding a confidence interval to the reported rate from this study (6.6%) would clarify that these hospitals are as good or better than other reported rates.
The most important conclusions I would draw from the trigger analysis is that few AEs were identified by isolated triggers and many isolated triggers are not associated with AEs. Although not highly useful for identifying AEs, some isolated triggers may be direct measures of "near misses". I would restructure the results and discussion to emphasize these points.
Overall, this is a good description of a broad-based standardized implementation of the Global Trigger Tool. The authors have presented suggestions for making improvements to the method.

Is the study design appropriate and is the work technically sound?
Partly rd Partly

If applicable, is the statistical analysis and its interpretation appropriate? Partly
Are all the source data underlying the results available to ensure full reproducibility? No source data required Are the conclusions drawn adequately supported by the results? Partly Dr. Naessens was an invited speaker at the initial meeting of Sicilian hospital Competing Interests: personnel when this project was being implemented. His travel expenses were reimbursed, but no honorarium was necessary. We appreciate the review by Professor Naessens of our paper, the accuracy of his observations and the relevance of his suggestions. We are honored that he considers our paper "a good " and that our work " description of a broad standardized implementation of the Global Trigger Tool " presented suggestions for improving the method.
We are aware that some of the points require further clarity, which may have been lost as we limited the word count.

Comment
In my first reading of the paper, I was confused by Table 2. It is not clear how a trigger could be associated with AEs more often than it is detected. This happened with C03, C04, C08, etc. Table  3 presents some of the same information, but labels the second column as AEs with triggers. To evaluate the effectiveness of GTT as a screening tool, it is useful to see both what percent of records with a specific trigger had an AE as well as how many AEs were associated with that trigger. The authors should clarify in Methods how they identified AEs (role of physician, why record without a trigger was reviewed) and clarify whether results they present are based on the . count of AEs or the percent of records with AEs (or both). I'd recommend both, as in Table 1 Response Table 2 In the third column of this table we reported the number of times a trigger was found.
In the third column of this table we reported the number of times a trigger was found. In the fourth column the number of times in which an adverse event was related to that trigger. The title of the fourth column (Times associated, with AE)does not express this clearly. Therefore in the third column the triggers are indicated, while in the fourth column the adverse events are indicated. We have found some CRswith more than one adverse event. In the third column of table 2 we intended to report that the C03 trigger was found in 279 CRs examined. Instead, in the fourth column of Table 2 we wanted to indicate that in 438 adverse events the C03 trigger was present. Table 3 We have tried to elucidate this concept in Table 3. This table lists the triggers that, in our experience, require more focus on. Finding one of the triggers on this list, linked to other triggers, could indicate the presence of more than one adverse event.

Effectiveness of a GTT
We agree that to evaluate the effectiveness of the GTT as a screening tool, it is useful to see both what percentage of CRs with a specific trigger have an AE and how many AEs have been associated with that trigger. We have a table to explains this, but due to space constraints it was not included and we can add it if the editorial staff agree.

Definitions and process
Adverse events were identified based on the IHI protocol definition: "unintended physical injury resulting from or contributed to medical care that requires additional monitoring, treatment or " The physician reviewed the consensus with the two hospitalization, or that results in death. records and reached a final agreement on the type, number, and severity of events. The physician did not review the CRs, only the summary sheet in IT-platform. Table 1  Table 1 shows the number of CRs examined, the number of CRs with triggers and the percentage of discharged patients. One line is missing which provides the number of discharged patients. If the editorial staff allows, we can integrate Table 1 with the number of patients discharged and the number of AEs compared to the CRs examined and with triggers.

Comment
The first paragraph of the results states that 18,008 records were reviewed, but only 14,706 were analyzed. The authors should explain why the records were excluded, to show they did not bias the results. A flow chart might help. Response From June 2015 to June 2018, 18.008 CRs were examined. In this study, we analyzed 14,706 CRs relating to patients discharged from 89 medicine, surgery, obstetrics and intensive care wards. 3.302 CRs concerned pediatrics and emergency department wards.

Comment
They also state that 7 CRs had AEs without triggers. My understanding of the GTT method is that only records with triggers are reviewed for AEs. Please explain how these AEs were found. Response Examination of the CRs was performed in accordance with the IHI protocol: Discharge codes, particularly infections, complications, or certain diagnoses Discharge summary Medications administration record Laboratory results Prescriber orders Operative records (operational report and record anesthesia, if applicable) Operative records (operational report and record anesthesia, if applicable) Nursing notes Physician progress notes If time permits, any other areas of the record In seven CRs the reading of the Discharge codes and of the Discharge summary has allowed us to identify directly the adverse events represented by two cases of invasive procedures complications, two cases of surgical complications, a case of hypoglycemia and a case of adverse reaction to the administration of a drug. Therefore the reviewers, having identified the adverse event, did not look for triggers.

Comment
The first paragraph under "Rates of triggers" in the discussion, which should be moved to the section, refers to significant differences. The method of testing these statistics should be Results added to the "Statistical analysis" section of the Methods Response The term "significantly" used in the first paragraph under "Rates of triggers" in the discussion, is misleading. We did not intend to claim that there was a statistically significant difference.

Comment
Style of sensitivity line in graph does not match that of label of Figure 1. Response if possible we will correct the figure.

Response
The total number of reviewers was 199 divided into 71 teams. In some teams, medical reviewers from another team were the supervisors. In line 2, the % refers to the total number of patients discharged; in line 4 the % refers to CRs with triggers. For example, from June 2015 to June 2018, 1,571 medical cases with triggers were found in the medical departments, equal to 34.3% of all patients discharged in the same period (n = 4,527) and 33.4% of all triggered CRs (n = 5,574). I think it would be clearer if we put a line with the number of patients discharged. Table 5: Procedure complication has an asterisk, but no explanation.

Comment
After reading the manuscript, it appears that there are two main questions that can be answered from the results: How does patient safety in Sicily, based on the GTT, compare to other published studies? Can we learn anything from the analysis of the triggers that can help us improve the review process? The first question is addressed in the discussion (page 4). Adding a confidence interval to the reported rate from this study (6.6%) would clarify that these hospitals are as good or better than other reported rates.

Response
Regarding the addition of a confidence interval at the rate reported by this study (6.6%) to clarify if these hospitals are equal or better than other reported rates, we think that the methodologies used in the various studies are too heterogeneous, protocols are often locally adapted to the local context, the populations studied are different, and the skills of the reviewers vary. For these reasons we decided not to explore this topic from a statistical point of view.

Comment
The most important conclusions I would draw from the trigger analysis is that few AEs were identified by isolated triggers and many isolated triggers are not associated with AEs. Although not highly useful for identifying AEs, some isolated triggers may be direct measures of "near misses". I would restructure the results and discussion to emphasize these points Response We agree and we will emphasize this point.
No competing interests were disclosed. Competing Interests: