Multicenter observational study on the reliability of the HEART score
Article information
Abstract
Objective
To rapidly and safely identify the risk of developing acute coronary syndrome in patients with chest pain who present to the emergency department, the clinical use of the History, Electrocardiogram, Age, Risk Factors, and Troponin (HEART) scoring has recently been proposed. This study aimed to assess the inter-rater reliability of the HEART score calculated by a large number of Italian emergency physicians.
Methods
The study was conducted in three academic emergency departments using clinical scenarios obtained from medical records of patients with chest pain. Twenty physicians, who took the HEART score course, independently assigned a score to different clinical scenarios, which were randomly administered to the participants, and data were collected and recorded in a spreadsheet by an independent investigator who was blinded to the study’s aim.
Results
After applying the exclusion criteria, 53 scenarios were finally included in the analysis. The general inter-rater reliability was good (kappa statistics [κ], 0.63; 95% confidence interval, 0.57 to 0.70), and a good inter-rater agreement for the high- and low-risk classes (HEART score, 7 to 10 and 0 to 3, respectively; κ, 0.60 to 0.73) was observed, whereas a moderate agreement was found for the intermediate-risk class (HEART score, 4 to 6; κ, 0.51). Among the different items of the HEART score, history and electrocardiogram had the worse agreement (κ, 0.37 and 0.42, respectively).
Conclusion
The HEART score had good inter-rater reliability, particularly among the high- and low-risk classes. The modest agreement for history suggests that major improvements are needed for objectively assessing this component.
INTRODUCTION
Chest pain is one of the most frequent symptoms leading to emergency department (ED) admission, and it may be triggered by several causes, ranging from mostly harmless to immediate life-threatening disorders. Based on the perspective of an emergency physician (EP), rapid identification of high-risk patients and concomitant ruling out of low-risk conditions are important. According to previous evidence, acute coronary syndrome (ACS) may not be identified as an underlying cause in approximately 20% to 25% of patients with chest pain who visited the ED [1] and in early 45% of those admitted to a chest pain unit [2]. The leading aspects in the EPs’ toolbox that can help identify the probability of ACS are patient history, electrocardiogram (ECG) findings, and cardiac troponin testing results, which are often combined with diagnostic algorithms designed for the rapid rule-in or rule-out of ACS [3]. However, a definitive and universally agreed upon strategy is still not identified and acknowledged.
Some official documents, endorsed by eminent scientific societies, have recently encouraged the use of clinical scores for evaluating patients with chest pain suggestive of ACS who present to the ED [4,5]. In particular, a recent systematic review comprehensively analyzed the leading clinical prediction rules for chest pain, including the Thrombolysis in Myocardial Infarction (TIMI) risk score, the History, ECG, Age, Risk Factors, and Troponin (HEART) score, and the Global Registry of Acute Coronary Events (GRACE) scores [6]. Among the aforementioned risk stratification tools, the HEART score was found to be useful for managing patients with chest pain who present to the ED because it is simple, easy, and quick to use and it has also been validated in several studies conducted in the ED [7-12]. Five main parameters contribute to the calculation of the final HEART score, which include clinical history, ECG findings, age, risk factors, and troponin testing results (Table 1). The assignment of points ranging from 0 and 2 to each of these five items contributes to obtaining the final score, which will be between 0 and 10 [7,8].
According to the final HEART score, patients can be classified into three groups: low (score of 0 to 3), intermediate (score of 4 to 6), and high (score of 7 to 10) risk for major adverse cardiac event (MACE) within 6 weeks. Notably, the definition of MACE includes acute myocardial infarction, percutaneous coronary intervention, coronary artery bypass grafting, coronary angiography revealing significant stenosis, and death due to any cause. A different management is then advocated for patients at low, intermediate, and high risk who require discharge, admission, and early invasive strategies, respectively.
The different clinical scores should be accurately evaluated and clinically validated [13]. However, to the best of our knowledge, only a single study has assessed the reliability of the HEART score [14]. This clearly represents a major drawback since the interpretation of both history and ECG is arbitrary to some extent (Table 1), and as a result, the assignment of the final score may be biased by a substantial degree of heterogeneity. Therefore, this multicenter study aimed to evaluate the inter-rater reliability of the HEART score calculated by a large number of Italian EPs.
METHODS
Study design and setting
This multicenter study was conducted in three Italian academic EDs (university hospitals in Bologna, Parma, and Modena) between March 2017 and December 2017. All three EDs used a harmonized triage procedure and managed a high number of patients, including those with chest pain.
The study was approved by the ethical committee of the Azienda Ospedaliero-Universitaria Policlinico Modena, Italy (Dnr: 96/17; 1977) and was conducted according to the Declaration of Helsinki under the terms of the relevant local legislation. The consents were collected among the doctors who participated at the study.
Data collection
The method suggested by Rotondi and Donner [15] was used to calculate the minimum sample size needed for a reliable estimation of kappa statistics (κ) according to multiple raters and multinomial outcomes.
According to the preliminary calculation (i.e., κ, 0.80; estimate prevalence of 0.2, 0.4, and 0.4 for low-, intermediate-, and high-risk scores, respectively), we planned to collect at least 53 different clinical scenarios. Hence, paper scenarios were obtained from the medical records of patients with chest pain who were admitted to the ED of the university hospital in Modena during a 2-month period (i.e., from January 1, 2017 to February 28, 2017).
Information about 59 patients was collected (one randomly selected patient with chest pain per day): demographic and clinical characteristics, nurse triage category, discharge data, clinical setting of admission, history, previous diseases, vital signs, and pain score. Additional information included age, gender, ECG data, and results of the cardiac troponin I testing (Ortho Vitros ECi, Ortho-Clinical Diagnostics, Raritan, NJ, USA; 99th upper reference limit, <34 ng/L). The exclusion criteria were as follows: (1) incomplete demographic and clinical data, (2) patients presenting with only dyspnea or palpitations, and (3) patients presenting with chest pain and significant ST segment elevation on ECG.
Study protocol
Twenty physicians who were recruited from the EDs, internal medicine wards, and postgraduate emergency medicine school were randomly assigned to a 5-hour training for the utilization of the HEART score. These participants were selected by the directors of the wards according to their willingness to participate in the study.
After completing the training on HEART score, each physician independently assigned a score to the different clinical scenarios. To prevent intercommunication among participants, they were asked to calculate the score on the same day and in the presence of the principal investigators. The clinical scenarios were randomly administered to the participants, who had access to the HEART score rules, with a 2-hour limit for completing the scoring process. Data were collected and recorded in a spreadsheet by an independent investigator, who was blinded to the aim of the study.
Data analysis and outcome
A participant was then asked to rank the score (i.e., 0, 1, or 2) for each of the five demographic and clinical characteristics to obtain the final HEART score, which helped in the classification of patients who are at low (score of 0 to 3), intermediate (score of 4 to 6), and high (score of 7 to 10) risk for MACE [7,8].
The main endpoint of this study was the estimation of inter-rater agreement in the calculation of the HEART score (κ value and 95% confidence interval [CI]) among physicians. Whether clinical experience could help improve the inter-rater reliability of calculating the HEART score was the secondary endpoint. This second aspect was assessed by comparing inter-rater reliability among expert EPs (i.e., those with more than 10 years of experience in emergency medicine) and students or physicians with no experience in emergency medicine.
According to the literature, poor, fair, moderate, good, and very good agreements were defined as a κ value between 0.00 and 0.20, 0.21 and 0.40, 0.41 and 0.60, 0.61 and 0.80, and 0.81 and 1.00, respectively. Statistical significance was set at a 0.05 alpha level. The Stata ver. 14.2 (StataCorp., College Station, TX, USA) was used for statistical analysis.
RESULTS
The three centers were all university hospitals, with similar characteristics in terms of patient volume and case mix (Table 2). The EPs recruited from the three centers also had similar experience in emergency medicine practice. Overall, 6 of the 59 clinical scenarios were excluded since they did not fulfill all our inclusion/exclusion criteria. Finally, 53 clinical scenarios were included in the analysis. The characteristics of the 53 clinical scenarios are shown in Table 3. The mean age of the patients was 56 (range, 16 to 92) years. Of the participants, 27 were men and 26 were women. Hypertension and smoking were the most frequent cardiovascular risk factors.
The final HEART score of each scenario was similar among all participants (Fig. 1). The distribution of the final HEART scores was similar to that observed in previous studies [7,8], with 20%, 40%, and 40% of clinical scenarios assigned to high-risk class as well as intermediate- and low-risk classes, respectively. The general inter-rater reliability was good (κ, 0.63; 95% CI, 0.57 to 0.70) and was similar between senior physicians (κ, 0.65; 95% CI, 0.57 to 0.73) and junior physicians (κ, 0.60; 95% CI, 0.51 to 0.72) (Table 4).
Overall, the study participants also had a good inter-rater agreement for high- and low-risk classes (HEART scores of 7 to 10 and 0 to 3; κ, 0.70 and 0.72, respectively), whereas moderate agreement was observed for the intermediate-risk class (HEART score of 4 to 6; κ, 0.51) (Table 4).
Importantly, history was characterized by the worst agreement (κ, 0.37) among the different HEART score items, with an extremely modest reliability among all participants. Modest agreement was also found for ECG score (κ, 0.37 to 0.46) (Table 4), whereas a significantly better concordance was observed for the remaining three parameters of the HEART score (i.e., risk factors, age, and troponin), as shown in Table 4.
DISCUSSION
Results showed that the calculation of the HEART score was similar among all participants, with comparable scores obtained by senior and junior physicians. In particular, a good inter-rater agreement was found for high- and low-risk classes, whereas the agreement was only modest for the intermediate-risk class.
The hypothesis that the subjective interpretation of history and ECG may influence the final calculation of the HEART score is supported by our data since larger heterogeneity was observed in scoring these two variables compared to risk factors, age, and troponin.
The HEART score has only been validated in a single center retrospective study [7] and in an ensuing multicenter study [8], which both analyzed the predictive value of the score for the combined end point of acute myocardial infarction, percutaneous coronary intervention, coronary artery bypass grafting, or death (MACE) within 6 weeks after initial assessment.
More recently, another study has compared the performance of the HEART score with that of the GRACE and TIMI scores in predicting MACE in 1,748 patients with chest pain who were admitted to the ED. Results have shown that the HEART score outperformed the other two risk assessment tools and reliably and safely identified a larger group of low-risk patients [16]. The impact of the HEART score on health care resources and expenditure has also been assessed in another study [11], which confirmed that the use of this score is safe in patients with chest pain, although a high non-compliance rate with management recommendations mitigates its otherwise favorable impact on the utilization of healthcare resources. Based on this evidence, the HEART score may have a good performance in the diagnosis and prognosis of patients with ACS in several clinical settings; hence, it may be reliable when used for estimating the risk of MACE in this category of patients.
In particular, Van Den Berg and Body [17] have conducted a recent systematic review and meta-analysis of the literature, which included 12 studies and 11,217 patients, and concluded that the HEART score identifies patients with a suspected diagnosis of ACS who have a low probability (1.6%) of developing MACE and who could be safely discharged from the ED. The area under the curve and the pooled sensitivity of the HEART score for predicting MACE were both excellent (i.e., 0.81 and 0.97, respectively), whereas the pooled specificity was modest (i.e., 0.47) [17].
To the best of our knowledge, only a single study about the inter-rater reliability of the HEART score has been previously conducted [14]. Although the study design was similar to that of our investigation (i.e., retrospective observational study that used clinical scenarios), the conclusion was quite different. In particular, Wu et al. [14] have found a substantial disagreement in the assignment of the HEART score to 33 clinical scenarios between EPs and cardiologists. Unlike these findings, we found a good agreement in the assignment of the HEART score to clinical scenarios among all raters. According to their findings, history was the primary source of disagreement (κ, 0.13; 95% CI, -0.1 to 0.40). In addition, a better agreement was observed among EPs and cardiologists for risk factors, age, and troponin.
The findings of our study may have some practical implications for managing patients with chest pain in the ED. In fact, our data showed that the HEART score may be used by both senior and junior physicians, with good inter-rater agreement (at least for patient classification in high- and low-risk classes). Notably, a score assignment to history should be modified to allow a more objective interpretation and ultimately mitigating the impact of subjectivity.
Since the HEART score was a reliable tool for classifying patients who are at low or high risk for MACE, it may be safely used in ruling out patients who are at low risk for ACS and encouraging additional investigations on high-risk patients. Nevertheless, the modest agreement found in classifying patients with an intermediate risk suggests its efficiency in identifying whether or not patients should be continuously monitored is uncertain. Indeed, further studies must be conducted to compare the reliability of other assessment tools (e.g., GRACE and TIMI) using a similar cluster set of clinical scenarios.
Interestingly, the good inter-rater reliability among all participants may allow an accurate communication among the users of the HEART score, promote a better standardization in health care, help obtain more reliable information for benchmarking, enhance patient safety, and encourage larger support for clinical research for national surveillance.
The use of clinical scenarios rather than actual clinical settings may be considered as a drawback in our study. However, performing actual trials with patients in the ED remains challenging, and more importantly, the clinical scenario approach has been used and validated in other studies that aimed to estimate the inter-rater reliability of other clinical scores [18]. Second, another possible limitation of our study is the fact that the participants only had a relatively short experience in using the HEART score. Finally, we did not compare the reliability of the HEART score with that of the other scores since this will be the focus of our next investigation in the future.
In summary, the HEART score had a good inter-rater reliability among a large number of Italian physicians, whereas a less satisfactory agreement was found in assigning the score to history. The experience of our participants did not substantially influence their scoring reliability. Overall, our participants had a good inter-rater agreement for high- and low-risk classes based on the HEART score. Meanwhile, the agreement was only modest for intermediate-risk class. In particular, the modest agreement for assigning the score to history suggests that additional efforts should be exerted in achieving a more objective assessment of this parameter.
Notes
No potential conflict of interest relevant to this article was reported.
References
Article information Continued
Notes
Capsule Summary
What is already known
The History, Electrocardiogram, Age, Risk Factors, and Troponin (HEART) score is useful for management of chest pain patients presenting to the emergency department because it is simple, easy and rapid, and has also been validated to predict major adverse cardiac events in many studies conducted in the emergency department.
What is new in the current study
In this study we found that the HEART reliability is moderate-good but the parameter history showed a fair inter-rater-reliability for its arbitrary interpretation.