Predicting 30-day mortality of patients with pneumonia in an emergency department setting using machine-learning models
Article information
Abstract
Objective
This study aimed to confirm the accuracy of a machine-learning-based model in predicting the 30-day mortality of patients with pneumonia and evaluating whether they were required to be admitted to the intensive care unit (ICU).
Methods
The study conducted a retrospective analysis of pneumonia patients at an emergency department (ED) in Seoul, Korea, from January 1, 2016 to December 31, 2017. Patients aged 18 years or older with a pneumonia registry designation on their electronic medical record were enrolled. We collected their demographic information, mental status, and laboratory findings. Three models were used: the pre-existing CURB-65 model, and the CURB-RF and Extensive CURB-RF models, which were machine-learning models that used a random forest algorithm. The primary outcomes were ICU admission from the ED or 30-day mortality. Receiver operating characteristic curves were constructed for the models, and the areas under these curves were compared.
Results
Out of the 1,974 pneumonia patients, 1,732 patients were eligible to be included in the study; from these, 473 patients died within 30 days or were initially admitted to the ICU from the ED. The area under receiver operating characteristic curves of CURB-65, CURB-RF, and extensive-CURB-RF were 0.615 (0.614–0.616), 0.701 (0.700–0.702), and 0.844 (0.843–0.845), respectively.
Conclusion
The proposed machine-learning models could predict the mortality of patients with pneumonia more accurately than the pre-existing CURB-65 model and can help decide whether the patient should be admitted to the ICU.
INTRODUCTION
Pneumonia remains the number one cause of death from infectious diseases worldwide [1]. As many as four million cases of pneumonia are reported annually, and nearly one-fifth of these cases require hospitalization [2]. In the outpatient setting, the mortality rate of pneumonia remains low, within the range of 1% to 5%; however, among patients with pneumonia who require hospitalization, the mortality rate approaches 25%, particularly if the patient requires admission to an intensive care unit (ICU) [3-9].
Patients suffering from fever, dyspnea, and upper/lower respiratory symptoms (e.g., coughing) often visit the emergency department (ED). Emergency physicians play an important role in the initial evaluation, assessment, management, and disposition of these patients. The CURB-65 score and the pneumonia severity index (PSI) score are the most commonly used predictive models for the classification of such patients.
However, many predictive models for pneumonia have different variables with a dichotomous and artificial cut-off [10-12]; thus, they have limited predictive powers. CURB-65 takes considerably less time for calculations and it is also more convenient to use in an ED setting than the PSI; however, it has a disadvantage in that it consists of only five variables.
Machine-learning methods have received significant attention in the medical fields, especially in diagnosis, radiology, pathology, and prediction [13-16]. Although studies on the usefulness of machine-learning models for pneumonia diagnosis have been conducted recently, their results have been insufficient [17-20]. There have been few studies directly comparing a machine-learning model to CURB-65 [18,20]. This study aimed to confirm the accuracy of a machine-learning-based model to predict the 30-day mortality of pneumonia patients as compared to CURB-65 and to determine whether pneumonia patients were required to be admitted to the ICU.
METHODS
Study setting
This study is based on a retrospective analysis of adult medical patients with a pneumonia registry designation in their electronic medical record (EMR) arriving at an ED of a tertiary referral center, which was established in 1994, in Seoul, Korea. This center has a 73-bed emergency unit with approximately 70,000 patients visits each year. This study was approved by the institutional review board of the study site (IRB number 2018-09-047-002).
Pneumonia registry
The pneumonia registry is an EMR designation documented by physicians in the ED of the tertiary referral center, since 2011. It includes information on the CURB-65 score, the pneumonia type, smoking status, and streptococcus pneumoniae vaccination preference. Patients were diagnosed with pneumonia if they exhibited acute lower respiratory symptoms accompanied by newly documented infiltrations on chest radiographs at the time of their ED visit [21]. Clinical diagnoses were made by a physician.
Inclusion and exclusion criteria
Patients aged 18 years or order with a pneumonia registry in their EMR were enrolled in the study from January 1, 2016 to December 31, 2017. Exclusion criteria included a duplicated date, patients whose consent regarding the use of their EMR could not be obtained, and patients referred from or to another hospital.
Data collection
The EMRs of all the enrolled patients were reviewed by three physicians. The following data were collected: demographic information (age, sex, and past medical history, including hypertension, diabetes mellitus, chronic lung disease, chronic liver disease, congestive heart failure, cerebrovascular accidents, chronic kidney disease, and cancer history), mental status at the ED, laboratory findings (Appendix 1), radiological findings such as the presence of pleural effusion, microbiological results, and in-hospital treatment data (ICU admission from the ED and 30-day mortality).
Primary outcomes
The primary outcomes were ICU admission from the ED or 30-day mortality, which was defined as documented death from any cause within 30 days of visiting the ED. Patients who were discharged after 30 days of their visit were considered to be alive. The case group included death within 30 days or admission to the ICU from the ED. The control group was composed of the other patients.
Data analysis
Preprocessing
By choosing a random under-sampled selection of the control group, we solved the problem of an imbalanced outcome variable. The ratio of the under-sampled selection was 1:2 between the case and control groups. To solve the problem of missing data, a multiple imputation method was used [22].
Developing prediction models
In this study, three models were established and compared. The first is a pre-existing model, CURB-65, consisting of five clinical and laboratory characteristics (confusion, blood urea nitrogen >7 mmol/L, respiratory rate >30 breaths/minute, diastolic blood pressure <60 mmHg or systolic blood pressure <90 mmHg, age ≥65 years old) [23].
The second is a CURB-RF model consisting of the same variables as CURB-65 but with continuous and non-dichotomous values. Here, the term “RF” indicates the use of the random forest method. The final model is an extensive-CURB-RF (E-CURB-RF) model; this model not only contains more variables than CURB-65, but also contains continuous values, which are obtained by applying the random forest method. Appendix 1 shows the variables used to compose these models.
We used the basic random forest model and attempted to perform auto parameter searching by ten-fold cross validation. The dataset was divided into two smaller sets, 0.7 for the training set and 0.3 for the test set; the training set underwent ten-fold cross validation. In particular, the number of trees was fixed at 500 and the number of randomly selected features used to conduct evaluations at each tree node was searched from 1 to 15. The optimal tree node finally obtained through the ten-fold cross validation was 2. The random forest and caret package was used for modeling. The CURB-RF and E-CURB-RF models were developed based on this process (Fig. 1).
Statistical analysis
Continuous variables were expressed as the median and interquartile range. Categorical data were presented as absolute numbers and percent frequencies. Differences between the continuous variables were analyzed using a Wilcoxon test and differences between the categorical variables were analyzed using a chi-square test.
After learning from the data of the training set through the random forest algorithm, the prediction rates of the primary outcomes were measured for the test set for each developed model. Because there is a fundamental weakness of randomness due to the process of dividing data into training and test sets, the random forest models were created 1,000 times to compensate for the weakness. Further, each of the area under receiver operating characteristic curves (AUROCs) for CURB-65, CURB-RF, and E-CURB-RF were constructed and compared using the KruskalWallis test. In addition, we calculated the sensitivities, specificities, positive predictive values, negative predictive values, accuracies, and F1 scores to compare performance of the three models. All analyses were conducted using the software package R ver. 3.6.0 (R Foundation for Statistical Computing, Vienna, Austria).
RESULTS
Of the 1,974 pneumonia patients originally considered for the study, 1,732 patients were eligible for inclusion and were analyzed. We excluded patients younger than 18 years old (n=3), patients without consent regarding the use of their EMR (n=81), patients with duplicated data that were mistakenly included in the original data (n=50), patients who cancelled ED care (n=1), and patients referred from or to other hospitals (n=107) (Fig. 2).
Patient characteristics
Primary information regarding the study subjects are presented in Table 1. Among the 1,732 patients considered, the total number in the case group was 473. Among them, 358 subjects died within 30 days and 178 were transferred to an ICU from the ED (Appendix 2). Among all the study patients, a total of 1,087 (62.7%) people had community-acquired pneumonia, 89 (5.0%) had hospital-acquired pneumonia, and 546 (31.5%) had healthcare-associated pneumonia. There were significant differences in the distributions of these types of pneumonia between the groups (P<0.001). In total, 695 (40.1%) patients had some type of cancer, and the case group had a significantly higher history of cancer (P<0.05), with exception of lymphoma.
In a comparison between the control and case groups, patients in the case group were noted to be older (69 vs. 67 years old, P= 0.010) and were predominantly male (72.7 vs. 59.2%, P<0.001). In terms of the initial vital signs, the case group had lower blood pressure and SpO2 (systolic blood pressure, 121 vs. 127 mmHg; diastolic blood pressure, 69 vs. 72 mmHg; SpO2, 94 vs. 96%; P<0.001) as well as higher heart rate and respiratory rate (heart rate, 108 vs. 99/min; respiratory rate, 21 vs. 20/min; P<0.001).
Table 2 confirms the presence of pleural effusion and various initial laboratory findings in the study subjects. The case group is assigned more patients with significant pleural effusion (P<0.001), and had significantly lower hemoglobin, platelet, and albumin, as well as significantly higher lactic acid, procalcitonin, C-reactive protein, blood urea nitrogen, and creatinine, than the control group.
The distribution of 30-day mortality or ICU admission from the ED according to the CURB-65 scores are listed in Table 3. Scores of 0 and 1 were distributed more in the control group, whereas scores of ≥2 were observed more in the case group. Because previous studies have set only 28- or 30-day mortality as the primary outcome, it is difficult to create a pure distribution of the mortality according to the CURB-65 scores for comparison with results from previous studies [20,23].
Comparison of three models
The AUROCs used to predict the primary outcome were 0.615 (95% confidence interval [CI], 0.614–0.616), 0.701 (95% CI, 0.700–0.702), and 0.844 (95% CI, 0.843–0.845) for the CURB-65, CURB-RF, and E-CURB-RF models (Fig. 3). The ROC curves for 30-day mortality are shown in Appendix 3, the AUROCs of which are 0.581 (95% CI, 0.579–0.582) for CURB-65, 0.638 (95% CI, 0.636–0.639) for CURB-RF, and 0.822 (95% CI, 0.821–0.823) for E-CURB-RF model. A comparison of the performance of the three models is listed in Table 4. The performance of the CURB-65 model was evaluated based on a score of 2, which is the original cut-off point [23]. In the case of the CURB-RF and E-CURB-RF models, we choose the data that have the highest F1 scores for a sensitivity of 0.8 or more and specificity of 0.2 or more for each of the 1,000 models. The Kruskal-Wallis test was used to confirm the significance among the three models and a post-hoc test was performed using Bonferroni correction. As a result, statistical significance was observed among the three models (P<0.001), except for the negative predictive value between two random forest models (P=0.083). The model with higher sensitivity is thus chosen if early treatment is important and to admit patients who are likely to worsen. The model with higher specificity is chosen if reducing medical care costs incurred by hospitalization of low-risk groups of pneumonia is the more important factor [24,25].
All the variables in Appendix 1 were used for the machinelearning algorithm, and the top 5 variables in the dataset that had the highest area under the curve values among the 1,000 E-CURB-RF models are serum lactic acid, serum albumin, hemoglobin, D-dimer, and peripheral capillary oxygen saturation, respectively. Although there may be differences in the types of top-10 variables for every 1,000 models, it is expected that the difference would not be significant.
DISCUSSION
There have been a few studies using machine-learning methods to predict mortality from pneumonia [17-20], and a few studies have included patients who visited an ED [20]. Previous studies on predicting mortality from pneumonia for CURB-65 demonstrated an AUROC range of approximately 0.6 to 0.75 [18,20,26,27], and this study showed an AUROC of 0.615 (95% CI, 0.614–0.616). In fact, the AUROC value of CURB-65 for 30-day mortality was only 0.581 (95% CI, 0.579–0.582), which is considered to be an extremely low predictive power compared to other reported studies.
Machine-learning involves scientific studies focusing on how computers learn trends from data [13]. One of the remarkable characteristics of machine learning is the improvement observed with additional learning [16]. This study suggests that machine-learning methods perform better than the existing CURB-65 model with regard to predicting the 30-day mortality or ICU admission of patients with pneumonia. By comparing the CURB-65 and CURB-RF models, it is observed that using a continuous value through a machine-learning method is more advantageous than dividing the dichotomous cutoffs by a clinician to improve predictive power. Furthermore, by comparing the CURB-RF and E-CURB-RF models, it is confirmed that when greater number of variables are considered, the AUROC values are higher. As the CURB-RF model has higher sensitivity than the E-CURB-RF model (0.924 vs. 0.803), but lower specificity of 0.270, the use of the CURB-RF model as a predictive model seems to be challenging. In practice, there has been a preference toward quick and convenient models, such as CURB-65 consisting of five variables, and q-SOFA consisting of three variables, in an ED setting. It is expected that the inconvenience of a complex model comprising more variables will be complemented by advances in machine-learning methods.
Unlike other studies, the 30-day mortality or ICU admission from the ED were included in the primary outcomes of this study. As predicting not only the 30-day mortality and deciding whether patients require admission but also whether they should be admitted to an ICU is crucial, the setting of primary outcomes seems meaningful. Unfortunately, the ICU admission rate of patients having pneumonia was underestimated. The rate was measured only when patients having pneumonia were admitted directly from the ED to an ICU; consequently, the cases of patients transferring to an ICU through general wards were missing. Therefore, analyzing cases of ICU transfer within 24 or 48 hours after a general ward admission is necessary to determine the factors for predicting patient deterioration.
According to Appendix 2, of the 358 deaths within the 30-day period, 295 were admitted to the general ward, and 63 to the ICU initially. In general, similar to that in other hospitals, physicians determined admission to an ICU if patients had an intubation, high vasopressor requirements, or needed intensive intervention such as continuous renal replacement therapy [28,29]. However, whether the patient is in a “Do not resuscitate (DNR)” state is also an important factor in terms of ICU care and limit of ICU capacity. It can be assumed that the number of patients with DNR setting is high owing to the high proportion of history of cancer or chronic medical disease in the hospital where this study was being conducted. Furthermore, the “ICU admission” of “Death within 30-day” group showed worse laboratory findings such as significantly lower albumin, higher lactic acid, procalcitonin, Creactive protein, and creatinine than the “No ICU” of “Death within 30-day” group. It can be inferred that patients with possibility of deterioration or possibility of hemodialysis due to high creatinine were preferentially selected to ICU admission. In addition, the higher the CURB-65 score, the more patients were admitted to an ICU. To identify ICU admission criteria in this study, supplementing several variables such as use of vasopressors and its dosage and whether a person has DNR status is necessary.
This study has several limitations, one of which was being a retrospective analysis conducted at a single, tertiary referral center. As can be seen in Table 1, 40.1% of patients had a history of cancer and 36.5% were non–community-acquired pneumonia patients, and the distribution might be likely to be different compared to other hospitals. Because a machine learning method is used to select the appropriate variables to create a model from an original dataset through a learning process, it can be inferred that whenever machine learning models are created based on the individual data from different hospitals, the models will reflect the specific characteristics of each hospital. In other words, the E-CURB-RF model created in this study cannot accurately demonstrate the validity of the dataset of other hospitals. In this aspect, a data analysis of multiple centers of the same level is necessary.
Furthermore, it is expected that each time another machine learning method such as deep learning is used, new models comprising different variables with different weighted values will appear. This characteristic can put the reliability in doubt; however, random forest is the most popular ensemble technique used to solve classification problems based on large data and is widely used in various fields, including medicine [30-32]. Random forest builds a set of decision trees based on the bagging and bootstrap technique. In general, the number of trees is sufficient, the models’ error rate is low and its prediction is stable [30], and this robust nature meets medical needs. Besides, because the highly weighted variables used herein have already been known to be important factors in previous studies [11,33], the results are reliable to a certain degree.
In this study, there might be a concern in that it is difficult to explain the cause of death within 30 days as being from pneumonia alone because the case group had significantly more patients with a history of cancer. In fact, however, history of cancer was not included in the highly weighted variables and likely did not have a significant impact on the outcome.
Owing to the limitation of retrospective studies, many cases that were diagnosed as having pneumonia but not recorded in the pneumonia registry might have been missed. There is also a limitation in that 1,974 data were analyzed by three clinicians during the data collection process. Even if certain criteria are determined prior to an EMR review, it is possible that each clinician evaluates the data differently.
The results were not compared with various other pre-existing prediction models such as the PSI score or SOFA score, which are frequently used in ICU care. However, this study is conducted on the ED setting that routinely calculates the CURB-65 score, and obtaining the components of other pre-existing models such as Glasgow coma scale score, vasoactive dosage, and partial pressure of arterial oxygen is difficult. There will be several missing values that makes comparison difficult. In the future, prospective studies are needed to apply a new machine learning-based model complemented by improving the above-mentioned limitations for patients having pneumonia visiting an ED.
In summary, we established that a machine learning-based model can predict the mortality of patients with pneumonia in an ED more accurately than pre-existing CURB-65 and help decide whether ICU care needs to be pursued.
Notes
No potential conflict of interest relevant to this article was reported.
References
Appendices
Appendix 2.
Basic characteristics of the study subjects stratified by death within 30 days and ICU admission
ceem-19-052-appendix2.pdfAppendix 3.
Comparison of receiver operating characteristics curves among the three models for 30-day mortality.
ceem-19-052-appendix3.pdfArticle information Continued
Notes
Capsule Summary
What is already known
Pneumonia is the leading cause of death from infectious diseases, and thus the importance has been given to its disposition based on different severity scores.
What is new in the current study
This study suggests that a machine-learning-based model can predict the mortality of pneumonia patients in an emergency department more accurately than pre-existing CURB-65 and help decide whether to pursue intensive care unit care.