SAFETYLIT WEEKLY UPDATE

We compile citations and summaries of about 400 new articles every week.
RSS Feed

HELP: Tutorials | FAQ
CONTACT US: Contact info

Search Results

Journal Article

Citation

Boo Y, Choi Y. BMC Public Health 2022; 22(1): e1476.

Copyright

(Copyright © 2022, Holtzbrinck Springer Nature Publishing Group - BMC)

DOI

10.1186/s12889-022-13719-3

PMID

35918672

Abstract

BACKGROUND: Injuries caused by RTA are classified under the International Classification of Diseases-10 as 'S00-T99' and represent imbalanced samples with a mortality rate of only 1.2% among all RTA victims. To predict the characteristics of external causes of road traffic accident (RTA) injuries and mortality, we compared performances based on differences in the correction and classification techniques for imbalanced samples.

METHODS: The present study extracted and utilized data spanning over a 5-year period (2013-2017) from the Korean National Hospital Discharge In-depth Injury Survey (KNHDS), a national level survey conducted by the Korea Disease Control and Prevention Agency, A total of eight variables were used in the prediction, including patient, accident, and injury/disease characteristics. As the data was imbalanced, a sample consisting of only severe injuries was constructed and compared against the total sample. Considering the characteristics of the samples, preprocessing was performed in the study. The samples were standardized first, considering that they contained many variables with different units. Among the ensemble techniques for classification, the present study utilized Random Forest, Extra-Trees, and XGBoost. Four different over- and under-sampling techniques were used to compare the performance of algorithms using "accuracy", "precision", "recall", "F1", and "MCC".

RESULTS: The results showed that among the prediction techniques, XGBoost had the best performance. While the synthetic minority oversampling technique (SMOTE), a type of over-sampling, also demonstrated a certain level of performance, under-sampling was the most superior. Overall, prediction by the XGBoost model with samples using SMOTE produced the best results.

CONCLUSION: This study presented the results of an empirical comparison of the validity of sampling techniques and classification algorithms that affect the accuracy of imbalanced samples by combining two techniques. The findings could be used as reference data in classification analyses of imbalanced data in the medical field.


Language: en

Keywords

Machine learning; Ensemble method; Imbalanced data; Mortality prediction; Road traffic accident injury

NEW SEARCH


All SafetyLit records are available for automatic download to Zotero & Mendeley
Print