Predicting and analyzing injury severity: a machine learning-based approach using class-imbalanced proactive and reactive data

Sarkar, Sobhan; Pramanik, Anima; Maiti, J.; Reniers, Genserik

doi:10.1016/j.ssci.2020.104616

SAFETYLIT WEEKLY UPDATE

We compile citations and summaries of about 400 new articles every week.

RSS Feed

HELP: Tutorials | FAQ

CONTACT US: Contact info

Search Results

Journal Article

Predicting and analyzing injury severity: a machine learning-based approach using class-imbalanced proactive and reactive data
Citation	Sarkar S, Pramanik A, Maiti J, Reniers G. Safety Sci. 2020; 125: e104616.
Copyright	(Copyright © 2020, Elsevier Publishing)
DOI	10.1016/j.ssci.2020.104616
PMID	unavailable
Abstract	Although the utility of the machine learning (ML) techniques is established in occupational accident domain using reactive data, its exploration in predicting injury severity using both reactive and proactive data is new. This necessitates the investigation of the significance of both types of data in prediction of injury severity using ML techniques. In addition, the unstructured texts, and class-imbalance in data often create difficulty in analysis. Therefore, to address the above-mentioned issues, two types of data, namely investigation report (i.e., reactive data) and inspection report (i.e., proactive data), collected from a steel plant, are used in this study. The datasets are merged together for generating mixed dataset. Topic modeling is used to handle the unstructured texts. A total of four oversampling algorithms, namely Synthetic Minority Over-sampling Technique (SMOTE), borderline SMOTE (BLSMOTE), Majority Weighted Minority Oversampling Technique (MWMOTE), and k-means SMOTE (KMSMOTE) have been used separately to handle the class imbalance issue. Thereafter, a set of six prediction algorithms, namely support vector machine, artificial neural network, Naíve Bayes, k-nearest neighbour, classification and regression tree analysis, and random forest have been used on reactive and mixed datasets separately for injury severity prediction. The results reveal that KMSMOTE performs better than others in balancing datasets and therefore, helps in achieving higher prediction in terms of average recall, F1-score and geometric mean. In addition, it is also statistically shown that prediction of injury severity is significantly higher using mixed dataset than reactive dataset only. Finally, a set of 19 crisp safety decision rules are generated using tolerance rough set approach (TRSA), which can explain the factors responsible for injury severity outcomes, namely 'Fatal', 'Medical case', and 'First-aid'. Language: en
Keywords	Class-imbalance; Classification algorithms; Injury severity prediction; Oversampling techniques; Reactive and proactive data; Tolerance rough set approach (TRSA)