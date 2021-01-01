Abstract

In spite of enormous improvements in vehicle safety, roadway design, and operations, there is still an excessive amount of traffic crashes resulting in injuries and major productivity losses. Despite the many studies on factors of crash frequency and injury severity, there is still further research to be conducted. Tree and utility pole/other pole related (TUOP) crashes present approximately 12 to 15% of all roadway departure (RwD) fatal crashes in the U.S. The count of TUOP crashes comprise nearly 22% of all fatal crashes in Louisiana. From 2010 to 2016, there were 55,857 TUOP crashes reported in Louisiana. Individually examining each of these crash reports is not a realistic option to investigate crash factors. Therefore, this study employed text mining and interpretable machine learning (IML) techniques to analyze all TUOP crashes (with available crash narratives) that occurred in Louisiana from 2010 to 2016. This study has two major goals: 1) to develop a framework for applying machine learning models to classify injury levels from unstructured textual content, and 2) to apply an IML framework that provides probability measures of keywords and their association with the injury classification. The present study employed three machine learning algorithms in the classification of injury levels based on the crash narrative data. Of the used modeling techniques, the eXtreme gradient boosting (XGBoost) model shows better performance, with accuracy ranging from 0.70 to 24% for the training data and from 0.30% to 16% for the test data.

