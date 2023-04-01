Abstract

Motor vehicle crashes are the leading cause of the death of teenagers in the United States. Young drivers have shown their higher propensity to get involved in crashes due to using a cellphone while driving, breaking the speed limit, and reckless driving. This study analyzed motor vehicle crashes involving young drivers using New Jersey crash data. Specifically, four years of crash data (2016-2019) was gathered and analyzed. Different machine learning (ML) methods, such as Random Forest, Light GBM, Catboost, and XGBoost, were used to predict the injury severity. The performance of the models was evaluated using accuracy, precision, and recall scores. In addition, interpretable ML techniques like sensitivity analysis and Shapley values were conducted to assess the most influential factors' impact on young driver-related crashes. The results revealed that XGBoost performed better than Random Forest, CatBoost, and LightGBM models in crash severity prediction.



RESULTS from the sensitivity analysis showed that multi-vehicle crashes, angular crashes, crashes at intersections, and dark-not-lit conditions had increased crash severity. A partial dependence plot of SHAP values revealed that speeding while in clear weather had a higher likelihood of injury crashes, and multi-vehicle crashes at the intersection had more injury crashes. We expect that the results obtained from this study will help policymakers and practitioners to take appropriate countermeasures to improve the safety of young drivers in New Jersey.

