@article{ref1, title="A comparison of statistical learning methods for deriving determining factors of accident occurrence from an imbalanced high resolution dataset", journal="Accident analysis and prevention", year="2019", author="Schlögl, Matthias and Stütz, Rainer and Laaha, Gregor and Melcher, Michael", volume="127", number="", pages="134-149", abstract="One of the main aims of accident data analysis is to derive the determining factors associated with road traffic accident occurrence. While current studies mainly use variants of count data regression to achieve this aim, the problem can also be considered as a binary classification task, with the dichotomous target variable indicating events (accidents) and non-events (no accidents). The effects of 45 variables - describing road condition and geometry, traffic volume and regulations, weather, and accident time - are analyzed using a dataset in high temporal (1 h) and spatial (250 m) resolution, covering the whole highway network of Austria over the period of four consecutive years. A combination of synthetic minority oversampling and maximum dissimilarity undersampling is used to balance the training dataset. We employ and compare a series of statistical learning techniques with respect to their predictive performance and discuss the importance of determining factors of accident occurrence from the ensemble of models.

FINDINGS substantiate that a trade-off between accuracy and sensitivity is inherent to imbalanced classification problems.

RESULTS show satisfying performance of tree-based methods which exhibit accuracies between 75% and 90% while exhibiting sensitivities between 30% and 50%. Overall, this analysis emphasizes the merits of using high-resolution data in the context of accident analysis.

Copyright © 2019 Elsevier Ltd. All rights reserved.

Language: en

", language="en", issn="0001-4575", doi="10.1016/j.aap.2019.02.008", url="http://dx.doi.org/10.1016/j.aap.2019.02.008" }