Abstract

Analyzing crash data is a complex and labor-intensive process that requires careful consideration of multiple interdependent modeling aspects, such as functional forms, transformations, likely contributing factors, correlations, and unobserved heterogeneity. Limited time, knowledge, and experience may lead to over-simplified, over-fitted, or misspecified models overlooking important insights. This paper proposes an extensive hypothesis testing framework including a multi-objective mathematical programming formulation and solution algorithms to estimate crash frequency models considering simultaneously likely contributing factors, transformations, non-linearities, and correlated random parameters. The mathematical programming formulation minimizes both in-sample fit and out-of-sample prediction. To address the complexity and non-convexity of the mathematical program, the proposed solution framework utilizes a variety of metaheuristic solution algorithms. Specifically, Harmony Search demonstrated minimal sensitivity to hyperparameters, enabling an efficient search for solutions without being influenced by the choice of hyperparameters. The effectiveness of the framework was evaluated using two real-world datasets and one synthetic dataset. Comparative analyses were performed using the two real-world datasets and the corresponding models published in literature by independent teams. The proposed framework showed its capability to pinpoint efficient model specifications, produce accurate estimates, and provide valuable insights for both researchers and practitioners. The proposed approach allows for the discovery of numerous insights while minimizing the time spent on model development. By considering a broader set of contributing factors, models with varied qualities can be generated. For instance, when applied to crash data from Queensland, the proposed approach revealed that the inclusion of medians on sharp curved roads can effectively reduce the occurrence of crashes, when applied to crash data from Washington, the simultaneous consideration of traffic volume and road curvature resulted in a notable reduction in crash variances but an increase in crash means.

