Do larger sample sizes increase the reliability of traffic incident duration models? A case study of east Tennessee incidents

Zhang, Zihe; Liu, Jun; Li, Xiaobing; Khattak, Asad J.

doi:10.1177/0361198121992063

SAFETYLIT WEEKLY UPDATE

We compile citations and summaries of about 400 new articles every week.

RSS Feed

HELP: Tutorials | FAQ

CONTACT US: Contact info

Search Results

Journal Article

Do larger sample sizes increase the reliability of traffic incident duration models? A case study of east Tennessee incidents
Citation	Zhang Z, Liu J, Li X, Khattak AJ. Transp. Res. Rec. 2021; 2675(6): 265-280.
Copyright	(Copyright © 2021, Transportation Research Board, National Research Council, National Academy of Sciences USA, Publisher SAGE Publishing)
DOI	10.1177/0361198121992063
PMID	unavailable
Abstract	Incident duration models are often developed to assist incident management and traveler information dissemination. With recent advances in data collection and management, enormous achieved incident data are now available for incident model development. However, a large volume of data may present challenges to practitioners, such as data processing and computation. Besides, data that span multiple years may have inconsistency issues because of the data collection environments and procedures. A practical question may arise in the incident modeling community--Is that much data really necessary ("all-in") to build models? If not, then how many data are necessary? To answer these questions, this study aims to investigate the relationship between the data sample sizes and the reliability of incident duration analysis models. This study proposed and demonstrated a sample size determination framework through a case study using data of over 47,000 incidents. This study estimated handfuls of hazard-based duration models with varying sample sizes. The relationships between sample size and model performance, along with estimate outcomes (i.e., coefficients and significance levels), were examined and visualized. The results showed that the variation of estimated coefficients decreases as the sample size increases, and becomes stabilized when the sample size reaches a critical threshold value. This critical threshold value may be the recommended sample size. The case study suggested a sample size of 6,500 to be enough for a reliable incident duration model. The critical value may vary significantly with different data and model specifications. More implications are discussed in the paper. Language: en