Bayesian methods: a useful tool for classifying injury narratives into cause groups

Lehto, Mark R.; Marucci-Wellman, Helen R.; Corns, H.

doi:10.1136/ip.2008.021337

SAFETYLIT WEEKLY UPDATE

We compile citations and summaries of about 400 new articles every week.

RSS Feed

HELP: Tutorials | FAQ

CONTACT US: Contact info

Search Results

Journal Article

Bayesian methods: a useful tool for classifying injury narratives into cause groups
Citation	Lehto MR, Marucci-Wellman HR, Corns H. Inj. Prev. 2009; 15(4): 259-265.
Affiliation	School of Industrial Engineering, Purdue University, 1287 Grissom Hall, West Lafayette, IN 47907, USA. lehto@purdue.edu
Copyright	(Copyright © 2009, BMJ Publishing Group)
DOI	10.1136/ip.2008.021337
PMID	19652000
Abstract	To compare two Bayesian methods (Fuzzy and Naïve) for classifying injury narratives in large administrative databases into event cause groups, a dataset of 14 000 narratives was randomly extracted from claims filed with a worker's compensation insurance provider. Two expert coders assigned one-digit and two-digit Bureau of Labor Statistics (BLS) Occupational Injury and Illness Classification event codes to each narrative. The narratives were separated into a training set of 11 000 cases and a prediction set of 3000 cases. The training set was used to develop two Bayesian classifiers that assigned BLS codes to narratives. Each model was then evaluated for the prediction set. Both models performed well and tended to predict one-digit BLS codes more accurately than two-digit codes. The overall sensitivity of the Fuzzy method was, respectively, 78% and 64% for one-digit and two-digit codes, specificity was 93% and 95%, and positive predictive value (PPV) was 78% and 65%. The Naïve method showed similar accuracy: a sensitivity of 80% and 70%, specificity of 96% and 97%, and PPV of 80% and 70%. For large administrative databases, Bayesian methods show significant promise as a means of classifying injury narratives into cause groups. Overall, Naïve Bayes provided slightly more accurate predictions than Fuzzy Bayes. Language: en