Intelligent human-machine approaches for assigning groups of injury codes to accident narratives

Nanda, Gaurav; Vallmuur, Kirsten; Lehto, Mark

doi:10.1016/j.ssci.2019.104585

SAFETYLIT WEEKLY UPDATE

We compile citations and summaries of about 400 new articles every week.

RSS Feed

HELP: Tutorials | FAQ

CONTACT US: Contact info

Search Results

Journal Article

Intelligent human-machine approaches for assigning groups of injury codes to accident narratives
Citation	Nanda G, Vallmuur K, Lehto M. Safety Sci. 2020; 125: e104585.
Copyright	(Copyright © 2020, Elsevier Publishing)
DOI	10.1016/j.ssci.2019.104585
PMID	unavailable
Abstract	In injury surveillance, different aspects of an injury event are captured using injury codes such as the External-cause-of-injury (E-code), Major Injury Factor (MIF), and Intent. These are usually assigned by human coders based on accident narratives. Previous studies have examined automated and semi-automated filtering approaches that use machine learning (ML) models to assign single E-codes to accident narratives. In this study, our goal was to examine the effectiveness of these approaches for assigning groups of injury codes. This was done for three different types of injury codes (E-code, MIF, and Intent) using several ML models (Logistic Regression, Support Vector Machine, and Long-Short-Term-Memory based Recurrent-Neural-Network). Four filtering strategies were also tested which used the probability of prediction correctness assigned by the Logistic Regression model. These approaches were evaluated for a manually-coded dataset, provided by the Queensland Injury Surveillance Unit containing about half a million injury cases. The results showed very similar performance for the three ML models. The overall sensitivity of each model was quite high and almost identical for E-code (0.81-0.82), MIF (0.69-0.71), and Intent (0.96-0.97). However, the unweighted sensitivities were lower - E-code (0.67-0.75), MIF (0.59-0.62), and Intent (0.46-0.56), reflecting a general trend of each model to under-predict small categories. It was also observed that the probability of correctly assigning all three codes was low (0.58). Filtering approaches resulted in large improvements in sensitivity for smaller categories and the probability of predicting all three codes correctly. Language: en
Keywords	Coding injury data; Filtering; Machine learning; Predicting multiple injury codes; Text classification