TY - JOUR
PY - 2022//
TI - Estimating county-level overdose rates using opioid-related Twitter data: interdisciplinary infodemiology study
JO - JMIR formative research
A1 - Cuomo, Raphael
A1 - Purushothaman, Vidya
A1 - Calac, Alec
A1 - McMann, Tiana
A1 - Li, Zhuoran
A1 - Mackey, Tim
SP - ePub
EP - ePub
VL - ePub
IS - ePub
N2 - BACKGROUND: There was an estimated 100,306 drug overdose deaths between April 2020 and April 2021, a three-quarters increase from the prior 12-month period. There is an approximate six-month reporting lag for provisional counts of drug overdose deaths from the National Vital Statistics System, and the highest level of geospatial resolution is at the state level. By contrast, public social media data are available close to real-time and are often accessible with precise coordinates.
OBJECTIVE: We sought to assess whether county-level overdose mortality burden could be estimated using opioid-related Twitter data.
METHODS: ICD codes for poisoning/exposure to overdose at the county level were obtained from CDC Wonder. Demographics were collected from the American Community Survey. The Twitter API was used to obtain tweets which contained any of 36 terms with drug names. An unsupervised classification approach was used for clustering tweets. Population-normalized variables and polynomial population-normalized variables were produced. Furthermore, z-scores of the Getis Ord Gi clustering statistic were produced, and both these scores and their polynomial counterparts were explored in regression modeling of county-level overdose mortality burden. A series of linear regression models were used for predictive modeling to explore interpretability of the analytical output.
RESULTS: Modeling of overdose mortality with normalized demographic variables alone explained only 7.4% of the variability in county-level overdose mortality, whereas this was approximately doubled by the use of specific demographic and Twitter data covariates based on a backwards selection approach. The highest adjusted R2 and lowest AIC was obtained for the model with normalized demographic variables, normalized z-scores from geospatial analyses, and normalized topic counts (adjusted R2 = 0.133, AIC = 8546.8). Z-scores of the Getis Ord Gi statistic appeared to have improved utility over population-normalization alone. In this model, median age, female population, and tweets about online drug selling were positively associated with opioid mortality. Asian race and Hispanic ethnicity were significantly negatively associated with county-level burden of overdose mortality.
CONCLUSIONS: Social media data, when transformed using certain statistical approaches, may add utility in the goal of producing closer to real-time county-level estimates of overdose mortality. Prediction of opioid-related outcomes can be advanced to inform prevention and treatment decisions. This interdisciplinary approach can facilitate evidence-based funding decisions for various SUD prevention and treatment programs.
Language: en
LA - en SN - 2561-326X UR - http://dx.doi.org/10.2196/42162 ID - ref1 ER -