SAFETYLIT WEEKLY UPDATE

We compile citations and summaries of about 400 new articles every week.
RSS Feed

HELP: Tutorials | FAQ
CONTACT US: Contact info

Search Results

Journal Article

Citation

Banerjee A, Mutlu OC, Kline A, Washington P, Wall D, Surabhi S. JMIR Form. Res. 2022; ePub(ePub): ePub.

Copyright

(Copyright © 2022, JMIR Publications)

DOI

10.2196/39917

PMID

35962462

Abstract

BACKGROUND: Implementing automated facial expression recognition on mobile devices could provide an accessible diagnostic and therapeutic tool for those who struggle to recognize facial expression, including children with developmental behavioral conditions such as autism. Although recent advances have been made in building more accurate facial expression classifiers for children, existing models are too computationally expensive to be deployed on smartphones.

OBJECTIVE: In this study, we explored the deployment of several state-of-the-art facial expression classifiers designed for usage on mobile devices. We use various post-training optimization techniques for both classification performance and efficiency on a Motorola Moto G6 phone. We additionally explore the importance of training our classifiers on children compared to adults and evaluate the performance of our models against different ethnic groups.

METHODS: We collected images from twelve public datasets and used video frames crowdsourced from the GuessWhat app a to train our classifiers. All images were annotated for 7 expressions: neutral, fear, happiness, sadness, surprise, anger, and disgust. We tested three copies for each of five different convolutional neural network architectures: MobileNetV3-Small 1.0x, MobileNetV2 1.0x, EfficientNetB0, MobileNetV3-Large 1.0x, and NASNetMobile. The first copy trained on images of children, the second copy trained on images of adults, while the third copy trained on all datasets. We evaluated each model against the Child Affective Facial Expression set, both in its entirety and by ethnicity. We then performed weight pruning, weight clustering, and quantize-aware training when possible and profiled the performance of each model on the Moto G6.

RESULTS: Our best model, a MobileNetV3-Large network pre-trained on ImageNet, achieved 65.78% balanced accuracy and 65.31% F1-score on CAFE while achieving a 90-millisecond inference latency on a Motorola Moto G6 phone when trained on all data. This balanced accuracy is only 1.12% lower than the current state of the art for CAFE, a model with 13.91x more parameters and was unable to run on the Moto G6 due to its size, even when fully optimized. When trained solely on children, this model achieved 60.57% balanced accuracy and 60.29% F1-score, while when trained only on adults the model received 53.36% balanced accuracy and 53.10% F1-score. Although the MobileNetV3-Large trained on all datasets achieved nearly 60% F1-score across all ethnicities, South Asian and African American children receive as much as 11.56% balanced accuracy and 11.25% F1-score lower than other groups.

CONCLUSIONS: This work demonstrates that with specialized design and optimization techniques, facial expression classifiers can become lightweight enough to run on mobile devices and still achieve state-of-the-art performance. This study also shows that there is potentially a "data shift" phenomenon between facial expressions of children compared to adults, with our classifiers performing much better when trained on children. In addition, we find that certain underrepresented ethnic groups such as South Asian and African American perform significantly worse than groups such as European Caucasian despite having a similar quality of data. The models developed in this study can be integrated into mobile health therapies to help diagnose ASD and to provide targeted therapeutic treatment to children.


Language: en

NEW SEARCH


All SafetyLit records are available for automatic download to Zotero & Mendeley
Print