SAFETYLIT WEEKLY UPDATE

We compile citations and summaries of about 400 new articles every week.
RSS Feed

HELP: Tutorials | FAQ
CONTACT US: Contact info

Search Results

Journal Article

Citation

Cao H, Cooper DG, Keutmann MK, Gur RC, Nenkova A, Verma R. IEEE Trans. Affect. Comput. 2014; 5(4): 377-390.

Affiliation

Radiology Department at the University of Pennsylvania, 3600 Market Street, Suite 380, Philadelphia, PA 19104.

Copyright

(Copyright © 2014, IEEE Computer Society)

DOI

10.1109/TAFFC.2014.2336244

PMID

25653738

PMCID

PMC4313618

Abstract

People convey their emotional state in their face and voice. We present an audio-visual data set uniquely suited for the study of multi-modal emotion expression and perception. The data set consists of facial and vocal emotional expressions in sentences spoken in a range of basic emotional states (happy, sad, anger, fear, disgust, and neutral). 7,442 clips of 91 actors with diverse ethnic backgrounds were rated by multiple raters in three modalities: audio, visual, and audio-visual. Categorical emotion labels and real-value intensity values for the perceived emotion were collected using crowd-sourcing from 2,443 raters. The human recognition of intended emotion for the audio-only, visual-only, and audio-visual data are 40.9%, 58.2% and 63.6% respectively. Recognition rates are highest for neutral, followed by happy, anger, disgust, fear, and sad. Average intensity levels of emotion are rated highest for visual-only perception. The accurate recognition of disgust and fear requires simultaneous audio-visual cues, while anger and happiness can be well recognized based on evidence from a single modality. The large dataset we introduce can be used to probe other questions concerning the audio-visual perception of emotion.


Language: en

NEW SEARCH


All SafetyLit records are available for automatic download to Zotero & Mendeley
Print