A Dynamic Texture based Approach to Recognition of Facial Actions and their Temporal Models
S. Koelstra, M. Pantic and I. Patras. In IEEE Trans. Pattern Analysis and Machine Intelligence, 2010. in press [bibtex]
@article{Koelstra10,
author = {S. Koelstra, M. Pantic and I. Patras},
title = {A Dynamic Texture based Approach to Recognition of Facial Actions and their Temporal Models},
journal = {IEEE Trans. Pattern Analysis and Machine Intelligence},
year = {2010},
note={in press},
abstract = "In this work we propose a dynamic-texture-based approach to the recognition of facial Action Units (AUs, atomic facial gestures) and their temporal models (i.e., sequences of temporal segments: neutral, onset, apex, and offset) in near-frontal-view face videos. Two approaches to modelling the dynamics and the appearance in the face region of an input video are compared: an extended version of Motion History Images and a novel method based on Non-rigid Registration using Free-Form Deformations (FFDs). The extracted motion representation is used to derive motion orientation histogram descriptors in both the spatial and temporal domain. Per AU, a combination of discriminative, frame-based GentleBoost ensemble learners and dynamic, generative Hidden Markov Models detects the presence of the AU in question and its temporal segments in an input image sequence. When tested for recognition of all 27 lower and upper face AUs, occurring alone or in combination in 264 sequences from the MMI facial expression database, the proposed method achieved an average event recognition accuracy of 89.2\% for the MHI method and of 94.3\% for the FFD method. The generalization performance of the FFD method has been tested using the Cohn-Kanade database. Finally, we also explored the performance on spontaneous expressions in the Sensitive Artificial Listener dataset.",
}
Abstract: In this work we propose a dynamic-texture-based approach to the recognition of facial Action Units (AUs, atomic facial gestures) and their temporal models (i.e., sequences of temporal segments: neutral, onset, apex, and offset) in near-frontal-view face videos. Two approaches to modelling the dynamics and the appearance in the face region of an input video are compared: an extended version of Motion History Images and a novel method based on Non-rigid Registration using Free-Form Deformations (FFDs). The extracted motion representation is used to derive motion orientation histogram descriptors in both the spatial and temporal domain. Per AU, a combination of discriminative, frame-based GentleBoost ensemble learners and dynamic, generative Hidden Markov Models detects the presence of the AU in question and its temporal segments in an input image sequence. When tested for recognition of all 27 lower and upper face AUs, occurring alone or in combination in 264 sequences from the MMI facial expression database, the proposed method achieved an average event recognition accuracy of 89.2\% for the MHI method and of 94.3\% for the FFD method. The generalization performance of the FFD method has been tested using the Cohn-Kanade database. Finally, we also explored the performance on spontaneous expressions in the Sensitive Artificial Listener dataset.
Single Trial Classification of EEG and Peripheral Physiological Signals for Recognition of Emotions Induced by Music Videos
Koelstra, Sander and Yazdani, Ashkan and Soleymani, Mohammad and Muehl, Christian and Lee, Jong-Seok and Nijholt, Anton and Pun, Thierry and Ebrahimi, Touradj and Patras, Ioannis. In Conf. Brain Informatics, 2010. [pdf] [bibtex]
@inproceedings{Koelstra10_1,
author = {Koelstra, Sander and Yazdani, Ashkan and Soleymani, Mohammad and Muehl, Christian and Lee, Jong-Seok and Nijholt, Anton and Pun, Thierry and Ebrahimi, Touradj and Patras, Ioannis},
booktitle = {Conf. Brain Informatics},
year = 2010,
title = {{Single Trial Classification of EEG and Peripheral Physiological Signals for Recognition of Emotions Induced by Music Videos}},
abstract = {{Recently, the field of automatic recognition of users' affective states has gained a great deal of attention. Automatic, implicit recognition of affective states has many applications, ranging from personalized content recommendation to automatic tutoring systems. In this work, we present some promising results of our research in classification of emotions induced by watching music videos. We show robust correlations between users' self-assessments of arousal and valence and the frequency powers of their EEG activity. We present methods for single trial classification using both EEG and peripheral physiological signals. For EEG, an average (maximum) classification rate of 55.7\% (67.0\%) for arousal and 58.8\% (76.0\%) for valence was obtained. For peripheral physiological signals, the results were 58.9\% (85.5\%) for arousal and 54.2\% (78.5\%) for valence.}},
}Abstract: Recently, the field of automatic recognition of users' affective states has gained a great deal of attention. Automatic, implicit recognition of affective states has many applications, ranging from personalized content recommendation to automatic tutoring systems. In this work, we present some promising results of our research in classification of emotions induced by watching music videos. We show robust correlations between users' self-assessments of arousal and valence and the frequency powers of their EEG activity. We present methods for single trial classification using both EEG and peripheral physiological signals. For EEG, an average (maximum) classification rate of 55.7\% (67.0\%) for arousal and 58.8\% (76.0\%) for valence was obtained. For peripheral physiological signals, the results were 58.9\% (85.5\%) for arousal and 54.2\% (78.5\%) for valence.
EEG analysis for implicit tagging of video data
S. Koelstra, C. Muehl and I. Patras. In Workshop on Affective Brain-Computer Interfaces, Proc. ACII, pages 27-32, 2009. [pdf] [bibtex]
@inproceedings{Koelstra09_2,
author = {S. Koelstra, C. Muehl and I. Patras},
title = {EEG analysis for implicit tagging of video data},
booktitle = {Workshop on Affective Brain-Computer Interfaces, Proc. ACII},
year = {2009},
pages = {27--32},
abstract = "In this work, we aim to find neuro-physiological indicators to validate tags attached to video content.
Subjects are shown a video and a tag and we aim to determine whether the shown tag was congruent with the presented video by detecting the occurrence of an N400 event-related potential.
Tag validation could be used in conjunction with a vision-based recognition system as a feedback mechanism to improve the classification accuracy for multimedia indexing and retrieval.
An advantage of using the EEG modality for tag validation is that it is a way of performing implicit tagging.
This means it can be performed while the user is passively watching the video.
Independent Component Analysis and repeated measures ANOVA are used for analysis.
Our experimental results show a clear occurrence of the N400 and a significant difference in N400 activation between matching and non-matching tags."
}Abstract: In this work, we aim to find neuro-physiological indicators to validate tags attached to video content. Subjects are shown a video and a tag and we aim to determine whether the shown tag was congruent with the presented video by detecting the occurrence of an N400 event-related potential. Tag validation could be used in conjunction with a vision-based recognition system as a feedback mechanism to improve the classification accuracy for multimedia indexing and retrieval. An advantage of using the EEG modality for tag validation is that it is a way of performing implicit tagging. This means it can be performed while the user is passively watching the video. Independent Component Analysis and repeated measures ANOVA are used for analysis. Our experimental results show a clear occurrence of the N400 and a significant difference in N400 activation between matching and non-matching tags.
The FAST-3D Spatio-Temporal Interest Region Detector
S. Koelstra and I. Patras. In Workshop on Image Analysis for Multimedia Interactive Services, pages 242-245, 2009. [pdf] [bibtex]
@inproceedings{Koelstra09,
author = {S. Koelstra and I. Patras},
title = {The FAST-3D Spatio-Temporal Interest Region Detector},
booktitle = {Workshop on Image Analysis for Multimedia Interactive Services},
year = {2009},
pages = {242--245},
abstract="Spatio-temporal interest region detectors can be used in the analysis of video to determine sparse, informative regions as candidates for feature extraction.
In this paper we compare existing detectors and introduce the new FAST-3D detector, loosely based on the FAST spatial interest region detector.
We compare the invariance of detectors to rotation, scale and compression by measuring the similarity between detected interest regions in original and transformed versions of videos.
We measure both the repeatibility and introduce a new similarity measure based on mutual information.
The FAST-3D detector is shown to be on par with the other detectors, while showing a significant increase in speed."
}Abstract: Spatio-temporal interest region detectors can be used in the analysis of video to determine sparse, informative regions as candidates for feature extraction. In this paper we compare existing detectors and introduce the new FAST-3D detector, loosely based on the FAST spatial interest region detector. We compare the invariance of detectors to rotation, scale and compression by measuring the similarity between detected interest regions in original and transformed versions of videos. We measure both the repeatibility and introduce a new similarity measure based on mutual information. The FAST-3D detector is shown to be on par with the other detectors, while showing a significant increase in speed.
Non-rigid registration using free-form deformations for recognition of facial actions and their temporal dynamics
S. Koelstra and M. Pantic. In Proc. IEEE Conf. Face and Gesture Recognition, pages 1-8, 2008. [pdf] [bibtex]
@INPROCEEDINGS{Koelstra08,
author = {S. Koelstra and M. Pantic},
title = {Non-rigid registration using free-form deformations for recognition of facial actions and their temporal dynamics},
booktitle = "Proc. IEEE Conf. Face and Gesture Recognition",
year = {2008},
pages = {1-8},
abstract = "In this paper we propose an appearance-based approach to recognition
of facial Action Units (AUs) and their temporal segments in frontal-view face videos. Non-rigid registration using free-form deformations is used to determine motion in the face region of an input video. The extracted motion fields are then used to derive motion histogram descriptors. Per AU, a combination of ensemble learners and Hidden Markov Models detects the
presence of the AU in question and its temporal segment in each frame of an input sequence.
When tested for recognition of all 27 lower and upper face AUs, occurring alone or in
combination in 264 sequences from the MMI facial expression database, an average sequence classification rate of 94.3\% was achieved."
}Abstract: In this paper we propose an appearance-based approach to recognition of facial Action Units (AUs) and their temporal segments in frontal-view face videos. Non-rigid registration using free-form deformations is used to determine motion in the face region of an input video. The extracted motion fields are then used to derive motion histogram descriptors. Per AU, a combination of ensemble learners and Hidden Markov Models detects the presence of the AU in question and its temporal segment in each frame of an input sequence. When tested for recognition of all 27 lower and upper face AUs, occurring alone or in combination in 264 sequences from the MMI facial expression database, an average sequence classification rate of 94.3\% was achieved.



