Research
In my research, I explore the use of the modalities of facial expressions and EEG analysis for affective and implicit tagging. In other words, I record people's brainwaves and facial expressions as they view multimedia content, trying to extract some meaningful information about the multimedia material from these signals. Below, I give a short overview of various aspects of this work.
EEG analysis for assessment of affective state
- A participant in the DEAP dataset experiment

- Correlations of EEG signals with the various response targets

- Positions of music videos in the valence-arousal space according to the online annotations

This research is on the use of EEG signal analysis for the assessment of a user's affective state. I performed three experiments (together with researchers from UniGe, UTwente and EPFL) where participants watched a set of music videos. After each video, the participant rates it on the well-known valence and arousal scales. I used power spectral density features in combination with a binary SVM classifier in order to classify the participants EEG signals in to low/high valence/arousal. In the most recent experiment, accuracies of 62.0% and 57.6% were attained for binary valence and arousal classification, respectively. The DEAP dataset collected during this last experiment has been made public and is with 32 participants to the best of our knowledge the largest publicly available dataset containing EEG, peripheral signals and face video. Finally, this technique was used as part of a real-time music video recommendation system. See also the slides of a departmental talk I gave on the DEAP dataset experiment (note that this presentation is in SVG format and has only been tested on up-to-date firefox browsers).
- DEAP: A Database for Emotion Analysis using Physiological Signals,
S. Koelstra, C. Muehl, M. Soleymani, J.-S. Lee, A. Yazdani, T. Ebrahimi, T. Pun, A. Nijholt, I. Patras. In IEEE Transaction on Affective Computing, Special Issue on Naturalistic Affect Resources for System Building and Evaluation,
in press
[pdf]
[bibtex]
@article{Koelstra11_2,
author = {S. Koelstra, C. Muehl, M. Soleymani, J.-S. Lee, A. Yazdani, T. Ebrahimi, T. Pun, A. Nijholt, I. Patras},
journal = {IEEE Transaction on Affective Computing, Special Issue on Naturalistic Affect Resources for System Building and Evaluation},
title = {DEAP: A Database for Emotion Analysis using Physiological Signals},
note = {in press},
abstract={Viewers' preference for multimedia selection depends highly on their emotional experience. In this paper, we present an emotion detection method for music videos using central and peripheral nervous system physiological signals as well as multimedia content analysis. A set of 40 music clips eliciting a broad range of emotions were first selected. After extracting the one minute long emotional highlight of each video, they were shown to 32 participants while their physiological responses were recorded. Participants self-reported their felt emotions after watching each clip by means of arousal, valence, dominance, and liking ratings. The physiological signals included electroencephalogram, galvanic skin response, respiration pattern, skin temperature, electromyograms and blood volume pulse using plethysmograph. Emotional features were extracted from the signals and the multimedia content. The emotional features were used to train a linear ridge regressor to detect emotions for each participant using a leave-one-out cross-validation strategy. The performance of the personalized emotion detection is shown to be significantly superior to a random regressor.},
}
- Continuous Emotion Detection in Response to Music Videos,
M. Soleymani, S. Koelstra, I. Patras, T. Pun. In International Workshop on Emotion Synthesis, rePresentation, and Analysis in Continuous spacE (EmoSPACE) In conjunction with the IEEE FG 2011,
pages 803-808,
2011.
[pdf]
[bibtex]
@inproceedings{Soleymani,
author = {M. Soleymani, S. Koelstra, I. Patras, T. Pun},
booktitle = {International Workshop on Emotion Synthesis, rePresentation, and Analysis in Continuous spacE (EmoSPACE) In conjunction with the IEEE FG 2011},
title = {{Continuous Emotion Detection in Response to Music Videos}},
pages = {803-808},
year = {2011},
abstract={We present a multimodal dataset for the analysis of human affective states. The electroencephalogram (EEG) and peripheral physiological signals of 32 participants were recorded as each watched 40 one-minute long excerpts of music videos. Participants rated each video in terms of the levels of arousal, valence, like/dislike, dominance and familiarity. For 22 of the 32 participants, frontal face video was also recorded. A novel method for stimuli selection is proposed using retrieval by affective tags from the last.fm website, video highlight detection and an online assessment tool. An extensive analysis of the participants' ratings during the experiment is presented. Correlates between the EEG signal frequencies and the participants' ratings are investigated. Methods and results are presented for single-trial classification of arousal, valence and like/dislike ratings using the modalities of EEG, peripheral physiological signals and multimedia content analysis. Finally, decision fusion of the classification results from the different modalities is performed. The dataset is made publicly available and we encourage other researchers to use it for testing their own affective state estimation methods.}
}
- Single Trial Classification of EEG and Peripheral Physiological Signals for Recognition of Emotions Induced by Music Videos,
S. Koelstra, A. Yazdani, M. Soleymani, C. Muehl, J.-S. Lee, A. Nijholt, T. Pun, T. Ebrahimi, I. Patras. In Conference on Brain Informatics,
pages 89-100,
2010.
[pdf]
[bibtex]
@inproceedings{Koelstra10_1,
author = {S. Koelstra, A. Yazdani, M. Soleymani, C. Muehl, J.-S. Lee, A. Nijholt, T. Pun, T. Ebrahimi, I. Patras},
booktitle = {Conference on Brain Informatics},
pages = {89-100},
year = 2010,
title = {{Single Trial Classification of EEG and Peripheral Physiological Signals for Recognition of Emotions Induced by Music Videos}},
abstract = {{Recently, the field of automatic recognition of users' affective states has gained a great deal of attention. Automatic, implicit recognition of affective states has many applications, ranging from personalized content recommendation to automatic tutoring systems. In this work, we present some promising results of our research in classification of emotions induced by watching music videos. We show robust correlations between users' self-assessments of arousal and valence and the frequency powers of their EEG activity. We present methods for single trial classification using both EEG and peripheral physiological signals. For EEG, an average (maximum) classification rate of 55.7\% (67.0\%) for arousal and 58.8\% (76.0\%) for valence was obtained. For peripheral physiological signals, the results were 58.9\% (85.5\%) for arousal and 54.2\% (78.5\%) for valence.}},
}
EEG analysis for implicit tag validation
- it's me all strapped in for an EEG experiment

- EEG signal differences for matching/nonmatching tags

I investigated the use EEG analysis for the implicit validation of tags related to multimedia data. That is, we tried to validate tags for multimedia content, based on the users' brainwaves as they watch the content. I did an experiment showing users a set of videos, along with matching or non-matching tags. As it turns out, there are significant differences in EEG signals between the trials where matching tags are displayed and those with non-matching tags. See also the slides of my presentation on this at the ABCI 2009 workshop. The QMUL-UT dataset we collected is now made publicly available.
- EEG analysis for implicit tagging of video data,
S. Koelstra, C. Muehl and I. Patras. In Workshop on Affective Brain-Computer Interfaces, Proc. ACII,
pages 27-32,
2009.
[pdf]
[bibtex]
@inproceedings{Koelstra09_2,
author = {S. Koelstra, C. Muehl and I. Patras},
title = {EEG analysis for implicit tagging of video data},
booktitle = {Workshop on Affective Brain-Computer Interfaces, Proc. ACII},
year = {2009},
pages = {27--32},
abstract = "In this work, we aim to find neuro-physiological indicators to validate tags attached to video content.
Subjects are shown a video and a tag and we aim to determine whether the shown tag was congruent with the presented video by detecting the occurrence of an N400 event-related potential.
Tag validation could be used in conjunction with a vision-based recognition system as a feedback mechanism to improve the classification accuracy for multimedia indexing and retrieval.
An advantage of using the EEG modality for tag validation is that it is a way of performing implicit tagging.
This means it can be performed while the user is passively watching the video.
Independent Component Analysis and repeated measures ANOVA are used for analysis.
Our experimental results show a clear occurrence of the N400 and a significant difference in N400 activation between matching and non-matching tags."
}
Facial Expression Analysis
- an example of facial Action Units

- a facial expression and its detected motion

- The HMM used to model the timing of expressions

I designed a system for automatic recognition of facial expressions. The idea is to automatically detect facial Action Units (AUs) and their temporal segments in frontal-view face videos.
I used a non-rigid registration technique to determine the motion in the input videos. Each video was then segmented in to a set of regions, from which features were extracted. A combination of an HMM and a boosting algorithm then detects the presence of AUs. See also the slides from my talk at the Face and Gesture Recognition 2008 conference. The accompanying paper was awarded the best student paper award.
- Non-rigid registration using free-form deformations for recognition of facial actions and their temporal dynamics,
S. Koelstra and M. Pantic. In Proc. IEEE Conf. Face and Gesture Recognition,
pages 1-8,
2008.
[pdf]
[bibtex]
@INPROCEEDINGS{Koelstra08,
author = {S. Koelstra and M. Pantic},
title = {Non-rigid registration using free-form deformations for recognition of facial actions and their temporal dynamics},
booktitle = "Proc. IEEE Conf. Face and Gesture Recognition",
year = {2008},
pages = {1-8},
abstract = "In this paper we propose an appearance-based approach to recognition
of facial Action Units (AUs) and their temporal segments in frontal-view face videos. Non-rigid registration using free-form deformations is used to determine motion in the face region of an input video. The extracted motion fields are then used to derive motion histogram descriptors. Per AU, a combination of ensemble learners and Hidden Markov Models detects the
presence of the AU in question and its temporal segment in each frame of an input sequence.
When tested for recognition of all 27 lower and upper face AUs, occurring alone or in
combination in 264 sequences from the MMI facial expression database, an average sequence classification rate of 94.3\% was achieved."
}
- A Dynamic Texture based Approach to Recognition of Facial Actions and their Temporal Models,
S. Koelstra, M. Pantic and I. Patras. In IEEE Trans. Pattern Analysis and Machine Intelligence,
vol. 32,
number 11,
pages 1940-1954,
2010.
[pdf]
[bibtex]
@article{Koelstra10,
author = {S. Koelstra, M. Pantic and I. Patras},
title = {A Dynamic Texture based Approach to Recognition of Facial Actions and their Temporal Models},
journal = {IEEE Trans. Pattern Analysis and Machine Intelligence},
pages={1940--1954},
year={2010},
volume={32},
number={11},
abstract = "In this work we propose a dynamic-texture-based approach to the recognition of facial Action Units (AUs, atomic facial gestures) and their temporal models (i.e., sequences of temporal segments: neutral, onset, apex, and offset) in near-frontal-view face videos. Two approaches to modelling the dynamics and the appearance in the face region of an input video are compared: an extended version of Motion History Images and a novel method based on Non-rigid Registration using Free-Form Deformations (FFDs). The extracted motion representation is used to derive motion orientation histogram descriptors in both the spatial and temporal domain. Per AU, a combination of discriminative, frame-based GentleBoost ensemble learners and dynamic, generative Hidden Markov Models detects the presence of the AU in question and its temporal segments in an input image sequence. When tested for recognition of all 27 lower and upper face AUs, occurring alone or in combination in 264 sequences from the MMI facial expression database, the proposed method achieved an average event recognition accuracy of 89.2\% for the MHI method and of 94.3\% for the FFD method. The generalization performance of the FFD method has been tested using the Cohn-Kanade database. Finally, we also explored the performance on spontaneous expressions in the Sensitive Artificial Listener dataset.",
}