ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article

Motion and Geometric Feature Analysis for Real-time Automatic Micro-expression Recognition Systems

[version 1; peer review: 1 approved with reservations, 1 not approved]
PUBLISHED 11 Oct 2021
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Research Synergy Foundation gateway.

Abstract

The trend of real-time micro-expression recognition systems has increased with recent advancements in human-computer interaction (HCI) in security and healthcare. Several studies in this field contributed towards recognition accuracy, while few studies look into addressing the computation costs. In this paper, two approaches for micro-expression feature extraction are analyzed for real-time automatic micro-expression recognition. Firstly, motion-based approach, which calculates motion of subtle changes from an image sequence and present as features. Then, secondly, a low computational geometric-based feature extraction technique, a very popular method for facial expression recognition in real-time. These approaches were integrated in a developed system together with a facial landmark detection algorithm and a classifier for real-time analysis. Moreover, the recognition performance were evaluated using SMIC, CASME, CAS(ME)2 and SAMM datasets. The results suggest that the optimized Bi-WOOF (leveraging on motion-based features) yields the highest accuracy of 68.5%, while the full-face graph (leveraging on geometric-based features) yields 75.53% on the SAMM dataset. On the other hand, the optimized Bi-WOOF processes sample at 0.36 seconds and full-face graph processes sample at 0.10 seconds with a 640x480 image size. All experiments were performed on an Intel i5-3470 machine.

Keywords

micro-expression recognition, facial feature extraction, real-time classification, geometric-based features, facial graph analysis, emotion classification

Introduction

A micro-expression is a brief, spontaneous facial expression that occurs on a human face in response to the emotions they are experiencing. The micro-expression contains a significant amount of information, and has attracted the interest of computer vision researchers because of its potential uses in security, interrogation, and healthcare.1-3 However, due to the speed of facial muscle movement, it is difficult to extract this information, and the features must be more detailed. A typical period of micro-expression is 200 milliseconds or less.4 For real-time micro-expression analysis and emotion recognition, the process involves pre-processing, feature extraction, and recognition. This paper examines two popular feature extraction approaches: motion-based features and geometric-based features. These approaches are reported to have reliable details from uncontrolled image data, which makes it feasible for real-time analysis.

A motion-based feature is constructed based on non- rigid motion changes of subtle expressions where motion changes are extracted for spotting purposes. Facial motion analysis was first presented in5 using optical flow to spot micro-expressions. Since then, several studies in this field have explored this approach for facial landmark detection and micro-expression recognition. In,6 authors proposed an optical flow features from Apex- frame network (OFF-ApexNet), which combines optical flow guided context with the convolutional neural network (CNN) to compute features. Then, authors in7 presented a novel algorithm that combines a deep multi-task convolutional network for detecting facial landmarks and a fused deep convolutional network for micro-expression features. In another study,8 authors suggested the Riesz pyramid and a multi-scale steerable Hilbert transform. While, Merghani and Yap9 proposed a new region-based method with an adaptive mask. However, the motion-based features reported recognition accuracy from the current studies peaks at 74.06% over the CASMEII using leave-one-subject-out cross validation (LOSOCV).6

On the other hand, geometric facial analysis deals with the locations and shapes of facial components. As highlighted in Liu et al.,10 performance of landmark detection algorithms is limited, where only a few studies utilized landmarks in early facial graph representations. However, with recent advancements in face analysis study, improved facial landmark detection algorithms are presented in several studies.11-14 Towards facial landmark graph features, Lei et al.,15 presented a method that only employed 28 brow and lip landmarks, which contributed significantly to micro-expressions. While, other studies16-19 presented graph-based methods using AU to define landmarks of interest. The reported recognition accuracies from these methods proved that micro-expression features can be extracted using facial graph approaches. However, the general problem with graph-based micro-expression recognition is the lack of large-scale in-the-wild datasets. To date, the recognition accuracy peaks at 87.33% over the SAMM dataset with LOSOCV as reported in Buhari et al.18

Methods

An automatic micro-expression recognition system is implemented for real-time facial analysis by integrating face landmark detection, feature extraction, and then classification. In the developed system, a trained model is generated using publicly available spontaneous micro-expression datasets. For micro-expression feature analysis, two different methods were implemented. Firstly, a Bi-Weighted Oriented Optical Flow (Bi-WOOF) feature descriptor by Liong et al.,20 is used. This method is a motion-based approach that uses optical flow to compute features, and it requires an apex-frame spotting before the feature computation. The BiWOOF is considered in this study as the results of its performance improvement over the textural feature extraction methods such as local binary patterns on three orthogonal planes (LBP-TOP) as reported in Liong et al.20 However, the computational cost poses challenges for real-time recognition as it require apex-frame spotting. The second feature descriptor considered is a full-face graph by Buhari et al.18 This geometric-based feature computation method requires only facial landmarks to compute features. The full-face graph is considered in this study because the computational time is significantly low in comparison with motion-based. However, it is reported that the earlier geometric-based methods could not detect hidden changes in facial components due to its subtleness and briefness.

Motion-based framework

Figure 1 illustrates the implemented real-time micro-expression recognition system developed using Bi-WOOF. This feature extractor requires at least two frames (i.e, neutral frame and apex-frame) to compute. Firstly, the system captures images of faces using dlib-19.4.21 Next, apex-frame spotting is applied using an automatic apex frame spotting method by Liong et al.,22 to identify the frame with the highest facial expression within the captured image sequences (i.e., processing sample). As reported by the authors in Liong et al.,22 the performance of their method improved over the annotated apex frames provided in micro-expression databases. Here, image sequences from spontaneous micro-expression datasets were utilized. Upon identifying the onset and apex frames, optical flow vectors are computed to define the face motion patterns: (i) magnitude: pixel movement intensity; (ii) orientation: flow motion direction; and (iii) optical strain: modest deformation intensity. Then, using the computed optical flow vectors (i.e., the magnitude, orientation, and the optical strain), Bi-WOOF features are formed. Step-by-step details of this method can be found in Liong et al.20 Figure 1 shows a framework of the real-time micro-expression recognition system using apex-frame spotting and BiWOOF feature extraction method.

d9862d8c-7611-46dc-b256-e494b2dac700_figure1.gif

Figure 1. Framework of the designed real-time micro-expression recognition system.

Geometric-based framework

Figure 2 illustrates the implemented real-time micro-expression recognition system developed using full-face graph. At first, facial landmark detection is applied to detect coordinates of facial components using dlib-19.4. Then, segments of lines are generated using the detected coordinates of facial components by connecting each landmark point (denoted as p) with subsequent landmark points (denoted as q), for p{1,2,,N} and q{1,2,,p}, where N=68. This concept is described as a full-facial graph using landmark points, and segments are generated as follows:

d9862d8c-7611-46dc-b256-e494b2dac700_figure2.gif

Figure 2. Frame-based micro-expression recognition system.

(k)={p,q}k{1,2,,p×q}.

The indexes (i.e., p,q) of every landmark with subsequent landmark points are determined and stored as segments in ℑ for feature computations. After the graph is generated, features are computed by calculating the Euclidean distance and gradient of every segment, an idea presented in Buhari et al.18 The total number of features computed using this technique is N×(N1)K, which translates to 4,556 features at N=68.

To further analyse the potential performance improvement of the geometric-based features, Eulerian motion magnification (EMM) is applied to the images to amplify the micro-expressions prior to the landmark detection process. Eulerian-inspired approaches23,24 do not require explicit motion vectors but emulate motion magnification by magnifying property changes, such as amplitude (denoted as A-EMM) or phase (denoted as P-EMM). According to Le et al.24 A-EMM outperformed P-EMM in terms of recognition rates over a broad range of magnification factors. Thus, this paper considered the A-EMM to the images before the feature computations. Details of the methods for A-EMM are detailed in Le et al.24 Figure 3 illustrates the principle of the integrated a magnification sub-process to the implemented single-frame sample with a geometric-based features system.

d9862d8c-7611-46dc-b256-e494b2dac700_figure3.gif

Figure 3. Frame-based micro-expression recognition system with A-EMM.

Experiment settings

The experiments were performed using four spontaneous datasets: (i) spontaneous micro-expression dataset (SMIC) dataset,25 (ii) Chinese Academy of Sciences Micro-expression (CASMEII) dataset,26 (iii) spontaneous macro-expressions and micro-expressions (CAS (ME)2) dataset,27 and (iv) spontaneous actions and micro-movements (SAMM) dataset.28 These are spontaneous micro-expression datasets which were used in this study with full details of these datasets in Li et al.,25 Yan et al.,26 Qu et al.,27 and Davison et al.28 Details to acquire these datasets used in this study are available at www.oulu.fi/cmvs/node/41319 for SMIC,25 fu.psych.ac.cn/CASME/casme2-en.php for CASMEII,26 fu.psych.ac.cn/CASME/cas (me)2-en.php for CAS (ME)2,27 and personalpages.manchester.ac.uk/staff/adrian.davison/SAMM.html for SAMM.28 Moreover, to evaluate the performance using a larger dataset, this paper merged the four datasets to form a COMBINED dataset. The COMBINED dataset is created from the raw images of all the four datasets. The steps for generating the COMBINED datasets includes; face detection, face cropping, colour-space conversion to grayscale, and image re-scaling to 140×170. From these steps, colour-space conversion is applied to adopt the SAMM dataset samples as provided in grayscale format, where the sample re-scaling to 140×170 adopts the SMIC dataset cropped image sizes (i.e., the smallest cropped image size considered to provide reliable features description, and achieve high speed performance for real-time micro-expression recognition). The image re-scaling utilises a down-sampling technique by Buhari et al.,29 in order to re-produce a high quality down-scaled samples. In addition, the COMBINED dataset adopted the SMIC dataset sample labelling by re-grouping the seven classes of emotions (i.e., happiness, sadness, anger, surprise, fear, contempt, and disgust) to three classes (i.e., positive, negative, and surprise). Here, positive {happiness}, negative {sadness,anger,fear,contempt,disgust}, and surprise {surprise}. Figure 4 illustrates the COMBINED dataset formation from the four spontaneous datasets. Note that the participant images utilised in Figure 4 are the publishable images with the consent of participants, as stated in the documentation from each study. While, Table 1 summarises the selected spontaneous micro-expression datasets used in this study. In this table, the COMBINED dataset is denoted as δ.

d9862d8c-7611-46dc-b256-e494b2dac700_figure4.gif

Figure 4. COMBINED dataset formation, sample images were taken from SMIC, CASMEII, CAS (ME)2, and SAMM datasets as labelled.

Table 1. Summary of spontaneous micro-expression datasets for analysis.

DatasetsFrame rateSubjectsSamplesClasses
SMIC100201643
CASMEII200352475
CAS (ME)230223413
SAMM200321597
δ949113

Recognition accuracy

Table 2 records the recognition accuracy of the baseline Bi-WOOF,20 (denoted as BBW), optimized Bi-WOOF, (denoted as OBW), full-face graph, (denoted as FFG) and full-face graph with A-EMM, (denoted as FFG+M). The baseline Bi-WOOF is referred to the original method by Liong et al.,20 which was implemented using MATLAB, then the optimized Bi-WOOF refers to the implemented C++ version that accelerates the computation performance for real-time analysis. All the four experimental setups utilized the Support Vector Machines (SVM) classifier with a Radial basis function (RBF) kernel. The SVM hyper-parameter selection is based on the recommendation in Bergstra and Bengio.30 This is described as an optimised hyper-parameter selection technique for best classification performance in comparison to the sequential tuning in a context of a model with many hyper-parameters. In addition, all measured accuracies are based on LOSOCV. Similarly, the COMBINED dataset is denoted as δ in Table 2.

Table 2. Recognition accuracy (%).

SMICCASMEIICAS (ME)2SAMMδ
BBW62.2062.5259.1165.2266.01
OBW65.2963.3257.8968.5069.15
FFG74.6274.4175.1174.3377.05
FFG+M75.0174.5576.2175.5377.85

Results

Table 2 presents the recognition accuracies of the baseline Bi-WOOF (denoted as BBW), the optimized Bi-WOOF (denoted as OBW), full-face graph (denoted as FFG) and the full-face graph with A-EMM (denoted as FFG+M). Here, the BBW and OBW yield the highest recognition accuracies of 66.01% and 69.15%, respectively, over the COMBINED dataset. Similarly, FFG and FFG+M yield the highest recognition accuracy of 77.05% and 77.85%, respectively, over the COMBINED dataset. From these results, it is observed that the OBW improved the performance of the BBW by up to 3.28% over the SAMM dataset. On the other hand, the implemented full-face graph with A-EMM improved the performance by up to 1.20% over the SAMM dataset. Then, in comparison with optimized Bi-WOOF and full-graph with A-EMM improved the performance by 9.72%, 11.23%, 18.32%, 7.03% and 8.7% over SMIC, CASMEII, CAS (ME)2, SAMM and COMBINED datasets, respectively. Moreover, Table 3 compares the accuracies of the optimized BiWOOF with other motion-based methods, where Table 4 compares the accuracies of the full-face graph with other geometric-based methods.

Table 3. Accuracy comparison table: Optimized Bi-WOOF with other motion-based features.

MethodFeatureClassifierLOSOCV accuracy (%)
SMICCASMEIICAS (ME)2SAMM
Liong et al.20Bi-WOOFSVM62.2058.8559.26-
Li et al.31MESR+LBPLSVM57.9355.87--
MESR+HIGOLSVM65.2457.09--
MESR+HOGLSVM57.9357.49--
Liong and Wong32Bi-WOOF + PhaseSVM68.2962.55--
Liong et al.6OFF-ApexNetSoftmax67.6874.06-68.18
Li et al.7deep-NN + Revised HOOFSVM-58.03--
Merghani and Yap9ROI + Adaptive MaskSVM-68.20-56.10
Duque et al.8Riesz Pyramid + MORFSVM54.8845.93--
Re-implemented20Optimized Bi-WOOFSVM65.2963.3257.8968.50

Table 4. Accuracy comparison table: Full-face graph with other geometric-base features.

MethodFeatureClassifierLOSOCV accuracy (%)
SMICCASMEIICAS (ME)2SAMM
Lei et al.15G-TCNSoftmax-73.98-75.00
Xie et al.16AU-GACNSoftmax-56.10-52.30
Buhari et al.18+Full-face graphSVM66.9073.4572.8380.23
+FACS-based graphSVM76.6775.0481.4187.33
Zhou et al.17MER-auGCNSoftmax-70.80-66.20
Liu et al.19Sparse MDMOSVM70.5166.95--
Experiment IFull-face GraphSVM74.6274.4175.1174.33
Experiment IIFull-face Graph + A-EMMSVM75.0174.5576.2175.53

Discussion

Table 3 lists the performance of benchmark motion-based methods,6-9,20,31,32 with the optimized Bi-WOOF. The best reported accuracy is 74.06% over the CASMEII dataset.6 Looking at Bi-WOOF+Phase,32 the highest performance reported is 68.29% over the SMIC dataset, which outperformed the baseline and the optimized Bi-WOOFs. However, the optimized Bi-WOOF outperformed the reported accuracies in other studies,7-9,31 as shown in Table 3.

On the other hand, Table 4 lists the benchmark geometric-based methods15-19 with the full-face graph and full-face graph + A-EMM, which are denoted as experiment I and experiment II, respectively. From these results, Buhari et al.,18 reported the highest accuracies of 76.67%, 75.04%, 81.41%, and 87.33% over the SMIC, CASMEII, CAS (ME)2, and SAMM datasets, respectively. In Buhari et al.,18 the full-face graph utilized 68 landmarks from the raw images (denoted as ). While, the full-face graph in experiment I and experiment II yield 74.62%, 74.41%, 75.11%, 74.33% and 75.01%, 74.55%, 76.21%, 75.53% over the SMIC, CASMEII, CAS (ME)2 and SAMM datasets, respectively. From these results, the full-face graph in experiment II outperformed the full-face graph presented in18 with 8.11%, 1.1%, and 3.38% over the SMIC, CASMEII, and CAS (ME)2 datasets, respectively. While, on the other hand, Buhari et al.,18 outperformed the results in experiment II with 4.7% over the SAMM dataset. While, in comparison with reported accuracies presented in Table 3, the full-face graph in experiment II achieved the highest performance.

Looking at the performance presented in Tables 3 and 4, the performance reported in Buhari et al.,18 registered the highest accuracy of 87.33% over SAMM datasets, respectively. While, the full- face with A-EMM outperformed the full-face graph performance presented in Buhari et al.18 From these results, the conclusion can be drawn that the geometric-based features are competing closely with the motion-based features. In terms of the computational time, the optimized Bi-WOOF running time is 0.36 seconds per sample (i.e., 2.7fps), while the full-face graph running time is 0.10 seconds per sample (i.e., 10fps), with a 640×480 image resolution, on an Intel i5-3470 machine. The reported running times include facial landmark detection and classification.

Conclusions

This paper analyzed the performance of motion-based features (i.e., Bi-WOOF) and geometric-based features (i.e., Full-face graph) for real-time micro-expression recognition systems. The results indicate that the optimized Bi-WOOF improved recognition accuracy of the baseline Bi-WOOF by up to 3.28% over the SAMM dataset. While, on the other hand, the full-face graph performance is improved by up to 1.20% with A-EMM over the SAMM dataset. Moreover, the full-face graph and full-face graph with A-EMM exhibit significant performance improvement over the baseline and the optimized Bi-WOOF by up to 18.32%. Though the full-face graph improved performance of recognition accuracy, the processing time could limit the readiness of the full-face graph features for real-time systems using high-speed cameras.

Data availability

Underlying data

The experiments were performed using four spontaneous datasets: (i) spontaneous micro-expression dataset (SMIC) dataset,25 (ii) Chinese Academy of Sciences Micro-expression (CASMEII) dataset,26 (iii) spontaneous macro-expressions and micro-expressions (CAS (ME)2) dataset,27 and (iv) spontaneous actions and micro-movements (SAMM) dataset.28 These are spontaneous micro-expression datasets which were used in this study with full details of these datasets in Li et al.,25 Yan et al.,26 Qu et al.,27 and Davison et al.28 Details to acquire these datasets used in this study are available at www.oulu.fi/cmvs/node/41319 for SMIC,25 fu.psych.ac.cn/CASME/casme2-en.php for CASMEII,26 fu.psych.ac.cn/CASME/cas (me)2-en.php for CAS (ME)2,27 and personalpages.manchester.ac.uk/staff/adrian.davison/SAMM.html for SAMM.28 Moreover, to evaluate the performance using a larger dataset, this paper merged the four datasets to form a COMBINED dataset. The COMBINED dataset is created from the raw images of all the four datasets with the source code available under extended data.

Extended data

Zenodo: Implementation of COMBINED micro-expression dataset and Setup files for real-time micro-expression recognition using motion and geometric features. https://doi.org/10.5281/zenodo.5524141.33

The project contains the following extended data:

  • Real-time micro-expression recognition using biwoof features (executable setup for micro-expression recognition using motion-based features).

  • Real-time micro-expression recognition using full-face graph features (executable setup for micro-expression recognition using geometric-based features).

  • Image re-scaler for COMBINED micro-expression dataset formation (Visual Studio 2010 source code written in C++).

Data are available under the terms of the Creative Commons Zero (CC0 v1.0 Universal).

Zenodo: Performance analysis of micro-expression recognition over different sample image sizes. https://doi.org/10.5281/zenodo.5379773.34

This project contains the following extended data:

  • Performance improvement over 140×170 sample size.

  • Performance improvement over 240×340 sample size.

  • Performance improvement over 560×680 sample size.

  • Performance improvement over 1120×1360 sample size.

Data are available under the terms of the Creative Commons Zero (CC0 v1.0 Universal).

Author contributions

Conceptualization, Investigation, Methodology, Formal Analysis, Software, Visualization, Writing – Original Draft Preparation, Review & Editing; AM Buhari: Conceptualization, Resources, Formal Analysis, Methodology, Supervision, Writing – Review & Editing; CP Ooi: Conceptualization, Formal Analysis, Supervision, Validation, Writing – Review & Editing; VM Baskaran. Conceptualization, Methodology, Writing – Review & Editing; WH Tan.

Competing interests

No competing interests were declared.

Grant information

The authors declared that no grants were involved in supporting this work.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 11 Oct 2021
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Muhammad Buhari A, Ooi CP, Baskaran VM and Tan WH. Motion and Geometric Feature Analysis for Real-time Automatic Micro-expression Recognition Systems [version 1; peer review: 1 approved with reservations, 1 not approved]. F1000Research 2021, 10:1029 (https://doi.org/10.12688/f1000research.72970.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 11 Oct 2021
Views
14
Cite
Reviewer Report 06 Dec 2021
Harshadkumar B. Prajapati, Department of Information Technology, Dharmsinh Desai University, Nadiad, Gujarat, India 
Not Approved
VIEWS 14
This manuscript performs experimental evaluation of two different methods: motion based and geometric based for facial expression classification. The manuscript contains some experiments and results, but the aims of the experiments appear to be random. 
  • The
... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Prajapati HB. Reviewer Report For: Motion and Geometric Feature Analysis for Real-time Automatic Micro-expression Recognition Systems [version 1; peer review: 1 approved with reservations, 1 not approved]. F1000Research 2021, 10:1029 (https://doi.org/10.5256/f1000research.76586.r101402)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
20
Cite
Reviewer Report 25 Oct 2021
Madhumita Takalkar, School of Computer Science, University of Technology Sydney, Ultimo, NSW, Australia 
Approved with Reservations
VIEWS 20
The paper talks about optimizing Bi-WOOF and using geometric based features to detect and recognise facial micro-expressions. Please find my comments as below.
 
  1. The idea does not sound novel. There have been further developments
... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Takalkar M. Reviewer Report For: Motion and Geometric Feature Analysis for Real-time Automatic Micro-expression Recognition Systems [version 1; peer review: 1 approved with reservations, 1 not approved]. F1000Research 2021, 10:1029 (https://doi.org/10.5256/f1000research.76586.r96609)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 11 Oct 2021
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.