Taking facial expression recognition outside the lab and into the wild by using challenging datasets and improved performance metrics

Khalid Ibrahim Adem; Kannan Ramakrishnan; Kousalya Govardhanan; Rathimala Kannan

doi:10.12688/f1000research.73630.1

Home Browse Taking facial expression recognition outside the lab and into the...

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Research Article

Taking facial expression recognition outside the lab and into the wild by using challenging datasets and improved performance metrics

[version 1; peer review: 2 not approved]

Khalid Ibrahim Adem¹, Kannan Ramakrishnan ², Kousalya Govardhanan³, Rathimala Kannan⁴

PUBLISHED 23 Mar 2022

Author details Author details

¹ Front-End and Business Intelligence Department, Deriv, Cyberjaya, Selangor, 63100, Malaysia
² Faculty of Computing and Informatics, Multimedia University, Cyberjaya, Selangor, 63100, Malaysia
³ Computer Science & Engineering Department, Coimbatore Institute of Technology, Coimbatore, Tamilnadu, 600025, India
⁴ Department of Information Technology, Faculty of Management, Multimedia University, Cyberjaya, Selangor, 63100, Malaysia

Khalid Ibrahim Adem
Roles: Data Curation, Formal Analysis, Methodology, Resources, Software, Writing – Original Draft Preparation

Kannan Ramakrishnan
Roles: Conceptualization, Methodology, Project Administration, Supervision

Kousalya Govardhanan
Roles: Validation, Writing – Review & Editing

Rathimala Kannan
Roles: Visualization, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the Research Synergy Foundation gateway.

This article is included in the Artificial Intelligence and Machine Learning gateway.

Abstract

Background: Facial expression recognition is a challenging field, evident by the ineffectiveness of current state-of-the-art techniques that aim to classify facial expressions. Despite showing high levels of accuracy, these methods perform poorly in real-life implementation. This poor performance is because the training sets used are usually simple, limited, and in a controlled lab environment.
Methods: This paper explores newer datasets that consist of images taken in challenging conditions with many variations. Using such datasets improves the accuracy of classification because it exposes the model to a variety of samples. In addition, we used new performance metrics to reflect the challenging conditions for classification. We reviewed the current best techniques for expression recognition and laid out a method to design an improved deep neural network using AffectNet, a newer and more challenging dataset. The implementation method is an iterative process that trains a convolutional neural network on challenging datasets, evaluates the result, and improves the model by tweaking its parameters. The models are also evaluated with new metrics like cross-dataset accuracy and mean accuracy drop.
Results: We found that the best performing model was the Visual Geometry Group 16 layer (VGG16) model, with a training accuracy of 81.05%, an improvement of 9.05% compared to AlexNet, the next best model trained on the same dataset, and testing accuracy of 70.69%, compared to 64% for AlexNet. The proposed model configuration was also assessed with cross-dataset accuracy scoring 42.02% and outperforming Inception V3, the next best model with a score of 28.96%, on the same metric.
Conclusions: The research resulted in improved accuracy of classifying expressions due to a better, more challenging dataset. In addition, we used new metrics that give us a better picture of the model’s robustness.

Keywords

Machine Learning, Deep Learning, Neural Networks, Affect Recognition, In-The-Wild, Facial Expression Recognition, Cross-Dataset Accuracy

Corresponding author: Kannan Ramakrishnan

Competing interests: No competing interests were disclosed.

Grant information: This work was supported by Multimedia University.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2022 Ibrahim Adem K et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Ibrahim Adem K, Ramakrishnan K, Govardhanan K and Kannan R. Taking facial expression recognition outside the lab and into the wild by using challenging datasets and improved performance metrics [version 1; peer review: 2 not approved]. F1000Research 2022, 11:349 (https://doi.org/10.12688/f1000research.73630.1) First published: 23 Mar 2022, 11:349 (https://doi.org/10.12688/f1000research.73630.1) Latest published: 23 Mar 2022, 11:349 (https://doi.org/10.12688/f1000research.73630.1)

Introduction

This paper is an examination of facial expression recognition (FER) in the context of artificial intelligence. It aims to look at the field of FER critically by studying its sub-fields and summarizing its state-of-the-art methods.

There are plenty of methods and techniques for facial expression recognition out there. However, these methods are not without limitations. The primary issue with current methods of FER is that they do not work in real life despite showing high-accuracy results in the lab. This incongruence prevents the utilization of FER in the mainstream industry. That is the main problem that this paper is addressing.

The literature review section will start by looking at current research in the field and will identify the trends that the field is taking. The section will summarize several research papers outlining various methods and end with the findings from the review of the current literature.

The methods section will give the theoretical framework needed for the project and outline the research methodology and the criteria used for evaluation. Then it will go to an in-depth layout of the proposed improvements and how they will be arrived at, implemented, and tested.

The evaluation section will examine, evaluate, and compare the results of training and testing to select the best performing model.

The last section will conclude by specifying ways to improve the methods and to implement future work.

Literature review

Facial expression recognition began more than two decades ago, and since that time, many developments have taken place.¹ The general process for FER starts with face detection, which is fairly accurate and is easy to implement. Next, we preprocess the dataset images for feature extraction. This step was done separately before the advent of deep neural networks. Methods like local binary pattern (LBP),² local gradient coding (LGC),³ local directional pattern (LDP),⁴ histogram of gradients (HOG),⁵ etc.⁶ are used. Then the classification process is done using k-nearest neighbor (KNN) or support vector machines (SVM). To classify emotions, we need to categorize them first. There are two main kinds of categorizations that are commonly used for classification. The first one is the descriptive coding scheme which uses action units to describe facial features. Action units do not describe facial expressions directly. Instead, they describe how a face looks which in turn helps us classify the emotion behind the expression (since there is no apparent link between how a facial expression is and the emotion that generates it).¹^,⁷ Here, the classification model classifies the action units in a face so that a combination of action units will likely correspond to a particular emotion.

The second is the judgment model, which, on the other hand, describes expressions based on the latent emotions that generate the facial expression. So instead of describing what the face looks like, it directly classifies the emotion that generates it, resulting in a multi-class classification of 6 or 7 primary emotions, usually based on the Ekman model.⁸ The classification can also be on static images or on sequences of images that start from a neutral expression and then proceed to shift to the full extent of the emotion.⁹

Deep neural networks have been very powerful in FER tasks.¹ However, most of the experiments carried out were on datasets that were in controlled environments, where the subjects were either asked to show a particular emotion or were shown videos that induced the needed emotion. The accuracy metrics seen for these models are very high. However, when we test the same methods on in-the-wild datasets, the results are comparatively poor.¹⁰ In-the-wild datasets include images captured where people were not aware that they were being filmed, making the expressions natural and spontaneous, or in harsh lighting conditions, tilted head angles, obstructed or occluded subjects, making classification more challenging. In-the-wild datasets were rare, but now more of them are made publicly available. Some examples include AffectNet,¹¹ and RAF-DB¹² dataset. Other trends that are now common include ensemble methods, where multiple techniques are combined; Many more deep learning models combine multiple neural networks that focus on different parts of the FER process.

Another important thing to note is that most FER models are trained and tested on the same dataset, making these models prone to biases present in the dataset. These models, therefore, perform poorly when tested on datasets that are different from the ones on which they were trained.¹³

One of the research gaps that we identified from the challenges in the field is the shortage of models trained on challenging in-the-wild datasets. Training is usually done on datasets created in lab-controlled environments, which lack the nuance and variety present in real-life, in-the-wild scenarios. Since these datasets are relatively new, they were not explored enough, and very few models were built and tailored for these datasets.

In addition, the evaluation metrics used for these kinds of datasets are limited to the old metrics that relied only on the accuracy level. These metrics worked well when the datasets were simple. Therefore, there is a need for incorporating more effective performance metrics, like cross-dataset accuracy, cross-dataset accuracy drop, and other similar metrics that better test the robustness of the model by exposing it to different data.¹³ Combining in-the-wild datasets and enhanced metrics will limit dataset bias and give us a better idea of how the model would perform in novel conditions.

This paper aims to:

• Study and analyze the best state-of-the-art methods for FER. This is important because an exploration of the field helps improve the understanding of the technologies, patterns, and research methodologies used to improve currents methods or introduce new ones
• Evaluate the current best methods and compare them to one another using various metrics to find the best optimal methods
• Introduce newer metrics that are not widely adopted but are nonetheless better at evaluating the proposed models
• Improve the current methods by devising new algorithms or improving the current ones. The goal here is to maximize the best results in at least one of the evaluation metrics generally used in the field or the newly introduced metrics

The proposed method attempts to address the gaps in the literature by using the most extensive in-the-wild dataset available. The model is also a deep neural network, which are the most accurate types of FER classification models thus far.¹⁴ Finally, we have used metrics that will reduce dataset bias and address the current limitations of FER evaluation methods.

Methods

The process used to arrive at the model involves data collection, preprocessing, model building and improvement over existing state-of-the-art deep learning models, model training, and evaluation. We check the results to see if there are any improvements in the metrics measured. If not, we improve upon the model by tweaking the neural network parameters, and the process reiterates until a better model is achieved. Figure 1 gives an overview of the process used.

Figure 1. Research Methodology.

Dataset

The primary dataset used for training the models is AffectNet.¹¹ The dataset contains 450,000 manually annotated images, out of which 27,000 were used for the training. We made this reduction because of the enormous size of the dataset and the long training time, especially without graphical processing units (GPUs).

Another dataset used was the Japanese Female Facial Expression (JAFFE) dataset.¹⁵ It is a simple dataset, collected in controlled conditions, and contains a little over 600 images. The dataset has female subjects pose and act out expressions to create five classes.

Training the deep learning models on JAFFE yields an accuracy of 100% due to the simplicity of the dataset. However, in this project, it is only used for cross-dataset accuracy. The models were trained on AffectNet, and then JAFFE was used as a validation set. This is because JAFFE is one of the most widely used classical datasets for FER research.

Pre-processing

The AffectNet dataset is divided into multiple directories with randomly allocated images in each directory. The dataset is indexed using a CSV file, with the first column containing the directory and name of the images. The other columns contain all the details of the images: the expression class, valence, arousal, and other properties. Therefore, most of the preprocessing work was done on the index file. This includes removing valence and arousal and other data points that we did not need for this model, the random selection of 27,000 images, and finally, the arrangement of these images.

A problem arises with the unequal distribution of data. For example, there are more images of the “Happy” class compared to other classes. This bias in the data sample distribution will make the model biased towards some classes since they are statistically more likely to appear, resulting in a model that does not generalize well.¹⁶ Therefore, we set a class weight attribute that gives higher weights to classes with fewer images. This helps in improving the results and compensates for the low volume of images in certain classes. The class weights are presented in Table 1.

Table 1. Class weights.

Class	Weight
Neutral	0.2
Happy	0.1
Sad	1.0
Surprised	1.0
Fear	5.0
Disgust	8.0
Anger	1.0

Model

The proposed model is a variant of VGG16. It predicts seven primary expressions: Neutral, Happy, Angry, Sad, Surprised, Afraid, and Disgusted. Below are the specifications of the most optimum model found after the iterative process:

We fine-tuned the model using the ImageNet weights.¹⁷^,¹⁸ Fine-tuning or transfer learning is a deep learning technique in which models are first trained to a high classification accuracy on one general purpose dataset (general object recognition using the ImageNet dataset in this case) and the learned weights are transferred to another model. The model is re-purposed by freezing the last couple of layers and retraining them with a different dataset for a more domain-specific classification task (facial expression recognition in this case, using AffectNet). This is done to speed up the training and to improve the classification accuracy.

We did not have to train the VGG16 model on ImageNet because it comes pre-trained and the weights pre-installed with Keras. We retrained the last six layers of the model and added four more layers. The top of the model was not included so that the output layer can be changed to match the seven classes of expressions. We also modified the model by adding a few layers to it:

1. Flattening layer. This is done to flatten the output of the previous layer to a one-dimensional vector
2. A Dense layer or a fully-connected layer with an output of 1024 and ReLU activation function
3. A Dropout layer with a dropout rate of 0.5
4. A Dense layer with seven outputs (for seven classes) and a Softmax activation function.

These last ten layers were trained on AffectNet. The loss function used is categorical crossentropy, which has shown the best results for deep neural networks. We set the learning rate at 0.0001 and then trained the model for ten epochs, with a batch size of 100. We used the Keras library (version 2.3.0) built on top of TensorFlow (version 2.3.0) and programmed in Python (version 3.6) to build and train the model.

Evaluation

Evaluation metrics are an essential part of the process. They should give us a good idea of the robustness of the model. Standard metrics like accuracy are helpful, but they do not assess how adaptive the model is to different settings where the images used as input are taken in conditions completely different from the controlled environment of the lab. Therefore, we use cross-dataset accuracy or mean cross-dataset accuracy as additional metrics.

The main metrics used for evaluation are:

• Accuracy – This metric measures the percentage of the labels guessed correctly by the model for the testing sample provided. It is the most widely used metric in all of the previously conducted research. The dataset needs to be divided into a training set and a test set to implement this metric. This will be done using k-fold cross-validation.¹⁹
We can determine the accuracy value by calculating the sum of true positive ( $TP$ ) and true negative ( $TN$ ), dividing by the total number of samples.
$Accuracy = \frac{TP + TN}{total} \times 100$
• Cross-dataset accuracy – This metric is rarely used (because it is newly introduced in FER) but is important nonetheless. It measures the model’s accuracy by training it on one dataset and testing it on a different dataset. This metric is important because it shows how well the model generalizes in very challenging, dataset-independent conditions. It reduces dataset bias while training models and is a much more solid metric to use for evaluation.²⁰^,¹³
• Mean cross-dataset accuracy drop – Similar to the previous metric, except that it measures the percentage of the drop in accuracy when the model is trained on one dataset and tested on another.
If $a_{d}$ denotes the accuracy of the model when tested on the training dataset and $a_{i}$ is the accuracy of the model when tested on dataset $i$ , then the mean accuracy drop $A_{n}$ for $n$ datasets is given by the equation:
$A_{n} = \frac{1}{n} \sum_{i = 1}^{n} \frac{| a_{d} - a_{i} |}{a_{d}}$
The lower the percentage of the drop, the better the results. However, this metric is tricky since it only shows the relative accuracy of the model between datasets but does not show the absolute robustness of the model. Therefore, it must be used with great care and scrutiny of the results.

Results

This section will cover the results of training the VGG16 model on the AffectNet dataset. First, we will look at the accuracy results that were measured using k-fold cross-validation. Then we will look at the results from the cross-dataset accuracy evaluation using the JAFFE dataset. Figures 2 to 4 shows the training accuracy, testing accuracy, and loss respectively for VGG16 model over the number of epochs. Figure 5 shows the cross-dataset accuracy with the training on AffectNet dataset and testing on JAFFE dataset.

Figure 2. VGG16 train accuracy.

Figure 3. VGG16 test accuracy.

Figure 4. VGG16 loss value.

Figure 5. VGG16 cross-dataset accuracy.

Accuracy

The accuracy of the VGG16 model is higher than the other models tested as shown in Table 2 and is also higher than the models in other studies as shown in Table 3. It peaks at 81.05% for the training set as seen in Figure 2 and 70.69% for the test set as seen in Figure 3.

Table 2. Summary of evaluation metrics.

Model tested	Training accuracy	Testing accuracy	Cross-dataset accuracy
AlexNet¹¹	72%	54%	N/A
Inception V3	59.88%	24%	28.96%
ResNet50	70.66%	15.38%	28.41%
ResNext101	29.31%	15.38%	24.88%
VGG16	81.05%	70.69%	42.02%

Also, looking at the loss values in Figure 4, it is clear that the model gets better with more epochs, and the loss values for both testing and training sets are steadily decreasing, despite some fluctuations in the test set’s loss value. These fluctuations can occur due to the weights not being reset after each epoch, which preserves the model’s weights from previous epochs. The training loss, however, is not affected by that.

Cross-dataset accuracy

Using the AffectNet dataset for training and JAFFE dataset for cross-dataset testing, we can see improved results for VGG16 model compared to the other models tested by us, as shown in Table 2. Figure 5 shows that the cross-dataset accuracy is steadily increasing, peaking at 42.02%. Unfortunately, we cannot use this value to compare this model with models from other studies since it is a metric that is not widely adopted by the field yet, and most of these studies did not use it.

Discussion

As we can see from Table 2, the fine-tuned VGG16 model has proven to have the highest accuracy. This is because of its appropriate size and its utilization of the pre-trained ImageNet weights. We tweaked and evaluated several additional models to arrive at this result. We tested the inception V3, ResNet50, and ResNext101 with different configurations and listed the results of their best variations in Table 2. Unfortunately, their performance does not come close to VGG16. The table also includes the AlexNet model proposed by the creators of the AffectNet dataset. To the best of our knowledge, it was the only proposed model that was trained on the same dataset we used and is a perfect candidate for comparison.

Table 3 shows a comparison of our model with other similar models that were trained on different datasets. We can see that the first two models, being trained on classic datasets (MMI and JAFFE), generally have lower accuracy when compared with models that were trained on In-The-wild datasets (FER2013 and RAF-DB).

Table 3. Accuracy comparison with other similar models.

Method	Dataset used	Accuracy
Miao et al.²¹	MMI + JAFFE	65%
Mayer et al.²²	MMI	66%
Wen et al.²³	FER2013	76%
DETN¹⁴	RAF-DB	78%
Mollahosseini et al.¹¹	AffectNet	72%
Our Model (VGG16)	AffectNet	81%

Since the AffectNet dataset is relatively new, many more studies must be conducted to better utilize this huge dataset along with all the possibilities it has to offer.

The given models can be improved if the following is applied:

• Training with a larger number of images, preferably the entire manually annotated dataset
• Training on an in-the-wild video dataset
• The hardware is improved by utilizing GPUs for the training process
• The model is trained for a larger number of epochs

In addition to improving the models, the findings can be improved if the existing FER techniques from other papers were tested using cross-dataset accuracy and then compared since it is a better metric for comparison. This can also be done by cross-testing on multiple datasets to better understand how the models compare to each other.

We can also apply these metrics to individual classes to give us a better idea about which expressions are easier to classify and which ones need more data.

Conclusion

The research in facial expression recognition is an exciting frontier for machine learning. This paper lays out a systematic proposal for a model that attempts to improve the current best models and uses better metrics for evaluation.

The paper started by examining the current literature of the field through a critical lens. We reviewed current trends and state-of-the-art methods used for FER. We then identified a research gap in which a contribution can be made: the lack of models trained on challenging datasets and evaluated through newer, more robust metrics like cross-dataset accuracy.

The proposed method had better accuracy compared to similar models trained on the same dataset. In addition, we evaluated it using cross-dataset accuracy, a metric that is better at assessing the utility of the model in real-life scenarios and challenging conditions.

The field of FER is still primarily confined to the lab. There is a lot more that we can do to improve the robustness of FER models so they can see broader adoption in the mainstream industry. Having systems that can read and understand our emotions is the next big step in human-computer interactions, and this is only the beginning!

Data and software availability

Source data

This paper primarily uses the AffectNet dataset¹¹ for training and accuracy testing. The dataset can be found in the AffectNet Dataset website. The dataset is strictly for non-commercial research use. Permission to use the dataset is obtained by directly contacting the creators of the dataset through this request form.

The second dataset, which is used for cross-dataset accuracy testing is the Japanese Female Facial Expression (JAFFE) dataset.¹⁵ It can be found at the JAFFE Dataset repository. It is also strictly for non-commercial scientific research and permission to use the dataset can be obtained through this request form after accepting the terms and conditions of use.

References

1. Corneanu CA, Simón MO, Cohn JF, et al.: Survey on rgb, 3d, thermal, and multimodal approaches for facial expression recognition: History, trends, and affect-related applications. IEEE Trans. Pattern Anal. Mach. Intell. 2016; 38(8): 1548–1568. PubMed Abstract | Publisher Full Text
2. Huang X, Wang S-J, Zhao G, et al.: Facial micro-expression recognition using spatiotemporal local binary pattern with integral projection. Proceedings of the IEEE international conference on computer vision workshops. 2015; pages 1–9.
3. Tong Y, Chen R, Cheng Y: Facial expression recognition algorithm using lgc based on horizontal and diagonal prior principle. Optik. 2014; 125(16): 4186–4189. Publisher Full Text
4. Taskeed Jabid M, Kabir H, Chae O: Robust facial expression recognition based on local directional pattern. ETRI J. 2010; 32(5): 784–794. Publisher Full Text
5. Carcagnì P, Del Coco M, Leo M, et al.: Facial expression recognition and histograms of oriented gradients: a comprehensive study. Springerplus. 2015; 4(1): 645. PubMed Abstract | Publisher Full Text
6. Kumari J, Rajesh R, Pooja KM: Facial expression recognition: A survey. Procedia Comput. Sci. 2015; 58: 486–491. Publisher Full Text
7. Liu M, Shan S, Wang R, et al.: Learning expressionlets on spatio-temporal manifold for dynamic facial expression recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014; pages 1749–1756.
8. Ekman P: An argument for basic emotions. Cognit. Emot. 1992; 6(3-4): 169–200. Publisher Full Text
9. Lucey P, Cohn JF, Kanade T, et al.: The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops. 2010; pages 94–101. IEEE.
10. Tan C, Sun F, Kong T, et al.: A survey on deep transfer learning. International Conference on Artificial Neural Networks. Springer; 2018; pages 270–279.
11. Mollahosseini A, Hasani B, Mahoor MH: Affectnet: A database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 2017; 10(1): 18–31.
12. Li S, Deng W, JunPing D: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017; pages 2852–2861.
13. Li S, Deng W: A deeper look at facial expression dataset bias. arXiv preprint arXiv:1904.11150. 2019.
14. Li S, Deng W: Deep facial expression recognition: A survey. arXiv preprint arXiv:1804.08348. 2018.
15. Lyons MJ, Kamachi M, Gyoba J: Coding facial expressions with gabor wavelets (ivc special issue). arXiv preprint arXiv:2009.05938. 2020.
16. Johnson JM, Khoshgoftaar TM: Survey on deep learning with class imbalance. J. Big Data. 2019; 6(1): 27. Publisher Full Text
17. Krizhevsky A, Sutskever I, Hinton GE: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Proces. Syst. 2012; 25.
18. Kornblith S, Shlens J, Le QV: Do better imagenet models transfer better?. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019; pages 2661–2671.
19. Rodriguez JD, Perez A, Lozano JA: Sensitivity analysis of k-fold cross validation in prediction error estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2009; 32(3): 569–575. Publisher Full Text
20. Torralba A, Efros AA: Unbiased look at dataset bias. CVPR 2011. IEEE; 2011; pages 1521–1528.
21. Miao Y-Q, Araujo R, Kamel MS: Cross-domain facial expression recognition using supervised kernel mean matching. 2012 11th International Conference on Machine Learning and Applications. 2012; volume 2: pages 326–332. IEEE.
22. Mayer C, Eggers M, Radig B: Cross-database evaluation for facial expression recognition. Pattern Recognit. Image Anal. 2014; 24(1): 124–132. Publisher Full Text
23. Wen G, Hou Z, Li H, et al.: Ensemble of deep neural networks with probability-based fusion for facial expression recognition. Cogn. Comput. 2017; 9(5): 597–610. Publisher Full Text

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 23 Mar 2022

Author details Author details

¹ Front-End and Business Intelligence Department, Deriv, Cyberjaya, Selangor, 63100, Malaysia
² Faculty of Computing and Informatics, Multimedia University, Cyberjaya, Selangor, 63100, Malaysia
³ Computer Science & Engineering Department, Coimbatore Institute of Technology, Coimbatore, Tamilnadu, 600025, India
⁴ Department of Information Technology, Faculty of Management, Multimedia University, Cyberjaya, Selangor, 63100, Malaysia

Khalid Ibrahim Adem
Roles: Data Curation, Formal Analysis, Methodology, Resources, Software, Writing – Original Draft Preparation

Kannan Ramakrishnan
Roles: Conceptualization, Methodology, Project Administration, Supervision

Kousalya Govardhanan
Roles: Validation, Writing – Review & Editing

Rathimala Kannan
Roles: Visualization, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

This work was supported by Multimedia University.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Article Versions (1)

version 1

Published: 23 Mar 2022, 11:349

https://doi.org/10.12688/f1000research.73630.1

Copyright

© 2022 Ibrahim Adem K et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

Ibrahim Adem K, Ramakrishnan K, Govardhanan K and Kannan R. Taking facial expression recognition outside the lab and into the wild by using challenging datasets and improved performance metrics [version 1; peer review: 2 not approved]. F1000Research 2022, 11:349 (https://doi.org/10.12688/f1000research.73630.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 1

VERSION 1

PUBLISHED 23 Mar 2022

Views

11

Reviewer Report 07 Aug 2023

Haythem Ghazouani, University of Carthage, ENICarthage, Tunis, Tunisia

Not Approved

https://doi.org/10.5256/f1000research.77294.r188794

The paper investigates facial expression recognition, acknowledging the limitations of current state-of-the-art techniques in real-life settings. The authors propose an improved deep neural network using the challenging AffectNet dataset and new performance metrics. They find that the VGG16 model achieves ... Continue reading

The paper investigates facial expression recognition, acknowledging the limitations of current state-of-the-art techniques in real-life settings. The authors propose an improved deep neural network using the challenging AffectNet dataset and new performance metrics. They find that the VGG16 model achieves the best results, outperforming other models like AlexNet and Inception V3 in terms of accuracy. The study concludes that using more challenging datasets and innovative metrics enhances expression recognition accuracy. The remarques and suggestions are as follow:

Introduction: The paper under review delves into the realm of facial expression recognition and seeks to address the limitations of current state-of-the-art techniques when applied in real-life settings. However, the introduction section falls short of providing a clear motivation for the proposed method, leaving readers with an incomplete understanding of the study's objectives and significance.

Related Work: The related work section briefly touches upon various existing methods in the domain of facial expression recognition, but it lacks a comprehensive analysis and comparison of these methods against the proposed approach. Additionally, the authors should avoid making statements like "Next, we preprocess the dataset images for feature extraction. This step was done separately before the advent of deep neural networks.” In the related work section.

Methodology: The paper introduces an improved deep neural network, utilizing the challenging AffectNet dataset and novel performance metrics. However, the lack of novelty in the proposed approach is a major concern. The authors seem to focus on training well-known models on multiple datasets and evaluating them using performance metrics, without presenting significant innovations in their methodology.

Experimental Results: The evaluation of results is found to be insufficient and lacking in-depth analysis. The authors simply state that the VGG16 model outperformed other models, such as AlexNet and Inception V3.., in terms of accuracy without providing deeper insights into the reasons behind these performance differences. A more thorough exploration of the results and a comparison with existing state-of-the-art techniques would have strengthened the study's credibility.

Novelty and Originality: Regrettably, the paper's novelty is weak, if not entirely absent. The study merely refines existing methods, which dampens its potential impact on the facial expression recognition field. The research lacks innovative ideas and unique contributions that could set it apart from previous works.

Language and Writing: The quality of English writing needs considerable improvement.

In conclusion, the paper falls short of meeting the criteria for originality and innovation expected for publication in this journal. The lack of a compelling introduction, limited exploration of related works, absence of novel contributions, and superficial analysis of results collectively contribute to its inadequacy. To be considered for indexing, the authors should revisit and enhance their methodology, provide deeper insights into the results, and strengthen the overall quality of writing. As it stands, I cannot recommend this paper for indexing in its current state.

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

No
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

No
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

No

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Artificial Intelligence, Machine Learning, Computer Vision, Facial Emotion Recognition

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

CITE

Report a concern

Respond or Comment

Views

19

Reviewer Report 02 Sep 2022

Marios Fanourakis, SIMS group, University of Geneva, Geneva, Switzerland

Not Approved

https://doi.org/10.5256/f1000research.77294.r148858

The authors address an important issue of facial expression recognition (FER), where models trained on one dataset do not perform very well on other datasets. This is a known problem and the solutions proposed in this work are not novel ... Continue reading

The authors address an important issue of facial expression recognition (FER), where models trained on one dataset do not perform very well on other datasets. This is a known problem and the solutions proposed in this work are not novel but neither are they widely used. The authors use the AffectNet dataset and the JAFFE dataset for their analysis.

This submission has many issues and I do not consider it to be of an acceptable scientific standard.

The FER methods described in the literature review are outdated, no specific mention is made on more recent deep learning architectures that have been used for FER.

The authors claim that this paper aims to "study and analyze state-of-the-art methods for FER", but this is not the case as none of the state-of-the-art methods are used in this paper.

The authors only use a small subset of the AffectNet training dataset which they randomly sampled. This makes it impossible for anyone to replicate the results. Furthermore, they do not specify which part of the dataset was used for computing the testing accuracy.

In figure 1, the authors mention "feature extraction" but there is no mention in the text about it. Also this is not needed.

In the 2nd paragraph of the "model" section, the authors state that in transfer learning "the model is re-purposed by freezing [the weights of] the last couple of layers...", in fact this is usually not the case. In transfer learning all weights except the ones in the last layers are usually frozen.

The clarity of the text in general can be improved especially when describing the modifications to VGG16. I also recommend that they do not use the name VGG16 when they are referring to their modified version of VGG16 in the results.

There are no details about the model training. What optimizer was used? What was the learning rate? the batch size? number of epochs? early stopping criteria?

The authors use accuracy as a metric even though the dataset is unbalanced. They should use any subset of: skew-normalized accuracy, f1 score, Cohen kappa.

They also mention k-fold cross validation in the metrics but it is not clear in the text if they use any form of cross validation.

Figures 2 to 4 are irrelevant and can be removed.

The training accuracy is not a useful metric. Please only report the testing accuracy.

In table 2 the results for AlexNet are taken from the AffectNet paper. These values look suspiciously similar to the results reported in the AffectNet paper, only on that paper the 72% is the testing accuracy that used their own test set which is not available, and the 54% is the skew-normalized testing accuracy. Did the authors train AlexNet using the same 27k images as all the other models in this table and this is a coincidence?

Also in table 2, the results for inception v3, resnet50, resnext101 are very noticeably varied which to me it indicates that something may be wrong in the implementation of these networks.

Table 3 should not be used to compare the accuracies. Every line on that table (yes, even the last two) are models trained on different datasets. You cannot compare them like this. Furthermore, the authors reported the training accuracy for the author's modified VGG16 but all the other models are probably reporting the testing accuracy.

The authors could not find other models using the AffectNet database, here are a few:

https://openaccess.thecvf.com/content_CVPR_2020/papers/Wang_Suppressing_Uncertainties_for_Large-Scale_Facial_Expression_Recognition_CVPR_2020_paper.pdf

https://openaccess.thecvf.com/content_ECCV_2018/papers/Jiabei_Zeng_Facial_Expression_Recognition_ECCV_2018_paper.pdf

https://openaccess.thecvf.com/content/WACV2021/papers/Farzaneh_Facial_Expression_Recognition_in_the_Wild_via_Deep_Attentive_Center_WACV_2021_paper.pdf

https://ojs.aaai.org/index.php/AAAI/article/download/16465/16272

https://arxiv.org/pdf/2103.16854.pdf

It is important to note that all of these papers use the entire AffectNet training dataset for training their models, and use the AffectNet validation dataset (which is class balanced) to test their models. All accuracies reported in these papers are based on the balanced validation dataset.

Is the work clearly and accurately presented and does it cite the current literature?

No
Is the study design appropriate and is the work technically sound?

No
Are sufficient details of methods and analysis provided to allow replication by others?

No
If applicable, is the statistical analysis and its interpretation appropriate?

No
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

No

References

1. Wang K, Peng X, Yang J, Lu S, et al.: Suppressing Uncertainties for Large-Scale Facial Expression Recognition. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020. Publisher Full Text
2. Zeng J, Shan S, Chen X: Facial Expression Recognition with Inconsistently Annotated Datasets. 11217: 227-243 Publisher Full Text
3. Farzaneh AH, Qi X: Facial Expression Recognition in the Wild via Deep Attentive Center Loss. 2021 IEEE Winter Conference on Applications of Computer Vision (WACV). 2021. Publisher Full Text
4. Zhao Z, Liu Q, Zhou F: Robust Lightweight Facial Expression Recognition Network with Label Distribution Training. Proceedings of the AAAI Conference on Artificial Intelligence. 2021; 35 (4): 3510-3519 Reference Source
5. Ma Z, Sun B, Li S: Facial Expression Recognition with Visual Transformers and Attentional Selective Fusion. IEEE Transactions on Affective Computing. 2021. Publisher Full Text

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Affective computing, machine learning, emotion recognition

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 23 Mar 2022

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2
Version 1 23 Mar 22	read	read

Marios Fanourakis, University of Geneva, Geneva, Switzerland
Haythem Ghazouani, University of Carthage, ENICarthage, Tunis, Tunisia

Comments on this article

All Comments(0)

Add a comment

Sign up for content alerts

Browse by related subjects

Back to all reports

Reviewer Report

11 Views

07 Aug 2023 | for Version 1

Haythem Ghazouani, University of Carthage, ENICarthage, Tunis, Tunisia

11 Views Cite this report Responses(0)

Not Approved

The paper investigates facial expression recognition, acknowledging the limitations of current state-of-the-art techniques in real-life settings. The authors propose an improved deep neural network using the challenging AffectNet dataset and new performance metrics. They find that the VGG16 model achieves the best results, outperforming other models like AlexNet and Inception V3 in terms of accuracy. The study concludes that using more challenging datasets and innovative metrics enhances expression recognition accuracy. The remarques and suggestions are as follow:

Introduction: The paper under review delves into the realm of facial expression recognition and seeks to address the limitations of current state-of-the-art techniques when applied in real-life settings. However, the introduction section falls short of providing a clear motivation for the proposed method, leaving readers with an incomplete understanding of the study's objectives and significance.

Related Work: The related work section briefly touches upon various existing methods in the domain of facial expression recognition, but it lacks a comprehensive analysis and comparison of these methods against the proposed approach. Additionally, the authors should avoid making statements like "Next, we preprocess the dataset images for feature extraction. This step was done separately before the advent of deep neural networks.” In the related work section.

Methodology: The paper introduces an improved deep neural network, utilizing the challenging AffectNet dataset and novel performance metrics. However, the lack of novelty in the proposed approach is a major concern. The authors seem to focus on training well-known models on multiple datasets and evaluating them using performance metrics, without presenting significant innovations in their methodology.

Experimental Results: The evaluation of results is found to be insufficient and lacking in-depth analysis. The authors simply state that the VGG16 model outperformed other models, such as AlexNet and Inception V3.., in terms of accuracy without providing deeper insights into the reasons behind these performance differences. A more thorough exploration of the results and a comparison with existing state-of-the-art techniques would have strengthened the study's credibility.

Novelty and Originality: Regrettably, the paper's novelty is weak, if not entirely absent. The study merely refines existing methods, which dampens its potential impact on the facial expression recognition field. The research lacks innovative ideas and unique contributions that could set it apart from previous works.

Language and Writing: The quality of English writing needs considerable improvement.

In conclusion, the paper falls short of meeting the criteria for originality and innovation expected for publication in this journal. The lack of a compelling introduction, limited exploration of related works, absence of novel contributions, and superficial analysis of results collectively contribute to its inadequacy. To be considered for indexing, the authors should revisit and enhance their methodology, provide deeper insights into the results, and strengthen the overall quality of writing. As it stands, I cannot recommend this paper for indexing in its current state.

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

No
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

No
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

No

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Artificial Intelligence, Machine Learning, Computer Vision, Facial Emotion Recognition

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

19 Views

02 Sep 2022 | for Version 1

Marios Fanourakis, SIMS group, University of Geneva, Geneva, Switzerland

19 Views Cite this report Responses(0)

Not Approved

The authors address an important issue of facial expression recognition (FER), where models trained on one dataset do not perform very well on other datasets. This is a known problem and the solutions proposed in this work are not novel but neither are they widely used. The authors use the AffectNet dataset and the JAFFE dataset for their analysis.

This submission has many issues and I do not consider it to be of an acceptable scientific standard.

The FER methods described in the literature review are outdated, no specific mention is made on more recent deep learning architectures that have been used for FER.

The authors claim that this paper aims to "study and analyze state-of-the-art methods for FER", but this is not the case as none of the state-of-the-art methods are used in this paper.

The authors only use a small subset of the AffectNet training dataset which they randomly sampled. This makes it impossible for anyone to replicate the results. Furthermore, they do not specify which part of the dataset was used for computing the testing accuracy.

In figure 1, the authors mention "feature extraction" but there is no mention in the text about it. Also this is not needed.

In the 2nd paragraph of the "model" section, the authors state that in transfer learning "the model is re-purposed by freezing [the weights of] the last couple of layers...", in fact this is usually not the case. In transfer learning all weights except the ones in the last layers are usually frozen.

The clarity of the text in general can be improved especially when describing the modifications to VGG16. I also recommend that they do not use the name VGG16 when they are referring to their modified version of VGG16 in the results.

There are no details about the model training. What optimizer was used? What was the learning rate? the batch size? number of epochs? early stopping criteria?

The authors use accuracy as a metric even though the dataset is unbalanced. They should use any subset of: skew-normalized accuracy, f1 score, Cohen kappa.

They also mention k-fold cross validation in the metrics but it is not clear in the text if they use any form of cross validation.

Figures 2 to 4 are irrelevant and can be removed.

The training accuracy is not a useful metric. Please only report the testing accuracy.

In table 2 the results for AlexNet are taken from the AffectNet paper. These values look suspiciously similar to the results reported in the AffectNet paper, only on that paper the 72% is the testing accuracy that used their own test set which is not available, and the 54% is the skew-normalized testing accuracy. Did the authors train AlexNet using the same 27k images as all the other models in this table and this is a coincidence?

Also in table 2, the results for inception v3, resnet50, resnext101 are very noticeably varied which to me it indicates that something may be wrong in the implementation of these networks.

Table 3 should not be used to compare the accuracies. Every line on that table (yes, even the last two) are models trained on different datasets. You cannot compare them like this. Furthermore, the authors reported the training accuracy for the author's modified VGG16 but all the other models are probably reporting the testing accuracy.

The authors could not find other models using the AffectNet database, here are a few:

https://openaccess.thecvf.com/content_CVPR_2020/papers/Wang_Suppressing_Uncertainties_for_Large-Scale_Facial_Expression_Recognition_CVPR_2020_paper.pdf

https://openaccess.thecvf.com/content_ECCV_2018/papers/Jiabei_Zeng_Facial_Expression_Recognition_ECCV_2018_paper.pdf

https://openaccess.thecvf.com/content/WACV2021/papers/Farzaneh_Facial_Expression_Recognition_in_the_Wild_via_Deep_Attentive_Center_WACV_2021_paper.pdf

https://ojs.aaai.org/index.php/AAAI/article/download/16465/16272

https://arxiv.org/pdf/2103.16854.pdf

It is important to note that all of these papers use the entire AffectNet training dataset for training their models, and use the AffectNet validation dataset (which is class balanced) to test their models. All accuracies reported in these papers are based on the balanced validation dataset.

Is the work clearly and accurately presented and does it cite the current literature?

No
Is the study design appropriate and is the work technically sound?

No
Are sufficient details of methods and analysis provided to allow replication by others?

No
If applicable, is the statistical analysis and its interpretation appropriate?

No
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

No

References

1. Wang K, Peng X, Yang J, Lu S, et al.: Suppressing Uncertainties for Large-Scale Facial Expression Recognition. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020. Publisher Full Text
2. Zeng J, Shan S, Chen X: Facial Expression Recognition with Inconsistently Annotated Datasets. 11217: 227-243 Publisher Full Text
3. Farzaneh AH, Qi X: Facial Expression Recognition in the Wild via Deep Attentive Center Loss. 2021 IEEE Winter Conference on Applications of Computer Vision (WACV). 2021. Publisher Full Text
4. Zhao Z, Liu Q, Zhou F: Robust Lightweight Facial Expression Recognition Network with Label Distribution Training. Proceedings of the AAAI Conference on Artificial Intelligence. 2021; 35 (4): 3510-3519 Reference Source
5. Ma Z, Sun B, Li S: Facial Expression Recognition with Visual Transformers and Attentional Selective Fusion. IEEE Transactions on Affective Computing. 2021. Publisher Full Text

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Affective computing, machine learning, emotion recognition

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

Respond to this report

Responses (0)

[1] 1. Corneanu CA, Simón MO, Cohn JF, et al.: Survey on rgb, 3d, thermal, and multimodal approaches for facial expression recognition: History, trends, and affect-related applications. IEEE Trans. Pattern Anal. Mach. Intell. 2016; 38(8): 1548–1568. PubMed Abstract | Publisher Full Text

[2] 2. Huang X, Wang S-J, Zhao G, et al.: Facial micro-expression recognition using spatiotemporal local binary pattern with integral projection. Proceedings of the IEEE international conference on computer vision workshops. 2015; pages 1–9.

[3] 3. Tong Y, Chen R, Cheng Y: Facial expression recognition algorithm using lgc based on horizontal and diagonal prior principle. Optik. 2014; 125(16): 4186–4189. Publisher Full Text

[4] 4. Taskeed Jabid M, Kabir H, Chae O: Robust facial expression recognition based on local directional pattern. ETRI J. 2010; 32(5): 784–794. Publisher Full Text

[5] 5. Carcagnì P, Del Coco M, Leo M, et al.: Facial expression recognition and histograms of oriented gradients: a comprehensive study. Springerplus. 2015; 4(1): 645. PubMed Abstract | Publisher Full Text

[6] 6. Kumari J, Rajesh R, Pooja KM: Facial expression recognition: A survey. Procedia Comput. Sci. 2015; 58: 486–491. Publisher Full Text

[7] 7. Liu M, Shan S, Wang R, et al.: Learning expressionlets on spatio-temporal manifold for dynamic facial expression recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014; pages 1749–1756.

[8] 8. Ekman P: An argument for basic emotions. Cognit. Emot. 1992; 6(3-4): 169–200. Publisher Full Text

[9] 9. Lucey P, Cohn JF, Kanade T, et al.: The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops. 2010; pages 94–101. IEEE.

[10] 10. Tan C, Sun F, Kong T, et al.: A survey on deep transfer learning. International Conference on Artificial Neural Networks. Springer; 2018; pages 270–279.

[11] 11. Mollahosseini A, Hasani B, Mahoor MH: Affectnet: A database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 2017; 10(1): 18–31.

[12] 12. Li S, Deng W, JunPing D: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017; pages 2852–2861.

[13] 13. Li S, Deng W: A deeper look at facial expression dataset bias. arXiv preprint arXiv:1904.11150. 2019.

[14] 14. Li S, Deng W: Deep facial expression recognition: A survey. arXiv preprint arXiv:1804.08348. 2018.

[15] 15. Lyons MJ, Kamachi M, Gyoba J: Coding facial expressions with gabor wavelets (ivc special issue). arXiv preprint arXiv:2009.05938. 2020.

[16] 16. Johnson JM, Khoshgoftaar TM: Survey on deep learning with class imbalance. J. Big Data. 2019; 6(1): 27. Publisher Full Text

[17] 17. Krizhevsky A, Sutskever I, Hinton GE: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Proces. Syst. 2012; 25.

[18] 18. Kornblith S, Shlens J, Le QV: Do better imagenet models transfer better?. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019; pages 2661–2671.

[19] 19. Rodriguez JD, Perez A, Lozano JA: Sensitivity analysis of k-fold cross validation in prediction error estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2009; 32(3): 569–575. Publisher Full Text

[20] 20. Torralba A, Efros AA: Unbiased look at dataset bias. CVPR 2011. IEEE; 2011; pages 1521–1528.

[21] 21. Miao Y-Q, Araujo R, Kamel MS: Cross-domain facial expression recognition using supervised kernel mean matching. 2012 11th International Conference on Machine Learning and Applications. 2012; volume 2: pages 326–332. IEEE.

[22] 22. Mayer C, Eggers M, Radig B: Cross-database evaluation for facial expression recognition. Pattern Recognit. Image Anal. 2014; 24(1): 124–132. Publisher Full Text

[23] 23. Wen G, Hou Z, Li H, et al.: Ensemble of deep neural networks with probability-based fusion for facial expression recognition. Cogn. Comput. 2017; 9(5): 597–610. Publisher Full Text

Taking facial expression recognition outside the lab and into the wild by using challenging datasets and improved performance metrics

Abstract

Keywords

Introduction

Literature review

Methods

Figure 1. Research Methodology.

Dataset

Pre-processing

Table 1. Class weights.

Model

Evaluation

Results

Figure 2. VGG16 train accuracy.

Figure 3. VGG16 test accuracy.

Figure 4. VGG16 loss value.

Figure 5. VGG16 cross-dataset accuracy.

Accuracy

Table 2. Summary of evaluation metrics.

Cross-dataset accuracy

Discussion

Table 3. Accuracy comparison with other similar models.

Conclusion

Data and software availability

Source data

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated