ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article

Non-invasive health prediction from visually observable features

[version 1; peer review: 1 approved, 1 approved with reservations]
PUBLISHED 13 Sep 2021
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Research Synergy Foundation gateway.

This article is included in the Artificial Intelligence and Machine Learning gateway.

Abstract

Background: The unprecedented development of Artificial Intelligence has revolutionised the healthcare industry. In the next generation of healthcare systems, self-diagnosis will be pivotal to personalised healthcare services. During the COVID-19 pandemic, new screening and diagnostic approaches like mobile health are well-positioned to reduce disease spread and overcome geographical barriers. This paper presents a non-invasive screening approach to predict the health of a person from visually observable features using machine learning techniques. Images like face and skin surface of the patients are acquired using camera or mobile devices and analysed to derive clinical reasoning and prediction of the person’s health.
Methods: In specific, a two-level classification approach is presented. The proposed hierarchical model chooses a class by training a binary classifier at the node of the hierarchy. Prediction is then made using a set of class-specific reduced feature set.
Results: Testing accuracies of 86.87% and 76.84% are reported for the first and second-level classification. Empirical results demonstrate that the proposed approach yields favourable prediction results while greatly reduces the computational time.
Conclusions: The study suggests that it is possible to predict the health condition of a person based on his/her face appearance using cost-effective machine learning approaches.

Keywords

Machine learning, Health prediction, Remote screening and diagnosis

Introduction

As technology advances, machine learning techniques have been growing in popularity over the past years. Machine learning techniques have proven to be effective in solving many modern problems in different domains. There is an increased research interest in applying machine learning methods for clinical informatics and healthcare systems.1-4 Meanwhile, facial recognition technology has been vastly utilized in various fields. For instance, it has been applied to unlock phones, find wanted fugitives and diagnose diseases. There have been many kinds of research done on disease diagnosis using facial images.5-8 Systems that only use facial features to diagnose illnesses are beneficial for remote medical diagnosis.

In this research, a machine learning approach was developed to detect the health condition of a person based on facial features. The purpose of the health prediction system was to identify images as ‘healthy’, or ‘ill’ with either ‘fever’, ‘sore throat’, or ‘runny nose’ symptoms. Facial images containing healthy and ill faces (fever, sore throat and runny nose) were collected. Then, discriminative facial features were extracted from the images using different feature extraction techniques. These features were used to train several machine learning classifiers for health prediction.

Literature review

In this section, various types of approaches to health prediction using facial features are studied and reviewed to learn about their respective advantages and disadvantages. These approaches are separated into two categories: conventional approaches and deep learning approaches.

Conventional approaches

In 2013, Zhao et al.1 introduced an approach to classify Down Syndrome through image-based facial dysmorphology. Facial features were extracted using Contourlet transform-based and local binary pattern- based (LBP) local texture features, as well as geometric features using landmarks of facial anatomy. The support vector machine (SVM) classifier, this technique has produced an accuracy of 97.92%.

A survey done by2 about genetic disorders diagnosis based on facial images, Saraydemir et al.3 presented an approach to identify subjects with Down Syndrome from healthy subjects using facial image. Gabor wavelet transform (GWT) was implemented for feature extraction purposes. Then, linear discriminant analysis (LDA) and principal component analysis (PCA) were carried out for the reduction of dimension. 96% and 97.34% accuracy were produced.

A research conducted by5 developed an approach for identifying Down Syndrome based on analysis of facial landmarks on 2D images. An independent component analysis-based hierarchical constrained local model (HCLM) was introduced to identify the landmarks of a face. The method was also tested on a mixed-syndromes dataset, and the highest accuracy achieved was 97%.

Another study related to health prediction systems using facial features that uses traditional machine learning methods, is an acromegaly identification using facial images proposed by.6 A few conventional methods such as SVM, generalized linear models (LM), k-NN, RF of randomized trees (RT) as well as other deep learning methods were used to train the model. The best performance was attained by the SVM method with a 95% PPV and 88% NPV, and with an accuracy of 91%. With frontalized faces, k-NN worked best with 89% PPV and 93% NPV, also with an accuracy of 91%.

Deep learning approaches

In 2018, Sajid et al.7 developed a palsy grading system based on unsymmetrical facial features using deep learning. A convolutional neural network (CNN) was proposed to extract features that exhibited palsy symptoms from a large number of facial images. The results of the model on the improved dataset showed a recognition rate of 92.6%.

A facial analysis framework introduced by8 called DeepGestalt, to identify rare genetic syndromes using deep learning. The training process of the DeepGestalt model consisted of two steps. Firstly, an overall representation of the face was learned by the model. The binary classification problem of identifying Angelman Syndrome (AS) and Cornelia De Lange Syndrome (CdLS) patients achieved an accuracy of 92% and 96.88%, respectively.

In year 2020,9 proposed to detect cancer using the facial features of patients. They used the network architecture of a residual neural network (ResNet) which comprised 27 convolution layers and two fully connected (FC) layers. Transfer learning was also applied for convolution layers 1-5 by directly obtaining the weights from a pre-trained face encoding model developed by.10 To describe the distinguishing traits of non-cancer and cancer datasets, they used gradient-weighted class activation mapping (grad-CAM) for the model that they trained. The accuracy rate produced by this approach was 82%.

Apart from that,11 developed a technique to detect Down Syndrome automatically based on facial images with deep convolutional neural network (DCNN). Firstly, they trained a DCNN on a large dataset to acquire an overall face encoding network. The network architecture consists of ten convolutional layers activated by ReLU along with three FC layers. This method achieved an accuracy of 95.87%.

Also in 2020,12 developed a study to diagnose and classify the severity of acromegaly at different severity levels using facial images with deep learning. CNN was used in this method. For facial recognition, the pre-trained Inception ResNet V1 was utilized to extract features. The total prediction accuracy achieved by this method was 90.7%.

Methods

Proposed solution

A two-level classification approach is presented in this paper for health prediction based on facial features. Figure 1 shows the processes of how a prediction model was developed. First, facial images of healthy and ill (fever, sore throat and runny nose) persons were collected. Then, these images were pre-processed to clean, standardize and normalize the data. There are two levels of classification. The first-level classification is responsible for classifying samples into ‘healthy’ and ‘ill’ classes, while the second-level classification is in charge of classifying the ‘ill’ samples’ into ‘fever’, ‘sore throat’, and ‘runny nose’ classes. Therefore, there are two levels of model training in the proposed solution. In this research, conventional machine learning methods were adopted.

0061a7ac-0c1d-49ee-aa15-b8bae1ec7139_figure1.gif

Figure 1. Proposed framework.

The feature extraction methods used were local binary pattern (LBP), PCA, LDA, and Gabor filter. The classifiers used were SVM, NN, KNN, and RF. A total of 16 combinations among the feature extraction techniques and classifiers mentioned were experimented with to find the best-performing model.

Datasets

In this study, a total of 733 facial images of healthy and ill persons were collected. Among 733 images, 233 are images of ill persons who had either fever, sore throat, or runny nose and 500 were images with healthy or normal persons. 420 out of the 500 healthy images contained normal faces of people from ages 1 to 50, while the remaining 80 images were healthy throat images. Images of healthy throat and ill persons were manually collected from various online sources, while images of healthy faces were obtained from the UTKFace database.13 The number of images for each class and subclass is listed in Table 1.

Table 1. Number of images for each class and subclass.

ClassSubclassNumber of images
Healthy-500
IlFever78
Sore throat80
Running nose75

Results and discussion

In this section, the experimental results for the different models that consist of the combinations of four feature extraction methods and four classifiers are presented, analysed and discussed. The testing accuracies of the first and second-level classification of each model were recorded for 10 runs.14

SVM variants

The first experiment validates the performance of the SVM variant. Table 2 demonstrates the results of the SVM variant for the first and second classification tests. Among LBP, PCA, LDA and Gabor filter features, PCA features performed the best with SVM in the first-level classification. It achieves a promising result of 85.85% average testing accuracy with minimal overfitting. On the other hand, the LBP features performed the best with SVM in the second-level classification, with an average testing accuracy of 73.32%. The SVM variants generally produced results with the least overfitting among all the classifiers.15

Table 2. Experimental results of SVM variants.

Methods1st Level classification testing2nd Level classification testing
LBP + SVM80.8873.32
PCA + SVM85.8564.05
LDS + SVM85.3763.01
GABOR FILTER + SVM81.2963.45
NN variants

The experimental results for the NN variants are depicted in Table 5. Among all the feature extraction techniques, PCA features worked best with NN in the first-level classification. It achieved an average testing accuracy as high as 91.84%. On the other hand, the LBP features performed best with NN in the second-level classification with an average testing accuracy of 76.84%. In the second-level classification, the LBP model was also the only model that stood out among the other NN variants.16

KNN variants

The performance of the KNN variants for the first and second level classifications is given in Table 4. Among all the feature extraction techniques, again, PCA features worked best with KNN in the first-level classification, with an average testing accuracy as high as 90.34%. The same model also performed best in the second-level classification among all the KNN variants, as it obtained an average testing accuracy of 70.03%.

Table 3. Experimental results of NN variants.

The best results are highlighted in bold.

Methods1st Level classification testing2nd Level classification testing
LBP + NN86.8776.84
PCA + NN91.8466.96
LDA + NN86.8763.19
GABOR FILTER + NN86.0566.93

Table 4. Experimental results of KNN variants.

The best results are highlighted in bold.

Methods1st Level classification testing2nd Level classification testing
LBP + NN83.5465.33
PCA + KNN90.3470.03
LDA + KNN86.5363.78
GABOR FILTER + KNN72.2663.02

RF variants

The experimental results for the RF variants are displayed in Table 5. Among all the features extraction techniques, once again, at 88.57% average testing accuracy, PCA features performed the best with RF in the first-level classification. This model also scored best in the second-level classification among all the RF variants as it obtained an average testing accuracy of 74.15%.

Table 5. Experimental results of RF variants.

The best results are highlighted in bold.

Methods1st Level classification testing2nd Level classification testing
LBP + RF83.6167.95
PCA + RF88.5774.15
LDA + RF85.3164.15
GABOR FILTER + RF87.8962.41
First-level classification results

According to the experimental results of all the models shown in Tables 2 to 5, two models achieved over 90% average testing accuracies in the first-level classification. These models are the PCA+NN and PCA+KNN model.

PCA+NN

The model that achieved the highest accuracy in the first-level classification was PCA+NN. It obtained a 91.84% average testing accuracy. The high accuracy could be due to the fact that PCA effectively reduces the dimensions of data and it is able to capture the important correlations and patterns that best characterize the data. The misclassified samples were plotted during one of the runs of the finalized PCA+NN model. Out of the 147 samples, there were 15 misclassified samples.

PCA+KNN

The PCA+KNN model obtained the second-highest accuracies in the first-level classification after PCA+NN. Its performance was as good as that of PCA+KNN as it achieved a 90.34% average testing accuracy. Figure 2 shows the confusion matrix after running the first-level classification of PCA+KNN model. There is not much difference between the performance of PCA+NN and PCA+KNN as both of them were able to perform equally well.

0061a7ac-0c1d-49ee-aa15-b8bae1ec7139_figure2.gif

Figure 2. Confusion matrix of PCA+KNN at first-level classification.

Overall first-level performance

Apart from PCA+NN and PCA+KNN, the overall results of the first-level classification were rather good as most of the models achieved an average testing accuracy of 80% and above. Even though the other models overfit more than PCA+NN and PCA+KNN, their results were still considered rather satisfactory. The symptoms shown on the faces of ill people or sore throats are important features to help the model classify healthy and ill samples.17

Second-level classification results

Based on the results given in Tables 2 to 5, a total of four models achieved average testing accuracies between 70% and 77% in the second-level classification. These models were the LBP+NN, PCA+RF, LBP+SVM, and PCA+KNN model.

LBP+NN

The model that achieved the highest accuracy in the second-level classification was LBP+NN. It obtained an average testing accuracy of 76.84%. Its performance was considered rather satisfactory, as most of the other models only obtained testing accuracies between 60% and 68% on average. The reason that LBP+NN could perform well could be that the LBP features were invariant to illumination and were highly discriminative.

PCA+RF

The PCA+RF model performed nearly as well compared to LBP+NN with an average testing accuracy of 74.15% in the second-level classification. It performed well due to the previously mentioned benefits of the combination of PCA and RF being a classifier with outstanding predictive capabilities. Figure 3 shows the confusion matrix produced after running the second-level classification of PCA+RF model.

0061a7ac-0c1d-49ee-aa15-b8bae1ec7139_figure3.gif

Figure 3. Confusion matrix of PCA+RF at second-level classification.

The confusion matrix was generated during one of the runs of the finalized PCA+RF model. The 0 label represents the ‘fever’ class, 1 represents the ‘sore throat’ class and 2 represents the ‘running nose’ class. It can be seen that the top two misclassified classes were the ‘fever’ (0) and ‘runny nose’ (2) classes, with 15 fever samples misclassified as runny nose and seven runny nose samples misclassified as fever. The reason for this occurrence is the same as for the LBP+NN model’s case. The total number of samples misclassified by PCA+RF for this run was 31 samples, with only one additional misclassified sample compared to the LBP+NN model. Hence, PCA+RF was able to produce results as good as LBP+NN in the second-level classification.

LBP+SVM and PCA+KNN

Other than the LBP+NN and PCA+RF, the two remaining models that achieved over 70% average testing accuracies were LBP+SVM and PCA+KNN. The LBP+SVM model obtained a 73.32% average testing accuracy in the second-level classification. The reason behind its performance is the robustness of LBP as well as the fact that SVM is effective in situations where the number of dimensions is larger than the number of samples. In the model’s second-level classification, the number of testing samples was always lesser than the number of dimensions.

Best model for the health prediction system

Among all models, the LBP+NN variant had the best overall performance in the first and second-level classifications. It achieved the highest average testing accuracy of 76.84% in the second-level classification. It also performed considerably well in the first-level classification with lesser overfitting than the other models with similar performances, as it showed 94.38% and 86.87% average training and testing accuracies, respectively.

Conclusions

This paper presents a health prediction system using facial features evaluated using different machine learning models. Datasets containing facial images of healthy and ill (fever, sore throat and runny nose) persons were collected. The facial features of the images were extracted using LBP, PCA, LDA and Gabor filter feature extraction techniques. The features were trained using SVM, NN, KNN and RF classifiers. Among the 16 models, the LBP+NN model yielded the best overall performance for both the first and second-level classifications. It obtained average testing accuracies of 86.87% and 76.84% for the first and second-level classification, respectively.

Data availability

Underlying data

UTKFace Large Scale Face Dataset: https://susanqq.github.io/UTKFace/.

As it is impossible to obtain the consent for the face images retrieved from the UTKFace dataset, the images cannot be shared in this article.

Software availability

Source code available from: https://doi.org/10.5281/zenodo.5266406.18

Data are available under the terms of the Creative Commons Zero “No rights reserved” data waiver (CC0 1.0 Public domain dedication).

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 13 Sep 2021
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Khong FY, Connie T, Goh MKO et al. Non-invasive health prediction from visually observable features [version 1; peer review: 1 approved, 1 approved with reservations]. F1000Research 2021, 10:918 (https://doi.org/10.12688/f1000research.72894.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 13 Sep 2021
Views
18
Cite
Reviewer Report 14 Feb 2022
Prabina Kumar Meher, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, Delhi, India 
Approved with Reservations
VIEWS 18
The authors developed a machine learning-based approach for predicting the health status of an individual by using the image data of their face. Below are my comments for further improvement of the article.
  1. The authors must
... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Meher PK. Reviewer Report For: Non-invasive health prediction from visually observable features [version 1; peer review: 1 approved, 1 approved with reservations]. F1000Research 2021, 10:918 (https://doi.org/10.5256/f1000research.76503.r121344)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 02 Mar 2022
    Tee Connie, Faculty of Information Science and Technology, Multimedia University, Melaka, 75450, Malaysia
    02 Mar 2022
    Author Response
    Dear Reviewer,

    Thank you very much for your time and efforts in reviewing our manuscript “Non-Invasive Health Prediction from Visually Observable Features”. According to your valuable comments and suggestions, ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 02 Mar 2022
    Tee Connie, Faculty of Information Science and Technology, Multimedia University, Melaka, 75450, Malaysia
    02 Mar 2022
    Author Response
    Dear Reviewer,

    Thank you very much for your time and efforts in reviewing our manuscript “Non-Invasive Health Prediction from Visually Observable Features”. According to your valuable comments and suggestions, ... Continue reading
Views
30
Cite
Reviewer Report 14 Sep 2021
Andrews Samraj, Department of Computer Science and Engineering, Mahendra Engineering College, Namakkal, Tamil Nadu, India;  Mahendra Engineering College, Namakkal, Tamil Nadu, India 
Approved
VIEWS 30
Non-invasive health prediction is the essential need of the hour due to the pandemic situation in health care. The approach and modeling using image data set of human faces with sound health and with illness are taken for research and ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Samraj A. Reviewer Report For: Non-invasive health prediction from visually observable features [version 1; peer review: 1 approved, 1 approved with reservations]. F1000Research 2021, 10:918 (https://doi.org/10.5256/f1000research.76503.r94202)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 13 Sep 2021
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.