Keywords
Dynamic Signature, Hand Gesture Signature, Gesture Recognition, Hand Gesture Signature Database, Image Processing, Forgeries Attack
This article is included in the Research Synergy Foundation gateway.
Dynamic Signature, Hand Gesture Signature, Gesture Recognition, Hand Gesture Signature Database, Image Processing, Forgeries Attack
We are pleased to inform you that this updated version of the manuscript has incorporated all the feedback provided by the reviewers. Specifically, we have provided a detailed explanation of the hyperparameters used in our learning models, and we have thoroughly revised and elaborated on the discussion of classification and robustness performance analysis. Additionally, we have included the latest related work and updated the references accordingly. Moreover, two new figures, Figure 1 and Figure 2, which illustrate the iHGS sample acquisition process from both top and side views have been included in the manuscript. We believe that these figures will help clarify our methodology and improve the manuscript's overall readability. Furthermore, we have revised some statements as suggested by the reviewers to avoid any confusion or misleading information.
See the authors' detailed response to the review by Lillian Yee Kiaw Wang
See the authors' detailed response to the review by Cheng-Yaw Low
A conventional dynamic signature recognition usually uses a special digitized device to capture the dynamic properties of a signature. A stylus pen is used to sign the signature on the surface of the digital tablet. This leaves a subtle track, exposing the signature information to others. A forger could learn the pattern from what they obtained from the tablet surface.
Numerous acquisition approaches have been proposed to replace the usage of a tablet for dynamic signatures. For instance, two ballpoint pens with sensors to measure the pen movement during the signing process,1 a wearable device on the wrist (i.e. smartwatches) to capture the hand motion,2 or an on-phone triaxial accelerometer built in a smartphone.3,4
The introduction of low-cost sensor cameras5 brings up new research opportunities for contactless human-computer interaction (HCI) in various applications such as robotics, healthcare, entertainment, intelligent surveillance, and intelligent environments.6 Human hand gestures and dynamic signature recognition are becoming prevalent. This work proposes a hand gesture signature recognition system with the capability to recognize the identity of a person in a touchless acquisition environment. Additionally, a public database is provided for evaluation purposes.
Some relevant research works have been conducted using their own collected database. Tian et al.7 introduced a Kinect-based password authentication system to explore the feasibility of a Kinect sensor to authenticate user-defined hand gesture passwords. In Ref. 8, the authors proposed a similar hand gesture signature recognition where the hand trajectory was used as the feature. The performance was evaluated on a self-collected database, consisting of 50 different classes. Empirical results demonstrated the feasibility and benefits of depth data in verifying a user’s identity based on a hand gesture signature. Fang et al.9 proposed a fusion-based in-air signature verification. The user’s fingertip was tracked and the signature trajectory was extracted from a video sample captured by a high-speed camera. Malik et al.10 implemented a neural network in recognizing hand gesture signatures for identity authentication. A CNN-based hand pose estimation algorithm was employed to estimate the hand joint position for the index fingertip. Multidimensional dynamic time warping (MD-DTW) was adopted to match the template and test signature data. It was tested on a self-collected dataset with 15 classes. The empirical results exhibited a promising recognition performance with the presence of depth features. Li and Sato 11 proposed an in-air signature authentication using the motion sensors of smart wrist-worn devices. The system captures signal-based gyroscope and accelerometer measurements employs a recurrent neural network (RNN) to classify between genuine and imposter hand signatures of twenty-two (22) participants. The research reported a highly promising equal error rate (EER) of only 0.83%. However, this research only tested the random forgeries of the signature.
From the literature, the existing studies were mainly utilizing their self-collected databases. To the best of our knowledge, there is no publicly available hand gesture signature database. The existence of a publicly available database can provide a freely available source of data to encourage more researchers into the field. For this reason, we present an openly available database, collected by the Microsoft Kinect sensor camera. To protect the privacy of the contributors, only depth information will be shared.
A Microsoft Kinect sensor camera is used as the main acquisition device to collect the samples of in-air hand gesture signature (iHGS) via its built-in IR projector and IR camera. A sample is a video clip that contains a set of image sequences disclosing the hand movement of a signature signing. The Kinect camera is capable of capturing up to 30 depth frames per second (fps). The number of image sequences (frames) of each sample corresponds to the duration of the hand movement and might be varying in each signature. Additionally, other computational factors such as heavy graphical processing and input latency affect the fps in each enrollment. These latencies may lead to a drop in the rate of fps, causing information loss. Thus, to ensure validation, the collected samples that have a fps rate of less than 27 are dropped/removed and the sample is re-captured through a similar procedure again. Figure 1 and Figure 2 depict the implementation of the iHGS sample acquisition process from both top and side views. The distances and spaces between the sensor camera and the subject were carefully chosen to ensure the entire body could be captured during the acquisition process. A more detailed data acquisition protocol can be found in Ref. 12.
The database is named iHGS database. The data collection was conducted in two separate sessions and the entire process took four months to complete. Samples for the second session were collected with a time interval of approximately two to three weeks from the first session. This arrangement is intended to allow the intra-variances in genuine hand gesture signatures, better reflecting real-world situations. Before enrolment, the flow of the entire enrolment process was explained to each participant. They were given ample time to practice and familiarize themselves with the process before data acquisition.
A total of 100 participants were successfully enrolled. Among the participants, 69 were male and 31 female, aged from 18-40 years. 90% of participants were right-handed (signing with their right hand) with only 10% using their left hand (left-handed). Table 1 summarizes the characteristics of the iHGS database.
There are two subsets of our iHGS database: (1) genuine dataset, and (2) skilled forgery dataset. For genuine dataset, each participant provides 10 genuine samples in each session (session 1 and session 2). A total of 2000 (10×2×100) samples were gathered for this genuine dataset.
A skilled forgery dataset contains forged signature samples. Each forger was provided with one genuine signature sample (signed by the genuine user on a piece of paper) randomly. They were asked to learn the signature with as much time as they needed. Then, each forger was asked to imitate the assigned signature 10 times. A total of 1000 skilled forgery signatures were successfully collected. However, 20 skilled forgery samples from two forgers (10 samples each) were corrupted due to the hardware error. Thus, only 980 skilled forgery samples were obtained. Table 2 summarizes the number of hand gesture signatures for the two subsets in the iHGS database.
Hand detection and localization techniques were applied to extract the region of interest (ROI) from each of the depth images of the iHGS database. A predictive hand segmentation technique was performed to precisely extract the hand region from the frames. Refer to Refs. 12, 13, 14 for more information.
An iHGS sample is a collection of depth image sequences that comprises of n image frames, i.e. n is also the length of the sample. Several basic vector-based features are extracted from the sample. Firstly, a Motion History Image (MHI) process is performed on the preprocessed depth image sequence of each sample along the time. This technique effectively condenses the image sequence into a single grey-scale image (coined as MHI template), while preserving the motion information in a more compact form.15,16 Specifically, MHI template describes the hand location and motion path along the time and generates a spatio-temporal information for the iHGS sample. The MHI image is then transformed into a vector space to produce a vector-based feature. The features explored in this work are as follows:
(a) x-directional summation (VX)
Produced by summing the MHI template in the vertical direction.
(b) y-directional summation (VY)
Produced by summing the MHI template in the horizontal direction.
(c) xy-directional summation (VXY)
The concatenation of both VX and VY features fora richer one-dimensional summation feature.
(d) Histogram of Oriented Gradient feature (VHOG)
A histogram descriptor is performed on the MHI template to extract the local texture, represented in a distribution of the edge and gradient structure.17 It can discover the shape or the outline of the template image based on the slope or orientation gradient. It is worth noted that each pixel value in the MHI template describes the motion’s temporal information at a particular location. Thus, histogram orientation of the MHI template represents the intensity of motion history which is a useful feature.
(e) Binarized Statistical Image Features (VBSIF)
Statistical-based features are computed and summarized in a single histogram representation. First, the input image is convolved with a set of predefined filters to maximize the statistical independence of the filter responses.18 Then, each response is applied to a nonlinear hashing operator to improve the computational efficiency. Next, the generated code map is regionalized into blocks and recapitulated into a block-wise histogram. These regional histograms are lastly concatenated into a global histogram, representing the underlying distribution of the data. In this work, different BSIF-based features are produced:
• VBSIF-MHI – MHI template is used as input data to the BSIF.
• VBSIF-X –Image sequences of an iHGS sample are projected along the y-axis to generate an X-Profile template. X-Profile template is used as input data to the BSIF.
• VBSIF-Y –Image sequences of an iHGS sample are projected along the x-axis to generate the Y-Profile template. Y-Profile template is used as input data to the BSIF.
• VBSIF-XY – Both X-Profile and Y-Profile templates are used as the data input to the BSIF.
• VBSIF-MHIXY – MHI, X-Profile, and Y-Profile templates are used as the data input to the BSIF.
Two types of performance analyses are conducted: (1) classification performance analysis, and (2) robustness analysis against forgery attacks. A well-known multiclass Support Vector Machine (SVM) is adopted in the classification analysis through a One-versus-One (OVO) approach. The genuine dataset is randomly divided into a training set and a testing set with a ratio of m:n where m is larger than n. The training set is further partitioned into two subsets: validation subset and training subset with the ratio of mp:nq. The training subset is to train the SVM model; while the validation subset is to find the optimal model parameters for a minimal validation error. The model is then tested on the testing set for performance evaluation. The robustness performance analysis measures the security level against impersonation attempts. It demonstrates two attacks: random forgery and skilled forgery. In the former, a testing sample that belongs to a subject i is compared with all the remaining samples of other subjects in the genuine dataset. In the latter, a forged sample of a subject j (from the skilled forgery dataset) is matched with a claimed identity’s sample (i.e., genuine subject i’s sample) from the genuine dataset.
This analysis is implemented using the multi-class classification feature which is available in a library of SVM (LIBSVM) in MATLAB.19 The samples of the genuine dataset are randomly partitioned into training, validation, and testing subsets, refer to Table 3.
Genuine dataset | Training samples | 1000 |
Validation samples | 400 | |
Testing samples | 600 | |
Total | 2000 |
A polynomial kernel of the SVM classifier is utilized as part of our machine learning model. The samples were randomly partitioned into training, validation and testing subsets to evaluate the model’s performance. For cross-validation purposes, we repeated this random partitioning process five times using five different subsets. The hyperparameters for the polynomial kernel are tuned as such that the gamma (γ) is set to 20, the degree of the polynomial (d) is set to 2 and the cost (C) is set to 1. These hyperparameters were determined through empirical testing, and the settings that proposed yielded optimal and stable performance across our multiple experiments were used. The averaged classification measurements including precision, recall, specificity, and F1-score and the standard deviation are reported in Table 4. The accuracies among features are illustrated in Figure 1.
The classification results show the two BSIF features, VBSIF-XY and VBSIF-MHIXY achieving the best accuracy scores of 97.43% and 93.57%, respectively. It is followed by the HOG feature VHOG with an accuracy of 91.63%. It is noted that the system vaguely classifies the summation features, VX and VY with accuracies of 61.43% and 61.20%. However, there is a boost in performance when concatenating them together, achieving 86.63% classification accuracy.
The results found that certain vector-based features such as VBSIF-XY and VBSIF-MHIXY, possess high levels of discriminative information for classifying in-air hand gesture signatures. Compared to other methods that involve complex preprocessing, the proposed vector-based features are extracted directly from the raw data, without the need for sophisticated techniques. These features can be used directly for classification model training, such as the SVM model, making it more convenient for real-world applications. Furthermore, the small value of standard deviation associated with these features suggests a high degree of stability in predicting hand gesture signatures. This is important in any classification task, as it ensures that the classification algorithm produces consistent and reliable results across a range of input data. The stability of these features is especially valuable in applications where the quality and consistency of the input data may vary. In summary, our findings demonstrate that vector-based features, particularly VBSIF-XY and VBSIF-MHIXY, offer a robust and reliable approach to iHGS classification. These features are easy to use and require minimal preprocessing, making them ideal for real-world applications that require efficient and accurate classification algorithms.
This experimental analysis aimed to determine the robustness of the proposed approach against two types of forgery attacks, namely random forgery attacks and skilled forgery attacks.
The experiments were repeated for five trials. Averaged equal error rate (EER) and standard deviations were recorded. Four distance metrics were examined: Euclidean distance (EucD), Cosine distance (CosD), Chi-Square distance (CSqD), and Manhattan distance (MD).
Tables 5 and 6 report the system performances of two forgery attacks. It can be seen that the performances of the four kinds of distance metrics vary with different feature vectors. For the random forgery attack, VHOG with a cosine distance metric yields the lowest EER in random forgery (EER-R) of 2.41% followed by VBSIF-MHIXY with EER-R of 5.18%. Manhattan distance is not able to perform in this context as compared with the other metrics.
Distinguishing skilled forgery attacks from genuine signatures is undeniably more challenging than detecting random forgery attacks, due to the high similarity between the forgery and genuine samples. Consequently, the Equal Error Rates (EERs) for skilled forgery attacks are expected to be higher than for random forgery attacks. Our study found that the vector-based features VXY and VHOG, when adopted with the cosine distance metric, achieved the best EER-S of 5.07% for skilled forgery attacks. This is a promising result, and it proves that these features can be effective in distinguishing skilled forgeries from genuine signatures. VBSIF-MHIXY with the Euclidean distance metric, obtained an EER-S of 9.45%, which is also a relatively good result. On the other hand, most BSIF features were found to perform poorly in verifying skilled forged hand gesture signatures, highlighting the importance of carefully selecting the features used for authentication. Similar to random forgery attacks, the Manhattan distance metric achieved the worst performance. Again, it indicates that the selection of the right distance metric is crucial for achieving good verification performance. In summary, these findings demonstrate that the verification performance of iHGS is not solely determined by the extracted features but is also highly dependent on the choice of distance metric. Therefore, careful consideration must be given to both factors in verifying the iHGS.
In this paper, we presented a self-collected iHGS database and a detailed description of the acquisition protocol to collect the database. Several basic sets of vector-based features were extracted from the samples. This paper also investigated the effectiveness of classification capability as well as the robustness against forgery attacks. The experimental results for both analyses have shown promising results with the appropriate features extracted from the samples. Our analyses demonstrate the potential of iHGS in both recognition and verification. However, there is room for future exploration in iHGS. The current database was collected in a controlled environment. As a biometric authentication, other external factors such as angles of the camera, the distance between user and acquisition devices, different background complexity, etc should be considered. In particular, it could be further extended by considering those uncontrolled environmental factors to increase the challenge of the database.
Figshare: In-air Hand Gesture Signature Database (iHGS Database) https://doi.org/10.6084/m9.figshare.16643314
This project contains the following underlying data:
• Genuine dataset (100 contributors labels with ID from 1 to 100)
• Skilled forgery dataset (98 contributors labels with ID from 1 to 100 where ID of 84 and 88 are not included)
Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).
The experimental analyses were established, according to the ethical guideline and were approved by the Research Ethics Committee (REC) with the ethical approval number EA1452021. Written informed consent was obtained from individual participants.
W.H. carried out the experiment with support from Y.H. and H.Y. coordinated the data collection and establishment of the database. Besides, W.H. took the lead in writing the manuscript while Y.H. and H.Y. provided critical feedback and helped shape the analysis and manuscript.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the work clearly and accurately presented and does it cite the current literature?
Yes
Is the study design appropriate and is the work technically sound?
Yes
Are sufficient details of methods and analysis provided to allow replication by others?
Partly
If applicable, is the statistical analysis and its interpretation appropriate?
Partly
Are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions drawn adequately supported by the results?
Partly
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Vector based gesture recognition; Mathematical Analysis; Clifford Geometric Algebra
Is the work clearly and accurately presented and does it cite the current literature?
Yes
Is the study design appropriate and is the work technically sound?
Partly
Are sufficient details of methods and analysis provided to allow replication by others?
Partly
If applicable, is the statistical analysis and its interpretation appropriate?
Partly
Are all the source data underlying the results available to ensure full reproducibility?
Partly
Are the conclusions drawn adequately supported by the results?
Partly
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Biometrics, Pattern Recognition.
Is the work clearly and accurately presented and does it cite the current literature?
Partly
Is the study design appropriate and is the work technically sound?
Yes
Are sufficient details of methods and analysis provided to allow replication by others?
Yes
If applicable, is the statistical analysis and its interpretation appropriate?
Not applicable
Are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions drawn adequately supported by the results?
Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: IoT, Education, Technology Acceptance, Data Analytics
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | |||
---|---|---|---|
1 | 2 | 3 | |
Version 2 (revision) 02 May 23 |
read | ||
Version 1 07 Mar 22 |
read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)