Body gesture recognition using crow search algorithm enhanced probabilistic neural network for human- computer interaction

BAKKIALAKSHMI V.S; Pornpimol Chawengsaksopark; Mithileysh Sathiyanarayanan

doi:10.12688/f1000research.160816.1

Home Browse Body gesture recognition using crow search algorithm enhanced probabilistic...

ALL Metrics

Views

Downloads

Get PDF

Get XML

Export

▬

✚

Research Article

Body gesture recognition using crow search algorithm enhanced probabilistic neural network for human- computer interaction

[version 1; peer review: 1 approved, 1 approved with reservations]

BAKKIALAKSHMI V.S ^1,2, Pornpimol Chawengsaksopark², Mithileysh Sathiyanarayanan²

PUBLISHED 03 Feb 2025

Author details Author details

¹ Department of Computing Technologies, SRM Institute of Science and Technology (Deemed to be University), Kattankulathur, Tamil Nadu, India
² Shinawatra University, Mueang Pathum Thani District, Pathum Thani, Thailand

BAKKIALAKSHMI V.S
Roles: Conceptualization, Methodology, Resources, Software, Supervision

Pornpimol Chawengsaksopark
Roles: Investigation

Mithileysh Sathiyanarayanan
Roles: Formal Analysis, Investigation, Project Administration

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the Artificial Intelligence and Machine Learning gateway.

Abstract

Background

Body gesture recognition has become a fundamental technique in Human-Computer Interaction (HCI). As human-machine interaction evolves, there is an increasing need for precise and efficient gesture detection systems. However, current methods face limitations such as accuracy constraints, high computational complexity, and limited adaptability. This study addresses these challenges by proposing an innovative approach to enhance the accuracy and efficiency of body gesture recognition systems.

Methods

The proposed system integrates advanced algorithms and techniques to improve performance. A Marker-Based Watershed Algorithm is employed for accurate image segmentation, enhancing region detection. Feature extraction uses a Convolutional Neural Network (CNN), while a Wavelet Transform-Based Pre-Processing technique improves input data quality. A unique component of this method is the application of the Crow Search Algorithm to optimize model efficiency. An Optimized Probabilistic Neural Network (PNN) is utilized for gesture classification, aiming to increase precision and computational effectiveness.

Results

The proposed approach achieves a gesture recognition accuracy rate of 99%. Compared to traditional methods such as Decision Trees (DT), Support Vector Machines (SVM), and Improved Neural Networks (INN), the Optimized PNN demonstrates a 2.21% improvement in overall accuracy. The implementation, carried out in Python, showcases the robustness and adaptability of the system across diverse HCI applications.

Conclusions

This work presents a comprehensive solution to the challenges of body gesture recognition by integrating cutting-edge algorithms. Combining the Marker-Based Watershed Algorithm, CNN-based feature extraction, and Crow Search Optimization significantly enhances the system’s accuracy and efficiency. By addressing the shortcomings of existing methods, this approach provides a more responsive, reliable, and flexible gesture recognition system, contributing to the advancement of HCI technologies. The results demonstrate the potential for improved human-computer interaction through more effective and precise gesture detection.

Keywords

Body Gesture Recognition, Human-Computer Interaction, Watershed Algorithm, Optimized Probabilistic Neural Network, Crow Search Algorithm.

Corresponding author: BAKKIALAKSHMI V.S

Competing interests: No competing interests were disclosed.

Grant information: The author(s) declared that no grants were involved in supporting this work.

Copyright: © 2025 V.S B et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: V.S B, Chawengsaksopark P and Sathiyanarayanan M. Body gesture recognition using crow search algorithm enhanced probabilistic neural network for human- computer interaction [version 1; peer review: 1 approved, 1 approved with reservations]. F1000Research 2025, 14:149 (https://doi.org/10.12688/f1000research.160816.1) First published: 03 Feb 2025, 14:149 (https://doi.org/10.12688/f1000research.160816.1) Latest published: 03 Feb 2025, 14:149 (https://doi.org/10.12688/f1000research.160816.1)

1. Introduction

Body gesture recognition is an emerging area of computer vision and artificial intelligence that focuses on interpreting and comprehending human body motions, postures, and emotions.¹ It is crucial in human-computer interaction, with applications ranging from improving gaming experiences to transforming healthcare, surveillance, and communication.² Body gesture recognition’s major goal is to enable machines to read the intents and emotions given by human body language. This technology uses image and video analysis techniques to identify and classify different body movements such as gestures of the hands, facial expressions, and full-body poses.³ These movements can express a wide range of information, including commands, emotions, and intentions. As a result, body gesture recognition has the potential to improve the intuitiveness, naturalness, and efficiency of human-computer interaction.⁴ Image or video data gathering, pre-processing, feature extraction, segmentation, and classification are all important components of body gesture recognition systems. Deep learning approaches, particularly CNNs, have considerably increased the accuracy and resilience of gesture recognition systems by automating the extraction of essential characteristics from visual data.⁵ Many fields, such as sign language translation, virtual reality, robotics, healthcare, and interactive multimedia, use body gesture detection.⁶ Body gesture recognition’s capabilities are anticipated to improve as technology advances, making it an interesting and promising topic with several practical and inventive applications.⁷

In the realm of HCI, Body Gesture Recognition (BGR) is a significant development that will fundamentally change how people interact with technology.⁸ BGR goes beyond conventional input techniques like keyboards and touchscreens, enabling machines to comprehend and react to human body language and gestures, improving HCI’s inclusivity and intuitiveness. This technology meets a variety of demands, including accessibility for those with physical limitations, immersive gaming experiences, and efficient control in a variety of sectors ranging from healthcare to smart homes. One of its key benefits is improved accessibility. BGR enables people with limited mobility to interact with computers, tablets, and smartphones, promoting autonomy and inclusiveness. It enables a more natural and expressive style of communication, paving the path for sign language translation and allowing those with speech problems to successfully communicate their thoughts and emotions. BGR enhances gaming experiences by letting players to control their games using body movements and gestures, resulting in a greater sense of immersion and engagement.⁹ BGR contributes to the development of telemedicine and rehabilitation applications in healthcare, allowing for remote patient monitoring and therapy. It simplifies difficult activities like operating smart home devices with natural hand gestures or facial expressions.¹⁰ The significance of BGR in HCI stems from its ability to link the gap between human intention and machine action, allowing technology to become more adaptive and receptive to our natural patterns of communication and engagement.

Recent years have seen significant progress in body gesture recognition, providing a wide range of approaches and strategies for precise and instantaneous identification of human body movements.¹¹ To extract features from image and video data and enable reliable and effective recognition, convolutional neural networks, or CNNs, have become the preferred method. CNNs are particularly useful for identifying dynamic motions because they automatically learn to record spatial data.¹² RNNs and their offspring, e.g., LSTM networks, play a crucial role in handling temporal characteristics, which makes them perfect for gesture identification in videos.¹³ For gesture identification, conventional computer vision methods like Haar-like features and the Histogram of Oriented Gradients (HOG) are still useful, particularly for real-time applications.¹⁴ They can record texture and edge information, which is important for some kinds of movements. Time-of-flight cameras and Microsoft Kinect, two depth sensors, have become more popular in 3D gesture detection. Because they include depth information, motions can be recognized without relying just on colour and even in dimly lit environments.¹⁵ Handmade features have demonstrated success in gesture detection tasks when paired with machine learning techniques like SVMs and Random Forests.¹⁶ A popular trend in obtaining more reliable and precise results is the fusion of multimodal data, which combines information from different sensors or sources. The dynamic nature of body gesture detection is reflected in the wide number of approaches that are currently available, which may be tailored to suit different use cases and ambient situations.

Existing approaches for body gesture identification, while constantly improving, have some problems and limits. Many traditional computer vision approaches rely largely on well-defined and controlled surroundings, making them difficult to adapt to real-world scenarios with changing lighting, occlusions, or background clutter.¹⁷ They may necessitate significant human feature engineering, making them less scalable. Although deep learning methods are strong, they frequently need considerable computer resources and big datasets for training. When dealing with fine-grained or sophisticated motions, recognition accuracy can be degraded in some circumstances.¹⁸ Multimodal fusion might complicate and complicate synchronization. To address these constraints, continued research is required to improve robustness, efficiency, and flexibility in a variety of practical scenarios. The suggested work presents a novel approach to Body Gesture Recognition, hence improving Human-Computer Interaction. It uses a Probabilistic Neural Network (PNN) and the Crow Search Algorithm (CSA) to optimize the CNN. The goal of this innovative approach is to solve the problems with the methods that are currently in use. By optimizing the PNN’s parameters, CSA raises the efficiency and accuracy of classification. Better performance in dynamic surroundings and while handling complex motions is promised by this adaptive method. The study aims to improve the robustness and real-time applicability of gesture detection by introducing CSA-PNN, which will be an important addition to the field of HCI and have wider applications in dynamic digital media, gaming, and accessibility. The following are the key contributions of this study,

• Wavelet Transform pre-processing improves the quality of input data, providing a more reliable and accurate basis for the gesture detection procedure.
• By precisely segmenting the input data using the Marker-Based Watershed Algorithm, it is possible to isolate and distinguish individual body motions in an efficient manner, which enhances the accuracy of recognition.
• Convolutional Neural Networks (CNNs) are a strong tool for feature extraction that may be used to collect and represent the prominent elements of body motions, leading to more precise and in-depth detection.
• The novel addition of the Crow Search Algorithm increases the performance of the model as a whole, improving its effectiveness and versatility while lowering computing complexity.
• Using an Optimized Probabilistic Neural Network for gesture classification solves the drawbacks of previous classification techniques and improves system responsiveness by ensuring precise and effective identification.

The following is a structured format for the paper. Section 2 investigates comparable works in the field, Section 3 outlines the issue statement, and Section 4 describes the approach of the proposed model, encompassing numerous components. Section 5 summarizes the findings and leads a discussion, while Section 6 wraps up the work by summarizing the findings and considering future ramifications.

2. Related works

The investigation of depth cameras is prompted by RGB cameras’ difficulties with variable lighting. León et al. proposed a work that used depth cameras and lightweight convolutional neural networks (CNN) to create a video-based hand gesture identification system.¹⁹ A curated dataset was employed to effectively identify and categorize hand movements. The assessment of categorization accuracy with a finite number of frames per gesture in videos was emphasized. The performance of RGB cameras was compared with depth cameras. Based on accuracy and inference time, the model’s performance on edge computing devices was assessed and compared to other models. The present research provides a thorough examination of video-based hand gesture detection and makes the case for lightweight CNN models and depth camera systems to enhance their practicality. The study’s limitation is its restricted investigation of the suggested video-based hand gesture detection system’s real-world application and generalizability. While the use of depth cameras addresses the lighting issue associated with RGB cameras, the study does not thoroughly analyse the possible challenges in various and complicated real-world situations. The evaluation on edge computing devices and comparison to benchmark models is valuable; nevertheless, it lacks a complete investigation of the system’s performance in numerous practical settings, such as crowded backgrounds, multiple users, or complex lighting conditions.

The increasing number of individuals who are deaf or hard of hearing, along with the shifting applications landscape of vision-based applications and touchless control in ubiquitous gadgets, have led to an increased importance of automatic hand gesture detection in recent years. A dependable system that considers both temporal and geographical aspects is necessary for hand gesture identification, which is crucial to sign language interpretation. Determining distinctive spatiotemporal features for hand motion sequences remains a challenging task. Al-Hammadi, Muhammad, Abdul, Alsulaiman, Bencherif, and Mekhtiche provide an effective technique for hand gesture detection using deep convolutional neural networks, with a focus on utilizing transfer learning to solve the limited availability of large labelled hand gesture datasets.²⁰ The complexity and resource-intensiveness of the suggested approach, in particular the utilization of three instances of 3DCNN for feature extraction from various video segments, are a downside of this study. The study also notes that hyperparameter tuning is necessary, implying that the Configuration of the suggested model might not be optimized, which could make it more difficult to implement in practical settings.

The gated recurrent unit (GRU) neural network layer finds long-term dependencies in hand gesture temporal sequences, whereas the attention layers identify short-term patterns. Khodabandelou et al. extensively evaluates the suggested model’s effectiveness and compare it to cutting-edge techniques in the area based on several factors.²¹ The challenge of detecting human hand gestures using deep learning techniques is discussed in this essay for practitioners. By utilizing natural correlations and extracting the most important elements of historical motion sequences such as their temporal, complex, and nonlinear characteristics—the model can anticipate hand gestures using wearable capacitance sensors. The study looks at how different lengths of historical motion sequences affect prediction accuracy, providing a more efficient alternative to time-consuming data collecting, expensive data processing, and high computational needs. The model demonstrates its competitiveness and capacity to replicate big activity trends in key channels by exhibiting good performance on real-world data and comparisons with known classifiers. However, several drawbacks could make the study less useful in real-world scenarios. For example, the suggested model might be sensitive to the historical motion sequence length used, necessitating fine-tuning for best results. The model’s acceptance in the actual world may be impacted by concerns concerning wearing capacitance sensor comfort and user acceptability. The results’ generalizability might also be restricted because the model’s performance is mostly evaluated using data from real-world applications; its applicability in various uncontrolled contexts is still unknown.

Hand gesture recognition is emerging as a viable solution in the world of digital entertainment, due to developments in sensors and machine learning. Nonetheless, the complicated structure of hand gesture identification offers difficulty in many existing models, owing to factors such as backdrop clutter, motion blur, fluctuations in illumination, and occlusions. In this study, Madni, Vijaya et al. proposed a dynamic method for identifying hand gestures is introduced to improve the overall performance of this activity.²² Initially, a normalization technique is used to improve the visibility of gesture photos supplied from the Indian sign language dataset. Following that, a semi-vectorial multilevel segmentation algorithm is used to precisely identify the gesture regions within the normalized images. The procedure is then repeated using an updated relief algorithm and K-nearest neighbour classifiers. It is important to recognize some limitations even if the enhanced relief-KNN model shows notable improvements in hand gesture recognition. Firstly, most evaluations of the model’s performance are conducted in the controlled setting of the Indian sign language database, which might not adequately capture the nuances of real-world scenarios with different lighting and backgrounds. When handling dynamic or unpredictable hand movements, the segmentation and preprocessing processes of the model may not be as reliable, which affects its efficacy. Although efficient, the KNN classifier’s reliance may not be as appropriate for handling bigger or more varied gesture datasets, and its computational requirements may provide difficulties in real-time applications.

Facial expression-based emotion identification is a key area in the field of human-computer interaction. Numerous problems, such as position fluctuations, uneven illumination, and facial accessories, are encountered in this field. Due to the requirement for simultaneous feature extraction and classifier optimization, traditional techniques for emotion detection have constraints. An increasing amount of attention has been paid to using deep learning techniques to address this. Deep learning techniques are now the most popular for handling classification jobs. Using transfer learning techniques, the inquiry in this paper is focused on emotion recognition. Pre-trained networks like Resnet50, vgg19, Inception V3, and Mobile Net are used in this approach by (Chowdary, Nguyen, and Hemanth.²³ These pre-trained Conv-Nets’ fully connected layers are eliminated and swapped out for specially designed completely connected layers that are suited to the demands of the particular emotion recognition task at hand. Notwithstanding the encouraging outcomes, it is important to recognize the limitations of this study. A single database, in this case CK+, is predominantly used for the evaluation of the proposed facial expression detection system, which may not accurately reflect the variety of real-world circumstances, emotions, or demographic variances. Examining the model’s performance on a wider variety of datasets with more variability would allow for more in-depth analysis. Although the accuracy of pre-trained convolutional neural networks is outstanding, there has been little research done on the computational resources needed, which might be a drawback for real-time or resource-constrained applications.

The fields of hand gesture recognition and facial emotion detection have benefited greatly from the recent research in human-computer interaction. To overcome the shortcomings of RGB cameras, hand gesture recognition research focuses on depth cameras and lightweight convolutional neural networks, but it also calls for resource-intensive models. In the meantime, face expression identification using deep learning algorithms is state-of-the-art. Nevertheless, the evaluation of this research frequently depends on a single dataset, which may limit their practical relevance. Real-time or resource-constrained applications have difficulties due to the computing demands of deep learning models. Although human-computer interaction is being advanced by these studies, they also highlight the necessity for flexible solutions that can address a variety of real-world issues in these rapidly developing sectors.

3. Problem statement

The lack of attention paid to adaptability to different and complex situations and real-world applicability is the common fault throughout the above literature studies. The video-based hand gesture detection system with depth cameras provides an answer to the problems associated with RGB cameras’ lighting conditions, but it does not fully handle the problems that could occur in real-world situations, such as crowded backgrounds, multiple users, and changing lighting conditions. Motion sequence-based hand gesture recognition exhibits strong performance in both signer-independent and signer-dependent modes; however, it faces challenges related to the demands on computational resources, sequence length sensitivity, and practical optimization, which may hinder its applicability for real-time applications and edge devices. Few studies have explored the computational resource requirements for broader real-time or resource-constrained applications, and the facial expression recognition models that have been primarily evaluated on a single dataset may not adequately account for the complexities of real-world scenarios, emotions, and demographic variations.²³ Altogether, these constraints highlight the need for more all-encompassing and flexible solutions to deal with the complex issues these developing fields present.

4. Methods

4.1 Proposed Crow Search Algorithm optimized probabilistic neural network for body gesture recognition

The proposed model’s methodology part is organized in an organized way. To improve the quality of the incoming data, Wavelet Transform-based pre-processing is first used. Then, the Marker-Based Watershed Algorithm is used to segment the data so that different body movements may be recognized. Convolutional Neural Networks (CNNs) are used for feature extraction. This method is novel in that it incorporates the Crow Search Algorithm, which improves the model’s performance to a great extent. An Optimized Probabilistic Neural Network is used to accurately classify body motions in the context of human-computer interaction. This all-encompassing approach blends several tried-and-true methods with state-of-the-art algorithms to produce a strong framework that enhances body gesture identification and advances the field of human-computer interaction. Figure 1 describes the block diagram of the proposed method.

Figure 1. Block diagram of the proposed CSA optimized PNN.

4.2 Data collection

A significant issue continues in the area of data collecting for synthesizing realistic and human-like conversational gestures due to the scarcity of datasets, models, and consistent evaluation standards. The Body-Expression-Audio-Text dataset, or BEAT, was created as a solution to this challenge. This BEAT dataset is used in this study to recognize body gestures. BEAT contains 76 hours of high-quality multi-modal data collected from 30 speakers engaged in conversations with eight various emotions and in four various languages. This dataset also includes a thorough annotation of 32 million frame-level emotion and semantic relevance annotations. An in-depth analytical examination of BEAT finds relationships between communicative gestures and numerous aspects such as facial expressions, emotions, semantics, audio, text, and speaker identity.²⁴

4.3 Wavelet transform based pre-processing

The accuracy of data preprocessing, the initial stage after data collection, is critical to the success of gesture recognition systems. This preprocessing procedure is critical to improving the dataset’s quality and usefulness. Wavelet denoising is a potent method for cutting noise from pictures and videos without sacrificing key details. Using a wavelet transform, thresholding the wavelet coefficients to eliminate noise, and finally rebuilding the denoised image are the steps in the process.

A wavelet transforms, usually the discrete wavelet transform (DWT) or the continuous wavelet transform (CWT), is used to convert the image into the wavelet domain. The DWT is frequently employed in real-world applications.

(1)

w (x, y) = \int_{- \infty}^{\infty} ψ_{x, y} (t) dt

In Eqn. (1), the wavelet coefficient at scale x and location y is represented by the symbol $w (x, y)$ . The initial image or signal is represented by $X (t)$ . The wavelet function at scale x and location y is represented by the symbol $ψ_{x, y} (t)$ .

To eliminate noise, the coefficients are threshold in the wavelet domain. Either hard or soft thresholding can be used to complete the thresholding. It is given in Eqn. (2).

(2)

w_{denoised} (x, y) = {\begin{cases} w (x, y) & if | w (x, y) | > threshold \\ 0 & otherwise \end{cases}

The denoised image or video is obtained by transforming the denoised wavelet coefficients back to the spatial domain is given in Eqn. (3).²⁵

(3)

X_{denoised} (t) = \int_{- \infty}^{\infty} w_{denoised} (x, y) ψ_{x, y} (t) dx

4.4 Marker based watershed algorithm for segmentation

Marker-based watershed segmentation is an important image processing approach that is utilized in a variety of applications, including body gesture recognition. This technique is frequently used after wavelet preprocessing to segment and identify regions of interest in an image. An approach for segmenting images is the marker-based watershed algorithm. The concept is predicated on viewing a picture as a topographic surface, with elevations represented by grayscale values. The algorithm uses “markers” or seeds to flood this surface, dividing it into regions, or catchment basins. The markers in the figure represent the desired locations of interest. To recognize body gestures, a pre-processed image can be divided into regions representing various body parts, including the head, arms, legs, and torso, using the marker-based watershed technique. Later analysis, such as tracking body motions or identifying certain gestures, can be done using these divided sections.

To identify suitable markers for the watershed technique, the gradient image of the pre-processed image is generated. This could be shown in Eqn. (4).

(4)

G (x, y) = \nabla i (x, y) = (\frac{\partial i}{\partial x}, \frac{\partial i}{\partial y})

In Eqn. (4), the gradient at pixel $(x, y)$ is represented by $G (x, y)$ . The pre-processed image is represented by $i (x, y)$ . Potential markers are represented as local minima in the gradient image. These minima represent the starting points for the flooding process. The regional minima can be defined in Eqn. (5).

(5)

m (x, y) = 1, if G (x, y) is a local minimum, otherwise 0

The gradient image is transformed into catchment basins using the watershed transformation. This can be expressed in Eqn. (6)

(6)

w (x, y) = watershed [G (x, y), m (x, y)]

In Eqn. (6), $w (x, y)$ is the segmented image. The image with gradients is $G (x, y) .$ ²⁶ The marker image is $m (x, y)$ . For segmenting body gestures in pre-processed images, the marker-based watershed algorithm can be an effective tool. It uses gradient information and markers to segment an image into discrete sections that can then be used for gesture detection or additional analysis.

4.5 CNN-based feature extraction

Convolutional Neural Networks (CNNs) are used to autonomously identify and extract significant properties from the segmented regions after the segmentation stage. CNNs are very good at recognizing spatial and hierarchical information inside the segmented areas, which makes them ideal for body gesture recognition. The CNN layers, comprising the pooling and convolutional layers, facilitate the network’s ability to acquire and portray discriminative characteristics from every divided area. CNNs identify and extract local patterns and features from the input images by using convolutional layers. A collection of learnable filters, commonly referred to as kernels, are applied to the input image during the convolution process. To create feature maps, each filter computes the dot product after swiping a window across the image.

(7)

f_{x, y} = {(i \times k)}_{x, y} = \sum_{m} \sum_{n} i_{x + m, y + n} ∙ k_{m, n}

Whereas Eqn. (7), the value at points $(x, y)$ the feature map is $f_{x, y}$ . The input image is denoted by $i$ . The filter is denoted by k. The filter’s indices are m and n.

An activation function is added to the network following the convolution operation to add non-linearity. A popular option is the Rectified Linear Unit (ReLU). In Eqn. (8), the output of the activation function is $O (x)$ . The supplied value is x.

(8)

O (x) = max (0, x)

The feature maps are down-sampled by using pooling layers, which lowers their spatial dimensions. A common technique is called “max pooling,” in which the maximum value is held for each zone. $p (x, y)$ is the value in the pooled feature map. A value in the initial feature map is denoted by $f_{x, y}$ . It is given in Eqn. (9).

(9)

p (x, y) = max (f_{x, y}, f_{x + 1, y}, f_{x, y + 1}, f_{x + 1, y + 1})

The completely connected layers receive the flattened feature maps as their input. It is shown in Eqn. (10).

(10)

v = [p (1, 1), p (1, 2), \dots \dots ., p (m, n)]

The feature vectors that have been flattened are linked to one or more dense layers, or completely connected layers. Based on the learnt attributes, these layers carry out classification and decision-making. More neurons than classifications or motions in the dataset are usually seen in the final fully connected layer. Using a SoftMax function, class probabilities are obtained. It is given in Eqn. (11).

(11)

p (y = x | X |) = \frac{e^{w_{x} . X + b_{x}}}{\sum_{y = 1}^{c} e^{w_{y} . X + b_{y}}}

The probability of class x is given by $p (y = x | X |)$ . The feature vector is denoted by X. The weights and biases for class x is denoted by $w_{x}$ and $b_{x}$ . The total number of classes is denoted by C.²⁷

4.6 Crow search algorithm for enhancing the proposed model

Crow search algorithm is focused in the proposed model to enhance the with all possibilities.²⁸ Inspired by the extraordinary cognitive abilities of crows—which are considered to be among the brightest bird species because to their comparatively large brains compared to their body size—the CSA is a novel meta-heuristic algorithm that has gained recognition for its intelligence. Crows are incredibly intelligent animals that have been well-studied and documented. They are self-aware in mirror tests, skilled tool makers, have perfect facial recognition, and have an amazing memory for storing food for months at a time. CSA uses the idea that crows are perceptive observers and learners of their environment to improve the suggested model for body gesture detection. Crows use their prior experiences to anticipate and avert prospective problems, much as they keep an eye on other birds to locate hiding spots and take advantage of food opportunities. CSA refers to a group of crows who collaborate to optimize tactics for the recognition model, remember important sites collectively, and protect against unforeseen obstacles. An effective and flexible method for improving the body gesture recognition model is made possible by the similarities between the actions of the crow and CSA. The algorithm of crow search optimization is depicted in Figure 2.

Figure 2. Flow diagram of Crow Search Algorithm.

To find out where Crow K is hiding this iteration, Crow j wishes to follow Crow K. In this situation, the following conditions could manifest:

Condition 1 : The goal of crow j in this scenario is to find crow k’s hiding place without crow k’s help or prior knowledge. This procedure is used to decide Crow J’s new position, given in Eqn. (12)

(12)

z^{j, rep + 1} = z^{j, rep} + s_{j} \times flight {len}^{j, rep} \times (N^{K, rep} - z^{j, rep})

A random number produced from a uniform distribution covering the interval between 0 and 1 is referred to in this context as the variable $s_{j}$ . The flight length of Crow $j$ during the repetition designated as $rep$ is indicated by the notation $flight {len}^{j, rep}$ .

Condition 2 : Crow k may effectively share its expertise with Crow J by guiding the latter to its cache. In the second case, though, crow k might decide to trick crow j by moving to a new spot in the search space, knowing that crow j is pursuing it, and protecting its cache from possible theft. Eqn. (13) can be used to sum up these two states:

(13)

z^{j, rep + 1} = {\begin{cases} z^{i, rep} + s_{j} \times flight {len}^{j, rep} \times (N^{k, rep} - z^{j, rep}) & s_{k} \geq k p^{k, rep} \\ a random position & otherwise \end{cases}

The following briefly describes the CSA implementation:

Step. 1. Identify the variables that can be changed in CSA, such as flock size (N), maximum repetitions $(repmax)$ , flight duration ( $flight len$ ), and knowledge probability.

Step. 2 . In an a-dimensional search space, distribute N crows at random; each crow indicates a workable solution with a set of decision variables using Eqn. (14).

(14)

crows = [\begin{array}{l} z_{1}^{1} & z_{2}^{1} & \dots & z_{b}^{1} \\ z_{1}^{2} & z_{2}^{2} & \dots & z_{b}^{2} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ z_{1}^{n} & z_{2}^{n} & \dots & z_{b}^{n} \end{array}]

This is where each crow’s memory is initialized. The birds’ food at the first positions is thought to have vanished since they were inexperienced at the first repetition. It is given in Eqn. (15).

(15)

memory = [\begin{array}{l} {lr}_{1}^{1} & {lr}_{2}^{1} & \dots & {lr}_{b}^{1} \\ {lr}_{1}^{2} & {lr}_{2}^{2} & \dots & {lr}_{b}^{2} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ {lr}_{1}^{n} & {lr}_{2}^{n} & \dots & {lr}_{b}^{n} \end{array}]

Step 3 . Based on the values of its decision variables, the fitness of each crow is evaluated in the objective function to determine its position.

Step.4. Crows in the search area reposition themselves by arbitrarily choosing a different crow (such as crow k) to use as a point of reference to gauge its success ( $l^{j}$ ). The new locations for every crow in the field are determined using equation (13).

Step 5 . Check if the new locations of each crow are stable. Crows only adjust their places if the new locations are stable; if not, they stay where they are.

Step.6. Determine the value of fitness for every new location of the crow.

Step 7. Each crow’s memory is updated using Eqn. (16).

(16)

{lr}^{i, rep + 1} = {\begin{cases} z^{j, rep + 1} flight len (z^{j, rep + 1}) \\ {lr}^{j, rep 0 . w} \end{cases} \geq flight len ({lr}^{j, rep})

Step 8 . Repeat steps 4–7 until the $(repmax)$ is obtained. As the optimization problem is solved, the optimal memory location about the objective function value will be displayed after the termination requirement is satisfied.²⁹

4.7 Optimized Probabilistic Neural Network for classification

The input layer, pattern layer, summation layer, and output layer make up the four layers of the hierarchical architecture of the Probabilistic Neural Network (PNN), a potent classifier. PNN’s supervised learning method, which takes its cues from Bayesian networks, makes it very good at pattern recognition tasks. Though PNN already performs admirably in many applications, optimization strategies can further increase its accuracy and efficiency. One improvement is the PNN model’s fine-tuning through the bid of the CSA. The crow’s hunting habit served as the inspiration for CSA, a nature-inspired optimization algorithm that is well-known for its ability to quickly find the best answers in challenging problem spaces. By utilizing the CSA on PNN, it is possible to enhance the network’s parameters and hyperparameters, resulting in a classifier that is more reliable and accurate. This optimization procedure aids in determining the PNN’s ideal design, improving its capacity to handle challenging classification tasks. A probabilistic neural network that is optimized for classification is produced when PNN and the Crow Search Algorithm are combined. Along with maintaining the intrinsic strengths of PNN, namely its capacity to deal with nonlinear issues well, this improved model also gains from the CSA’s optimization powers. Figure 3 describes the architecture diagram of PNN.

Figure 3. Architecture diagram of Probabilistic Neural Network.

Normalized characteristic vector ‘B’ of intrusion category ‘z’ for test samples is accepted by the input layer. Eqn. (17) defines the connection weighting ‘W,’ which is the transpose of ‘B’ and is multiplied by this vector. Next, as indicated by Eqn. (18) the resultant value ‘I’ is transferred to the pattern layer.

(17)

w = b^{t} = {[b_{1 z}, b_{2 z}, \dots \dots . b_{16 z}]}^{2}, z \in (1, 2, 3, 4)

(18)

i = bw

The characteristic vector’s $δ$ dimensions and the total number of training samples are matched by the number of neurons in the pattern layer and input layer, respectively. 16 calculated characteristics are used as target parameters in the PNN model during training. 100 neurons are set aside in the pattern layer for every category of intrusion. For each element in the training sample and test sample, these neurons calculate the Euclidean distance using a Gaussian function. Eqn. (19) specifies the output result for pattern nodes of category z, represented as $G_{sz} (i)$ .

(19)

G_{sz} (i) = \frac{1}{{(2 π)}^{\frac{δ}{2 σ^{δ}}}} exp (\frac{- ∥ i - i_{sz} ∥^{2}}{2 δ^{2}})

Within the intrusion category z, where ‘s’ varies from 1 to 100, $i_{sz}$ denotes the centre of training sample ‘s’. The distance between two vectors is calculated using the function indicated by the notation ‘||·||’. Within the summing layer, the total amount of intrusive event types is matched by the number of summation nodes. Each neuron in this layer calculates the sum of the outputs from the pattern nodes associated with the same category. This summation is then divided by the total number of neurons in the pattern layer, as expressed in Eqn. (20).

(20)

X_{z} = \frac{1}{100} \sum_{s = 1}^{100} G_{sz}

Neuron ‘z’s estimated density function is represented by the symbol $X_{z}$ in the output layer. The output result is represented as $ρ$ . It is given in Eqn. (21).

(21)

ρ = arg max (X_{z})

The fitness function that can be used to assess the similarity or equivalency between the expected and actual outcomes in the PNN model when the neuron in the summation layer has the highest predicted probability value is defined by Eqn. (22), which also determines cost function C.

(22)

C = - \frac{1}{100} \sum_{s = 1}^{100} \sum_{z = 1}^{4} X_{z} ∙ log (X'_{z})

With the smoothing factor adjusted, a more accurate forecast is indicated by the decreasing cross-entropy cost function C. The objective of these experiments is to improve classification precision by using the Improved Salp Swarm Algorithm (ISSA) as the optimizer to find the ideal smoothing factor, denoted as ‘Salp’.

5. Results and Discussion

An extensive analysis of the suggested model and its parts is provided in the study’s results section. Python is the implementation tool for this study. It has been shown that using the Wavelet Transform for pre-processing can improve gesture identification by lowering noise and improving data quality. Accurate feature extraction has been made possible by the segmentation process brought about by the Marker-Based Watershed Algorithm. As a result, the model’s capacity to recognize complex movements is improved. CNN-Based Feature Extraction has demonstrated its ability to identify subtle patterns and nuances in body gestures. The model’s accuracy and performance in gesture identification have improved with the addition of the Crow Search Algorithm. The model’s robustness and dependability are demonstrated by the high accuracy and decreased classification errors obtained from using the Optimized Probabilistic Neural Network for classification. Together, these findings demonstrate the effectiveness and promise of the suggested strategy in improving human-computer interaction by accurately and effectively recognizing body gestures.

5.1 Performance analysis

Precision, sensitivity, F1-score, and accuracy are among the evaluation metrics used to analyse the performance of the body gesture recognition model. These parameters are presented below and serve as measures for model evaluation:

Accuracy: It calculates the percentage of correct predictions, including both TP and TN, over all occurrences examined. It is referred to as Eqn. (23).

(23)

A = \frac{T (p + n)}{T (p + n) + F (p + n)}

Precision: The ratio of accurately anticipated positive instances to total expected positive instances is denoted as precision. Precision is calculated using Eqn. (24).

(24)

P = \frac{Tp}{Tp + Fp}

Sensitivity: Eqn. (25) shows how to calculate recall, where $Fp$ denotes false positives, $Fn$ refers false negatives, $Tp$ denotes true positives, and $Tn$ is true negatives.

(25)

S = \frac{Tp}{Tp + Fn}

F1 score: High recall and high accuracy are desirable in the context of body gesture identification, but they frequently come at a cost. The F1-score—the harmonic means of recall and accuracy is used to take into consideration both factors, as demonstrated by Eqn. (26).

(26)

F 1 score = 2 \times \frac{Pre \times sensitivity}{Pre + sensitivity}

Specificity: It is depicted in Eqn. (27).

(27)

Specificity = \frac{Tn}{Tn + Fp}

5.2 Accuracy of proposed body gesture recognition model with existing models

A statistical assessment technique called accuracy analysis is employed to evaluate a model’s overall prediction accuracy in a binary or multiclass classification problem. To determine the overall performance of the model, it computes the ratio of accurate predictions (including true positives and true negatives) to the total number of examples analysed.

A evaluation of the proposed body gesture recognition method (PNN) with the existing models—DT, SVM and INN—is shown in Figure 4. The models’ various learning percentages are shown on the x-axis, and the accuracy values that correspond to them are shown on the y-axis. The suggested model’s improved accuracy in body gesture recognition over the alternatives is illustrated by the graph, which offers a clear visual depiction of how it performs better or matches existing models across various learning percentages.

Figure 4. Accuracy of proposed body gesture recognition model with existing models.

5.3 Sensitivity analysis

A statistical metric called sensitivity, which is often referred to as true positive rate or recall, counts the percentage of real positive cases that a model or test accurately recognized. In binary classification and medical diagnostics, it is an essential indicator that shows how well the model can identify and categorize real positive cases while reducing false negatives. Calculating sensitivity involves dividing the total number of true positives by the total number of false negatives.

The proposed body gesture recognition model (PNN)’s sensitivity and recall are compared to those of other models, such as DT, SVM and INN, in the Figure 5 graph. The models’ different learning percentages are displayed on the x-axis, while the appropriate sensitivity values are displayed on the y-axis. This graph shows how well the suggested model and the current models identify and accurately categorize genuine positive events when it comes to body gesture recognition over various learning percentages. It highlights the model’s sensitivity performance by showcasing its capacity to reduce false negatives and successfully identify real positive cases.

Figure 5. Sensitivity analysis.

5.4 Specificity analysis

The proposed body gesture recognition model (PNN) is evaluated in comparison to other models, including DT, SVM and INN, in Figure 6. Specificity is a measure of the true negative rate. The y-axis shows the corresponding specificity values, while the x-axis shows the various learning percentages applied to these models. This graph presents a visual representation of each model’s performance over different learning percentages in terms of how well it detects non-target situations or true negatives. It offers important information on the specificity of the models for body gesture recognition by demonstrating their capacity to reduce false positives and correctly identify cases that do not belong to the target class.

Figure 6. Specificity analysis.

5.5 Precision analysis

A statistical measure known as precision assesses how well a model predicts the positive outcomes of a binary or multiclass classification task. It provides information on the model’s capacity to reduce false positive mistakes and generate accurate positive predictions. It is computed as the ratio of true positive cases to the total instances projected as positive.

The suggested body gesture recognition model (PNN) is compared to other models, such as DT, SVM, and INN, in Figure 7. Precision is a metric that quantifies the accuracy of positive predictions. The precision values are displayed on the y-axis, while the x-axis shows the different learning percentages that were applied to these models. This graphic illustrates how well each model performs in terms of reducing false positive mistakes and producing precise positive predictions, especially when it comes to body gesture recognition at various learning percentages. The graph highlights the models’ success in minimizing false positive predictions while maximizing correct ones, giving insights into their capacity to generate accurate positive outcomes.

Figure 7. Precision analysis.

5.6 False positive rate

A statistical indicator used to assess a model’s performance, especially in binary classification problems, is the FPR, often called the False Alarm Rate. The percentage of negative cases (true negatives) that the model mistakenly classifies as positive (false positives) is what it measures. FPR is a useful supplementary measure to True Negative Rate (TNR), which measures the model’s accuracy in classifying negative instances. It offers information on the model’s ability to differentiate between true negative cases and false positive predictions. A crucial indicator for evaluating a model’s specificity and capacity to reduce false positive or false alarm errors is false positive rate (FPR).

Figure 8 presents a comparative study of the False Positive Rate (FPR) for several models. The models that have been trained at different learning percentages include the proposed body gesture recognition model (PNN), DT, INN, and SVM. The y-axis shows the relevant accuracy numbers, while the x-axis shows these learning percentages. This graph provides a visual depiction of each model’s performance in terms of FPR when it comes to body gesture recognition, showing how well it can differentiate between real negative instances and false positive predictions. It offers important insights into the models’ capacity to reduce false positive errors while preserving overall body gesture recognition accuracy.

Figure 8. False positive rate.

5.7 False negative rate

A statistical parameter called the False Negative Rate (FNR), sometimes referred to as the Miss Rate, is used to assess a model’s performance, especially in binary classification issues. It calculates the percentage of positive cases (true positives) that the model mistakenly labels as negative (false negatives). FNR measures the rate at which positive examples are overlooked or mistakenly classified as negative, and it offers insights on the model’s capacity to identify and accurately categorize genuine positive cases. FNR is frequently used to evaluate a model’s sensitivity and capacity to reduce false negative rates.

Figure 9 compares the False Negative Rate (FNR) of various models, including the DT, Support Vector Machine (SVM), INN and the proposed body gesture recognition model (PNN). On the x-axis, these models are assessed across various learning percentages. The appropriate accuracy numbers are shown on the y-axis. This graph shows how well each model finds and classifies true positive instances while quantifying the rate at which positive examples are ignored or wrongly labelled as negative. It provides useful insights into the models’ sensitivity and performance in reducing the rate of false negatives in the context of body gesture recognition while preserving overall accuracy.

Figure 9. False negative rate.

5.8 Negative predictive value

A statistical measure called Negative Predictive Value (NPV) is used to evaluate a model’s accuracy, especially in binary classification issues. It measures the ratio of accurately predicted negative instances, or true negative instances, to all instances that were anticipated to be negative. NPV gives information on how well the model can remove cases that do not fall into the positive class and minimize false negative errors by accurately identifying true negatives.

Figure 10 compares the Negative Predictive Value (NPV) of an INN, DT, Support Vector Machine (SVM), and the suggested body gesture recognition model (PNN) among other models. The accuracy values are displayed on the y-axis, while the x-axis shows different learning percentages applied to these models. Specifically in the context of body gesture recognition at various learning percentages, this graph shows how well each model detects real negative instances and minimizes false negative errors. It sheds light on the models’ overall accuracy in negative predictions as well as their dependability in rejecting examples that do not fall into the positive class.

Figure 10. Negative predictive value.

5.9 False discovery rate

In the context of multiple hypothesis testing, the FDR is a statistical indicator that is employed to evaluate the precision of positive predictions. The ratio of false positive cases to all cases anticipated as positive is its definition. Put otherwise, the false discovery rate (FDR) quantifies the frequency at which a model’s positive predictions are shown to be falsified or inaccurate. It finds frequent application in fields like scientific research or genetics where limiting the rate of false discoveries is essential.

The False Discovery Rate (FDR) for various models, including the DT, Support Vector Machine (SVM), INN and the proposed body gesture recognition model (PNN), is compared in Figure 11. This graph shows how well each model manages and minimizes the rate of false discoveries over different learning percentages. It provides insight into the models’ capacity to detect positive instances and restrict the frequency of false positive mistakes while preserving overall accuracy.

Figure 11. False discovery rate.

5.10 F1-Score

A statistical indicator called the F1 Score is used to evaluate a model’s classification accuracy, especially in binary or multiclass issues. It is a balanced assessment of a model’s performance that takes into account both precision and memory. When there is an unequal distribution of classes, the F1 Score is especially helpful since it strikes the ideal balance between reducing false positives and false negatives. Better overall model performance is indicated by a higher F1 Score is the ideal score.

Figure 12 presents a comparative evaluation of the F1 Score for several models, such as the suggested body gesture recognition model (PNN), DT, Support Vector Machine (SVM), and INN. The y-axis shows the relevant accuracy scores, while the x-axis shows different learning percentages applied to these models. This graph provides a unified assessment of each model’s overall performance in identifying body motions across various learning percentages by providing a visual depiction of how well it balances recall and precision.

Figure 12. F1-Score.

5.11 Mathew’s correlation coefficient

An analytical tool for evaluating binary classification models’ quality is the Matthews Correlation Coefficient (MCC), which is especially useful when class distributions are unbalanced. In order to quantify the degree of correlation between the true class labels and the model’s predictions, true positives, true negatives, false positives, and false negatives are examined. MCC provides a fair assessment of a model’s performance by determining the relationship’s strength and direction. The scale goes from 0 (no association) to +1 (perfect prediction) (perfect inverse prediction). MCC is frequently used in many domains, such as machine learning, epidemiology, and bioinformatics, to assess the performance of classification models. It is especially helpful when the dataset includes unequal class proportions.

A comparison of the Matthews Correlation Coefficient (MCC) for several models, including the suggested body gesture recognition model (PNN), DT, SVM, and INN, is provided in Figure 13. Taking into account true positives, true negatives, false positives, and false negatives, this graph shows how well each model builds the relationship between its predictions and actual class labels. MCC is an important indicator since it provides a fair assessment of the model’s performance, especially when it comes to body gesture recognition at varying learning percentages. It provides a trustworthy indicator of the models’ classification quality and enables a thorough understanding of how well they can manage unbalanced class distributions.

Figure 13. Mathew's correlation coefficient.

5.12 Evaluation of proposed optimized PNN with existing models

Table 1 and Figure 14 shows a comparison of four distinct machine learning techniques: DT, Support Vector Machines (SVM), an INN and a Proposed Optimized Probabilistic Neural Network (PNN). Although the DT approach produced an F1-score of 54%, its accuracy was the greatest at 98%; nevertheless, its precision and recall were comparatively lower at 53.3% and 47.3%, respectively. A 51% F1-score was obtained by SVM, which had an accuracy of 96.6%, precision and recall values of 51% and 33.8%. With a 97.3% accuracy rate, precision and recall rates of 53.5% and 43%, respectively, and an F1-score of 53.5%, the INN model outperformed the others on the little margin.

Table 1. Evaluation of proposed optimized PNN with existing models.

Methods	Accuracy (%)	Precision (%)	Recall (%)	F1-score (%)
DT	98	53.3	47.3	54
SVM	96.6	51	33.8	51
INN	97.3	53.5	43	53.5
Proposed Optimized PNN	99	84.3	55	78

Figure 14. Comparison of proposed optimized PNN with existing models.

With a remarkable accuracy of 99%, a far higher precision rate of 84.3%, a recall of 55%, and an F1-score of 78, the Proposed Optimized PNN beat the other approaches, indicating that it would be a good option for the current task.

5.13 Discussion

The results section includes a thorough analysis of the suggested model’s performance in comparison to other models. Numerous metrics are covered by the analysis, such as F1-Score, Mathew’s correlation coefficient, accuracy, sensitivity, specificity, precision, FPR, FNR, NPV, and FDR. This comprehensive analysis makes it possible to comprehend the model’s body gesture recognition abilities and usefulness in the context of human-computer interaction. By taking into account these many criteria, the debate offers insightful information on the advantages and disadvantages of the suggested approach as well as how it might improve gesture recognition technology’s ability to facilitate human-computer interaction.

6. Conclusion and future work

The field of gesture recognition technology has advanced significantly as a result of this work. Using a multimodal strategy that includes CNN Based Feature Extraction, Marker Based Watershed Algorithm for Segmentation, Wavelet Transform Based Pre-Processing, Crow Search Algorithm for Model Enhancement, and an Optimized Probabilistic Neural Network for Classification, this study has shown promise for improving gesture recognition in human-computer interaction. Promising results have been obtained by integrating the Crow Search Algorithm and the Probabilistic Neural Network optimization; these demonstrate the model’s high degree of efficiency and precision in recognizing and classifying body motions. The model’s resilience has been increased while the accuracy has been further improved by applying the Marker Based Watershed Algorithm for segmentation. These accomplishments have the potential to completely transform how we engage with technology, improving its accessibility and intuitiveness. The suggested paradigm creates opportunities in a variety of industries, including industrial automation, assistive technology, gaming, healthcare, and more by precisely recognizing and reacting to human movements. This research offers a significant addition to the constantly changing field of human-computer interaction and has the potential to revolutionize how humans interact with technology and communicate. This work represents a major advancement in the use of novel algorithms and creative approaches to harness the power of body gesture detection for more effective, realistic, and engaging interactions with computers and other digital devices. This study has intriguing potential for future studies. To improve the model’s flexibility and inclusivity, future research might concentrate on growing the dataset to include a greater variety of motions and user demographics. Second, there is a chance for practical implementation by looking at real-time applications and incorporating the system into other interfaces including virtual reality, healthcare, and smart homes. Further research into how well the model performs in various environmental settings and the development of robustness against noise and occlusions would be beneficial.

Ethics and consent

Ethical approval and consent were not required.

Data availability statement

The third-party data is available using computer generated images of humans by using https://pantomatrix.github.io/BEAT/.

References

1. Behera A, Matthew P, Keidel A, et al.: Associating Facial Expressions and Upper-Body Gestures with Learning Tasks for Enhancing Intelligent Tutoring Systems. Int. J. Artif. Intell. Educ. Jun. 2020; 30(2): 236–270. Publisher Full Text
2. Chang V, Eniola RO, Golightly L, et al.: An Exploration into Human–Computer Interaction: Hand Gesture Recognition Management in a Challenging Environment. SN Comput. Sci. Jun. 2023; 4(5): 441. PubMed Abstract | Publisher Full Text | Free Full Text
3. Dunne R, Morris T, Harper S: A Survey of Ambient Intelligence. ACM Comput. Surv. Apr. 2021; 54: 1–27. Publisher Full Text
4. Gupta S, Bagga S, Sharma DK: Hand Gesture Recognition for Human Computer Interaction and Its Applications in Virtual Reality. Advanced Computational Intelligence Techniques for Virtual Reality in Healthcare. Gupta D, Hassanien AE, Khanna A, editors. Cham: Springer International Publishing; 2020; pp. 85–105. in Studies in Computational Intelligence. Publisher Full Text
5. Lv Z, Poiesi F, Dong Q, et al.: Deep Learning for Intelligent Human–Computer Interaction. Appl. Sci. Nov. 2022; 12(22): 11457. Publisher Full Text
6. Al-Hammadi M, et al.: Deep Learning-Based Approach for Sign Language Gesture Recognition With Efficient Hand Gesture Representation. IEEE Access. 2020; 8: 192527–192542. Publisher Full Text
7. Al-Qurishi M, Khalid T, Souissi R: Deep Learning for Sign Language Recognition: Current Techniques, Benchmarks, and Open Issues. IEEE Access. 2021; 9: 126917–126951. Publisher Full Text
8. Buttar AM, Ahmad U, Gumaei AH, et al.: Deep Learning in Sign Language Recognition: A Hybrid Approach for the Recognition of Static and Dynamic Signs. Mathematics. Aug. 2023; 11(17): 3729. Publisher Full Text
9. Moin A, et al.: A wearable biosensing system with in-sensor adaptive machine learning for hand gesture recognition. Nat. Electron. Dec. 2020; 4(1): 54–63. Publisher Full Text
10. Nguyen TK, Trung N, Pham C: Gesture Recognition Using Wearable Sensors With Bi-Long Short-Term Memory Convolutional Neural Networks. IEEE Sensors J. Apr. 2021; 21: 15061–15065. Publisher Full Text
11. Wen F, et al.: Machine Learning Glove Using Self-Powered Conductive Superhydrophobic Triboelectric Textile for Gesture Recognition in VR/AR Applications. Adv. Sci. 2020; 7(14): 2000261. PubMed Abstract | Publisher Full Text | Free Full Text
12. Almabdy S, Elrefaei L: Deep Convolutional Neural Network-Based Approaches for Face Recognition. Appl. Sci. Oct. 2019; 9(20): 4397. Publisher Full Text
13. Hakim NL, Shih TK, Kasthuri Arachchi SP, et al.: Dynamic Hand Gesture Recognition Using 3DCNN and LSTM with FSM Context-Aware Model. Sensors. Dec. 2019; 19(24): 5429. PubMed Abstract | Publisher Full Text | Free Full Text
14. Mohammed AAQ, Lv J, Islam MS: A Deep Learning-Based End-to-End Composite System for Hand Detection and Gesture Recognition. Sensors. Nov. 2019; 19(23): 5282. PubMed Abstract | Publisher Full Text | Free Full Text
15. School of Computer Engineering & Technology, MIT World Peace University, Kothrud, Pune. Country. and Dr. N. Lokhande: Real Time Static Gesture Recognition using Time of Flight Camera. Int. J. Eng. Adv. Technol. Oct. 2019; 9(1): 1426–1432. Publisher Full Text
16. Muralidharan K, Ramesh A, Rithvik G, et al.: 1D Convolution approach to human activity recognition using sensor data and comparison with machine learning algorithms. Int. J. Cogn. Comput. Eng. Jun. 2021; 2: 130–143. Publisher Full Text
17. Paneru S, Jeelani I: Computer vision applications in construction: Current state, opportunities & challenges. Autom. Constr. Dec. 2021; 132: 103940. Publisher Full Text
18. Tan S, Yang J, Chen Y: Enabling Fine-Grained Finger Gesture Recognition on Commodity WiFi Devices. IEEE Trans. Mob. Comput. Aug. 2022; 21(8): 2789–2802. Publisher Full Text
19. León DG, et al.: Video Hand Gestures Recognition Using Depth Camera and Lightweight CNN. IEEE Sensors J. Jul. 2022; 22(14): 14610–14619. Publisher Full Text
20. Al-Hammadi M, Muhammad G, Abdul W, et al.: Hand Gesture Recognition for Sign Language Using 3DCNN. IEEE Access. 2020; 8: 79491–79509. Publisher Full Text
21. Khodabandelou G, Jung P-G, Amirat Y, et al.: Attention-Based Gated Recurrent Unit for Gesture Recognition. IEEE Trans. Autom. Sci. Eng. Apr. 2021; 18(2): 495–507. Publisher Full Text
22. Madni V, Madni M, Vijaya C, et al.: Hand Gesture Recognition Using Semi Vectorial Multilevel Segmentation Method with Improved ReliefF Algorithm. Int. J. Intell. Eng. Syst. Jun. 2021; 14(3): 447–457. Publisher Full Text
23. Chowdary MK, Nguyen TN, Hemanth DJ: Deep learning-based facial emotion recognition for human–computer interaction applications. Neural Comput. Applic. Nov. 2023; 35(32): 23311–23328. Publisher Full Text
24. Liu H, et al.: BEAT: A Large-Scale Semantic and Emotional Multi-Modal Dataset for Conversational Gestures Synthesis. arXiv. Sep. 20, 2022. Accessed: Nov. 02, 2023. Reference Source
25. Liang X, Ge Z, Sun L, et al.: LSTM with Wavelet Transform Based Data Preprocessing for Stock Price Prediction. Math. Probl. Eng. Jul. 2019; 2019: 1–8. Publisher Full Text
26. Harini K, Uma Maheswari S: A novel static and dynamic hand gesture recognition using self-organizing map with deep convolutional neural network. Automatika. Oct. 2023; 64(4): 1128–1140. Publisher Full Text
27. Lei X, Pan H, Huang X: A Dilated CNN Model for Image Classification.2019; 7.
28. Hussien AG, et al.: Crow Search Algorithm: Theory, Recent Advances, and Applications. IEEE Access. 2020; 8: 173548–173565. Publisher Full Text
29. Gadekallu TR, et al.: Hand gesture classification using a novel CNN-crow search algorithm. Complex Intell. Syst. Aug. 2021; 7(4): 1855–1868. Publisher Full Text

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 03 Feb 2025

Author details Author details

BAKKIALAKSHMI V.S
Roles: Conceptualization, Methodology, Resources, Software, Supervision

Pornpimol Chawengsaksopark
Roles: Investigation

Mithileysh Sathiyanarayanan
Roles: Formal Analysis, Investigation, Project Administration

Competing interests

No competing interests were disclosed.

Grant information

The author(s) declared that no grants were involved in supporting this work.

Article Versions (1)

version 1

Published: 03 Feb 2025, 14:149

https://doi.org/10.12688/f1000research.160816.1

© 2025 V.S B et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

SEE MORE DETAILS

CITE

how to cite this article

V.S B, Chawengsaksopark P and Sathiyanarayanan M. Body gesture recognition using crow search algorithm enhanced probabilistic neural network for human- computer interaction [version 1; peer review: 1 approved, 1 approved with reservations]. F1000Research 2025, 14:149 (https://doi.org/10.12688/f1000research.160816.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 1

VERSION 1

PUBLISHED 03 Feb 2025

Views

Reviewer Report 12 Aug 2025

Sachinkumar Veerashetty, Sharnbasva University, Kalaburagi, India

Approved

https://doi.org/10.5256/f1000research.176767.r395806

The article, written by Bakkialakshmi V.S., Pornpimol Chawengsaksopark, and Mithileysh Sathiyanarayanan, introduces an innovative framework for body gesture recognition aimed at enhancing human-computer interaction (HCI). This proposed system integrates several advanced techniques, including: - Wavelet Transform for noise reduction and preprocessing, - Marker-Based Watershed Algorithm for efficient segmentation, - Convolutional Neural Networks (CNNs) for feature extraction, - Crow Search Algorithm (CSA) for hyperparameter optimization, and - Optimized Probabilistic Neural Network (PNN) for classification. The objective of this system is to address significant limitations present in existing gesture recognition methods, such as inadequate accuracy, excessive computational demands, and insufficient adaptability to dynamic real-world environments (e.g., occlusions and varying lighting conditions). Evaluation of the system is conducted on the Body-Expression-Audio-Text (BEAT) dataset, revealing an impressive classification accuracy of 99%. This represents a 2.21% enhancement over traditional models, including Decision Trees (DT), Support Vector Machines (SVM), and Improved Neural Networks (INN). Performance metrics reported by the authors encompass precision (84.3%), recall (55%), F1-score (78%), along with specificity, false positive rate (FPR), false negative rate (FNR), negative predictive value (NPV), false discovery rate (FDR), and Matthews correlation coefficient (MCC). These metrics underscore the system's robustness and its practical applicability across domains such as gaming, healthcare, and virtual reality.

Comments

1. Improve Statistical Rigor
To strengthen the empirical validity of the results:

Clearly define and present the recall formula used in the evaluation.
Perform statistical significance testing to validate the superiority of the proposed method over baselines.
Address class imbalance in the dataset, particularly in light of the reported low recall, and consider using class-weighted metrics or resampling techniques.

2. Refine Visual Content
To improve clarity and comprehension of the results:

Enhance figure labeling with detailed legends, axis titles, and captions that fully explain the visuals.
Include error margins, standard deviations, or confidence intervals to show performance variability across multiple runs or folds.

3. Address Limitations Explicitly
To present a more balanced and transparent discussion:

Quantify the computational cost associated with the CSA optimization step and discuss its scalability.
Explore the model’s performance under real-time constraints, and test its robustness to occlusions and noise to assess practical deployment feasibility.

To Enhance Reproducibility: To enhance reproducibility, the authors are encouraged to share the full implementation code via GitHub or another open-source platform, provided there are no licensing or any other constraints.

Overall Assessment

The article introduces a technically sound and innovative method for body gesture recognition, successfully combining handcrafted and deep learning techniques. The reported accuracy is commendable, and the architectural integration demonstrates strong academic merit. However, addressing the above points through minor revisions will enhance the paper’s reproducibility, statistical soundness, and practical relevance, ensuring it meets the standards of a high-impact publication.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: My research focuses on the integration of Computer Vision, Artificial Intelligence, Medical Image Processing, and Machine Learning, with an emphasis on pattern recognition, feature extraction, and classification techniques to advance medical image analysis and other real-world applications

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

CITE

Report a concern

Respond or Comment

Views

Reviewer Report 13 Mar 2025

Vijayan Vijayarajan, School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India

Approved with Reservations

https://doi.org/10.5256/f1000research.176767.r366089

Peer Review Report by Dr. Vijayarajan V
Summary of the Paper
This research presents a novel body gesture recognition system for human-computer interaction (HCI) using an optimized probabilistic neural network (PNN) enhanced by the Crow Search Algorithm (CSA). The authors integrate several advanced techniques, ... Continue reading

Wavelet Transform-based pre-processing to improve data quality.
Marker-Based Watershed Algorithm for precise segmentation.
Convolutional Neural Networks (CNNs) for feature extraction.
Crow Search Algorithm (CSA) is used to optimise the PNN classifier.

The proposed approach is evaluated against conventional machine learning models such as Decision Trees (DT), Support Vector Machines (SVM), and Improved Neural Networks (INN) . It achieves an impressive 99% accuracy in gesture classification. The study aims to improve gesture recognition systems' accuracy, computational efficiency, and adaptability .

Evaluation of the Paper
1. Is the work clearly and accurately presented and does it cite the current literature?
✅ Yes
The paper is well-structured and presents a transparent methodology. The background section provides a good overview of gesture recognition in HCI, and the related works section covers recent advances in machine learning, deep learning, and optimization techniques. The citations are relevant and up to date (2020–2023).
However, additional references on the effectiveness of CSA in similar domains could strengthen the justification for using this optimization technique.

2. Is the study design appropriate and is the work technically sound?
✅ Yes
The study employs a systematic approach to gesture recognition, incorporating segmentation, feature extraction, and classification in a structured pipeline. The choice of a PNN optimized by CSA is well-motivated, and the comparisons with other classifiers (DT, SVM, INN) are relevant. The methodology aligns well with best practices in machine learning and computer vision.
One suggestion is to include details on the hardware used (e.g., GPU specifications, processing time) better to assess the approach's feasibility for real-world applications.

3. Are sufficient details of methods and analysis provided to allow replication by others?
�� Partly
While the methodology is well-described, specific details are missing that would enable full reproducibility:

Hyperparameter settings for CNN, PNN, and CSA are not explicitly mentioned.
Dataset processing steps (e.g., augmentation, normalization, feature selection) are not provided.
Implementation details (e.g., exact architectures, learning rates, batch sizes) are absent.

Providing these details or a GitHub repository with the code would significantly improve reproducibility.

4. If applicable, is the statistical analysis and its interpretation appropriate?
✅ Yes
The study includes a comprehensive performance evaluation, using:

Accuracy, precision, recall, F1-score, specificity, false positive/negative rates, and Matthews correlation coefficient (MCC).
Comparisons with DT, SVM, and INN, showing a 2.21% improvement in accuracy.

The statistical analysis is appropriate and supports the conclusions drawn. However, a statistical significance test (e.g., t-test, ANOVA) comparing the models’ performance would strengthen the claim that the proposed model significantly outperforms existing methods.

5. Are all the source data underlying the results available to ensure full reproducibility?
�� Partly
The authors state that they used the BEAT dataset, but they do not provide:

A direct link to the dataset.
The preprocessing scripts used for data cleaning and feature extraction.
The partitioning strategy (training vs. testing splits).

Providing a dataset link and preprocessing scripts would enhance transparency and allow others to replicate the results.

6. Are the conclusions drawn adequately supported by the results?
✅ Yes
The paper effectively demonstrates that the CSA-optimized PNN outperforms conventional methods in gesture recognition. The results are well-supported by quantitative analysis and visual comparisons of performance metrics.
However, some limitations should be acknowledged:

The computational complexity of CSA is not discussed. How does it compare to simpler optimizers like Genetic Algorithms (GA) or Particle Swarm Optimization (PSO)?
The paper does not evaluate the model on real-time video streams. Can it work efficiently in dynamic, real-world environments?
Performance on diverse user demographics and varying environmental conditions (e.g., lighting, occlusions) is not explored.

Addressing these points in a future work section would provide a more balanced perspective.

Suggestions for Improvement
To enhance the paper’s quality and impact, the following improvements are recommended:
1. Provide Detailed Implementation Information

Include hyperparameter settings for CNN, PNN, and CSA.
Specify computational requirements (e.g., GPU, processing time).
Share dataset preprocessing steps and scripts for replication.

2. Make the Code Publicly Available

Provide a GitHub link with implementation details.
Include a README file explaining dataset usage and model training.

3. Compare Computational Efficiency

Report the training time for CSA vs. other optimizers.
Analyze scalability and real-time performance.

4. Address Real-World Applicability

Test the model on real-time gesture recognition scenarios.
Evaluate performance under challenging conditions (e.g., occlusions, low light).

Final Verdict
✅ Recommendation: Accept with Minor Revisions
This paper presents an innovative and technically sound approach to gesture recognition for HCI. The combination of CSA and PNN is novel and achieves state-of-the-art performance compared to traditional methods. The recall formula is missing when other performance metric formulae are presented.
However, the paper would benefit from additional implementation details and a discussion on computational feasibility to improve reproducibility and practical applicability.
This research can make a valuable contribution to machine learning-based gesture recognition by addressing these minor revisions.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Health InformaticsAI & ML, Big DataDeep Learning, Internet of ThingsQuantum Computing Data Analytics, Information Technology

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Author Response 19 Mar 2025

BAKKIALAKSHMI V.S, Department of Computing Technologies, SRM Institute of Science and Technology (Deemed to be University), Kattankulathur, India

19 Mar 2025

Author Response

Thanks Professor, I will on the corrections given.
Competing Interests: Not Available
Thanks Professor, I will on the corrections given.
Thanks Professor, I will on the corrections given.
Competing Interests: Not Available Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 19 Mar 2025

BAKKIALAKSHMI V.S, Department of Computing Technologies, SRM Institute of Science and Technology (Deemed to be University), Kattankulathur, India

19 Mar 2025

Author Response

Thanks Professor, I will on the corrections given.
Competing Interests: Not Available
Thanks Professor, I will on the corrections given.
Thanks Professor, I will on the corrections given.
Competing Interests: Not Available Close
Report a concern

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 03 Feb 2025

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2
Version 1 03 Feb 25	read	read

Vijayan Vijayarajan, Vellore Institute of Technology, Vellore, India
Sachinkumar Veerashetty, Sharnbasva University, Kalaburagi, India

Comments on this article

All Comments(0)

Add a comment

Browse by related subjects

Back to all reports

Reviewer Report

1 Views

12 Aug 2025 | for Version 1

Sachinkumar Veerashetty, Sharnbasva University, Kalaburagi, India

1 Views Cite this report Responses(0)

Approved

Clearly define and present the recall formula used in the evaluation.
Perform statistical significance testing to validate the superiority of the proposed method over baselines.
Address class imbalance in the dataset, particularly in light of the reported low recall, and consider using class-weighted metrics or resampling techniques.

2. Refine Visual Content
To improve clarity and comprehension of the results:

Enhance figure labeling with detailed legends, axis titles, and captions that fully explain the visuals.
Include error margins, standard deviations, or confidence intervals to show performance variability across multiple runs or folds.

3. Address Limitations Explicitly
To present a more balanced and transparent discussion:

Quantify the computational cost associated with the CSA optimization step and discuss its scalability.
Explore the model’s performance under real-time constraints, and test its robustness to occlusions and noise to assess practical deployment feasibility.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

My research focuses on the integration of Computer Vision, Artificial Intelligence, Medical Image Processing, and Machine Learning, with an emphasis on pattern recognition, feature extraction, and classification techniques to advance medical image analysis and other real-world applications

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

4 Views

13 Mar 2025 | for Version 1

Vijayan Vijayarajan, School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India

4 Views Cite this report Responses(1)

Approved With Reservations

Wavelet Transform-based pre-processing to improve data quality.
Marker-Based Watershed Algorithm for precise segmentation.
Convolutional Neural Networks (CNNs) for feature extraction.
Crow Search Algorithm (CSA) is used to optimise the PNN classifier.

Hyperparameter settings for CNN, PNN, and CSA are not explicitly mentioned.
Dataset processing steps (e.g., augmentation, normalization, feature selection) are not provided.
Implementation details (e.g., exact architectures, learning rates, batch sizes) are absent.

Accuracy, precision, recall, F1-score, specificity, false positive/negative rates, and Matthews correlation coefficient (MCC).
Comparisons with DT, SVM, and INN, showing a 2.21% improvement in accuracy.

A direct link to the dataset.
The preprocessing scripts used for data cleaning and feature extraction.
The partitioning strategy (training vs. testing splits).

The computational complexity of CSA is not discussed. How does it compare to simpler optimizers like Genetic Algorithms (GA) or Particle Swarm Optimization (PSO)?
The paper does not evaluate the model on real-time video streams. Can it work efficiently in dynamic, real-world environments?
Performance on diverse user demographics and varying environmental conditions (e.g., lighting, occlusions) is not explored.

Include hyperparameter settings for CNN, PNN, and CSA.
Specify computational requirements (e.g., GPU, processing time).
Share dataset preprocessing steps and scripts for replication.

2. Make the Code Publicly Available

Provide a GitHub link with implementation details.
Include a README file explaining dataset usage and model training.

3. Compare Computational Efficiency

Report the training time for CSA vs. other optimizers.
Analyze scalability and real-time performance.

4. Address Real-World Applicability

Test the model on real-time gesture recognition scenarios.
Evaluate performance under challenging conditions (e.g., occlusions, low light).

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Health InformaticsAI & ML, Big DataDeep Learning, Internet of ThingsQuantum Computing Data Analytics, Information Technology

Respond to this report

Responses (1)

Alongside their report, reviewers assign a status to the article:

Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions

[1] 1. Behera A, Matthew P, Keidel A, et al.: Associating Facial Expressions and Upper-Body Gestures with Learning Tasks for Enhancing Intelligent Tutoring Systems. Int. J. Artif. Intell. Educ. Jun. 2020; 30(2): 236–270. Publisher Full Text

[2] 2. Chang V, Eniola RO, Golightly L, et al.: An Exploration into Human–Computer Interaction: Hand Gesture Recognition Management in a Challenging Environment. SN Comput. Sci. Jun. 2023; 4(5): 441. PubMed Abstract | Publisher Full Text | Free Full Text

[3] 3. Dunne R, Morris T, Harper S: A Survey of Ambient Intelligence. ACM Comput. Surv. Apr. 2021; 54: 1–27. Publisher Full Text

[4] 4. Gupta S, Bagga S, Sharma DK: Hand Gesture Recognition for Human Computer Interaction and Its Applications in Virtual Reality. Advanced Computational Intelligence Techniques for Virtual Reality in Healthcare. Gupta D, Hassanien AE, Khanna A, editors. Cham: Springer International Publishing; 2020; pp. 85–105. in Studies in Computational Intelligence. Publisher Full Text

[5] 5. Lv Z, Poiesi F, Dong Q, et al.: Deep Learning for Intelligent Human–Computer Interaction. Appl. Sci. Nov. 2022; 12(22): 11457. Publisher Full Text

[6] 6. Al-Hammadi M, et al.: Deep Learning-Based Approach for Sign Language Gesture Recognition With Efficient Hand Gesture Representation. IEEE Access. 2020; 8: 192527–192542. Publisher Full Text

[7] 7. Al-Qurishi M, Khalid T, Souissi R: Deep Learning for Sign Language Recognition: Current Techniques, Benchmarks, and Open Issues. IEEE Access. 2021; 9: 126917–126951. Publisher Full Text

[8] 8. Buttar AM, Ahmad U, Gumaei AH, et al.: Deep Learning in Sign Language Recognition: A Hybrid Approach for the Recognition of Static and Dynamic Signs. Mathematics. Aug. 2023; 11(17): 3729. Publisher Full Text

[9] 9. Moin A, et al.: A wearable biosensing system with in-sensor adaptive machine learning for hand gesture recognition. Nat. Electron. Dec. 2020; 4(1): 54–63. Publisher Full Text

[10] 10. Nguyen TK, Trung N, Pham C: Gesture Recognition Using Wearable Sensors With Bi-Long Short-Term Memory Convolutional Neural Networks. IEEE Sensors J. Apr. 2021; 21: 15061–15065. Publisher Full Text

[11] 11. Wen F, et al.: Machine Learning Glove Using Self-Powered Conductive Superhydrophobic Triboelectric Textile for Gesture Recognition in VR/AR Applications. Adv. Sci. 2020; 7(14): 2000261. PubMed Abstract | Publisher Full Text | Free Full Text

[12] 12. Almabdy S, Elrefaei L: Deep Convolutional Neural Network-Based Approaches for Face Recognition. Appl. Sci. Oct. 2019; 9(20): 4397. Publisher Full Text

[13] 13. Hakim NL, Shih TK, Kasthuri Arachchi SP, et al.: Dynamic Hand Gesture Recognition Using 3DCNN and LSTM with FSM Context-Aware Model. Sensors. Dec. 2019; 19(24): 5429. PubMed Abstract | Publisher Full Text | Free Full Text

[14] 14. Mohammed AAQ, Lv J, Islam MS: A Deep Learning-Based End-to-End Composite System for Hand Detection and Gesture Recognition. Sensors. Nov. 2019; 19(23): 5282. PubMed Abstract | Publisher Full Text | Free Full Text

[15] 15. School of Computer Engineering & Technology, MIT World Peace University, Kothrud, Pune. Country. and Dr. N. Lokhande: Real Time Static Gesture Recognition using Time of Flight Camera. Int. J. Eng. Adv. Technol. Oct. 2019; 9(1): 1426–1432. Publisher Full Text

[16] 16. Muralidharan K, Ramesh A, Rithvik G, et al.: 1D Convolution approach to human activity recognition using sensor data and comparison with machine learning algorithms. Int. J. Cogn. Comput. Eng. Jun. 2021; 2: 130–143. Publisher Full Text

[17] 17. Paneru S, Jeelani I: Computer vision applications in construction: Current state, opportunities & challenges. Autom. Constr. Dec. 2021; 132: 103940. Publisher Full Text

[18] 18. Tan S, Yang J, Chen Y: Enabling Fine-Grained Finger Gesture Recognition on Commodity WiFi Devices. IEEE Trans. Mob. Comput. Aug. 2022; 21(8): 2789–2802. Publisher Full Text

[19] 19. León DG, et al.: Video Hand Gestures Recognition Using Depth Camera and Lightweight CNN. IEEE Sensors J. Jul. 2022; 22(14): 14610–14619. Publisher Full Text

[20] 20. Al-Hammadi M, Muhammad G, Abdul W, et al.: Hand Gesture Recognition for Sign Language Using 3DCNN. IEEE Access. 2020; 8: 79491–79509. Publisher Full Text

[21] 21. Khodabandelou G, Jung P-G, Amirat Y, et al.: Attention-Based Gated Recurrent Unit for Gesture Recognition. IEEE Trans. Autom. Sci. Eng. Apr. 2021; 18(2): 495–507. Publisher Full Text

[22] 22. Madni V, Madni M, Vijaya C, et al.: Hand Gesture Recognition Using Semi Vectorial Multilevel Segmentation Method with Improved ReliefF Algorithm. Int. J. Intell. Eng. Syst. Jun. 2021; 14(3): 447–457. Publisher Full Text

[23] 23. Chowdary MK, Nguyen TN, Hemanth DJ: Deep learning-based facial emotion recognition for human–computer interaction applications. Neural Comput. Applic. Nov. 2023; 35(32): 23311–23328. Publisher Full Text

[24] 24. Liu H, et al.: BEAT: A Large-Scale Semantic and Emotional Multi-Modal Dataset for Conversational Gestures Synthesis. arXiv. Sep. 20, 2022. Accessed: Nov. 02, 2023. Reference Source

[25] 25. Liang X, Ge Z, Sun L, et al.: LSTM with Wavelet Transform Based Data Preprocessing for Stock Price Prediction. Math. Probl. Eng. Jul. 2019; 2019: 1–8. Publisher Full Text

[26] 26. Harini K, Uma Maheswari S: A novel static and dynamic hand gesture recognition using self-organizing map with deep convolutional neural network. Automatika. Oct. 2023; 64(4): 1128–1140. Publisher Full Text

[27] 27. Lei X, Pan H, Huang X: A Dilated CNN Model for Image Classification.2019; 7.

[28] 28. Hussien AG, et al.: Crow Search Algorithm: Theory, Recent Advances, and Applications. IEEE Access. 2020; 8: 173548–173565. Publisher Full Text

[29] 29. Gadekallu TR, et al.: Hand gesture classification using a novel CNN-crow search algorithm. Complex Intell. Syst. Aug. 2021; 7(4): 1855–1868. Publisher Full Text