Keywords
Classification; Land Cover ; Deep Learning; CNN
This article is included in the Computational Modelling and Numerical Aspects in Engineering collection.
Human activity recognition poses a complex challenge in predicting individuals’ movements from raw sensor data using machine learning models. This paper explores the application of six prominent machine learning techniques – decision tree, random forest, linear regression, Naïve Bayes, k-nearest neighbor, and neural networks – to enhance the accuracy of human activity detection for e-health systems. Despite previous research efforts employing data mining and machine learning, there remains room for improvement in performance. The study focuses on predicting activities such as walking, standing, laying, sitting, walking upstairs, and walking downstairs.
The research employs six machine learning algorithms to recognize human activities, including decision tree, random forest, linear regression, Naïve Bayes, k-nearest neighbor, and neural networks.
Evaluation of the human activity recognition dataset reveals that the random forest classifier, CNN, GRN and neural network yield promising results, achieving high accuracy. However, Naïve Bayes falls short of satisfying outcomes.
The study successfully classifies activities like SITTING, STANDING, LAYING, WALKING, WALKING_DOWNSTAIRS, and WALKING_UPSTAIRS with a remarkable accuracy of 98%. The contribution lies in the thorough exploration of machine learning techniques, with neural networks emerging as the most effective in enhancing human activity recognition. The findings showcase the potential for advanced applications in e-health systems and beyond.
Classification; Land Cover ; Deep Learning; CNN
Firstly, in response to the need for a more detailed description of the dataset, we have thoroughly elucidated its characteristics, drawing inspiration from works such as N. A. Choudhury and B. Soni's study on human activity recognition in uncontrolled environments. Secondly, we have clarified how we incorporated raw sensor data from the UCI-HAR dataset, addressing concerns about the initial ambiguity in our approach. Additionally, we have meticulously enhanced the overall writing quality, grammar, and scientific representation of the paper by employing software tools for verification. Furthermore, our literature review has been bolstered by including recent, high-quality papers from reputable journals and conferences, ensuring a comprehensive survey of the field. Moreover, we have provided a more lucid explanation of our data cleaning and preprocessing methods, alongside the integration of a feature fusion pipeline to enhance activity recognition performance. Additionally, both the abstract and conclusion sections have been rewritten to better highlight the paper's contributions and novelties. Furthermore, we have revised the model explanation to prominently feature hyperparameters and ensured a thorough comparison of achieved results with benchmark models. These revisions significantly strengthen the clarity, rigor, and overall quality of our paper.
The revised paper was rewritten according to the suggestions given by the reviewers.
See the authors' detailed response to the review by Nurul Amin Choudhury
The intricate process of identifying the ongoing physical activities performed by one or more individuals within a given set of circumstances is commonly referred to as human activity recognition. This intricate task involves scrutinizing a series of observations meticulously captured while users engage in various actions within a predetermined environment. The surge in the prevalence of omnipresent, wearable, and persuasive computing technologies has ushered in a myriad of novel applications. Human activity recognition, consequently, has burgeoned into a transformative technology that is reshaping the fabric of individuals’ daily routines.
The concept of human activity recognition has garnered substantial attention in recent times, giving rise to a plethora of applications that extend beyond the immediate purview. These applications span a diverse spectrum, including assistive technology, health and fitness tracking, elder care, and automated surveillance. Notably, the swift advancement in activity recognition research has propelled its application in unforeseen domains, such as the monitoring of elderly individuals or incarcerated individuals, showcasing the versatile utility of this technology.
Integral to the evolution of smart devices is the pivotal role played by mobile phone sensors, enhancing the overall utility and environmental awareness of smartphones. These sensors, integrated seamlessly into most smartphones, contribute significantly to rendering these devices cognizant of their surroundings, encompassing factors like spatial orientation, the presence of lighting, and the existence of debris, among others. The assimilation of a diverse array of information pertaining to an individual’s daily life and activities becomes feasible through the utilization of these sensors.
Prominent among these sensor devices are the accelerometer and gyroscope sensors, which, when judiciously employed, enable the capture of precise motion data. Leveraging machine learning algorithms, one can extrapolate and predict human actions based on the wealth of sensor data constantly being monitored. This continuous surveillance capability of human activity recognition is adept at discerning a myriad of actions, ranging from rudimentary to intricate, thereby showcasing its robustness and adaptability in comprehending the nuances of human behavior.
This research significantly advances the field of human activity recognition by providing a comprehensive exploration of machine learning techniques applied to a dataset derived from triaxial accelerometer and gyroscope sensors. The study focuses on six distinct classes of activities, namely Sitting, Standing, Laying, Walking, Walking_Downstairs, and Walking_Upstairs. The primary contributions can be summarized as follows:
Methodological Diversity: The study employs six prominent machine learning algorithms—naïve Bayes, decision tree, random forest, K-nearest neighbours, logistic regression, and neural network. This diverse approach allows for a thorough investigation into the strengths and limitations of each algorithm in predicting human activities.
Performance Analysis: Through rigorous experimentation, the research provides a detailed performance analysis of each machine learning model. Notably, the neural network emerges as the standout performer, achieving an impressive accuracy rate of 98.93%, surpassing other models such as naïve Bayes, decision tree, random forest, K-nearest neighbours, and logistic regression.
High Accuracy Achievements: The study successfully classifies activities like Sitting, Standing, Laying, Walking, Walking_Downstairs, and Walking_Upstairs with a remarkable accuracy of 98%. This achievement underscores the potential for accurate human activity recognition, a crucial aspect in applications such as e-health systems.
Insights into Algorithmic Strengths and Weaknesses: By presenting varied accuracies achieved by each model, the research sheds light on the strengths and weaknesses of different machine learning algorithms. This information is valuable for researchers and practitioners seeking to optimize activity recognition models for specific contexts.
Prominence of Neural Networks: The findings highlight the efficacy of neural networks in enhancing human activity recognition, positioning them as a promising approach for future endeavors in this domain. This insight can guide future research and development efforts towards leveraging neural networks for improved accuracy and reliability in activity prediction.
In conclusion, this research significantly contributes to the field of human activity recognition by offering a thorough examination of diverse machine learning methodologies. The emphasis on neural networks and their exceptional performance opens new avenues for advanced applications in e-health systems and beyond. The insights gained from this study not only deepen our understanding of machine learning applications in activity recognition but also pave the way for more effective methodologies in predicting and classifying human activities.
This section encompasses selected literature papers pertaining to human activity recognition, with specific details outlined in Table 1.
First name | Last name | Grade |
---|---|---|
Authors | Description | Future perception |
Schüldt et al.1 | Support vector machine (SVM) is used to recognize activities. | Neural network techniques must be utilized to increase the recognition rate. |
Laptev et al.2 | For activity recognition, an SVM with a multichannel Gaussian kernel is utilised. | We need to enhance the recognition rate here as well. |
Yamato et al.3 | With the feature vector, HMM is employed for activity recognition. | An increase in the rate of identification is required. |
Oliver et al.4 | CHMM is a tool for recognising activities. | We need to enhance the recognition rate here as well. |
Natarajan et al.5 | Human action is detected in this article utilising Conditional Random Fields. | The activity recognition needs to be improved. |
Ning et al.6 | Human activity is detected using a typical way in this article. | We need to enhance activity identification here as well. |
Vali M. et al.7 | The authors combined HMM and CRF in their study. | It is necessary to boost the rate of recognition. |
Madharshahian R. et al.8 | The authors utilized multinomial LR in this study. | We need to enhance activity identification here as well. |
Kiros R. et al.9 | The authors employed consecutive RNNs in their study. | To improve the rate of identification, more layers must be added. |
Grushin A. et al.10 | The authors utilized consecutive RNNs in this study as well. | More layers are required to improve the identification rate in this case as well. |
Veeriah et al.11 | The authors employed differential RNN in this study. | More layers are required to boost the rate of activity detection in this case as well. |
Du Y., Wang W. and Wang L.12 | The authors employed hierarchical RNNs in their study. | To improve the rate of activity recognition, DNN, i.e. LSTM, is necessary. |
Anna Ferrari et al.13 | Author created automated methods to identify daily activities (ADLs) using sensor signals and deep learning techniques. | Additional layers are needed to enhance activity detection in the deep learning model. |
Li et al.14 | Authors proposed HAR_WCNN, a daily behavior recognition algorithm utilizing wide time-domain convolutional neural network and multienvironment sensor data. | It is imperative for us to enhance the process of identifying activities in this context as well. |
Sarkar A.15 | The authors employed a Convolutional Neural Network (CNN) augmented with Spatial Attention to accurately identify human activities. | We must enhance the efficiency of activity recognition in this context as well. |
Choudhury N.16 | The authors employed an adaptive batch size-based CNN-LSTM model to discern various human activities within an uncontrolled environment. | Increasing the number of layers in the network could potentially boost the recognition rate here too. |
Gholamiangonabadi D. and Grolinger K.17 | The authors employed a personalized Human Activity Recognition (HAR) model utilizing CNN and signal decomposition. | For better activity recognition rates, DNN, specifically LSTM, is essential. |
Dua N.18 | Authors propose "ICGNet" for HAR, a hybrid of Convolutional Neural Network (CNN) and Gated Recurrent Unit (GRU). | Adding more layers to the network may enhance recognition rates. |
Wu, Hao19 | Authors proposed the Differential Spatio-Temporal LSTM (DST-LSTM) method for Human Activity Recognition (HAR). | Expanding the network's layers may enhance recognition rates in this case as well. |
Liciotti, Daniele20 | The authors utilized Long Short-Term Memory (LSTM) to model spatio-temporal sequences obtained from smart home sensors and subsequently classify human activities. | Improving the recognition rate is necessary in this context too. |
Priyadarshini I.21 | The authors employed machine learning techniques, including Random Forest (RF), Decision Trees (DT), K-Nearest Neighbors (k-NN), and deep learning algorithms such as Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), and Gated Recurrent Units (GRU) for Human Activity Recognition (HAR). | Increasing the layers of the network could improve recognition rates in this instance too. |
The human activity recognition methodology involves four key phases, namely input, data cleaning, data splitting, and classification and validation. Figure 1 illustrates the structure of the human activity recognition process.
Input: In our research endeavor, we drew upon the extensive resources offered by the kaggle, specifically selecting a dataset characterized by its wealth of information pertaining to triaxial acceleration derived from an accelerometer and triaxial angular velocity obtained from a gyroscope. This dataset, comprising a total of 10,299 entries, is distinguished by each entry being defined by a set of 561 features. This curated dataset plays a pivotal role in the domain of machine learning, providing a comprehensive array of features essential for thorough analysis and effective model training.
The foundation of our investigation lies in the Human Activity Recognition database, meticulously constructed from recordings capturing the daily activities of 30 participants. These individuals engaged in various activities of daily living (ADL), all while carrying a waist-mounted smartphone equipped with embedded inertial sensors. The primary objective was to classify these activities into one of six predefined categories: WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, and LAYING.
The experimental phase involved a cohort of 30 volunteers aged between 19 and 48 years. Each participant undertook the performance of six distinct activities while wearing a Samsung Galaxy S II smartphone on their waist. The smartphone, equipped with an accelerometer and gyroscope, captured 3-axial linear acceleration and 3-axial angular velocity at a consistent rate of 50Hz. To ensure the accuracy of the recorded data, the experiments were video-recorded, enabling manual labeling of the dataset. Subsequently, the dataset was randomly partitioned into two sets: 70% of the volunteers were allocated to generate the training data, and the remaining 30% constituted the test data.
The raw sensor signals, originating from both the accelerometer and gyroscope, underwent a pre-processing phase that included noise filtering. The processed signals were then sampled in fixed-width sliding windows, each lasting 2.56 seconds with a 50% overlap (128 readings per window). Further refinement involved the separation of the sensor acceleration signal, which encompasses both gravitational and body motion components. This separation was achieved using a Butterworth low-pass filter, isolating body acceleration and gravity. The gravitational force, primarily composed of low-frequency components, was effectively filtered using a cutoff frequency of 0.3 Hz.
Each window in the dataset resulted in a feature vector, totaling 561 dimensions, comprising variables derived from both the time and frequency domains. The attributes associated with each record in the dataset include triaxial acceleration from the accelerometer (representing total acceleration) and estimated body acceleration, triaxial angular velocity from the gyroscope, the 561-feature vector, the corresponding activity label, and an identifier specifying the subject who conducted the experiment.
Data cleaning: Data cleaning is a crucial process in refining a dataset, involving the removal of redundant columns and the correction or replacement of inaccurate records. As part of this process, an encoding technique is often employed to transform categorical data into numerical format; however, in cases where all feature values are already numerical, such encoding may be unnecessary. The primary objective of data cleaning is to mitigate the risk of errors recurring in the dataset.
The initial step in data cleaning entails the elimination of unwanted observations, which encompass both irrelevant and duplicate data. Duplicate entries commonly emerge when data is amalgamated from various sources, while irrelevant data pertains to information that does not contribute to addressing the underlying problem. This meticulous process is imperative for enhancing the dataset’s integrity.
Following the removal of undesirable observations, the subsequent step involves addressing missing data, which can be approached through two methods: either removing the entire record containing missing values or filling in the gaps based on other observations. However, outright removal or manual filling may not be optimal, as it could result in information loss, such as replacing a range with an arbitrary value. To address missing numerical data effectively, a dual-pronged approach is employed:
▪ Flagging the observation with an indicator value denoting missingness.
▪ Temporarily filling the observation with a placeholder value, such as 0, to ensure that no gaps exist.
This technique of flagging and filling not only aids in preserving the overall dataset but also allows algorithms to estimate missing values more optimally, mitigating potential information loss. In essence, data cleaning is a meticulous process that goes beyond mere elimination, involving strategic measures to enhance the accuracy and completeness of the dataset.
Feature fusion pipeline: In the realm of human activity recognition using sensor data, the feature fusion pipeline plays a pivotal role in amalgamating diverse sets of features extracted from various sensors to provide a comprehensive and nuanced understanding of human movements. The feature fusion pipeline is essentially a systematic process that integrates information from different sensors, such as accelerometers, gyroscopes, and magnetometers, to create a unified representation of human activities. This integration of features not only enhances the accuracy of activity recognition but also ensures a more robust and holistic analysis of complex human behaviors. For instance, by fusing data from accelerometers measuring linear motion with gyroscopes capturing rotational movements, the feature fusion pipeline enables a more nuanced differentiation between activities like walking, running, or even specific gestures, contributing to the refinement of human activity recognition systems.
The feature fusion pipeline involves multiple stages, beginning with the extraction of relevant features from individual sensors. These features may include acceleration patterns, orientation changes, or temporal characteristics. Subsequently, a fusion mechanism is applied to combine these diverse features into a cohesive representation, often using techniques such as concatenation, averaging, or weighted summation. The resultant fused feature set serves as input to machine learning models, allowing for more sophisticated and context-aware human activity recognition. By seamlessly integrating information from various sensors, the feature fusion pipeline significantly enhances the capability of systems to discern intricate human activities, making it a cornerstone in advancing the accuracy and applicability of human activity recognition using sensor data.
Data splitting: During the pivotal phase of data splitting, the dataset undergoes a discerning segmentation, effectively yielding two distinctive components with unique roles and significance:
Training Data: At the core of this phase lies the training data, a crucial subset meticulously employed to educate and refine the model. Specifically, we judiciously allocated 60% of the entire dataset to serve as the bedrock for training our model. This subset plays a pivotal role in shaping the model’s understanding, fostering its ability to generalize patterns, and ultimately enhancing its predictive capabilities.
Testing Data: The counterpart to the training data, the testing data serves a distinct purpose in evaluating the model’s proficiency and generalization beyond the training set. Carefully designated as the dataset reserved for assessing the model’s performance, we conscientiously earmarked 20% of the dataset for this purpose. This subset represents a critical measure of the model’s ability to extrapolate its learned knowledge to unseen instances, thereby validating its robustness and reliability.
Classification and validation: The intricate process of classification and validation unfolded in two distinct phases, namely:
Classification: Within the realm of data analysis, classification stands as the pivotal process dedicated to predicting the target class of novel observations, a task fundamentally grounded in the patterns discerned from the training data. The dataset meticulously chosen for our analysis already encompasses the essential target class information. To harness this data effectively, we executed the model by employing a suite of sophisticated classification algorithms. The algorithms we incorporated in this analytical journey were carefully selected and included the following advanced techniques: [list of algorithms]. Each of these algorithms brought its unique strengths and nuances to the forefront, enhancing our model’s predictive capabilities.
Decision Tree: The decision tree classification utilizes a tree representation to address a problem, with each internal node representing features, branches denoting feature outcomes, and leaf nodes indicating target class labels. The algorithm for constructing the decision tree involves several stages with a critical consideration of hyperparameters.
Hyperparameter: Attribute Selection Measure
▪ Identify the best attribute to serve as the root of the tree.
▪ Subdivide the training set into sections based on the selected attribute.
▪ Repeat the above steps until all branches of the tree have leaf nodes.
Hyperparameter: Attribute Selection Measures
Information Gain is mathematically defined as Equation 1:
where Pi is the probability of class i
Information Gain (Equation 2) is then calculated as:
Gini Index, another measure, quantifies the likelihood of a randomly chosen element being incorrectly classified (Equation 3):
Considering the unbalanced nature of the dataset with multiple classes, the decision tree primarily relies on Information Gain to select the most appropriate attribute.
Logistic regression: Logistic regression is a statistical method used for classification which falls under supervised learning. It is used for predictions based on probability concepts. It analyses data set and take discrete independent input values and give a single output.
Logistic function: Logistic function is also known as the sigmoid function, which takes a real-value number as input and maps it into a value between 0 and 1, but not exactly at those limits. It maps predictions to probabilities.
By using the sigmoid function, if we give an input value on x, then it predicts the target value on the y axis.
Here k = 1, x0 = 0, L = 1
Naïve Bayes classifier: Naïve is a probabilistic classification algorithm which is based on Bayes theorem. It assumes that the occurrence of an instance is independent of other instances.
The conditional probability of an object with feature vector x1,x2, … … ., xn belongs to a particular class Ci, and it is calculated with Equation 7.
Gaussian naïve Bayes: As values of each feature of our data set are continuous, we use the Gaussian distribution, which is also called the normal distribution.
Where μ is the mean of all the values a feature and σ is the standard deviation.
K-nearest neighbours is one of the essential classification algorithms. It falls under the category of supervised learning. It assumes that similar data are close to each other. Here the data is classified into groups based on an attribute. It is used in applications like pattern recognition, intrusion detection and data mining.
Let N be the number of training samples of data and U is the unknown point.
The data set is stored in an array each element represented as a tuple (x, y).
For i = 0 to n-1
Begin
Euclidian distance from each point to U is calculated.
S is the set of m smallest distances (all the points related to distances must be classified)
end
Return (Mode (m labels))
Random forest: Random forests are ensembles, which means combination of two or more models to get better results. Random forests create a forest of decision trees on data samples. Each decision tree gives a target class as output and the final target class is identified by performing some measures on outputs from each decision tree of corresponding input record.
Artificial neural network (ANN): Artificial Neural Network (ANN), an intricate computational model, is comprised of interconnected nodes that collaboratively process and store property values. These nodes serve as pivotal entities where input values, derived from the distinctive characteristics of training examples, are meticulously aggregated. The interconnection between nodes involves weighted values (W) and biases (), which are then systematically forwarded to the subsequent layer of nodes. A crucial aspect of this progression includes subjecting the weighted total to a non-linear activation function (, with the resultant value efficiently transmitted to the subsequent layer. This iterative process persists until the network culminates in generating the final output, as illustrated in the intricate structure depicted in Figure 2.
Figure 2 exemplifies the complex architecture of a neural network model, visually portraying the interplay of nodes, weighted values, and biases that characterize its underlying structure.
In the quest to enhance the accuracy of output predictions, the manipulation of weights (W) and biases (b) within the ANN plays a central role. These parameters are optimized through the employment of various optimization algorithms, each tailored to address specific challenges in training neural networks. These algorithms are characterized by hyperparameters, including: Gradient Descent (GD), Stochastic Gradient Descent (SGD), Mini-Batch Gradient Descent.
Gradient Descent (GD): Update Rule
θt represents the parameters (weights and biases) at iteration t
∝ is the learning rate, controlling the size of the step taken during each iteration.
Stochastic Gradient Descent (SGD): Update Rule
Similar to GD,but instead of using the gradient of the entire dataset, it uses the gradient of the cost function for a single data point (xi,yi) at each iteration.
Mini-Batch Gradient Descent: Update Rule
An intermediate approach between GD and SGD,where updates are computed using a small andom subset (mini-batch) of the entire dataset.
The neural network architecture is defined by the number of layers (L) and the number of neurons in each layer (Nl). The weights (W) and biases (b) are initialized and updated through the training process, adjusting their values to minimize the cost function (J). The activation function (σ) introduces non-linearity to the model, enabling it to learn complex relationships within the data.
Number of Layers and Neurons: The neural network architecture is defined by the number of layers (L) and the number of neurons in each layer (Nl).
Initialization of Weights and Biases: The weights (W) and biases (b) are initialized before the training process begins. This initialization is typically done randomly or using specific strategies to break symmetry and aid convergence.
Forward Propagation: The forward propagation computes the predicted output (al) for each layer using the following equations:
Cost Function: The cost function (J) measures the difference between the predicted output and the actual output. For regression problems, Mean Squared Error (MSE) is commonly used:
Backward Propagation: The backward propagation computes the gradients of the cost function with respect to the weights and biases, enabling updates in the direction that minimizes the cost. The gradients are calculated using the chain rule:
The iterative nature of the learning process in ANN, combined with the dynamic adjustments introduced by these parameters and hyperparameters, showcases the adaptability and efficiency in navigating the complex parameter space to converge towards optimal solutions.
Validation: Validation is a crucial step in assessing the performance of a machine learning model, where evaluation metrics are employed to gauge the model’s compatibility with the underlying dataset. The process involves employing various metrics designed to provide insights into different aspects of the model’s performance. Among these metrics, commonly utilized for assessing classification models, are accuracy, precision, recall, and the F1 score.
Precision: Precision is also known as positive predicted value which is a measure of accuracy. It is mathematically defined as true positives divided by the sum of true positives and false positives.
Recall: Recall is also referred toas sensitivity, which is a measure of accuracy. It is mathematically defined as true positives divided by sum of true positives and false negatives.
F1 score: The F1 score is calculated with equation 11
Accuracy: Accuracy defines how well the model predicts the class. It is mathematically defined as the number of truly predicted classes divided by total number of classes.
The Results section comprises two parts:
Exploratory data analysis means gaining insights from the data before applying any model to it. It can be done using mathematical functions and visualizations. The total number of rows and columns in a dataset are identified by using the ‘shape’ function. It is shown in Figure 3.
Figure 3 reveals that the dataset comprises 10,299 instances and 562 attributes. Among these attributes, 561 are independent, while one serves as the dependent variable. Our exploration of the data involved the utilization of various graphs, including a bar graph, box plot, and heatmap. Specifically, the bar graph facilitated the comparison of different target classes. Figure 4 displays the bar graph, illustrating the distribution of classifications among individuals. The observations extracted from Figure 4 are as follows: LAYING—1944 instances, STANDING—1906 instances, SITTING—1777 instances, WALKING—1722 instances, WALKING_UPSTAIRS—1544 instances, and WALKING_DOWNSTAIRS—1406 instances.
Box plot: In Figure 5, a box plot illustrates the data distribution for the ‘Activity’ target class and the ‘tBodyAcc-max()-X’ feature. This visualization incorporates five key measures: minimum, first quartile (Q1), median, third quartile (Q3), and maximum.
Analyzing the information presented in Figure 5, the following noteworthy observations were discerned:
▪ When the value of tBodyAcc-max()-X falls below -0.75, the corresponding activities are predominantly identified as Standing, Sitting, or Laying.
▪ In instances where tBodyAcc-max()-X exceeds -0.50, the categorization shifts towards activities characterized as Walking, Walking_Downstairs, or Walking_Upstairs.
▪ Notably, when tBodyAcc-max()-X surpasses 0.00, the specific activity discerned is attributed to Walking_Downstairs.
The confusion matrix for the naïve Bayes classifier is shown in Figure 6. From Figure 6, the following observations were made:
▪ The actual number of Laying was 389, but the model correctly predicted 361 as Laying and incorrectly predicted 22 as Sitting and nine as Walking_Upstairs.
▪ The actual number of Sitting was 372, but the model correctly predicted 336 as Sitting and incorrectly predicted two as Laying, 31 as Standing, and three as Walking_Upstairs.
▪ The actual number of Standing was 375, but the model correctly predicted 157 as Standing and incorrectly predicted 214 as Sitting and four as Walking_Upstairs.
▪ The actual number of Walking was 345, but the model correctly predicted 256 as Walking and incorrectly predicted 31 as Walking_Downstairs and 58 as Walking_Upstairs.
▪ The actual number of Walking_Downstairs was 282, but the model correctly predicted 199 of them as Walking_Downstairs and incorrectly predicted 25 as Walking and 58 as Walking_Upstairs.
▪ The actual number of Walking_Upstairs was 297, but the model correctly predicted 275 of them as Walking_Upstairs and incorrectly predicted 17 as Walking_Downstairs and five as Walking.
The precision, recall, and F1 score values of the naïve Bayes classifier are depicted in the bar plot shown in Figure 7. From Figure 7, we can conclude that the class “Laying” was most accurately predicted among all other classes. Among all the classes, “Standing” had a low number of correctly predicted instances.
The confusion matrix for the decision tree classifier is depicted in Figure 8. Based on Figure 8, the following observations were made:
▪ The actual number of instances classified as “Laying” was 389, and the model correctly predicted all 389 as “Laying.”
▪ The actual number of instances classified as “Sitting” was 372. The model correctly predicted 346 of them as “Sitting” and incorrectly predicted 26 as “Standing.”
▪ The actual number of instances classified as “Standing” was 375. The model correctly predicted 347 of them as “Standing” and incorrectly predicted 28 as “Sitting.”
▪ The actual number of instances classified as “Walking” was 345. The model correctly predicted 324 as “Walking” and incorrectly predicted six as “Walking_Downstairs” and 15 as “Walking_Upstairs.”
▪ The actual number of instances classified as “Walking_Downstairs” was 282. The model correctly predicted 257 of them as “Walking_Downstairs” and incorrectly predicted eight as “Walking” and 15 as “Walking_Upstairs.”
▪ The actual number of instances classified as “Walking_Upstairs” was 297. The model correctly predicted 266 as “Walking_Upstairs” and incorrectly predicted 13 as “Walking” and 18 as “Walking_Downstairs.”
Precision, recall and F1 score values of decision tree classifier are depicted in the bar plot shown in Figure 9. From Figure 9 we can see that the class Laying was predicted correctly without any incorrectly classified classes.
The confusion matrix for the K-nearest neighbors classifier is shown in Figure 10. From Figure 10, the following observations were made:
▪ The actual number of “Laying” was 389, and the model correctly predicted all 389 as “Laying.”
▪ The actual number of “Sitting” was 372, but the model correctly predicted 332 of them as “Sitting” and incorrectly predicted 239 as “Standing” and one as “Sitting.”
▪ The actual number of “Standing” was 375, but the model correctly predicted 354 as “Standing” and incorrectly predicted 21 as “Sitting.”
▪ The actual number of “Walking” was 345, but the model correctly predicted 344 of them as “Walking” and incorrectly predicted one as “Walking_Upstairs.”
▪ The actual number of “Walking_Downstairs” was 282, but the model correctly predicted 280 of them as “Walking_Downstairs” and incorrectly predicted two as “Walking.”
▪ The actual number of “Walking_Upstairs” was 297, and the model correctly predicted all 297 as “Walking_Upstairs.”
Precision, recall and F1 score values of the K-nearest neighbour classifier are depicted in the bar plot shown in Figure 11. From Figure 11, we can see that the activity Laying was predicted correctly without any incorrectly classified classes; hence, all metrics had a 100% score.
The confusion matrix for the random forest classifier is depicted in Figure 12. From Figure 12, the following observations were made:
▪ The actual number of instances labeled as “Laying” was 389, and the model correctly predicted all 389 as “Laying.”
▪ The actual number of instances labeled as “Sitting” was 372. However, the model correctly predicted 355 as “Sitting” and incorrectly predicted 17 as “Standing.”
▪ The actual number of instances labeled as “Standing” was 375. The model correctly predicted 355 of them as “Standing” and incorrectly predicted 20 as “Sitting.”
▪ The actual number of instances labeled as “Walking” was 345. The model correctly predicted 336 as “Walking” and incorrectly predicted two as “Walking_Downstairs” and seven as “Walking_Upstairs.”
▪ The actual number of instances labeled as “Walking_Downstairs” was 282. The model correctly predicted 265 of them as “Walking_Downstairs” and incorrectly predicted 12 as “Walking” and five as “Walking_Upstairs.”
▪ The actual number of instances labeled as “Walking_Upstairs” was 297. The model correctly predicted 286 as “Walking_Upstairs” and incorrectly predicted seven as “Walking” and four as “Walking_Downstairs.”
Precision, recall and F1 score values of random forest are depicted as a bar plot shown in Figure 13. From Figure 13 we can see that all instances having Laying as target were correctly predicted.
The confusion matrix for logistic regression is shown in Figure 14. From Figure 14, the following observations were made:
▪ The actual number of Laying was 389, and the model correctly predicted all 389 instances as Laying.
▪ The actual number of Sitting was 372, but the model correctly predicted 351 of them as Sitting and incorrectly predicted 21 as Standing.
▪ The actual number of Standing was 375, and the model correctly predicted 360 of them as Standing while incorrectly predicting 15 as Sitting.
▪ The actual number of Walking was 345, and the model correctly predicted 344 of them as Walking while incorrectly predicting one as Walking_Upstairs.
▪ The actual number of Walking_Downstairs was 282, and the model correctly predicted 281 as Walking_Downstairs while incorrectly predicting one as Walking_Upstairs.
▪ The actual number of Walking_Upstairs was 297, and the model correctly predicted 295 of them as Walking_Upstairs while incorrectly predicting one as Walking and one as Walking_Downstairs.
The graphical representation in Figure 15 illustrates the precision, recall, and F1 score values associated with logistic regression. Upon careful examination of the figure, it becomes evident that the classification performance of the model is particularly noteworthy for instances corresponding to activities such as Laying, Walking, and Walking Downstairs. The visual depiction underscores the model’s proficiency in accurately identifying and classifying these specific activities, as evidenced by the clarity and distinctiveness of the bar plot for these instances.
Neural network: The graphical representation of the relationship between epoch and training loss for the human activity dataset is visually depicted in Figure 16. This graph serves as a valuable visual insight into the model’s learning process over successive epochs. Upon close examination of Figure 16, it is evident that the testing loss exhibits a notable decrease, indicating an improvement in the model’s predictive performance. This reduction in testing loss aligns with an observed increase in accuracy, suggesting that the model has become more adept at making accurate predictions on the human activity dataset.
Comparison of different algorithms: The exploration of various algorithms employed in training the model has been represented by treating the algorithms as a variable denoted by x, while their corresponding accuracy serves as the dependent variable y. The graphical representation of this relationship is depicted in Figure 17 through a bar graph, providing a visual comparison of the algorithmic performance. Notably, the proposed neural network model is subjected to a comparative analysis against benchmark models identified as CNN (Convolutional Neural Network) and GRN (Generic Reference Network). The respective accuracy results for all the models are summarized in Table 2.
Upon careful examination of Figure 17 and the accuracy comparison table, it becomes evident that the neural network model outperformed all other algorithms in terms of predictive accuracy. Specifically, the neural network achieved an impressive accuracy rate of 98.93%. This high level of accuracy signifies the model’s efficacy in accurately predicting outcomes, showcasing its superiority over benchmark models. The detailed exploration and visualization of algorithmic performance provide valuable insights into the strengths of the proposed neural network model in comparison to established benchmarks, establishing it as a robust and reliable choice for the given task.
In conclusion, this research addresses the vital task of human activity recognition, utilizing a dataset derived from triaxial accelerometer and gyroscope sensors. With 561 features and 10,299 records spanning six distinct classes, namely Sitting, Standing, Laying, Walking, Walking_Downstairs, and Walking_Upstairs, our study employed six machine learning algorithms—naïve Bayes, decision tree, random forest, K-nearest neighbours, logistic regression, and neural network.
Notably, our exploration contributes valuable insights to the field by presenting a comprehensive analysis of the performance of these algorithms in predicting human activities. The experimental results reveal the varied accuracies achieved by each model, with the naïve Bayes classifier demonstrating 76.89%, the decision tree classifier achieving 93.39%, the random forest classifier attaining 96.89%, K-nearest neighbours reaching 96.40%, and logistic regression classifier yielding 98.05%. However, the standout performer among these models is the neural network, boasting an impressive accuracy of 98.93%.
This study’s significant contributions lie in its thorough investigation of diverse machine learning methodologies for human activity recognition, shedding light on the strengths and limitations of each algorithm. Furthermore, our work underscores the efficacy of neural networks in achieving a remarkable accuracy rate, positioning it as a promising approach for future endeavors in human activity recognition. These findings not only enhance our understanding of machine learning applications in this domain but also pave the way for more nuanced and effective methodologies in predicting and classifying human activities.
Kaggle: Human Activity Recognition with Smartphones, https://www.kaggle.com/datasets/uciml/human-activity-recognition-with-smartphones
Data are available under the terms of the Creative Commons Zero “No rights reserved” data waiver (CC0 1.0 Public domain dedication).
Analysis code
Source code available from: https://github.com/someshchinta/Human_Actiity_recognition
Archived source code at time of publication: https://doi.org/10.5281/zenodo.7108706
License: Apache-2.0
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the rationale for developing the new method (or application) clearly explained?
Partly
Is the description of the method technically sound?
No
Are sufficient details provided to allow replication of the method development and its use by others?
Yes
If any results are presented, are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions about the method and its performance adequately supported by the findings presented in the article?
Yes
References
1. Ronald M, Poulose A, Han D: iSPLInception: An Inception-ResNet Deep Learning Architecture for Human Activity Recognition. IEEE Access. 2021; 9: 68985-69001 Publisher Full TextCompeting Interests: No competing interests were disclosed.
Is the rationale for developing the new method (or application) clearly explained?
Partly
Is the description of the method technically sound?
Partly
Are sufficient details provided to allow replication of the method development and its use by others?
Partly
If any results are presented, are all the source data underlying the results available to ensure full reproducibility?
Partly
Are the conclusions about the method and its performance adequately supported by the findings presented in the article?
Partly
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Human Activity Recognition, wearable devices, digital health
Is the rationale for developing the new method (or application) clearly explained?
No
Is the description of the method technically sound?
Partly
Are sufficient details provided to allow replication of the method development and its use by others?
Partly
If any results are presented, are all the source data underlying the results available to ensure full reproducibility?
Partly
Are the conclusions about the method and its performance adequately supported by the findings presented in the article?
Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Computer vision and human action recognition in sports
Is the rationale for developing the new method (or application) clearly explained?
No
Is the description of the method technically sound?
No
Are sufficient details provided to allow replication of the method development and its use by others?
Partly
If any results are presented, are all the source data underlying the results available to ensure full reproducibility?
Partly
Are the conclusions about the method and its performance adequately supported by the findings presented in the article?
No
References
1. Choudhury N, Soni B: An Adaptive Batch Size-Based-CNN-LSTM Framework for Human Activity Recognition in Uncontrolled Environment. IEEE Transactions on Industrial Informatics. 2023; 19 (10): 10379-10387 Publisher Full TextCompeting Interests: No competing interests were disclosed.
Reviewer Expertise: Human Activity Recognition, AI-ML, Feature Engineering, Domain Adaptation, eHealth Applications, Neural Networks.
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | ||||
---|---|---|---|---|
1 | 2 | 3 | 4 | |
Version 4 (revision) 05 Nov 24 |
read | read | ||
Version 3 (revision) 30 Sep 24 |
||||
Version 2 (revision) 06 Feb 24 |
read | read | read | |
Version 1 06 Mar 23 |
read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)