ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article

Anomaly Detection in building energy system using machine learning technique

[version 1; peer review: awaiting peer review]
PUBLISHED 12 Jun 2026
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS AWAITING PEER REVIEW

This article is included in the Artificial Intelligence and Machine Learning gateway.

Abstract

Background

Energy consumption in buildings has steadily increased over time due to growing urbanization and the need for living more comfortably and working environments. The growing inefficiencies in energy consumption within buildings are caused by reliance on static monitoring systems, which often failed to detect subtle or evolving anomalies. The inability to adapt to dynamic consumption patterns, resulting in waste, higher costs, and reduced sustainability objective was to design a model capable of learning from data, identifying abnormal energy usage, and dynamically adjusting to new conditions.

Methods

Support Vector Machine (SVM) was applied using electricity consumption data sourced from smart meters. The dataset underwent cleaning, feature extraction, and scaling before model training and testing. The developed model achieved reliable anomaly detection, identifying irregular energy consumption with improved accuracy compared to traditional rule-based methods.

Results

The performance metrics of the model was Accuracy–0.97, F1 Score-0.73, Precision-0.79, Recall-0.68. The results showed reduced false alarms and enhanced adaptability to changes in energy use patterns.

Conclusion

The study contributed to energy efficiency, cost reduction, and sustainable building management, while also providing a replicable framework for intelligent energy monitoring. The benefits extended to facility managers and researchers by offering an adaptable system for smarter energy decision-making.

Keywords

Anomaly detection, Building energy, Data, Energy consumption, Support vector machine

1. Introduction

The global transition toward sustainable development has intensified the need for intelligent energy management within the built environment.1,2 Buildings account for a substantial proportion of total energy consumption and carbon emissions worldwide, making them critical targets for optimization and efficiency improvement.3 With the rapid advancement of smart technologies, modern buildings are increasingly equipped with interconnected sensors, smart meters, and automated control systems that continuously generate large volumes of energy-related data. This evolution has given rise to the concept of smart buildings, where data-driven decision-making plays a central role in enhancing operational efficiency, reducing energy waste, and ensuring environmental sustainability.47 Despite these advancements, a major challenge persists in effectively monitoring and managing building energy consumption due to the dynamic and complex nature of usage patterns. Energy demand in buildings is influenced by multiple factors, including occupancy behavior, environmental conditions, operational schedules, and equipment performance.812 Traditional monitoring approaches, which rely heavily on static thresholds and rule-based systems, often fail to capture subtle deviations or evolving consumption patterns.13 As a result, these conventional methods are prone to high false alarm rates or missed detections, limiting their effectiveness in real-world applications. Anomaly detection has emerged as a critical technique for addressing these challenges by identifying unusual patterns or deviations in energy consumption data that may indicate inefficiencies, system faults, or unauthorized usage.1417, 18 Early detection of such anomalies is essential for enabling predictive maintenance, reducing operational costs and improving overall system reliability. However, the inherent variability and non-linearity of building energy data make anomaly detection a non-trivial task, requiring more advanced and adaptive analytical approaches.1922

In recent years, machine learning (ML) techniques have demonstrated significant potential in overcoming the limitations of traditional methods. By learning complex patterns directly from data, ML-based models can automatically adapt to changing conditions and uncover hidden relationships within high-dimensional datasets.2325 Various approaches, including supervised, unsupervised, and hybrid learning models, have been explored for anomaly detection in building energy systems.26 These methods have shown promising results in tasks such as fault detection, energy forecasting, and consumption optimization. Nevertheless, challenges remain in achieving a balance between detection accuracy, computational efficiency, interpretability, and adaptability to real-time environments. Among the available machine learning techniques, Support Vector Machines (SVM) have gained considerable attention due to their robustness in handling high-dimensional data and their ability to establish clear decision boundaries between normal and anomalous patterns. SVM models are particularly effective in classification tasks where the distinction between classes is subtle, as is often the case in energy consumption datasets.2730 Furthermore, their capability to generalize well with limited training samples makes them suitable for applications where labeled anomaly data may be scarce.

Motivated by these considerations, this study proposes a dynamic anomaly detection framework for building energy systems using machine learning techniques, with a particular focus on Support Vector Machines. Unlike traditional static approaches, the proposed model is designed to adapt to evolving energy consumption patterns and provide accurate identification of abnormal behavior.31 The framework leverages smart meter data, incorporating preprocessing and feature extraction techniques to enhance data quality and model performance. By analyzing time-series consumption data alongside contextual variables, the system aims to detect anomalies more reliably while minimizing false positives. The significance of this research lies in its contribution to sustainable building management through intelligent energy monitoring.32 By enabling early detection of inefficiencies and abnormal consumption patterns, the proposed approach supports energy conservation, cost reduction, and improved operational decision-making. Additionally, the study provides a scalable and replicable framework that can be extended to various smart building environments and integrated with existing energy management systems.33

In this paper, we propose a dynamic anomaly detection model using machine learning techniques that can adapt to changing patterns in building energy use. By focusing on a system that evolves and improves over time, the goal is to enhance energy efficiency, reduce waste, and contribute to the broader push for smarter and more sustainable buildings. Many of the energy monitoring systems currently used in buildings still rely on fixed thresholds or simple rule-based logic to spot issues. While these traditional methods have served us for a while, they often fall short when it comes to catching subtle changes or new patterns in how energy is used. Because these systems are rigid, they can miss important anomalies or, on the flip side, generate too many false alarms that can frustrate building managers.

Aiming at addressing these challenges, our paper provides a set of contributions which can be summarized:

  • Develop an approach to detect abnormal energy consumption in addition to reducing waste.

  • Examine occupancy patterns and energy consumption signals gathered by various sensors placed throughout buildings in relation to the standby consumption level, maximum operating time, and active consumption rate of each device. Time-series consumption observations go through a pre-processing procedure that involves data cleaning and resampling before being classified into the previously indicated classifications.

  • Use the support vector machine algorithm to automatically identify the abnormal consumption classes.

  • An anomaly detection data set based on an experiment performed on the smart meter data was used to strain to evaluate the performance of the proposed solution and by using the technique, promising performance has been obtained

  • To test and validate the suggested anomaly detection method, The smart meter dataset is used, while metrics like accuracy, f1 score are used to evaluate the performance of the system.

    The rest of the work is arranged as follows: The sectionon “Related Works” summarizes previous works while outlining their limitations. The “Methods” described the approach for the detection of abnormal energy consumption. The new design and working of the system are carried out in the “Results and Discussion” section. In “Conclusion”, the conclusion derived from this work are presented and future directions are discussed.

2. Related works

Several studies have been carried out on the different applications of machine learning techniques in detecting anomaly in building energy systems. Demonstrating how a hybrid method combining unsupervised (SR-CNN) and supervised (boosted decision trees) learning can detect anomalies in smart meter electricity data.34 They achieved 90% accuracy and reduced false positives, helping utility companies detect fraud more reliably and cost-effectively. Using semi-supervised learning combining SAX-CART and SAX-MLP models to detect anomalies in electricity consumption. There framework improved both interpretability and accuracy, identifying abnormal patterns in real-world data and offering insights into potential causes.35 The anomaly detection of building energy consumption is one of the best strategies to maximize energy in buildings.36 Exploring an ensemble learning framework combination with multiple classifiers to detect anomalies in building energy consumption showed higher sensitivity and fewer false positives compared to individual methods, supporting smart.37 Introducing an Asymmetric Hybrid Encoder-Decoder (AHED) anomaly detection architecture, designed to precisely forecast and identify point anomalies and collective anomalies within the domain of building energy usage.38 This architecture synthesizes both supervised and unsupervised learning approaches and utilizes an advanced decoder-encoder configuration for accurate prediction of energy consumption. Concurrently, the AHED framework applies sliding window techniques and cross-correlation analysis to convert multivariate temporal data into feature matrices, to detect anomalous patterns that manifest collectively within specified time intervals. The results demonstrate that the AHED model outperforms traditional anomaly detection techniques, achieving higher accuracy and improved generalization across diverse building environments, which affirms the efficacy and superiority of the asymmetric model in anomaly detection for building energy consumption.3940

The research work by41 proposed a two-way stage learning system for smart home using LOf, COF and CBLOF outlier detection approaches, followed by regression and ensemble models to forecast energy demand. Their findings show that removing anomalies increases prediction accuracy. Wang and Ahn42 proposed a framework for detecting residential electrical load anomalies that incorporated a hybrid one-step-ahead load predictor and a rule-engine-based load anomaly detector to increase load forecast accuracy. They employed the KNN method and a support vector machine to improve anomaly detection. Xu and Chen43 demonstrated a hybrid data mining-based method for tracking irregular building energy demand and detecting anomalies. They employed a recurrent neural network to identify the inaccurate prediction interval, and the quantile regression range was used to assess atypical building energy consumption results. This approach was used to examine energy use data from three distinct homes. Sater and Hamza44 investigated an anomaly detection system that uses a federated learning strategy to train a long short-term memory model while simultaneously solving numerous tasks. In,45 a spectral dual convolutional neural network recognized anomalies in a time-series data stream by defining a threshold to identify anomalous values across the whole energy consumption data set. Zainab et al46 used machine learning to detect spam in smart home device readings based on time series data. Their method calculates the spamicity score of each IoT device and assesses the dependability of devices in the home network using feature significance and RMSE machine learning techniques. Bouabdallaoui et al.47 suggested a machine learning technique for building maintenance. Data were collected from various sensors around a building, and a fault detection model was created utilizing an autoencoder, recurrent neural network, and long short-term memory model. A real-world case study was used to forecast the maintenance of heating, ventilation, and air conditioning systems in sports facilities. Gaur et al.,48 proposed a method for generating a ground truth based on existing data to detect abnormalities in short- and long-term energy usage using the z-score and LR models.

3. Methods

The outline of the method used in building the model is presented in Figure. 1. The input to this model is from Kaggle. The first step is to preprocess the data and extract features that are then used for building the model. The final step is to use the SVM algorithm to classify the datasets to predict energy consumption behavior.

e993320c-3542-425a-b461-79f8b6048d85_figure1.gif

Figure 1. Flowchart diagram of the anomaly detection model setup.

Smart meter dataset from Kaggle formed the input data for the training of the model, the data after being collected were preprocessed and features extracted before being trained to detect anomaly in energy system.

The choice of SVM in this research arose holding the fact it handles high dimensional data well as building energy usage is influenced by many features such as time of day, day of the week, etc. SVM is designed to handle datasets with many input features, even when the number of examples is limited. As well, SVM tend to clear separation between normal and abnormal classes: SVM works by finding the best possible boundary (also called a hyperplane) that separates different classes in the data. If the electricity usage data contains labeled examples (e.g., normal vs abnormal), SVM will try to maximize the margin between them ensuring highly confident decisions.19

The detailed step is thereby presented.

3.1: Data acquisition

The data used in this study was obtained from a simulated smart meter system from kaggle (https://www.kaggle.com/datasets/ziya07/smart-meter-electricity-consumption-dataset) that records energy consumption at half-hour intervals. Figure 2 shows the smart meter dataset. The dataset contains 5,000 entries and includes environmental and contextual factors that influence electricity usage, such as temperature, humidity, and wind speed in which most of them won’t be used after data preprocessing. Each record also includes a binary label indicating whether the reading is normal or considered an anomaly, which supports the supervised learning approach. From the dataset 70% of the data would be for training the model, 15% would be for validation and the remaining 15% is for testing. The dataset was stored in a CSV file named “ smart_meter_data main.csv” and was loaded using Python’s pandas library for further processing and model development.

e993320c-3542-425a-b461-79f8b6048d85_figure2.gif

Figure 2. Smart meter dataset (https://www.kaggle.com/datasets/ziya07/smart-meter-electricity-consumption-dataset).

The smart meter dataset containing about 5000 entries including environmental and contextual factors that influence energy consumption.

3.2: Data preprocessing

Before training the model, the raw data was cleaned and transformed to ensure accuracy and consistency. The following preprocessing steps were applied:

  • Timestamp Conversion: The Timestamp column was converted from string format to a standard datetime format using pandas.to_datetime(), enabling time-based operations such as feature extraction.

  • Handling Missing Values: The dataset was checked for missing values, and none were found. If any had existed, they would have been handled by forward-filling or interpolation.

  • Label Encoding: The Anomaly_Label column, which contains categorical values (“Normal”, “Anomaly”), was encoded into binary format: 0 for “Normal” and 1 for “Anomaly”

This preprocessing ensured that the dataset was clean, structured, and suitable for the anomaly detection model.

3.3. Feature extraction

After preprocessing, specific features were selected to train the Support Vector Machine model. These features were chosen based on their relevance to identifying unusual patterns in electricity usage.20

Electricity_Consumed: This is the primary variable of interest, indicating how much electricity was used at each interval.

Avg_Past_Consumption: Provides context by indicating the average usage in the past, which helps detect spikes or drops in consumption relative to usual behavior.

Timestamp: Another variable of intrerest, that indicates the hours and the timestamps are recorded in half-hour(30mins).

Anomaly_Label: A very important column which contains lables of the electrical data to know if its “Normal or Anomaly”, this helps the model well because we are using supervised machine learning meaning we use labelled data.

Humidity: To check for corollation between humidity and anomaly.

Tempereture: To check for correlation between temperature and anomaly.

4 Results and discussion

After training and evaluating the Support Vector Machine (SVM) model for anomaly detection in electricity consumption, the system was tested on both normal and anomalous data points extracted from smart meter records. The model successfully identified deviations in energy usage patterns that aligned with abnormal operational events such as sudden spikes, unusually low consumption, or irregular load behavior during non-operational hours. These visualizations not only validated the detection mechanism but also highlighted the interpretability of results, which is essential for practical adoption by facility managers.

An energy usage time series is shown in Figure 3. With distinct peaks and troughs that match normal household or system demand cycles, the graphic illustrates the temporal dynamics of electricity usage. Anomalies in this series appear as erratic variations that depart from typical consumption patterns. Abrupt spikes or declines that are out of sync with the overall energy usage rhythm are examples of these aberrations. Anomalies are visually scarce and frequently inconspicuous because they are embedded inside lengthy sequences of typical consumption, which further highlights the difficulty of anomaly detection. It provides a crucial framework for comprehending why anomaly detection algorithms need to be perceptive to both local and global variations in energy usage.

e993320c-3542-425a-b461-79f8b6048d85_figure3.gif

Figure 3. Time series visualization of energy consumption.

The deviations in energy consumption patterns such as sudden spikes, irregular load behaviors are represented using the graph.

A bar chart comparing the actual and projected counts of events (normal and anomalous) is shown in Figure 4. A clear visual evaluation of the model’s performance is given by the chart. A notable disparity is noted for anomalies, but the expected counts for typical occurrences are rather near to the actual counts. In particular, the model’s bias toward the majority class is shown in its underprediction of anomalies. The low recall and F1 scores given for anomalies are visually explained by this mismatch in predicting performance. As a result, Figure 4 functions as a performance diagnostic, demonstrating the classifier successfully capturing typical behavior.

e993320c-3542-425a-b461-79f8b6048d85_figure4.gif

Figure 4. Bar chart showing Actual and Predicted Count.

The bar chart, class against count compared the actual and the predicted outcome of events in energy consumption abnormality.

The distribution of normal versus anomaly counts throughout the sample is depicted in a bar chart in Figure 5. The data display a strong imbalance, with many more normal instances than anomalies, similar to the previous class distribution data. The classifier’s behavior is directly impacted by this skew, which makes it favor accurate normal sample classification over abnormality detection. The picture highlights the underlying reason for the subpar anomaly performance metrics: the model’s capacity to identify distinctive patterns for anomalies is constrained by their rarity in the dataset.

e993320c-3542-425a-b461-79f8b6048d85_figure5.gif

Figure 5. Bar chart showing Normal and Anomaly Count.

The bar chart shows the distribution of normal versus anomaly counts throughout the sample.

Specifically, the system achieved:

  • Accuracy: A measure of the overall correctness of predictions. While accuracy was relatively high, it was not considered the most reliable indicator due to class imbalance (fewer anomalies than normal cases).

  • Precision: The fraction of correctly identified anomalies out of all flagged anomalies. The precision value demonstrated the model’s ability to minimize false alarms, which is critical for ensuring trust in the system by operators.

  • Recall: The fraction of true anomalies detected. A strong recall score indicated the system’s effectiveness in capturing anomalous behavior that might otherwise have been missed by manual rule-based monitoring.

  • F1 Score: The harmonic mean of precision and recall. This balanced metric reflected the trade-off between missing anomalies (false negatives) and over-flagging normal points (false positives).

5 Conclusion

This research focused on the development of a dynamic anomaly detection system for electricity consumption using Support Vector Machines (SVM). The system was developed with the aim of providing a reliable and intelligent approach to identify irregular consumption patterns that could signal equipment faults, unauthorized usage, or inefficiencies. The study concludes that machine learning-based approaches, particularly Support Vector Machines, offer a superior method of anomaly detection in electricity consumption compared to manual or rule-based methods. The research demonstrated that AI systems can learn complex, non-linear consumption patterns and can provide early warning signs of anomalies that may otherwise go unnoticed. One of the most significant insights gained is the importance of data preprocessing and feature engineering. Clean, structured, and well-prepared data is the foundation upon which effective models are built. The study also emphasized that accuracy alone is not a reliable performance metric in imbalanced classification tasks; instead, a combination of precision, recall, and F1-score gives a more comprehensive evaluation.

Authors’ information

Chidi Ukamaka Betrand: 0000-0003-0452-375X

Chinwe Gilean Onukwugha: 0000-0001-6462-4662 [email protected]

Oluchukwu Uzoamaka Ekwealor: 0000-0001-8950-4544 [email protected]

Douglas Allswell Kelechi: 0009-0007-5627-9345 [email protected]

Mercy Eberechi Benson Emenike: 0000-0003-1771-5806 [email protected]

Nneka Martina Oragba: 0000-0001-6680-6347

Donatus Onyedikachi Njoku: 0000-0001-6309-7493 [email protected]

Christopher Ifeanyi Ofoegbu: 0000-0001-6462-4662 [email protected]

Toochi Chima Ewunonu: 0000-0003-3152-6788 [email protected]

Chidimma Lilian Okpalla: 000-0002-0560-1871 [email protected]

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 12 Jun 2026
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Betrand C, Onukwugha C, Ekwealor O et al. Anomaly Detection in building energy system using machine learning technique [version 1; peer review: awaiting peer review]. F1000Research 2026, 15:917 (https://doi.org/10.12688/f1000research.182125.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status:
AWAITING PEER REVIEW
AWAITING PEER REVIEW
?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 12 Jun 2026
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.