ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article

Optimal price-based control of heterogeneous thermostatically controlled loads under uncertainty using LSTM networks and genetic algorithms

[version 1; peer review: 2 approved with reservations, 1 not approved]
PUBLISHED 10 Sep 2019
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Artificial Intelligence and Machine Learning gateway.

This article is included in the Mathematical, Physical, and Computational Sciences collection.

Abstract

In this paper, we consider the problem of thermostatically controlled load (TCL) control through dynamic electricity prices, under partial observability of the environment and uncertainty of the control response. The problem is formulated as a Markov decision process where an agent must find a near-optimal pricing scheme using partial observations of the state and action. We propose a long-short-term memory (LSTM) network to learn the individual behaviors of TCL units. We use the aggregated information to predict the response of the TCL cluster to a pricing policy. We use this prediction model in a genetic algorithm to find the best prices in terms of profit maximization in an energy arbitrage operation. The simulation results show that the proposed method offers a profit equal to 96% of the theoretical optimal solution.

Keywords

Artificial intelligence, Artificial neural networks, Customer behavior learning, Demand response programs, Energy arbitrage, LSTM, Partial observability, Price elasticity of demand, Profit maximization, Smart grid, thermostatically controlled loads.

Abbreviations

DR           Demand Response

GA           Genetic Algorithm

LSTM       Long-Short Term Memory

MDP         Markov Decision Process

TCL         Thermostatically controlled load

Indices

n               Index for TCL units n = 1,2, …,30

t                Index for time step, t = 1, 2, …, 24

Parameters

f                Transition function

g               State approximation function

H              Control horizon

Lmax         Maximum load capacity1

Lmax         Load threshold

N              Number of TCL units to control

PN            Population size

Rmax        Revenue cap

U              Action space

W              Random process space

X              State space

Variables

C            Candidate state vector in LSTM network

Ct           Cost function at time t

ΔTt         Gap between the outdoor and indoor temperatures [°C]

h             Control policy

ht            Hidden state vector of LSTM network

In,t          Input matrix of LSTM network

Pt           Selling electricity price at time t [€ cent/kW]

Pt           Wholesale electricity price at time t [€ cent/kW]

Pt,max     Maximum selling price at time t [€ cent/kW]

Pt,min      Minimum selling price at time t [€ cent/kW]

Pw          Probability distribution

p             Control action reward

Tt           Temperature at time t [°C]

ut           Control action at time t

xt           State at time t

Introduction

In a power network relying on distributed and renewable energy resources, the exploration of new sources of flexibility is a key factor for its stability. Given the intermittent nature of renewable energy resources, it is challenging to maintain the power balance under normal operating conditions in a grid with deep penetration of these resources. Therefore, more integration of renewable resources increases the need for ancillary services such as regulation reserve and load following requirements1. However, using traditional fossil fuel generators to provide these reserves will decrease the net carbon benefit from renewables, weaken generation efficiency and will be economically untenable. Alternatively, demand-side resources can play a key role in supplying the regulation service needed for deep renewable integration with zero-emission operations. Demand-side resources such as thermostatically controlled loads (TCLs), electric vehicles and strategic storage can contribute to ancillary services by acting as a source of flexibility to the grid. Unlike the traditional demand-side management programs, such as peak load shaving and emergency load management, the exploration of higher flexibility from the above-mentioned loads has a big potential in offering more lucrative and faster ancillary services. The potential of these sources of flexibility is reflected on the energy market. Electricity prices fluctuate according to the availability and demand of energy. This can open considerable opportunities for energy arbitrage2.

A significant potential for provision of flexibility resides in TCLs such as air conditioners ACs, heat pumps, water heaters, and refrigerators. TCLs represent a high percentage of the total electricity consumption3,4. The nature of TCLs permits them to act as a thermal storage which makes it possible to adjust their electricity consumption while maintaining the temperature requirements and the comfort level of the end user. The idea of TCL flexibility relies on the principle that the temperature constraints specified by the users, can be fulfilled by different power trajectories. Finding the optimal trajectory that provides the required flexibility and high lucrative ancillary service is the subject of several studies57. However, this problem requires real-time information about the state of TCLs, their envelope temperature and their behavior in response to temperature dynamics. In most of the cases, this information is only partially available and requires qualitative or quantitative models to estimate it. It is also possible to use model-free approaches to solve the problem of uncertainty and find near-optimal power trajectories2.

The optimal power trajectory for a cluster of TCLs is then translated to individual or aggregated control signals using a variety of control methods. Control methods can be categorized into intrusive forms, including direct and indirect control, and non-intrusive form using price proxies. The direct intrusive form of control consists of directly controlling the on/off states of the TCLs, the indirect intrusive form consists of controlling the parameters of TCLs, such as the temperature set points and the switch cycles and the non-intrusive form of control uses dynamic prices to steer the consumption of TCLs relying on price-based demand response programs. The intrusive form requires an aggregator contracting with each TCL unit holder for taking control of their TCLs with the condition that their temperature constraints will be respected throughout the control period. The non-intrusive approach relies on the end user’s involvement and response to a given control signal in return of a certain incentive or special pricing. The users’ response to these signals can also be an automatic response to electricity prices throughout the day using home energy management systems or embedded TCL controllers8.

Intrusive control of TCLs has a big potential in offering a wide range of flexibility and market opportunities for the aggregators. It offers a faster response to control signals and permits the design of a more reliable energy arbitrage strategy compared to non-intrusive control through price proxies. However, the implementation of the technological requirements for an intrusive control on a large scale can be challenging due to its high financial requirements. Additionally, the question of whether the consumers are ready to give up the control of their TCLs to an external party can also be a barrier for the implementation of these programs. According to 9, the integration of end users in the demand response (DR) programs is a key factor for its success. Several smart grid projects were analyzed from this perspective and the conclusions suggest that more attention should be given to the domestication of these technologies and their adaptation with the users’ experience considering their social dimensions such as individual behavior, education, and income level9,10,11. It is therefore necessary to include all these factors in the design of a DR program. Non-intrusive control, on the other hand, has fewer constraints regarding the users’ comfort and data privacy. It makes the end user feel included in the decision making of the grid and involved in the energy management. This discussion can serve as a benchmark when making the choice of the control strategy and the implementation of a large-scale DR program.

In our paper, we choose to implement a non-intrusive control using dynamic electricity prices. We first formulate the problem as a Markov decision process (MDP)12, where the policy consists of a sequence of electricity prices. The agent is assumed to have no prior knowledge or data about the state of TCL units except their real time power consumption. The idea is to use data-driven models that can learn the consumption patterns of each individual TCL unit and their response to temperatures and prices. We use a long-short-term memory (LSTM) neural network architecture to learn individual TCL units’ behaviors as in 13. This method can overcome the problem of uncertainty and the diversity of power consumption preferences in response to varying prices. The aggregator uses these models to simulate the aggregate response TCLs to different pricing schemes during a certain control horizon. An optimization algorithm is then applied to find the best pricing strategy given an objective function. When controlling a cluster of TCLs, different objective functions are considered in the literature, such as tracking a balancing signal7 or energy arbitrage5. In this work we adopt an energy arbitrage objective function, where we maximize the profit of an aggregator that buys electricity from the wholesale market and sells it in the retail market to end users with TCL units. A genetic algorithm is implemented to find the best pricing solution of the aggregate TCLs.

Related work and contributions

The literature contains extensive research concerning TCL control and their flexibility potential.

TCL control approaches

Most early studies, as well as current work, focus on direct intrusive control methods and frameworks. Early work that tackled aggregated modeling of TCLs can be found in 14 and 15. The solution computation and controller design of these approaches is considerably difficult, which represents a drawback for these approaches. These issues were mitigated in more recent works5,7,16 using a different class of linear population-bin transition models based on Markov chains. Other approaches have proposed time-varying battery models with dissipation such as 17 or without dissipation as in 18. These approaches were used to compute near-optimal control trajectories with a reduced computational cost. Although optimal pricing for demand side management has been thoroughly studied in the literature1921, the price-based control of TCLs remains only briefly addressed in the literature. In 22, the operating reserve capacity of aggregated heterogeneous TCLs was evaluated using a TCL model that takes into consideration consumer behavior. The price-based approach was also addressed from the consumer perspective in 23. The objective of the proposed method was mainly to find the optimal set point change in response to electricity prices in other to minimize the increases in the electricity bill due to dynamic pricing. The power gain from this control scheme was then used for load following supply. Another approach was proposed to find the equilibrium between the electricity prices and the users’ comfort. Using a Stackelberg game approach, authors in 24 presented a unique Stackelberg equilibrium that maximizes the utility function and minimizes dissatisfaction cost of TCLs users. A similar approach was proposed in 25 and 26 using a mean-field game approach to find the best pricing scheme considering TCLs as price-responsive rational agents.

Deep learning-based models for TCL control with partial observability

Deep learning and other machine learning methods are largely applied in DR programs27. The implementation of a TCL cluster control program faces the problem of uncertainty and heterogeneity of the TCL units’ behaviors in response to control prices. Consequently, many researchers were interested in using machine learning models that can learn aggregate or individual behavior of TCL units under partial observability. A model-free reinforcement learning was early proposed in 28 for TCL control that gives similar results as model predictive approaches. Reinforcement learning approaches were also used in29 to control domestic water buffers according to a local photovoltaic production for the maximization of self-consumption. More recently, the success of deep reinforcement learning approaches has inspired more researchers to tackle the problem of direct TCL control using deep reinforcement learning. Authors in 3033 have used different deep neural architectures for automatic estimation of the TCLs’ state’s features in a batch reinforcement learning model. The same authors have later provided a comparison of the different architectures in 33,34. The LSTM architecture has outperformed the other deep neural network architectures. These works focused only on deep Q-learning, which is based on the estimation of a quality function for every potential action before performing the optimization. In 35 Deep policy gradient method was explored along with deep Q-learning for an on-line energy optimization of the buildings.

Contributions

Following the above-mentioned literature and the success of LSTM networks in mitigating the problem of partial state information and solving long-term dependency problem13,33,34, we propose a two-step pricing optimization method for the exploration of TCL flexibility in energy arbitrage. This paper addresses the need for new non-intrusive TCL control methods via electricity prices proxies, so far lacking in the scientific literature. The proposed method relies on LSTM networks learning individual TCL unit behavior and the prediction of individual responses to electricity prices. The individual predictions are aggregated to form an overall prediction model. This model is used in a genetic algorithm (GA)-based optimization algorithm to maximize a retailer’s profit considering grid and energy cost constraints. To the best of the authors’ knowledge, this is the first work that uses LSTM networks in a non-intrusive TCL control problem based on electricity prices within a DR program. The main contributions of this paper are the following:

  • An MDP formulation of the price control problem where the policy is the set of electricity prices during a control horizon.

  • An LSTM network for learning the individual behavior of TCL units in response of electricity prices and temperatures.

  • An aggregation of individual TCL units’ behaviors, in response to prices, to derive a global estimation of the potential response of the TCL units cluster.

  • A genetic algorithm that uses the aggregated information from the LSTM networks to optimize the lucrative benefits from an energy arbitrage operation.

Problem formulation

We consider a cluster of residential households powered by electricity from the same retailer or utility company. The households are equipped with smart meters and TCLs that can react to electricity prices and indoor temperatures. The retailer implements a price-based DR program that announces electricity prices for a certain time horizon in such a way that maximizes an objective function. The optimization is based on an estimated information about the responsiveness to electricity prices and temperatures. Before discussing the pricing optimization approach, we formulate the problem as an MDP12. An MDP is defined by its state space X, its action space U, and its transition function f, which defines the dynamics between the current state xtX and the next step xt+1 under a control action utU and subject to a random process wW with a probability distribution pw (., xt). The transition equation is defined as follows:

xt+1=f(xt,ut,wt)(1)

The objective of this process will be to find a policy h: XU that minimizes or maximizes a cost function or a reward function throughout the control horizon starting from a state x1 denoted by:

Rh(x1)=E(tρ(xt,ht,wt))(2)

where ρ is the reward or the cost of each time step k given an action ht. Unlike the classic Q-iteration methods, the policy is characterized directly by sum of rewards during a time horizon H. The optimization is performed on the set of actions during the time horizon H and the fitness function is the cost function Rh of the policy h. For each policy h, a corresponding sequence of states is estimated implicitly by the forecasting model.

State and control action description

The agent is only able to measure a partial observation of the true state i.e. no information about the indoor temperatures, resulting in a partially observable Markov decision problem. The observable state space X consists of two variables: the outside temperature, and the electric load:

xt=(Lt,Tt)(3)

Since the observable state space only includes part of the true state, it is not possible to directly model future state transitions. Yet this remains convenient when following the results from 13 that we can predict the next step electric load Lt+1 using the information of outdoor temperature Tt, the electric load Lt and the electricity price Pt+1. The state is extended with sequences of past observations of states and actions, which results in a non-Markovian state.

For each TCL, the electric load is approximated by:

Lt+1g(Lt,Tt,Pt+1)(4)

We assume that the outside temperatures’ forecasts are available for every future timestep in the control horizon.

The control action ut consists of the electricity price that the retailer announces for each time step of the control horizon. As mentioned earlier, even though the retailer is not controlling the TCLs directly, we assume that the TCLs react directly to electricity prices. Therefore, the electricity price controls the state by influencing the amount of energy consumed during a timestep t. The next state is then defined by:

xt+1=f(xt,Pt,wt)(g(Lt,Tt,Pt+1),Tt+1).(5)

Objective function

According to the existing literature, the control of TCLs clusters can be performed considering different objective functions. For instance, the objective can be tracking a balancing signal or energy arbitrage. In this work we consider an energy arbitrage problem where a retailer is trying to maximize their profit. However, the framework and methods presented here might as well be applied to different objective functions. We consider the profit as the difference between the revenue and the cost function. We assume that the cost function Ct(Lt) is convex increasing in Lt for each timestep as formulated in 36.

Ct(Lt)=qLt2+ptLt+c(6)

where, q > 0 is a constant, pt > 0 is the electricity price in the wholesale market and c > 0 is a fixed cost.

In order to avoid overload during peak times, we introduce a maximum load capacity of the power network, denoted Lt,max at each timestep. Therefore, we have the following constraint:

Lt=nLn,tLt,max,tH(7)

The revenue is the bill that customers would pay for using the energy during the time window H:

R=t=0H(Lt*Pt)(8)

Usually, there exists a total revenue cap, denoted as Rmax, for the retailer. Therefore, we need to add the revenue constraint to improve the acceptability of the retailer’s pricing strategies, i.e., without such a constraint, the retail prices will keep going up to a level which is against energy regulations as well as financially unacceptable to the customers. As a result, we have the following constraint:

R<Rmax(9)

Moreover, for each timestep tH, we define the minimum and maximum price that the retailer (utility company) can offer Pt,min and Pt,max, we have:

Pt,minPtPt,max,tH(10)

Pt,min and Pt,max are usually designed based on historical prices, market competition, customers’ acceptability, and the wholesale price. It is reasonable to assume that the price the retailers can offer is greater than the wholesale price for each hour, and there exists a price cap for the retail prices due to retail market competition.

Finally, the control problem defined the optimization of the price vector P, during the time horizon H, can be modeled as follows:

maxP{Rt=0HCt(Lt)}(11)

subject to constraints:

R<Rmax(12)

LtLmax(13)

Pt,maxPtPt,max,tH(14)

Methods and implementation

Given the partial observability of this problem, the methods proposed in this paper are nondeterministic. An LSTM network is used to estimate the next states given an initial state and a pricing policy. The method consists of learning the individual behavior of each TCL agent n using an LSTM method as illustrated in 13. The N estimation models will predict the reaction Ln,t+1 of each TCL to a state x and a pricing action Pt. The overall estimated load Lt is the sum of all the load predictions as in (7). Given this estimation model, we apply a genetic algorithm to find the best pricing policy.

LSTM networks for state estimation

LSTM networks are recurrent neural networks that consist of memory blocks. These memory blocks replace the summation units in the hidden layers in a standard recurrent neural network. The input vector and the hidden state vector are passed through the forget gate to determine the keeping rate of the cell state components. The same vector is passed through the input gate to determine how much of the new state candidate C can pass to the new cell state. Finally, the output gate will decide how much of the transformed state cell vector can be passed to the next hidden state vector ht. Following 13, the proposed LSTM network consists of multiple layers of LSTM cells followed by a fully connected layer as illustrated in Figure 1. In the case of our model, the input In,t is a 2 x 3 matrix that consists of the electric loads, the temperatures and the electricity prices as follows:

60ec456e-0949-4e28-9b53-bef3cc7b5199_figure1.gif

Figure 1. LSTM Network for TCLs load prediction.

The model uses the information about temperatures, loads and price in the previous timesteps to predict the load L(t). Since this is a regressions problem, the fully connected layer uses a linear activation function.

In,t=(Ln,t1,Tt1,PtLn,t,Tt,Pt+1)(15)

The LSTM network recurrently uses the historical information of loads, temperatures and prices to predict electric load for an individual TCL n, in the next timestep. The aggregation of these predictions gives an approximation of g function mentioned in the previous section.

Initially, for each TCL agent nN we train an LSTM network based on the historical reactions of these TCLs to prices and temperatures. We assume that a DR program is implemented during a long period, enough to collect a sufficient amount of data related to the reactions of TCL agents to prices and temperatures.

Genetic algorithms for price optimization

Due to the discontinuous nature of the objective function and the complicated dependency between the function electric load L and the electricity prices P, the conventional nonlinear optimization methods are not usable for this problem. Therefore, GA-based optimization algorithms are more suited for this problem37. The proposed GA algorithm uses rank selection and value encoding38. Each chromosome represents a pricing policy P and consists of a vector of size H. We use uniform crossover39 and non-uniform mutation40. The constraints are handled by the approach proposed in 41.

The proposed GA-based optimization algorithms for TCL pricing control are given in Algorithm 1 and Algorithm 2.

Algorithm 1. GA-based optimization algorithm for TCL pricing control.

1:       Population Initialization, i.e., generating a population of PN chromosomes randomly; each chromosome denotes a pricing policy for the next time horizon H.

2:       for i=1 to PN do

3:           Concatenate the price vector to the temperature forecasts of the next time horizon.

4:           for each TCL agent n in N do:

5:                Use LSTM network iteratively to predict (Ln,t)t∈H using Algorithm 2.

6:           end for

7:           Calculate Lt, Ct(Lt) ∀tH, and R

8:           Check the feasibility of policy P regarding the constraints. Handle the invalid individuals by the approach proposed in []. Then calculate the fitness value of policy P.

9:       end for

10:      Create a new generation of chromosomes by using the selection, crossover, and mutation operations of the GA.

11:      Repeat steps 2–11 until the stopping condition is reached.

12:      Announce the best price vector via the two-way communication infrastructure at the beginning of the control horizon.

Algorithm 2. Individual TCL load prediction using LSTM network.

1:      Build the initial input matrix In,0 using the initial values of prices, loads and temperatures.

2:      for t=0 to H do

3:          Use the input matrix In,t to predict Ln,t+1

4:          Concatenate L, T and P with the last line of the input matrix In,t to build the next input matrix:

In,t+1=(Ln,t,Tt,Pt+1Ln,t+1,Tt+1,Pt+2)

5:       end for

6:       return (Ln,t)t∈H

In Algorithm 1, we initialize a population of NP pricing policies at step 1. For each policy P we perform steps 2–6 to evaluate the fitness function and the feasibility for each policy. The evaluation of policies is performed using LSTM sequence prediction presented in Algorithm 2. The best policies are selected, and a new generation is created using crossover and mutation operations in step 10. This process is repeated until a stopping condition or maximum number of iterations is reached. At the end of the optimization process, the best pricing policy is selected, and prices are announced to TCL agents via two-way communications technology. After each control episode, the LSTM learning models are updated according to the new data collected from the actual response to the implemented electricity prices.

Results

In this section we evaluate the functionality of the proposed pricing control methods. A set of numerical experiments were performed on a simulation scenario comprising a population of 30 TCLs exposed to dynamic electricity prices during a period where the outdoor temperatures change significantly. The thermal inertia of each TCL allows the electric demand to be shifted towards lower price moments. The TCL agents determine the amount of electricity to be consumed at each timestep according to the indoor temperature and the electricity prices. The objective of TCL agents is to maintain a reasonable comfort level while minimizing the electricity bill. Therefore, the different TCL agents have different reactions given a set of prices and temperatures depending on individual user’s preferences and buildings’ characteristics. We define a control timestep of 1 hour and a control horizon of 6 hours. The choice of the control horizon is justified by the limited ability of LSTM to predict large sequences of the future electric loads. The control horizon is chosen in a way that minimizes the number of times the retailer runs the control algorithms and announces the prices, while keeping a good accuracy of the LSTM predictions.

Simulation data

Following 13 the simulation data is generated using two fuzzy logic systems with the following assumptions:

  • The TCL agents are reacting to indoor temperatures and electricity prices.

  • The difference between the outdoor and indoor temperature ∆T depends on the building characteristics and the amount of energy spent in heating/cooling in previous timesteps.

TCL agents are operating during the day to maintain a comfortable temperature of the space while taking into consideration the electricity price in a given hour. Fuzzy logic is used in this problem because it can model non-qualitative concepts like “hot temperature” or “low price”. The combination of the two fuzzy logic systems delivers the load Ln,t+1 using the outdoor temperature Tt and the electricity price Pt+1. The simulation is performed with different parameters to generate diverse data for 30 TCL agents. The temperature and price data used for the simulation are taken respectively from the Kaisaniemi observation station in Helsinki, available online in 42, and Elspot DA electricity prices in Finland43 for the period between 1st January 2017 and the 7th September 2018. The generated dataset consists of 14,734 data points for each TCL agent.

LSTM networks results

The data generated from the above-mentioned simulations is used to train the LSTM networks to learn the behavior of each individual TCL agent. The hyperparameters and structure of the LSTM networks are chosen according to the results of 13 and summarized in Table 1.

Table 1. Results of LSTM model hyperparameters optimization.

Sequence length2
LSTM cell size30
LSTM cells2
Dropout0.2
Activation‘tanh’
Recurrent activation‘selu’
Optimizer‘rmsprop’

The results are evaluated using validation data generated from the same simulations. Figure 2a illustrates the learning results for three TCL agents during different time periods with different temperatures and prices. Figure 2b illustrates the comparison between the real and predicted average power consumption of the 30 TCL agents cluster. The power curves show that the TCL agents’ responses to prices and temperatures are slightly different. In general, the power consumption is high when the temperatures and electricity prices are low and vice-versa. The comparison between the true load curves and the predicted load curves show a very small prediction error per hour in most cases. The true and predicted load curves have similar shapes and significant resemblances. The peaks and valleys are also predicted accurately in most of the cases, which gives a valuable insight for demand side management.

60ec456e-0949-4e28-9b53-bef3cc7b5199_figure2.gif

Figure 2. LSTM Learning results.

(a) Power consumption of different TCL agents in response to electricity prices and outdoor temperatures. (b) Average real and predicted power consumption of the cluster surrounded by an envelope containing 9% of the power consumption profiles for different days.

GA Optimization results

We run the GA optimization algorithm on a population of size 100 for 100 iterations. The parameters used for the optimization are summarized in Table 2. The optimization process is graphically presented in Figure 3. The learning process is measured by the fitness of the best individual in the population at each iteration. Figure 4 illustrates the results of the best pricing solutions for one day. Figure 4a is an illustration of the electricity prices fluctuations during the 24 hours. Figure 4b shows a comparison between the power consumption of the whole cluster under original prices and the power consumption under optimized prices. Figure 4c presents the revenue and profit that the retailer would make under original and optimized prices. Figure 4d presents daily bill of each user of the cluster under original and optimized prices.

Table 2. Optimization parameters.

PN100
Lmax75.0 kWh
q0.01 €cents/[kWh]2
c1.0 €cents
Pt,minpt
Pt,max2*pt
RmaxN*H*5.5 €cents
60ec456e-0949-4e28-9b53-bef3cc7b5199_figure3.gif

Figure 3. Learning process of a population of size 100.

60ec456e-0949-4e28-9b53-bef3cc7b5199_figure4.gif

Figure 4. Results’ comparison of original and optimized pricing policy.

(a) Optimized prices solution for 24 hours. (b) Revenue and profit under original and optimized prices for 24 hours. (c) Total electricity consumption under original and optimized prices. (d) Daily electricity bills under original and optimized prices.

The results show a general increase in prices throughout the day. However, this increase didn’t result in an increase in the daily electricity bills. Most of customers will be paying a slightly lower amount per day. This is a consequence of upper limit constraint on the revenue described in (12). The overall consumption of electricity was decreased comparing to the original pricing scheme which gives a good idea about the potential energy saving that an optimal pricing strategy can offer.

Comparison with a theoretical benchmark

In order to validate the performance of the proposed algorithm, we consider a case where we have a full access to TCL units’ behavior, i.e. the exact electricity consumption of each TCL unit given temperatures and prices at each timestep. The optimization is performed with direct access to the simulation model described above, which provides full observability and perfect information about the TCLs. This theoretical setup can serve as a benchmark of our method. It can be seen as an upper limit on the profit possibly made by the aggregator without violating the constraints.

The results illustrated in Figure 5a–d, show that the proposed methods have performed very similarly to the benchmark. The hourly prices in Figure 5a, are only slightly shifted from the benchmark prices during most of the day. The difference is only significant in 2 to 3 points. The same observation can be made for the revenues and profits in Figure 5b and electricity consumption in Figure 5c. The comparison of daily bills under optimized prices and benchmark prices in Figure 5d shows a slight rise in the electricity bill in the benchmark model for most customers. This can be explained by the slight increase in prices illustrated in Figure 5a.

60ec456e-0949-4e28-9b53-bef3cc7b5199_figure5.gif

Figure 5. Results’ comparison of optimized and benchmark pricing policy.

(a) Comparison between benchmark and optimized prices. (b) Hourly revenues and profits under optimized prices and benchmark prices. (c) Hourly total electricity consumption under optimized prices and benchmark prices. (d) Daily electricity bills under optimized and benchmark prices.

The daily revenues and profits under original, optimized and benchmark prices are compared in Figure 6. The comparison shows a closely similar revenue in the three cases. The optimized prices have given a slightly smaller revenue compared to the revenue from original and benchmark prices. However, the profit from original prices is considerably smaller than the profit from optimized prices. The latter is only slightly smaller than the benchmark’s profit. Numerically, the profit from the proposed methods is 95.97% of the optimal benchmark profit. This observation shows that an increase in the profit can be made without an increase in the revenue when the prices are optimized correctly.

60ec456e-0949-4e28-9b53-bef3cc7b5199_figure6.gif

Figure 6. Daily revenues and profits under original, optimized and benchmark prices.

Discussion and conclusion

In this paper, we demonstrated the effectiveness of a new TCL control using electricity price proxies. The control policy consists of a sequence of prices influencing the electricity consumption from TCLs. The problem was formulated as a Markov decision process with non-Markovian state to handle the sparse observations of the TCL cluster’s state. We extend the observable state with sequences of past observations to approximate the transition function using an LSTM architecture. The LSTM network is used to capture the individual behavior of TCLs under price-based DR. The individual models are aggregated to approximate the next state of the cluster. This approximation is used iteratively in a genetic algorithm to evaluate the potential profit from an energy arbitrage operation and find the optimal pricing policy for a given control horizon. The LSTM models are updated every 24 hours to capture the changes in the TCL units’ behavior.

The experiment consists of a retailer agent buying electricity from the wholesale market and selling it to a group of residential TCLs. The agent can only measure the electricity consumption of each TCL and the outside temperature. The agent has access to a significant amount of historical data from an already implemented DR program. Which allows it to train the LSTM models for each TCL unit and perform an optimization on the electricity prices.

We first evaluate the performance of the LSTM network by comparing the real and predicted loads from 30 TCL units during different days. The predicted load profiles are closely similar to the real load profiles both at individual and aggregate level. The optimization relies on a genetic algorithm with a profit maximization objective. The results of the optimization show that the proposed methods offer a much higher daily profit than the original prices and 95.97% of the optimal profit from a model that has full observation of the state.

The flexibility offered by TCLs is a high potential for ancillary services required for a deep integration of renewable energy sources in the grid. An energy arbitrage operation can offer a service to the grid by exploiting this flexibility using direct or indirect control. The partially observable state and the uncertainty of the TCL response to prices was tackled in this paper with an LSTM network using past observations and actions. The LSTM network offered a high performance by extracting relevant features of the hidden state using its internal memory cell, allowing it to process sequences of sparse observations to learn the hidden patterns of power consumption.

Data availability

Underlying data

Figshare: LSTM+GA data, https://doi.org/10.6084/m9.figshare.9746786.v144.

This project contains the following underlying data:

  • Data used by the fuzzy logic simulation model such as temp_prices and temperatures.

  • Data generated by the fuzzy simulator such as fuzzy_outxx.csv and used to train the LSTM models.

  • Data related to the optimization process such as results and GA_pricing, optimized_prices_loads

Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

Software availability

Code for analysis available from: https://github.com/tahanakabi/Optimal-Price-Based-control-of-heterogeneous-thermostatically-controlled-loads-under-uncertainty-usi

Archived code as at time of publication: http://doi.org/10.5281/zenodo.338361545

License: MIT

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 10 Sep 2019
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Nakabi TA and Toivanen P. Optimal price-based control of heterogeneous thermostatically controlled loads under uncertainty using LSTM networks and genetic algorithms [version 1; peer review: 2 approved with reservations, 1 not approved]. F1000Research 2019, 8:1619 (https://doi.org/10.12688/f1000research.20421.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 10 Sep 2019
Views
16
Cite
Reviewer Report 01 Sep 2020
Shanti Swarup, Department of Electrical Engineering, Indian Institute of Technology Madras, Chennai, Tamil Nadu, India 
Not Approved
VIEWS 16
  1. The idea behind TCL is very good and how it effects the price.
     
  2. However, the paper is not properly presented.
     
  3. What is LSTM? long-short-term memory is contradicting
... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Swarup S. Reviewer Report For: Optimal price-based control of heterogeneous thermostatically controlled loads under uncertainty using LSTM networks and genetic algorithms [version 1; peer review: 2 approved with reservations, 1 not approved]. F1000Research 2019, 8:1619 (https://doi.org/10.5256/f1000research.22445.r68826)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
12
Cite
Reviewer Report 01 Sep 2020
Chiou-Jye Huang, Department of Electrical Engineering, Jiangxi University of Science and Technology, Ganzhou, Jiangxi, China 
Approved with Reservations
VIEWS 12
The authors proposed a long-short-term memory (LSTM) network to learn the individual behaviors of TCL units. The authors use the aggregated information to predict the response of the TCL cluster to the pricing policy. The authors use this prediction model ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Huang CJ. Reviewer Report For: Optimal price-based control of heterogeneous thermostatically controlled loads under uncertainty using LSTM networks and genetic algorithms [version 1; peer review: 2 approved with reservations, 1 not approved]. F1000Research 2019, 8:1619 (https://doi.org/10.5256/f1000research.22445.r68823)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
19
Cite
Reviewer Report 25 Aug 2020
Matsukawa Shun, Smart Grid Power Control Engineering Joint Laboratory, Gifu University, Gifu, Japan 
Approved with Reservations
VIEWS 19
  1. The sequence length of the LSTM model is so short that it does not show superior predictive ability compared to other models.
     
  2. I couldn't understand the significance of Figure 2 because of
... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Shun M. Reviewer Report For: Optimal price-based control of heterogeneous thermostatically controlled loads under uncertainty using LSTM networks and genetic algorithms [version 1; peer review: 2 approved with reservations, 1 not approved]. F1000Research 2019, 8:1619 (https://doi.org/10.5256/f1000research.22445.r68829)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 10 Sep 2019
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.