<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.20421.1</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Research Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>Optimal price-based control of heterogeneous thermostatically controlled loads under uncertainty using LSTM networks and genetic algorithms</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 1; peer review: 2 approved with reservations, 1 not approved]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Nakabi</surname>
                        <given-names>Taha Abdelhalim</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-7103-7036</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Toivanen</surname>
                        <given-names>Pekka</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Funding Acquisition</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>School of Computing, University of Eastern Finland, Kuopio, 70211, Finland</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:tahanak@uef.fi">tahanak@uef.fi</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>10</day>
                <month>9</month>
                <year>2019</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2019</year>
            </pub-date>
            <volume>8</volume>
            <elocation-id>1619</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>3</day>
                    <month>9</month>
                    <year>2019</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Nakabi TA and Toivanen P</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/8-1619/pdf"/>
            <abstract>
                <p>In this paper, we consider the problem of thermostatically controlled load (TCL) control through dynamic electricity prices, under partial observability of the environment and uncertainty of the control response. The problem is formulated as a Markov decision process where an agent must find a near-optimal pricing scheme using partial observations of the state and action. We propose a long-short-term memory (LSTM) network to learn the individual behaviors of TCL units. We use the aggregated information to predict the response of the TCL cluster to a pricing policy. We use this prediction model in a genetic algorithm to find the best prices in terms of profit maximization in an energy arbitrage operation. The simulation results show that the proposed method offers a profit equal to 96% of the theoretical optimal solution.</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>Artificial intelligence</kwd>
                <kwd>Artificial neural networks</kwd>
                <kwd>Customer behavior learning</kwd>
                <kwd>Demand response programs</kwd>
                <kwd>Energy arbitrage</kwd>
                <kwd>LSTM</kwd>
                <kwd>Partial observability</kwd>
                <kwd>Price elasticity of demand</kwd>
                <kwd>Profit maximization</kwd>
                <kwd>Smart grid</kwd>
                <kwd>thermostatically controlled loads.</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1" xlink:href="http://dx.doi.org/10.13039/501100004022">
                    <funding-source>Jenny ja Antti Wihurin Rahasto</funding-source>
                </award-group>
                <funding-statement>Jenny and Antti Wihuri Foundation.</funding-statement>
                <funding-statement>
                    <italic>The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</italic>
                </funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        <sec>
            <title>Abbreviations</title>
            <p>DR&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Demand Response</p>
            <p>GA&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Genetic Algorithm</p>
            <p>LSTM&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Long-Short Term Memory</p>
            <p>MDP&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Markov Decision Process</p>
            <p>TCL&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Thermostatically controlled load</p>
        </sec>
        <sec>
            <title>Indices</title>
            <p>n&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Index for TCL units n = 1,2, &#x2026;,30</p>
            <p>t&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Index for time step, t = 1, 2, &#x2026;, 24</p>
        </sec>
        <sec>
            <title>Parameters</title>
            <p>
                <italic toggle="yes">f</italic>&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Transition function</p>
            <p>g&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;State approximation function</p>
            <p>H&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Control horizon</p>
            <p>
                <italic toggle="yes">L
                    <sub>max</sub>
                </italic>&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Maximum load capacity
                <sup>
                    <xref ref-type="other" rid="FN1">1</xref>
                </sup>
            </p>
            <p>
                <italic toggle="yes">L
                    <sub>max</sub>
                </italic>&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Load threshold</p>
            <p>N&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Number of TCL units to control</p>
            <p>PN&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Population size</p>
            <p>Rmax&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Revenue cap</p>
            <p>U&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Action space</p>
            <p>
                <italic toggle="yes">W</italic>&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Random process space</p>
            <p>
                <italic toggle="yes">X</italic>&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;State space</p>
        </sec>
        <sec>
            <title>Variables</title>
            <p>
                <underline>
                    <italic toggle="yes">C</italic>
                </underline>&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Candidate state vector in LSTM network</p>
            <p>
                <italic toggle="yes">C
                    <sub>t</sub>
                </italic>&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Cost function at time t</p>
            <p>&#x0394;T
                <sub>t</sub>&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Gap between the outdoor and indoor temperatures [&#x00b0;C]</p>
            <p>
                <italic toggle="yes">h</italic>&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Control policy</p>
            <p>
                <underline>
                    <italic toggle="yes">h
                        <sub>t</sub>
                    </italic>
                </underline>&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Hidden state vector of LSTM network</p>
            <p>
                <italic toggle="yes">I
                    <sub>n,t</sub>
                </italic>&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Input matrix of LSTM network</p>
            <p>
                <italic toggle="yes">P
                    <sub>t</sub>
                </italic>&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Selling electricity price at time 
                <italic toggle="yes">t</italic> [&#x20ac; cent/kW]</p>
            <p>
                <italic toggle="yes">P
                    <sub>t</sub>
                </italic>&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Wholesale electricity price at time 
                <italic toggle="yes">t</italic> [&#x20ac; cent/kW]</p>
            <p>
                <italic toggle="yes">P
                    <sub>t,max</sub>
                </italic>&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Maximum selling price at time 
                <italic toggle="yes">t</italic> [&#x20ac; cent/kW]</p>
            <p>
                <italic toggle="yes">P
                    <sub>t,min</sub>
                </italic>&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Minimum selling price at time 
                <italic toggle="yes">t</italic> [&#x20ac; cent/kW]</p>
            <p>
                <italic toggle="yes">P
                    <sub>w</sub>
                </italic>&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Probability distribution</p>
            <p>
                <italic toggle="yes">p</italic>&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Control action reward</p>
            <p>T
                <sub>t</sub>&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Temperature at time 
                <italic toggle="yes">t</italic> [&#x00b0;C]</p>
            <p>
                <italic toggle="yes">u
                    <sub>t</sub>
                </italic>&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Control action at time 
                <italic toggle="yes">t</italic>
            </p>
            <p>
                <italic toggle="yes">x
                    <sub>t</sub>
                </italic>&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;State at time 
                <italic toggle="yes">t</italic>
            </p>
        </sec>
        <sec sec-type="intro">
            <title>Introduction</title>
            <p>In a power network relying on distributed and renewable energy resources, the exploration of new sources of flexibility is a key factor for its stability. Given the intermittent nature of renewable energy resources, it is challenging to maintain the power balance under normal operating conditions in a grid with deep penetration of these resources. Therefore, more integration of renewable resources increases the need for ancillary services such as regulation reserve and load following requirements
                <sup>
                    <xref ref-type="bibr" rid="ref-1">1</xref>
                </sup>. However, using traditional fossil fuel generators to provide these reserves will decrease the net carbon benefit from renewables, weaken generation efficiency and will be economically untenable. Alternatively, demand-side resources can play a key role in supplying the regulation service needed for deep renewable integration with zero-emission operations. Demand-side resources such as thermostatically controlled loads (TCLs), electric vehicles and strategic storage can contribute to ancillary services by acting as a source of flexibility to the grid. Unlike the traditional demand-side management programs, such as peak load shaving and emergency load management, the exploration of higher flexibility from the above-mentioned loads has a big potential in offering more lucrative and faster ancillary services. The potential of these sources of flexibility is reflected on the energy market. Electricity prices fluctuate according to the availability and demand of energy. This can open considerable opportunities for energy arbitrage
                <sup>
                    <xref ref-type="bibr" rid="ref-2">2</xref>
                </sup>.</p>
            <p>A significant potential for provision of flexibility resides in TCLs such as air conditioners ACs, heat pumps, water heaters, and refrigerators. TCLs represent a high percentage of the total electricity consumption
                <sup>
                    <xref ref-type="bibr" rid="ref-3">3</xref>,
                    <xref ref-type="bibr" rid="ref-4">4</xref>
                </sup>. The nature of TCLs permits them to act as a thermal storage which makes it possible to adjust their electricity consumption while maintaining the temperature requirements and the comfort level of the end user. The idea of TCL flexibility relies on the principle that the temperature constraints specified by the users, can be fulfilled by different power trajectories. Finding the optimal trajectory that provides the required flexibility and high lucrative ancillary service is the subject of several studies
                <sup>
                    <xref ref-type="bibr" rid="ref-5">5</xref>&#x2013;
                    <xref ref-type="bibr" rid="ref-7">7</xref>
                </sup>. However, this problem requires real-time information about the state of TCLs, their envelope temperature and their behavior in response to temperature dynamics. In most of the cases, this information is only partially available and requires qualitative or quantitative models to estimate it. It is also possible to use model-free approaches to solve the problem of uncertainty and find near-optimal power trajectories
                <sup>
                    <xref ref-type="bibr" rid="ref-2">2</xref>
                </sup>.</p>
            <p>The optimal power trajectory for a cluster of TCLs is then translated to individual or aggregated control signals using a variety of control methods. Control methods can be categorized into intrusive forms, including direct and indirect control, and non-intrusive form using price proxies. The direct intrusive form of control consists of directly controlling the on/off states of the TCLs, the indirect intrusive form consists of controlling the parameters of TCLs, such as the temperature set points and the switch cycles and the non-intrusive form of control uses dynamic prices to steer the consumption of TCLs relying on price-based demand response programs. The intrusive form requires an aggregator contracting with each TCL unit holder for taking control of their TCLs with the condition that their temperature constraints will be respected throughout the control period. The non-intrusive approach relies on the end user&#x2019;s involvement and response to a given control signal in return of a certain incentive or special pricing. The users&#x2019; response to these signals can also be an automatic response to electricity prices throughout the day using home energy management systems or embedded TCL controllers
                <sup>
                    <xref ref-type="bibr" rid="ref-8">8</xref>
                </sup>.</p>
            <p>Intrusive control of TCLs has a big potential in offering a wide range of flexibility and market opportunities for the aggregators. It offers a faster response to control signals and permits the design of a more reliable energy arbitrage strategy compared to non-intrusive control through price proxies. However, the implementation of the technological requirements for an intrusive control on a large scale can be challenging due to its high financial requirements. Additionally, the question of whether the consumers are ready to give up the control of their TCLs to an external party can also be a barrier for the implementation of these programs. According to 
                <xref ref-type="bibr" rid="ref-9">9</xref>, the integration of end users in the demand response (DR) programs is a key factor for its success. Several smart grid projects were analyzed from this perspective and the conclusions suggest that more attention should be given to the domestication of these technologies and their adaptation with the users&#x2019; experience considering their social dimensions such as individual behavior, education, and income level
                <sup>
                    <xref ref-type="bibr" rid="ref-9">9</xref>,
                    <xref ref-type="bibr" rid="ref-10">10</xref>,
                    <xref ref-type="bibr" rid="ref-11">11</xref>
                </sup>. It is therefore necessary to include all these factors in the design of a DR program. Non-intrusive control, on the other hand, has fewer constraints regarding the users&#x2019; comfort and data privacy. It makes the end user feel included in the decision making of the grid and involved in the energy management. This discussion can serve as a benchmark when making the choice of the control strategy and the implementation of a large-scale DR program.</p>
            <p>In our paper, we choose to implement a non-intrusive control using dynamic electricity prices. We first formulate the problem as a Markov decision process (MDP)
                <sup>
                    <xref ref-type="bibr" rid="ref-12">12</xref>
                </sup>, where the policy consists of a sequence of electricity prices. The agent is assumed to have no prior knowledge or data about the state of TCL units except their real time power consumption. The idea is to use data-driven models that can learn the consumption patterns of each individual TCL unit and their response to temperatures and prices. We use a long-short-term memory (LSTM) neural network architecture to learn individual TCL units&#x2019; behaviors as in 
                <xref ref-type="bibr" rid="ref-13">13</xref>. This method can overcome the problem of uncertainty and the diversity of power consumption preferences in response to varying prices. The aggregator uses these models to simulate the aggregate response TCLs to different pricing schemes during a certain control horizon. An optimization algorithm is then applied to find the best pricing strategy given an objective function. When controlling a cluster of TCLs, different objective functions are considered in the literature, such as tracking a balancing signal
                <sup>
                    <xref ref-type="bibr" rid="ref-7">7</xref>
                </sup> or energy arbitrage
                <sup>
                    <xref ref-type="bibr" rid="ref-5">5</xref>
                </sup>. In this work we adopt an energy arbitrage objective function, where we maximize the profit of an aggregator that buys electricity from the wholesale market and sells it in the retail market to end users with TCL units. A genetic algorithm is implemented to find the best pricing solution of the aggregate TCLs.</p>
        </sec>
        <sec>
            <title>Related work and contributions</title>
            <p>The literature contains extensive research concerning TCL control and their flexibility potential.</p>
            <sec>
                <title>TCL control approaches</title>
                <p>Most early studies, as well as current work, focus on direct intrusive control methods and frameworks. Early work that tackled aggregated modeling of TCLs can be found in 
                    <xref ref-type="bibr" rid="ref-14">14</xref> and 
                    <xref ref-type="bibr" rid="ref-15">15</xref>. The solution computation and controller design of these approaches is considerably difficult, which represents a drawback for these approaches. These issues were mitigated in more recent works
                    <sup>
                        <xref ref-type="bibr" rid="ref-5">5</xref>,
                        <xref ref-type="bibr" rid="ref-7">7</xref>,
                        <xref ref-type="bibr" rid="ref-16">16</xref>
                    </sup> using a different class of linear population-bin transition models based on Markov chains. Other approaches have proposed time-varying battery models with dissipation such as 
                    <xref ref-type="bibr" rid="ref-17">17</xref> or without dissipation as in 
                    <xref ref-type="bibr" rid="ref-18">18</xref>. These approaches were used to compute near-optimal control trajectories with a reduced computational cost. Although optimal pricing for demand side management has been thoroughly studied in the literature
                    <sup>
                        <xref ref-type="bibr" rid="ref-19">19</xref>&#x2013;
                        <xref ref-type="bibr" rid="ref-21">21</xref>
                    </sup>, the price-based control of TCLs remains only briefly addressed in the literature. In 
                    <xref ref-type="bibr" rid="ref-22">22</xref>, the operating reserve capacity of aggregated heterogeneous TCLs was evaluated using a TCL model that takes into consideration consumer behavior. The price-based approach was also addressed from the consumer perspective in 
                    <xref ref-type="bibr" rid="ref-23">23</xref>. The objective of the proposed method was mainly to find the optimal set point change in response to electricity prices in other to minimize the increases in the electricity bill due to dynamic pricing. The power gain from this control scheme was then used for load following supply. Another approach was proposed to find the equilibrium between the electricity prices and the users&#x2019; comfort. Using a Stackelberg game approach, authors in 
                    <xref ref-type="bibr" rid="ref-24">24</xref> presented a unique Stackelberg equilibrium that maximizes the utility function and minimizes dissatisfaction cost of TCLs users. A similar approach was proposed in 
                    <xref ref-type="bibr" rid="ref-25">25</xref> and 
                    <xref ref-type="bibr" rid="ref-26">26</xref> using a mean-field game approach to find the best pricing scheme considering TCLs as price-responsive rational agents.</p>
            </sec>
            <sec>
                <title>Deep learning-based models for TCL control with partial observability</title>
                <p>Deep learning and other machine learning methods are largely applied in DR programs
                    <sup>
                        <xref ref-type="bibr" rid="ref-27">27</xref>
                    </sup>. The implementation of a TCL cluster control program faces the problem of uncertainty and heterogeneity of the TCL units&#x2019; behaviors in response to control prices. Consequently, many researchers were interested in using machine learning models that can learn aggregate or individual behavior of TCL units under partial observability. A model-free reinforcement learning was early proposed in 
                    <xref ref-type="bibr" rid="ref-28">28</xref> for TCL control that gives similar results as model predictive approaches. Reinforcement learning approaches were also used in
                    <sup>
                        <xref ref-type="bibr" rid="ref-29">29</xref>
                    </sup> to control domestic water buffers according to a local photovoltaic production for the maximization of self-consumption. More recently, the success of deep reinforcement learning approaches has inspired more researchers to tackle the problem of direct TCL control using deep reinforcement learning. Authors in 
                    <xref ref-type="bibr" rid="ref-30">30</xref>&#x2013;
                    <xref ref-type="bibr" rid="ref-33">33</xref> have used different deep neural architectures for automatic estimation of the TCLs&#x2019; state&#x2019;s features in a batch reinforcement learning model. The same authors have later provided a comparison of the different architectures in 
                    <xref ref-type="bibr" rid="ref-33">33</xref>,
                    <xref ref-type="bibr" rid="ref-34">34</xref>. The LSTM architecture has outperformed the other deep neural network architectures. These works focused only on deep Q-learning, which is based on the estimation of a quality function for every potential action before performing the optimization. In 
                    <xref ref-type="bibr" rid="ref-35">35</xref> Deep policy gradient method was explored along with deep Q-learning for an on-line energy optimization of the buildings.</p>
            </sec>
            <sec>
                <title>Contributions</title>
                <p>Following the above-mentioned literature and the success of LSTM networks in mitigating the problem of partial state information and solving long-term dependency problem
                    <sup>
                        <xref ref-type="bibr" rid="ref-13">13</xref>,
                        <xref ref-type="bibr" rid="ref-33">33</xref>,
                        <xref ref-type="bibr" rid="ref-34">34</xref>
                    </sup>, we propose a two-step pricing optimization method for the exploration of TCL flexibility in energy arbitrage. This paper addresses the need for new non-intrusive TCL control methods via electricity prices proxies, so far lacking in the scientific literature. The proposed method relies on LSTM networks learning individual TCL unit behavior and the prediction of individual responses to electricity prices. The individual predictions are aggregated to form an overall prediction model. This model is used in a genetic algorithm (GA)-based optimization algorithm to maximize a retailer&#x2019;s profit considering grid and energy cost constraints. To the best of the authors&#x2019; knowledge, this is the first work that uses LSTM networks in a non-intrusive TCL control problem based on electricity prices within a DR program. The main contributions of this paper are the following:</p>
                <list list-type="bullet">
                    <list-item>
                        <p>An MDP formulation of the price control problem where the policy is the set of electricity prices during a control horizon.</p>
                    </list-item>
                    <list-item>
                        <p>An LSTM network for learning the individual behavior of TCL units in response of electricity prices and temperatures.</p>
                    </list-item>
                    <list-item>
                        <p>An aggregation of individual TCL units&#x2019; behaviors, in response to prices, to derive a global estimation of the potential response of the TCL units cluster.</p>
                    </list-item>
                    <list-item>
                        <p>A genetic algorithm that uses the aggregated information from the LSTM networks to optimize the lucrative benefits from an energy arbitrage operation.</p>
                    </list-item>
                </list>
            </sec>
        </sec>
        <sec>
            <title>Problem formulation</title>
            <p>We consider a cluster of residential households powered by electricity from the same retailer or utility company. The households are equipped with smart meters and TCLs that can react to electricity prices and indoor temperatures. The retailer implements a price-based DR program that announces electricity prices for a certain time horizon in such a way that maximizes an objective function. The optimization is based on an estimated information about the responsiveness to electricity prices and temperatures. Before discussing the pricing optimization approach, we formulate the problem as an MDP
                <sup>
                    <xref ref-type="bibr" rid="ref-12">12</xref>
                </sup>. An MDP is defined by its state space 
                <italic toggle="yes">X</italic>, its action space 
                <italic toggle="yes">U</italic>, and its transition function 
                <italic toggle="yes">f</italic>, which defines the dynamics between the current state 
                <italic toggle="yes">x
                    <sub>t</sub>
                </italic> &#x2208; 
                <italic toggle="yes">X</italic> and the next step 
                <italic toggle="yes">x
                    <sub>t</sub>
                </italic>
                <sub>+1</sub> under a control action 
                <italic toggle="yes">u
                    <sub>t</sub>
                </italic> &#x2208;
                <italic toggle="yes">U</italic> and subject to a random process 
                <italic toggle="yes">w</italic> &#x2208; 
                <italic toggle="yes">W</italic> with a probability distribution 
                <italic toggle="yes">p
                    <sub>w</sub>
                </italic> (., 
                <italic toggle="yes">x
                    <sub>t</sub>
                </italic>). The transition equation is defined as follows:</p>
            <p>
                <disp-formula id="e1">
                    <mml:math display="block" id="math1">
                        <mml:mrow>
                            <mml:msub>
                                <mml:mi>x</mml:mi>
                                <mml:mrow>
                                    <mml:mi>t</mml:mi>
                                    <mml:mo>+</mml:mo>
                                    <mml:mn>1</mml:mn>
                                </mml:mrow>
                            </mml:msub>
                            <mml:mo>=</mml:mo>
                            <mml:mi>f</mml:mi>
                            <mml:mo stretchy="false">(</mml:mo>
                            <mml:msub>
                                <mml:mi>x</mml:mi>
                                <mml:mi>t</mml:mi>
                            </mml:msub>
                            <mml:mo>,</mml:mo>
                            <mml:msub>
                                <mml:mi>u</mml:mi>
                                <mml:mi>t</mml:mi>
                            </mml:msub>
                            <mml:mo>,</mml:mo>
                            <mml:msub>
                                <mml:mi>w</mml:mi>
                                <mml:mi>t</mml:mi>
                            </mml:msub>
                            <mml:mo stretchy="false">)</mml:mo>
                        </mml:mrow>
                        <mml:mspace width="12.5em"/>
                        <mml:mo stretchy="false">(</mml:mo>
                        <mml:mn>1</mml:mn>
                        <mml:mo stretchy="false">)</mml:mo>
                    </mml:math>
                </disp-formula>
            </p>
            <p>The objective of this process will be to find a policy 
                <italic toggle="yes">h: X</italic>&#x2192;
                <italic toggle="yes">U</italic> that minimizes or maximizes a cost function or a reward function throughout the control horizon starting from a state 
                <italic toggle="yes">x</italic>
                <sub>1</sub> denoted by:</p>
            <p>
                <disp-formula id="e2">
                    <mml:math display="block" id="math2">
                        <mml:mrow>
                            <mml:msub>
                                <mml:mi>R</mml:mi>
                                <mml:mi>h</mml:mi>
                            </mml:msub>
                            <mml:mo stretchy="false">(</mml:mo>
                            <mml:msub>
                                <mml:mi>x</mml:mi>
                                <mml:mn>1</mml:mn>
                            </mml:msub>
                            <mml:mo stretchy="false">)</mml:mo>
                            <mml:mo>=</mml:mo>
                            <mml:mi>E</mml:mi>
                            <mml:mo stretchy="false">(</mml:mo>
                            <mml:mstyle displaystyle="true">
                                <mml:munder>
                                    <mml:mo>&#x2211;</mml:mo>
                                    <mml:mi>t</mml:mi>
                                </mml:munder>
                                <mml:mrow>
                                    <mml:mi>&#x03c1;</mml:mi>
                                    <mml:mo stretchy="false">(</mml:mo>
                                    <mml:msub>
                                        <mml:mi>x</mml:mi>
                                        <mml:mi>t</mml:mi>
                                    </mml:msub>
                                    <mml:mo>,</mml:mo>
                                    <mml:msub>
                                        <mml:mi>h</mml:mi>
                                        <mml:mi>t</mml:mi>
                                    </mml:msub>
                                    <mml:mo>,</mml:mo>
                                    <mml:msub>
                                        <mml:mi>w</mml:mi>
                                        <mml:mi>t</mml:mi>
                                    </mml:msub>
                                    <mml:mo stretchy="false">)</mml:mo>
                                    <mml:mo stretchy="false">)</mml:mo>
                                </mml:mrow>
                            </mml:mstyle>
                        </mml:mrow>
                        <mml:mspace width="8.6em"/>
                        <mml:mo stretchy="false">(</mml:mo>
                        <mml:mn>2</mml:mn>
                        <mml:mo stretchy="false">)</mml:mo>
                    </mml:math>
                </disp-formula>
            </p>
            <p>where 
                <italic toggle="yes">&#x03c1;</italic> is the reward or the cost of each time step k given an action 
                <italic toggle="yes">h
                    <sub>t</sub>
                </italic>. Unlike the classic Q-iteration methods, the policy is characterized directly by sum of rewards during a time horizon 
                <italic toggle="yes">H</italic>. The optimization is performed on the set of actions during the time horizon 
                <italic toggle="yes">H</italic> and the fitness function is the cost function 
                <italic toggle="yes">R
                    <sub>h</sub>
                </italic> of the policy 
                <italic toggle="yes">h</italic>. For each policy 
                <italic toggle="yes">h</italic>, a corresponding sequence of states is estimated implicitly by the forecasting model.</p>
            <sec>
                <title>State and control action description</title>
                <p>The agent is only able to measure a partial observation of the true state i.e. no information about the indoor temperatures, resulting in a partially observable Markov decision problem. The observable state space 
                    <italic toggle="yes">X</italic> consists of two variables: the outside temperature, and the electric load:</p>
                <p>
                    <disp-formula id="e3">
                        <mml:math display="block" id="math3">
                            <mml:mrow>
                                <mml:msub>
                                    <mml:mi>x</mml:mi>
                                    <mml:mi>t</mml:mi>
                                </mml:msub>
                                <mml:mo>=</mml:mo>
                                <mml:mo stretchy="false">(</mml:mo>
                                <mml:msub>
                                    <mml:mi>L</mml:mi>
                                    <mml:mi>t</mml:mi>
                                </mml:msub>
                                <mml:mo>,</mml:mo>
                                <mml:msub>
                                    <mml:mi>T</mml:mi>
                                    <mml:mi>t</mml:mi>
                                </mml:msub>
                                <mml:mo stretchy="false">)</mml:mo>
                            </mml:mrow>
                            <mml:mspace width="15.5em"/>
                            <mml:mo stretchy="false">(</mml:mo>
                            <mml:mn>3</mml:mn>
                            <mml:mo stretchy="false">)</mml:mo>
                        </mml:math>
                    </disp-formula>
                </p>
                <p>Since the observable state space only includes part of the true state, it is not possible to directly model future state transitions. Yet this remains convenient when following the results from 
                    <xref ref-type="bibr" rid="ref-13">13</xref> that we can predict the next step electric load 
                    <italic toggle="yes">L
                        <sub>t</sub>
                    </italic>
                    <sub>+1</sub> using the information of outdoor temperature 
                    <italic toggle="yes">T
                        <sub>t</sub>
                    </italic>, the electric load 
                    <italic toggle="yes">L
                        <sub>t</sub>
                    </italic> and the electricity price 
                    <italic toggle="yes">P
                        <sub>t</sub>
                    </italic>
                    <sub>+1</sub>. The state is extended with sequences of past observations of states and actions, which results in a non-Markovian state.</p>
                <p>For each TCL, the electric load is approximated by:</p>
                <p>
                    <disp-formula id="e4">
                        <mml:math display="block" id="math4">
                            <mml:mrow>
                                <mml:msub>
                                    <mml:mi>L</mml:mi>
                                    <mml:mrow>
                                        <mml:mi>t</mml:mi>
                                        <mml:mo>+</mml:mo>
                                        <mml:mn>1</mml:mn>
                                    </mml:mrow>
                                </mml:msub>
                                <mml:mo>&#x223c;</mml:mo>
                                <mml:mi>g</mml:mi>
                                <mml:mo stretchy="false">(</mml:mo>
                                <mml:msub>
                                    <mml:mi>L</mml:mi>
                                    <mml:mi>t</mml:mi>
                                </mml:msub>
                                <mml:mo>,</mml:mo>
                                <mml:msub>
                                    <mml:mi>T</mml:mi>
                                    <mml:mi>t</mml:mi>
                                </mml:msub>
                                <mml:mo>,</mml:mo>
                                <mml:msub>
                                    <mml:mi>P</mml:mi>
                                    <mml:mrow>
                                        <mml:mi>t</mml:mi>
                                        <mml:mo>+</mml:mo>
                                        <mml:mn>1</mml:mn>
                                    </mml:mrow>
                                </mml:msub>
                                <mml:mo stretchy="false">)</mml:mo>
                            </mml:mrow>
                            <mml:mspace width="11.5em"/>
                            <mml:mo stretchy="false">(</mml:mo>
                            <mml:mn>4</mml:mn>
                            <mml:mo stretchy="false">)</mml:mo>
                        </mml:math>
                    </disp-formula>
                </p>
                <p>We assume that the outside temperatures&#x2019; forecasts are available for every future timestep in the control horizon.</p>
                <p>The control action 
                    <italic toggle="yes">u
                        <sub>t</sub>
                    </italic> consists of the electricity price that the retailer announces for each time step of the control horizon. As mentioned earlier, even though the retailer is not controlling the TCLs directly, we assume that the TCLs react directly to electricity prices. Therefore, the electricity price controls the state by influencing the amount of energy consumed during a timestep 
                    <italic toggle="yes">t</italic>. The next state is then defined by:</p>
                <p>
                    <disp-formula id="e5">
                        <mml:math display="block" id="math5">
                            <mml:mtable columnalign="left">
                                <mml:mtr>
                                    <mml:mtd>
                                        <mml:msub>
                                            <mml:mi>x</mml:mi>
                                            <mml:mrow>
                                                <mml:mi>t</mml:mi>
                                                <mml:mo>+</mml:mo>
                                                <mml:mn>1</mml:mn>
                                            </mml:mrow>
                                        </mml:msub>
                                    </mml:mtd>
                                </mml:mtr>
                                <mml:mtr>
                                    <mml:mtd>
                                        <mml:mo>=</mml:mo>
                                        <mml:mi>f</mml:mi>
                                        <mml:mo stretchy="false">(</mml:mo>
                                        <mml:msub>
                                            <mml:mi>x</mml:mi>
                                            <mml:mi>t</mml:mi>
                                        </mml:msub>
                                        <mml:mo>,</mml:mo>
                                        <mml:msub>
                                            <mml:mi>P</mml:mi>
                                            <mml:mi>t</mml:mi>
                                        </mml:msub>
                                        <mml:mo>,</mml:mo>
                                        <mml:msub>
                                            <mml:mi>w</mml:mi>
                                            <mml:mi>t</mml:mi>
                                        </mml:msub>
                                        <mml:mo stretchy="false">)</mml:mo>
                                        <mml:mo>&#x223c;</mml:mo>
                                        <mml:mo stretchy="false">(</mml:mo>
                                        <mml:mi>g</mml:mi>
                                        <mml:mo stretchy="false">(</mml:mo>
                                        <mml:msub>
                                            <mml:mi>L</mml:mi>
                                            <mml:mi>t</mml:mi>
                                        </mml:msub>
                                        <mml:mo>,</mml:mo>
                                        <mml:msub>
                                            <mml:mi>T</mml:mi>
                                            <mml:mi>t</mml:mi>
                                        </mml:msub>
                                        <mml:mo>,</mml:mo>
                                        <mml:msub>
                                            <mml:mi>P</mml:mi>
                                            <mml:mrow>
                                                <mml:mi>t</mml:mi>
                                                <mml:mo>+</mml:mo>
                                                <mml:mn>1</mml:mn>
                                            </mml:mrow>
                                        </mml:msub>
                                        <mml:mo stretchy="false">)</mml:mo>
                                        <mml:mo>,</mml:mo>
                                        <mml:msub>
                                            <mml:mi>T</mml:mi>
                                            <mml:mrow>
                                                <mml:mi>t</mml:mi>
                                                <mml:mo>+</mml:mo>
                                                <mml:mn>1</mml:mn>
                                            </mml:mrow>
                                        </mml:msub>
                                        <mml:mo stretchy="false">)</mml:mo>
                                        <mml:mo>.</mml:mo>
                                    </mml:mtd>
                                </mml:mtr>
                            </mml:mtable>
                            <mml:mspace width="3.4em"/>
                            <mml:mo stretchy="false">(</mml:mo>
                            <mml:mn>5</mml:mn>
                            <mml:mo stretchy="false">)</mml:mo>
                        </mml:math>
                    </disp-formula>
                </p>
            </sec>
            <sec>
                <title>Objective function</title>
                <p>According to the existing literature, the control of TCLs clusters can be performed considering different objective functions. For instance, the objective can be tracking a balancing signal or energy arbitrage. In this work we consider an energy arbitrage problem where a retailer is trying to maximize their profit. However, the framework and methods presented here might as well be applied to different objective functions. We consider the profit as the difference between the revenue and the cost function. We assume that the cost function 
                    <italic toggle="yes">C
                        <sub>t</sub>
                    </italic>(
                    <italic toggle="yes">L
                        <sub>t</sub>
                    </italic>) is convex increasing in 
                    <italic toggle="yes">L
                        <sub>t</sub>
                    </italic> for each timestep as formulated in 
                    <xref ref-type="bibr" rid="ref-36">36</xref>.</p>
                <p>
                    <disp-formula id="e6">
                        <mml:math display="block" id="math6">
                            <mml:mrow>
                                <mml:msub>
                                    <mml:mi>C</mml:mi>
                                    <mml:mi>t</mml:mi>
                                </mml:msub>
                                <mml:mo stretchy="false">(</mml:mo>
                                <mml:msub>
                                    <mml:mi>L</mml:mi>
                                    <mml:mi>t</mml:mi>
                                </mml:msub>
                                <mml:mo stretchy="false">)</mml:mo>
                                <mml:mo>=</mml:mo>
                                <mml:mi>q</mml:mi>
                                <mml:msubsup>
                                    <mml:mi>L</mml:mi>
                                    <mml:mi>t</mml:mi>
                                    <mml:mn>2</mml:mn>
                                </mml:msubsup>
                                <mml:mo>+</mml:mo>
                                <mml:msub>
                                    <mml:mi>p</mml:mi>
                                    <mml:mi>t</mml:mi>
                                </mml:msub>
                                <mml:msub>
                                    <mml:mi>L</mml:mi>
                                    <mml:mi>t</mml:mi>
                                </mml:msub>
                                <mml:mo>+</mml:mo>
                                <mml:mi>c</mml:mi>
                            </mml:mrow>
                            <mml:mspace width="10.8em"/>
                            <mml:mo stretchy="false">(</mml:mo>
                            <mml:mn>6</mml:mn>
                            <mml:mo stretchy="false">)</mml:mo>
                        </mml:math>
                    </disp-formula>
                </p>
                <p>where, 
                    <italic toggle="yes">q</italic> &gt; 0 is a constant, 
                    <italic toggle="yes">p
                        <sub>t</sub>
                    </italic> &gt; 0 is the electricity price in the wholesale market and 
                    <italic toggle="yes">c</italic> &gt; 0 is a fixed cost.</p>
                <p>In order to avoid overload during peak times, we introduce a maximum load capacity of the power network, denoted 
                    <italic toggle="yes">L
                        <sub>t,max</sub>
                    </italic> at each timestep. Therefore, we have the following constraint:</p>
                <p>
                    <disp-formula id="e7">
                        <mml:math display="block" id="math7">
                            <mml:mtable columnalign="left">
                                <mml:mtr>
                                    <mml:mtd>
                                        <mml:msub>
                                            <mml:mi>L</mml:mi>
                                            <mml:mi>t</mml:mi>
                                        </mml:msub>
                                        <mml:mo>=</mml:mo>
                                        <mml:mstyle displaystyle="true">
                                            <mml:munder>
                                                <mml:mo>&#x2211;</mml:mo>
                                                <mml:mi>n</mml:mi>
                                            </mml:munder>
                                            <mml:mrow>
                                                <mml:msub>
                                                    <mml:mi>L</mml:mi>
                                                    <mml:mrow>
                                                        <mml:mi>n</mml:mi>
                                                        <mml:mo>,</mml:mo>
                                                        <mml:mi>t</mml:mi>
                                                    </mml:mrow>
                                                </mml:msub>
                                                <mml:mo>&#x2264;</mml:mo>
                                                <mml:msub>
                                                    <mml:mi>L</mml:mi>
                                                    <mml:mrow>
                                                        <mml:mi>t</mml:mi>
                                                        <mml:mo>,</mml:mo>
                                                        <mml:mi>m</mml:mi>
                                                        <mml:mi>a</mml:mi>
                                                        <mml:mi>x</mml:mi>
                                                    </mml:mrow>
                                                </mml:msub>
                                                <mml:mspace width="0.2em"/>
                                                <mml:mo>,</mml:mo>
                                            </mml:mrow>
                                        </mml:mstyle>
                                    </mml:mtd>
                                </mml:mtr>
                                <mml:mtr>
                                    <mml:mtd>
                                        <mml:mspace width="6em"/>
                                        <mml:mo>&#x2200;</mml:mo>
                                        <mml:mi>t</mml:mi>
                                        <mml:mspace width="0.2em"/>
                                        <mml:mo>&#x2208;</mml:mo>
                                        <mml:mtext>&#x200a;</mml:mtext>
                                        <mml:mspace width="0.4em"/>
                                        <mml:mi>H</mml:mi>
                                    </mml:mtd>
                                </mml:mtr>
                            </mml:mtable>
                            <mml:mspace width="11em"/>
                            <mml:mo stretchy="false">(</mml:mo>
                            <mml:mn>7</mml:mn>
                            <mml:mo stretchy="false">)</mml:mo>
                        </mml:math>
                    </disp-formula>
                </p>
                <p>The revenue is the bill that customers would pay for using the energy during the time window H:</p>
                <p>
                    <disp-formula id="e8">
                        <mml:math display="block" id="math8">
                            <mml:mrow>
                                <mml:mi>R</mml:mi>
                                <mml:mo>=</mml:mo>
                                <mml:mstyle displaystyle="true">
                                    <mml:munderover>
                                        <mml:mo>&#x2211;</mml:mo>
                                        <mml:mrow>
                                            <mml:mi>t</mml:mi>
                                            <mml:mo>=</mml:mo>
                                            <mml:mn>0</mml:mn>
                                        </mml:mrow>
                                        <mml:mi>H</mml:mi>
                                    </mml:munderover>
                                    <mml:mrow>
                                        <mml:mo stretchy="false">(</mml:mo>
                                        <mml:msub>
                                            <mml:mi>L</mml:mi>
                                            <mml:mi>t</mml:mi>
                                        </mml:msub>
                                        <mml:mspace width="0.1em"/>
                                        <mml:mo>*</mml:mo>
                                        <mml:mspace width="0.1em"/>
                                        <mml:msub>
                                            <mml:mi>P</mml:mi>
                                            <mml:mi>t</mml:mi>
                                        </mml:msub>
                                        <mml:mo stretchy="false">)</mml:mo>
                                    </mml:mrow>
                                </mml:mstyle>
                            </mml:mrow>
                            <mml:mspace width="13.8em"/>
                            <mml:mo stretchy="false">(</mml:mo>
                            <mml:mn>8</mml:mn>
                            <mml:mo stretchy="false">)</mml:mo>
                        </mml:math>
                    </disp-formula>
                </p>
                <p>Usually, there exists a total revenue cap, denoted as 
                    <italic toggle="yes">R
                        <sub>max</sub>
                    </italic>, for the retailer. Therefore, we need to add the revenue constraint to improve the acceptability of the retailer&#x2019;s pricing strategies, i.e., without such a constraint, the retail prices will keep going up to a level which is against energy regulations as well as financially unacceptable to the customers. As a result, we have the following constraint:</p>
                <p>
                    <disp-formula id="e9">
                        <mml:math display="block" id="math9">
                            <mml:mrow>
                                <mml:mi>R</mml:mi>
                                <mml:mo>&lt;</mml:mo>
                                <mml:msub>
                                    <mml:mi>R</mml:mi>
                                    <mml:mrow>
                                        <mml:mi>m</mml:mi>
                                        <mml:mi>a</mml:mi>
                                        <mml:mi>x</mml:mi>
                                    </mml:mrow>
                                </mml:msub>
                            </mml:mrow>
                            <mml:mspace width="16.8em"/>
                            <mml:mo stretchy="false">(</mml:mo>
                            <mml:mn>9</mml:mn>
                            <mml:mo stretchy="false">)</mml:mo>
                        </mml:math>
                    </disp-formula>
                </p>
                <p>Moreover, for each timestep 
                    <italic toggle="yes">t</italic> &#x2208; 
                    <italic toggle="yes">H</italic>, we define the minimum and maximum price that the retailer (utility company) can offer 
                    <italic toggle="yes">P
                        <sub>t,min</sub>
                    </italic> and 
                    <italic toggle="yes">P
                        <sub>t,max</sub>
                    </italic>, we have:</p>
                <p>
                    <disp-formula id="e10">
                        <mml:math display="block" id="math10">
                            <mml:mrow>
                                <mml:msub>
                                    <mml:mi>P</mml:mi>
                                    <mml:mrow>
                                        <mml:mi>t</mml:mi>
                                        <mml:mo>,</mml:mo>
                                        <mml:mi>m</mml:mi>
                                        <mml:mi>i</mml:mi>
                                        <mml:mi>n</mml:mi>
                                    </mml:mrow>
                                </mml:msub>
                                <mml:mo>&#x2264;</mml:mo>
                                <mml:msub>
                                    <mml:mi>P</mml:mi>
                                    <mml:mi>t</mml:mi>
                                </mml:msub>
                                <mml:mo>&#x2264;</mml:mo>
                                <mml:msub>
                                    <mml:mi>P</mml:mi>
                                    <mml:mrow>
                                        <mml:mi>t</mml:mi>
                                        <mml:mo>,</mml:mo>
                                        <mml:mi>m</mml:mi>
                                        <mml:mi>a</mml:mi>
                                        <mml:mi>x</mml:mi>
                                    </mml:mrow>
                                </mml:msub>
                                <mml:mtext>,</mml:mtext>
                                <mml:mspace width="1em"/>
                                <mml:mo>&#x2200;</mml:mo>
                                <mml:mi>t</mml:mi>
                                <mml:mspace width="0.2em"/>
                                <mml:mo>&#x2208;</mml:mo>
                                <mml:mspace width="0.2em"/>
                                <mml:mi>H</mml:mi>
                            </mml:mrow>
                            <mml:mspace width="7.8em"/>
                            <mml:mo stretchy="false">(</mml:mo>
                            <mml:mn>10</mml:mn>
                            <mml:mo stretchy="false">)</mml:mo>
                        </mml:math>
                    </disp-formula>
                </p>
                <p>
                    <italic toggle="yes">P
                        <sub>t,min</sub>
                    </italic> and 
                    <italic toggle="yes">P
                        <sub>t,max</sub>
                    </italic> are usually designed based on historical prices, market competition, customers&#x2019; acceptability, and the wholesale price. It is reasonable to assume that the price the retailers can offer is greater than the wholesale price for each hour, and there exists a price cap for the retail prices due to retail market competition.</p>
                <p>Finally, the control problem defined the optimization of the price vector P, during the time horizon H, can be modeled as follows:</p>
                <p>
                    <disp-formula id="e11">
                        <mml:math display="block" id="math11">
                            <mml:mrow>
                                <mml:mi>m</mml:mi>
                                <mml:mi>a</mml:mi>
                                <mml:msub>
                                    <mml:mi>x</mml:mi>
                                    <mml:mi>P</mml:mi>
                                </mml:msub>
                                <mml:mspace width="1em"/>
                                <mml:mo stretchy="false">{</mml:mo>
                                <mml:mi>R</mml:mi>
                                <mml:mo>&#x2212;</mml:mo>
                                <mml:mstyle displaystyle="true">
                                    <mml:munderover>
                                        <mml:mo>&#x2211;</mml:mo>
                                        <mml:mrow>
                                            <mml:mi>t</mml:mi>
                                            <mml:mo>=</mml:mo>
                                            <mml:mn>0</mml:mn>
                                        </mml:mrow>
                                        <mml:mi>H</mml:mi>
                                    </mml:munderover>
                                    <mml:mrow>
                                        <mml:msub>
                                            <mml:mi>C</mml:mi>
                                            <mml:mi>t</mml:mi>
                                        </mml:msub>
                                        <mml:mo stretchy="false">(</mml:mo>
                                        <mml:msub>
                                            <mml:mi>L</mml:mi>
                                            <mml:mi>t</mml:mi>
                                        </mml:msub>
                                        <mml:mo stretchy="false">)</mml:mo>
                                    </mml:mrow>
                                </mml:mstyle>
                                <mml:mo stretchy="false">}</mml:mo>
                            </mml:mrow>
                            <mml:mspace width="11.3em"/>
                            <mml:mo stretchy="false">(</mml:mo>
                            <mml:mn>11</mml:mn>
                            <mml:mo stretchy="false">)</mml:mo>
                        </mml:math>
                    </disp-formula>
                </p>
                <p>subject to constraints:</p>
                <p>
                    <disp-formula id="e12">
                        <mml:math display="block" id="math12">
                            <mml:mrow>
                                <mml:mi>R</mml:mi>
                                <mml:mo>&lt;</mml:mo>
                                <mml:msub>
                                    <mml:mi>R</mml:mi>
                                    <mml:mrow>
                                        <mml:mi>m</mml:mi>
                                        <mml:mi>a</mml:mi>
                                        <mml:mi>x</mml:mi>
                                    </mml:mrow>
                                </mml:msub>
                            </mml:mrow>
                            <mml:mspace width="17.4em"/>
                            <mml:mo stretchy="false">(</mml:mo>
                            <mml:mn>12</mml:mn>
                            <mml:mo stretchy="false">)</mml:mo>
                        </mml:math>
                    </disp-formula>
                </p>
                <p>
                    <disp-formula id="e13">
                        <mml:math display="block" id="math13">
                            <mml:mrow>
                                <mml:msub>
                                    <mml:mi>L</mml:mi>
                                    <mml:mi>t</mml:mi>
                                </mml:msub>
                                <mml:mo>&#x2264;</mml:mo>
                                <mml:msub>
                                    <mml:mi>L</mml:mi>
                                    <mml:mrow>
                                        <mml:mi>m</mml:mi>
                                        <mml:mi>a</mml:mi>
                                        <mml:mi>x</mml:mi>
                                    </mml:mrow>
                                </mml:msub>
                            </mml:mrow>
                            <mml:mspace width="17.3em"/>
                            <mml:mo stretchy="false">(</mml:mo>
                            <mml:mn>13</mml:mn>
                            <mml:mo stretchy="false">)</mml:mo>
                        </mml:math>
                    </disp-formula>
                </p>
                <p>
                    <disp-formula id="e14">
                        <mml:math display="block" id="math14">
                            <mml:mrow>
                                <mml:msub>
                                    <mml:mi>P</mml:mi>
                                    <mml:mrow>
                                        <mml:mi>t</mml:mi>
                                        <mml:mo>,</mml:mo>
                                        <mml:mi>m</mml:mi>
                                        <mml:mi>a</mml:mi>
                                        <mml:mi>x</mml:mi>
                                    </mml:mrow>
                                </mml:msub>
                                <mml:mo>&#x2264;</mml:mo>
                                <mml:msub>
                                    <mml:mi>P</mml:mi>
                                    <mml:mi>t</mml:mi>
                                </mml:msub>
                                <mml:mo>&#x2264;</mml:mo>
                                <mml:msub>
                                    <mml:mi>P</mml:mi>
                                    <mml:mrow>
                                        <mml:mi>t</mml:mi>
                                        <mml:mo>,</mml:mo>
                                        <mml:mi>m</mml:mi>
                                        <mml:mi>a</mml:mi>
                                        <mml:mi>x</mml:mi>
                                    </mml:mrow>
                                </mml:msub>
                                <mml:mtext>,</mml:mtext>
                                <mml:mspace width="1.5em"/>
                                <mml:mo>&#x2200;</mml:mo>
                                <mml:mi>t</mml:mi>
                                <mml:mspace width="0.2em"/>
                                <mml:mo>&#x2208;</mml:mo>
                                <mml:mspace width="0.2em"/>
                                <mml:mi>H</mml:mi>
                            </mml:mrow>
                            <mml:mspace width="7.2em"/>
                            <mml:mo stretchy="false">(</mml:mo>
                            <mml:mn>14</mml:mn>
                            <mml:mo stretchy="false">)</mml:mo>
                        </mml:math>
                    </disp-formula>
                </p>
            </sec>
        </sec>
        <sec>
            <title>Methods and implementation</title>
            <p>Given the partial observability of this problem, the methods proposed in this paper are nondeterministic. An LSTM network is used to estimate the next states given an initial state and a pricing policy. The method consists of learning the individual behavior of each TCL agent 
                <italic toggle="yes">n</italic> using an LSTM method as illustrated in 
                <xref ref-type="bibr" rid="ref-13">13</xref>. The N estimation models will predict the reaction 
                <italic toggle="yes">L
                    <sub>n,t</sub>
                </italic>
                <sub>+1</sub> of each TCL to a state 
                <italic toggle="yes">x</italic> and a pricing action 
                <italic toggle="yes">P
                    <sub>t</sub>
                </italic>. The overall estimated load 
                <italic toggle="yes">L
                    <sub>t</sub>
                </italic> is the sum of all the load predictions as in (
                <xref ref-type="other" rid="e7">7</xref>). Given this estimation model, we apply a genetic algorithm to find the best pricing policy.</p>
            <sec>
                <title>LSTM networks for state estimation</title>
                <p>LSTM networks are recurrent neural networks that consist of memory blocks. These memory blocks replace the summation units in the hidden layers in a standard recurrent neural network. The input vector and the hidden state vector are passed through the forget gate to determine the keeping rate of the cell state components. The same vector is passed through the input gate to determine how much of the new state candidate 
                    <bold>
                        <italic toggle="yes">
                            <underline>C</underline>
                        </italic>
                    </bold> can pass to the new cell state. Finally, the output gate will decide how much of the transformed state cell vector can be passed to the next hidden state vector 
                    <bold>
                        <italic toggle="yes">
                            <underline>h
                                <sub>t</sub>
                            </underline>
                        </italic>
                    </bold>. Following 
                    <xref ref-type="bibr" rid="ref-13">13</xref>, the proposed LSTM network consists of multiple layers of LSTM cells followed by a fully connected layer as illustrated in 
                    <xref ref-type="fig" rid="f1">Figure 1</xref>. In the case of our model, the input 
                    <italic toggle="yes">I
                        <sub>n,t</sub>
                    </italic> is a 2 x 3 matrix that consists of the electric loads, the temperatures and the electricity prices as follows:</p>
                <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                    <label>Figure 1. </label>
                    <caption>
                        <title>LSTM Network for TCLs load prediction.</title>
                        <p>The model uses the information about temperatures, loads and price in the previous timesteps to predict the load L(t). Since this is a regressions problem, the fully connected layer uses a linear activation function.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/22445/60ec456e-0949-4e28-9b53-bef3cc7b5199_figure1.gif"/>
                </fig>
                <p>
                    <disp-formula id="e15">
                        <mml:math display="block" id="math15">
                            <mml:mrow>
                                <mml:msub>
                                    <mml:mi>I</mml:mi>
                                    <mml:mrow>
                                        <mml:mi>n</mml:mi>
                                        <mml:mo>,</mml:mo>
                                        <mml:mi>t</mml:mi>
                                    </mml:mrow>
                                </mml:msub>
                                <mml:mo>=</mml:mo>
                                <mml:mo stretchy="false">(</mml:mo>
                                <mml:mtable>
                                    <mml:mtr>
                                        <mml:mtd>
                                            <mml:mrow>
                                                <mml:msub>
                                                    <mml:mi>L</mml:mi>
                                                    <mml:mrow>
                                                        <mml:mi>n</mml:mi>
                                                        <mml:mo>,</mml:mo>
                                                        <mml:mi>t</mml:mi>
                                                        <mml:mo>&#x2212;</mml:mo>
                                                        <mml:mn>1</mml:mn>
                                                    </mml:mrow>
                                                </mml:msub>
                                                <mml:mo>,</mml:mo>
                                                <mml:msub>
                                                    <mml:mi>T</mml:mi>
                                                    <mml:mrow>
                                                        <mml:mi>t</mml:mi>
                                                        <mml:mo>&#x2212;</mml:mo>
                                                        <mml:mn>1</mml:mn>
                                                    </mml:mrow>
                                                </mml:msub>
                                                <mml:mo>,</mml:mo>
                                                <mml:msub>
                                                    <mml:mi>P</mml:mi>
                                                    <mml:mi>t</mml:mi>
                                                </mml:msub>
                                            </mml:mrow>
                                        </mml:mtd>
                                    </mml:mtr>
                                    <mml:mtr>
                                        <mml:mtd>
                                            <mml:mrow>
                                                <mml:msub>
                                                    <mml:mi>L</mml:mi>
                                                    <mml:mrow>
                                                        <mml:mi>n</mml:mi>
                                                        <mml:mo>,</mml:mo>
                                                        <mml:mi>t</mml:mi>
                                                    </mml:mrow>
                                                </mml:msub>
                                                <mml:mo>,</mml:mo>
                                                <mml:msub>
                                                    <mml:mi>T</mml:mi>
                                                    <mml:mi>t</mml:mi>
                                                </mml:msub>
                                                <mml:mo>,</mml:mo>
                                                <mml:msub>
                                                    <mml:mi>P</mml:mi>
                                                    <mml:mrow>
                                                        <mml:mi>t</mml:mi>
                                                        <mml:mo>+</mml:mo>
                                                        <mml:mn>1</mml:mn>
                                                    </mml:mrow>
                                                </mml:msub>
                                            </mml:mrow>
                                        </mml:mtd>
                                    </mml:mtr>
                                </mml:mtable>
                                <mml:mo stretchy="false">)</mml:mo>
                            </mml:mrow>
                            <mml:mspace width="11.5em"/>
                            <mml:mo stretchy="false">(</mml:mo>
                            <mml:mn>15</mml:mn>
                            <mml:mo stretchy="false">)</mml:mo>
                        </mml:math>
                    </disp-formula>
                </p>
                <p>The LSTM network recurrently uses the historical information of loads, temperatures and prices to predict electric load for an individual TCL 
                    <italic toggle="yes">n</italic>, in the next timestep. The aggregation of these predictions gives an approximation of 
                    <italic toggle="yes">g</italic> function mentioned in the previous section.</p>
                <p>Initially, for each TCL agent 
                    <italic toggle="yes">n</italic> &#x2208; 
                    <italic toggle="yes">N</italic> we train an LSTM network based on the historical reactions of these TCLs to prices and temperatures. We assume that a DR program is implemented during a long period, enough to collect a sufficient amount of data related to the reactions of TCL agents to prices and temperatures.</p>
            </sec>
            <sec>
                <title>Genetic algorithms for price optimization</title>
                <p>Due to the discontinuous nature of the objective function and the complicated dependency between the function electric load 
                    <italic toggle="yes">L</italic> and the electricity prices 
                    <italic toggle="yes">P</italic>, the conventional nonlinear optimization methods are not usable for this problem. Therefore, GA-based optimization algorithms are more suited for this problem
                    <sup>
                        <xref ref-type="bibr" rid="ref-37">37</xref>
                    </sup>. The proposed GA algorithm uses rank selection and value encoding
                    <sup>
                        <xref ref-type="bibr" rid="ref-38">38</xref>
                    </sup>. Each chromosome represents a pricing policy 
                    <italic toggle="yes">P</italic> and consists of a vector of size 
                    <italic toggle="yes">H</italic>. We use uniform crossover
                    <sup>
                        <xref ref-type="bibr" rid="ref-39">39</xref>
                    </sup> and non-uniform mutation
                    <sup>
                        <xref ref-type="bibr" rid="ref-40">40</xref>
                    </sup>. The constraints are handled by the approach proposed in 
                    <xref ref-type="bibr" rid="ref-41">41</xref>.</p>
                <p>The proposed GA-based optimization algorithms for TCL pricing control are given in 
                    <xref ref-type="other" rid="A1">Algorithm 1</xref> and 
                    <xref ref-type="other" rid="A2">Algorithm 2</xref>.</p>
                <boxed-text id="A1" orientation="portrait" position="float">
                    <label/>
                    <caption>
                        <title>Algorithm 1. GA-based optimization algorithm for TCL pricing control.</title>
                    </caption>
                    <p>1:&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Population Initialization, i.e., generating a population of 
                        <italic toggle="yes">PN</italic> chromosomes randomly; each chromosome denotes a pricing policy for the next time horizon H.</p>
                    <p>2:&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;
                        <bold>for</bold> i=1 to 
                        <italic toggle="yes">PN</italic> 
                        <bold>do</bold>
                    </p>
                    <p>3:&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Concatenate the price vector to the temperature forecasts of the next time horizon.</p>
                    <p>4:&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;
                        <bold>for</bold> each TCL agent n in N 
                        <bold>do</bold>:</p>
                    <p>5:&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Use LSTM network iteratively to predict (
                        <italic toggle="yes">L
                            <sub>n,t</sub>
                        </italic>)
                        <italic toggle="yes">
                            <sub>t&#x2208;H</sub>
                        </italic> using 
                        <xref ref-type="other" rid="A2">Algorithm 2</xref>.</p>
                    <p>6:&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;
                        <bold>end for</bold>
                    </p>
                    <p>7:&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Calculate 
                        <italic toggle="yes">L
                            <sub>t</sub>
                        </italic>, 
                        <italic toggle="yes">C
                            <sub>t</sub>
                        </italic>(
                        <italic toggle="yes">L
                            <sub>t</sub>
                        </italic>) &#x2200;
                        <italic toggle="yes">t</italic> &#x2208; 
                        <italic toggle="yes">H</italic>, and 
                        <italic toggle="yes">R</italic>
                    </p>
                    <p>8:&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Check the feasibility of policy 
                        <italic toggle="yes">P</italic> regarding the constraints. Handle the invalid individuals by the approach proposed in []. Then calculate the fitness value of policy 
                        <italic toggle="yes">P.</italic>
                    </p>
                    <p>9:&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;
                        <bold>end for</bold>
                    </p>
                    <p>10:&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Create a new generation of chromosomes by using the selection, crossover, and mutation operations of the GA.</p>
                    <p>11:&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Repeat steps 2&#x2013;11 until the stopping condition is reached.</p>
                    <p>12:&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Announce the best price vector via the two-way communication infrastructure at the beginning of the control horizon.</p>
                </boxed-text>
                <boxed-text id="A2" orientation="portrait" position="float">
                    <label/>
                    <caption>
                        <title>Algorithm 2. Individual TCL load prediction using LSTM network.</title>
                    </caption>
                    <p>1:&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Build the initial input matrix 
                        <italic toggle="yes">I
                            <sub>n</sub>
                        </italic>
                        <sub>,0</sub> using the initial values of prices, loads and temperatures.</p>
                    <p>2:&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;
                        <bold>for</bold> t=0 to H 
                        <bold>do</bold>
                    </p>
                    <p>3:&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Use the input matrix 
                        <italic toggle="yes">I
                            <sub>n,t</sub>
                        </italic> to predict 
                        <italic toggle="yes">L
                            <sub>n,t</sub>
                        </italic>
                        <sub>+1</sub>
                    </p>
                    <p>4:&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;Concatenate L, T and P with the last line of the input matrix 
                        <italic toggle="yes">I
                            <sub>n,t</sub>
                        </italic> to build the next input matrix:</p>
                    <p>
                        <disp-formula id="e16">
                            <mml:math display="block" id="math">
                                <mml:mrow>
                                    <mml:msub>
                                        <mml:mi>I</mml:mi>
                                        <mml:mrow>
                                            <mml:mi>n</mml:mi>
                                            <mml:mo>,</mml:mo>
                                            <mml:mi>t</mml:mi>
                                            <mml:mo>+</mml:mo>
                                            <mml:mn>1</mml:mn>
                                        </mml:mrow>
                                    </mml:msub>
                                    <mml:mo>=</mml:mo>
                                    <mml:mo stretchy="false">(</mml:mo>
                                    <mml:mtable>
                                        <mml:mtr>
                                            <mml:mtd>
                                                <mml:mrow>
                                                    <mml:msub>
                                                        <mml:mi>L</mml:mi>
                                                        <mml:mrow>
                                                            <mml:mi>n</mml:mi>
                                                            <mml:mo>,</mml:mo>
                                                            <mml:mi>t</mml:mi>
                                                        </mml:mrow>
                                                    </mml:msub>
                                                    <mml:mo>,</mml:mo>
                                                    <mml:msub>
                                                        <mml:mi>T</mml:mi>
                                                        <mml:mi>t</mml:mi>
                                                    </mml:msub>
                                                    <mml:mo>,</mml:mo>
                                                    <mml:msub>
                                                        <mml:mi>P</mml:mi>
                                                        <mml:mrow>
                                                            <mml:mi>t</mml:mi>
                                                            <mml:mo>+</mml:mo>
                                                            <mml:mn>1</mml:mn>
                                                        </mml:mrow>
                                                    </mml:msub>
                                                </mml:mrow>
                                            </mml:mtd>
                                        </mml:mtr>
                                        <mml:mtr>
                                            <mml:mtd>
                                                <mml:mrow>
                                                    <mml:msub>
                                                        <mml:mi>L</mml:mi>
                                                        <mml:mrow>
                                                            <mml:mi>n</mml:mi>
                                                            <mml:mo>,</mml:mo>
                                                            <mml:mi>t</mml:mi>
                                                            <mml:mo>+</mml:mo>
                                                            <mml:mn>1</mml:mn>
                                                        </mml:mrow>
                                                    </mml:msub>
                                                    <mml:mo>,</mml:mo>
                                                    <mml:msub>
                                                        <mml:mi>T</mml:mi>
                                                        <mml:mrow>
                                                            <mml:mi>t</mml:mi>
                                                            <mml:mo>+</mml:mo>
                                                            <mml:mn>1</mml:mn>
                                                        </mml:mrow>
                                                    </mml:msub>
                                                    <mml:mo>,</mml:mo>
                                                    <mml:msub>
                                                        <mml:mi>P</mml:mi>
                                                        <mml:mrow>
                                                            <mml:mi>t</mml:mi>
                                                            <mml:mo>+</mml:mo>
                                                            <mml:mn>2</mml:mn>
                                                        </mml:mrow>
                                                    </mml:msub>
                                                </mml:mrow>
                                            </mml:mtd>
                                        </mml:mtr>
                                    </mml:mtable>
                                    <mml:mo stretchy="false">)</mml:mo>
                                </mml:mrow>
                            </mml:math>
                        </disp-formula>
                    </p>
                    <p>5:&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;
                        <bold>end for</bold>
                    </p>
                    <p>6:&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;
                        <bold>return</bold> (
                        <italic toggle="yes">L
                            <sub>n,t</sub>
                        </italic>)
                        <italic toggle="yes">
                            <sub>t&#x2208;H</sub>
                        </italic>
                    </p>
                </boxed-text>
                <p>In 
                    <xref ref-type="other" rid="A1">Algorithm 1</xref>, we initialize a population of NP pricing policies at step 1. For each policy 
                    <italic toggle="yes">P</italic> we perform steps 2&#x2013;6 to evaluate the fitness function and the feasibility for each policy. The evaluation of policies is performed using LSTM sequence prediction presented in 
                    <xref ref-type="other" rid="A2">Algorithm 2</xref>. The best policies are selected, and a new generation is created using crossover and mutation operations in step 10. This process is repeated until a stopping condition or maximum number of iterations is reached. At the end of the optimization process, the best pricing policy is selected, and prices are announced to TCL agents via two-way communications technology. After each control episode, the LSTM learning models are updated according to the new data collected from the actual response to the implemented electricity prices.</p>
            </sec>
        </sec>
        <sec sec-type="results">
            <title>Results</title>
            <p>In this section we evaluate the functionality of the proposed pricing control methods. A set of numerical experiments were performed on a simulation scenario comprising a population of 30 TCLs exposed to dynamic electricity prices during a period where the outdoor temperatures change significantly. The thermal inertia of each TCL allows the electric demand to be shifted towards lower price moments. The TCL agents determine the amount of electricity to be consumed at each timestep according to the indoor temperature and the electricity prices. The objective of TCL agents is to maintain a reasonable comfort level while minimizing the electricity bill. Therefore, the different TCL agents have different reactions given a set of prices and temperatures depending on individual user&#x2019;s preferences and buildings&#x2019; characteristics. We define a control timestep of 1 hour and a control horizon of 6 hours. The choice of the control horizon is justified by the limited ability of LSTM to predict large sequences of the future electric loads. The control horizon is chosen in a way that minimizes the number of times the retailer runs the control algorithms and announces the prices, while keeping a good accuracy of the LSTM predictions.</p>
            <sec>
                <title>Simulation data</title>
                <p>Following 
                    <xref ref-type="bibr" rid="ref-13">13</xref> the simulation data is generated using two fuzzy logic systems with the following assumptions:</p>
                <list list-type="bullet">
                    <list-item>
                        <p>The TCL agents are reacting to indoor temperatures and electricity prices.</p>
                    </list-item>
                    <list-item>
                        <p>The difference between the outdoor and indoor temperature &#x2206;
                            <italic toggle="yes">T</italic> depends on the building characteristics and the amount of energy spent in heating/cooling in previous timesteps.</p>
                    </list-item>
                </list>
                <p>TCL agents are operating during the day to maintain a comfortable temperature of the space while taking into consideration the electricity price in a given hour. Fuzzy logic is used in this problem because it can model non-qualitative concepts like &#x201c;hot temperature&#x201d; or &#x201c;low price&#x201d;. The combination of the two fuzzy logic systems delivers the load 
                    <italic toggle="yes">L
                        <sub>n,t</sub>
                    </italic>
                    <sub>+1</sub> using the outdoor temperature 
                    <italic toggle="yes">T
                        <sub>t</sub>
                    </italic> and the electricity price 
                    <italic toggle="yes">P
                        <sub>t</sub>
                    </italic>
                    <sub>+1</sub>. The simulation is performed with different parameters to generate diverse data for 30 TCL agents. The temperature and price data used for the simulation are taken respectively from the 
                    <italic toggle="yes">Kaisaniemi</italic> observation station in Helsinki, available online in 
                    <xref ref-type="bibr" rid="ref-42">42</xref>, and 
                    <italic toggle="yes">Elspot DA</italic> electricity prices in Finland
                    <sup>
                        <xref ref-type="bibr" rid="ref-43">43</xref>
                    </sup> for the period between 1
                    <sup>st</sup> January 2017 and the 7th September 2018. The generated dataset consists of 14,734 data points for each TCL agent.</p>
            </sec>
            <sec>
                <title>LSTM networks results</title>
                <p>The data generated from the above-mentioned simulations is used to train the LSTM networks to learn the behavior of each individual TCL agent. The hyperparameters and structure of the LSTM networks are chosen according to the results of 
                    <xref ref-type="bibr" rid="ref-13">13</xref> and summarized in 
                    <xref ref-type="table" rid="T1">Table 1</xref>.</p>
                <table-wrap id="T1" orientation="portrait" position="anchor">
                    <label>Table 1. </label>
                    <caption>
                        <title> Results of LSTM model hyperparameters optimization.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">Sequence length</td>
                                <td align="left" colspan="1" rowspan="1">2</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">LSTM cell size</td>
                                <td align="left" colspan="1" rowspan="1">30</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">LSTM cells</td>
                                <td align="left" colspan="1" rowspan="1">2</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">Dropout</td>
                                <td align="left" colspan="1" rowspan="1">0.2</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">Activation</td>
                                <td align="left" colspan="1" rowspan="1">&#x2018;tanh&#x2019;</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">Recurrent activation</td>
                                <td align="left" colspan="1" rowspan="1">&#x2018;selu&#x2019;</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">Optimizer</td>
                                <td align="left" colspan="1" rowspan="1">&#x2018;rmsprop&#x2019;</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>The results are evaluated using validation data generated from the same simulations. 
                    <xref ref-type="fig" rid="f2">Figure 2a</xref> illustrates the learning results for three TCL agents during different time periods with different temperatures and prices. 
                    <xref ref-type="fig" rid="f2">Figure 2b</xref> illustrates the comparison between the real and predicted average power consumption of the 30 TCL agents cluster. The power curves show that the TCL agents&#x2019; responses to prices and temperatures are slightly different. In general, the power consumption is high when the temperatures and electricity prices are low and vice-versa. The comparison between the true load curves and the predicted load curves show a very small prediction error per hour in most cases. The true and predicted load curves have similar shapes and significant resemblances. The peaks and valleys are also predicted accurately in most of the cases, which gives a valuable insight for demand side management.</p>
                <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                    <label>Figure 2. </label>
                    <caption>
                        <title>LSTM Learning results.</title>
                        <p>(
                            <bold>a</bold>) Power consumption of different TCL agents in response to electricity prices and outdoor temperatures. (
                            <bold>b</bold>) Average real and predicted power consumption of the cluster surrounded by an envelope containing 9% of the power consumption profiles for different days.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/22445/60ec456e-0949-4e28-9b53-bef3cc7b5199_figure2.gif"/>
                </fig>
            </sec>
            <sec>
                <title>GA Optimization results</title>
                <p> We run the GA optimization algorithm on a population of size 100 for 100 iterations. The parameters used for the optimization are summarized in 
                    <xref ref-type="table" rid="T2">Table 2</xref>. The optimization process is graphically presented in 
                    <xref ref-type="fig" rid="f3">Figure 3</xref>. The learning process is measured by the fitness of the best individual in the population at each iteration. 
                    <xref ref-type="fig" rid="f4">Figure 4</xref> illustrates the results of the best pricing solutions for one day. 
                    <xref ref-type="fig" rid="f4">Figure 4a</xref> is an illustration of the electricity prices fluctuations during the 24 hours. 
                    <xref ref-type="fig" rid="f4">Figure 4b</xref> shows a comparison between the power consumption of the whole cluster under original prices and the power consumption under optimized prices. 
                    <xref ref-type="fig" rid="f4">Figure 4c</xref> presents the revenue and profit that the retailer would make under original and optimized prices. 
                    <xref ref-type="fig" rid="f4">Figure 4d</xref> presents daily bill of each user of the cluster under original and optimized prices.</p>
                <table-wrap id="T2" orientation="portrait" position="anchor">
                    <label>Table 2. </label>
                    <caption>
                        <title>Optimization parameters.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">PN</td>
                                <td align="left" colspan="1" rowspan="1">100</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">
                                    <italic toggle="yes">L
                                        <sub>max</sub>
                                    </italic>
                                </td>
                                <td align="left" colspan="1" rowspan="1">75.0 kWh</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">q</td>
                                <td align="left" colspan="1" rowspan="1">0.01 &#x20ac;
                                    <italic toggle="yes">cents</italic>/[
                                    <italic toggle="yes">kWh</italic>]
                                    <sup>2</sup>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">c</td>
                                <td align="left" colspan="1" rowspan="1">1.0 &#x20ac;cents</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">
                                    <italic toggle="yes">P
                                        <sub>t,min</sub>
                                    </italic>
                                </td>
                                <td align="left" colspan="1" rowspan="1">
                                    <italic toggle="yes">p
                                        <sub>t</sub>
                                    </italic>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">
                                    <italic toggle="yes">P
                                        <sub>t,max</sub>
                                    </italic>
                                </td>
                                <td align="left" colspan="1" rowspan="1">2*
                                    <italic toggle="yes">p
                                        <sub>t</sub>
                                    </italic>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">
                                    <italic toggle="yes">R
                                        <sub>max</sub>
                                    </italic>
                                </td>
                                <td align="left" colspan="1" rowspan="1">N*H*5.5 &#x20ac;cents</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                    <label>Figure 3. </label>
                    <caption>
                        <title>Learning process of a population of size 100.</title>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/22445/60ec456e-0949-4e28-9b53-bef3cc7b5199_figure3.gif"/>
                </fig>
                <fig fig-type="figure" id="f4" orientation="portrait" position="float">
                    <label>Figure 4. </label>
                    <caption>
                        <title>Results&#x2019; comparison of original and optimized pricing policy.</title>
                        <p>(
                            <bold>a</bold>) Optimized prices solution for 24 hours. (
                            <bold>b</bold>) Revenue and profit under original and optimized prices for 24 hours. (
                            <bold>c</bold>) Total electricity consumption under original and optimized prices. (
                            <bold>d</bold>) Daily electricity bills under original and optimized prices.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/22445/60ec456e-0949-4e28-9b53-bef3cc7b5199_figure4.gif"/>
                </fig>
                <p>The results show a general increase in prices throughout the day. However, this increase didn&#x2019;t result in an increase in the daily electricity bills. Most of customers will be paying a slightly lower amount per day. This is a consequence of upper limit constraint on the revenue described in (
                    <xref ref-type="other" rid="e12">12</xref>). The overall consumption of electricity was decreased comparing to the original pricing scheme which gives a good idea about the potential energy saving that an optimal pricing strategy can offer.</p>
            </sec>
            <sec>
                <title>Comparison with a theoretical benchmark</title>
                <p>In order to validate the performance of the proposed algorithm, we consider a case where we have a full access to TCL units&#x2019; behavior, i.e. the exact electricity consumption of each TCL unit given temperatures and prices at each timestep. The optimization is performed with direct access to the simulation model described above, which provides full observability and perfect information about the TCLs. This theoretical setup can serve as a benchmark of our method. It can be seen as an upper limit on the profit possibly made by the aggregator without violating the constraints.</p>
                <p>The results illustrated in 
                    <xref ref-type="fig" rid="f5">Figure 5a&#x2013;d</xref>, show that the proposed methods have performed very similarly to the benchmark. The hourly prices in 
                    <xref ref-type="fig" rid="f5">Figure 5a</xref>, are only slightly shifted from the benchmark prices during most of the day. The difference is only significant in 2 to 3 points. The same observation can be made for the revenues and profits in 
                    <xref ref-type="fig" rid="f5">Figure 5b</xref> and electricity consumption in 
                    <xref ref-type="fig" rid="f5">Figure 5c</xref>. The comparison of daily bills under optimized prices and benchmark prices in 
                    <xref ref-type="fig" rid="f5">Figure 5d</xref> shows a slight rise in the electricity bill in the benchmark model for most customers. This can be explained by the slight increase in prices illustrated in 
                    <xref ref-type="fig" rid="f5">Figure 5a</xref>.</p>
                <fig fig-type="figure" id="f5" orientation="portrait" position="float">
                    <label>Figure 5. </label>
                    <caption>
                        <title>Results&#x2019; comparison of optimized and benchmark pricing policy.</title>
                        <p> (
                            <bold>a</bold>) Comparison between benchmark and optimized prices. (
                            <bold>b</bold>) Hourly revenues and profits under optimized prices and benchmark prices. (
                            <bold>c</bold>) Hourly total electricity consumption under optimized prices and benchmark prices. (
                            <bold>d</bold>) Daily electricity bills under optimized and benchmark prices.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/22445/60ec456e-0949-4e28-9b53-bef3cc7b5199_figure5.gif"/>
                </fig>
                <p>The daily revenues and profits under original, optimized and benchmark prices are compared in 
                    <xref ref-type="fig" rid="f6">Figure 6</xref>. The comparison shows a closely similar revenue in the three cases. The optimized prices have given a slightly smaller revenue compared to the revenue from original and benchmark prices. However, the profit from original prices is considerably smaller than the profit from optimized prices. The latter is only slightly smaller than the benchmark&#x2019;s profit. Numerically, the profit from the proposed methods is 95.97% of the optimal benchmark profit. This observation shows that an increase in the profit can be made without an increase in the revenue when the prices are optimized correctly.</p>
                <fig fig-type="figure" id="f6" orientation="portrait" position="float">
                    <label>Figure 6. </label>
                    <caption>
                        <title>Daily revenues and profits under original, optimized and benchmark prices.</title>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/22445/60ec456e-0949-4e28-9b53-bef3cc7b5199_figure6.gif"/>
                </fig>
            </sec>
        </sec>
        <sec sec-type="discussion | conclusions">
            <title>Discussion and conclusion</title>
            <p>In this paper, we demonstrated the effectiveness of a new TCL control using electricity price proxies. The control policy consists of a sequence of prices influencing the electricity consumption from TCLs. The problem was formulated as a Markov decision process with non-Markovian state to handle the sparse observations of the TCL cluster&#x2019;s state. We extend the observable state with sequences of past observations to approximate the transition function using an LSTM architecture. The LSTM network is used to capture the individual behavior of TCLs under price-based DR. The individual models are aggregated to approximate the next state of the cluster. This approximation is used iteratively in a genetic algorithm to evaluate the potential profit from an energy arbitrage operation and find the optimal pricing policy for a given control horizon. The LSTM models are updated every 24 hours to capture the changes in the TCL units&#x2019; behavior.</p>
            <p>The experiment consists of a retailer agent buying electricity from the wholesale market and selling it to a group of residential TCLs. The agent can only measure the electricity consumption of each TCL and the outside temperature. The agent has access to a significant amount of historical data from an already implemented DR program. Which allows it to train the LSTM models for each TCL unit and perform an optimization on the electricity prices.</p>
            <p>We first evaluate the performance of the LSTM network by comparing the real and predicted loads from 30 TCL units during different days. The predicted load profiles are closely similar to the real load profiles both at individual and aggregate level. The optimization relies on a genetic algorithm with a profit maximization objective. The results of the optimization show that the proposed methods offer a much higher daily profit than the original prices and 95.97% of the optimal profit from a model that has full observation of the state.</p>
            <p>The flexibility offered by TCLs is a high potential for ancillary services required for a deep integration of renewable energy sources in the grid. An energy arbitrage operation can offer a service to the grid by exploiting this flexibility using direct or indirect control. The partially observable state and the uncertainty of the TCL response to prices was tackled in this paper with an LSTM network using past observations and actions. The LSTM network offered a high performance by extracting relevant features of the hidden state using its internal memory cell, allowing it to process sequences of sparse observations to learn the hidden patterns of power consumption.</p>
        </sec>
        <sec>
            <title>Data availability</title>
            <sec>
                <title>Underlying data</title>
                <p>Figshare: LSTM+GA data, 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.6084/m9.figshare.9746786.v1">https://doi.org/10.6084/m9.figshare.9746786.v1</ext-link>
                    <sup>
                        <xref ref-type="bibr" rid="ref-44">44</xref>
                    </sup>.</p>
                <p>This project contains the following underlying data:</p>
                <list list-type="bullet">
                    <list-item>
                        <p>Data used by the fuzzy logic simulation model such as temp_prices and temperatures.</p>
                    </list-item>
                    <list-item>
                        <p>Data generated by the fuzzy simulator such as fuzzy_outxx.csv and used to train the LSTM models.</p>
                    </list-item>
                    <list-item>
                        <p>Data related to the optimization process such as results and GA_pricing, optimized_prices_loads</p>
                    </list-item>
                </list>
                <p>Data are available under the terms of the 
                    <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/legalcode">Creative Commons Attribution 4.0 International license</ext-link> (CC-BY 4.0).</p>
            </sec>
        </sec>
        <sec>
            <title>Software availability</title>
            <p>Code for analysis available from: 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/tahanakabi/Optimal-Price-Based-control-of-heterogeneous-thermostatically-controlled-loads-under-uncertainty-usi">https://github.com/tahanakabi/Optimal-Price-Based-control-of-heterogeneous-thermostatically-controlled-loads-under-uncertainty-usi</ext-link>
            </p>
            <p>Archived code as at time of publication: 
                <ext-link ext-link-type="uri" xlink:href="http://doi.org/10.5281/zenodo.3383615">http://doi.org/10.5281/zenodo.3383615</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-45">45</xref>
                </sup>
            </p>
            <p>License: MIT</p>
        </sec>
    </body>
    <back>
        <fn-group>
            <fn id="FN1">
                <label/>
                <p>
                    <sup>1</sup>This work was supported by The Jenny and Antti Wihuri Foundation, FINLAND.</p>
            </fn>
        </fn-group>
        <ref-list>
            <ref id="ref-1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Halamay</surname>
                            <given-names>DA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Brekken</surname>
                            <given-names>TKA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Simmons</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Reserve requirement impacts of large-scale integration of wind, solar, and ocean wave power generation</article-title>.<year>2010</year>.
                    <pub-id pub-id-type="doi">10.1109/PES.2010.5590203</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Mathieu</surname>
                            <given-names>JL</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kamgarpour</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lygeros</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Arbitraging Intraday Wholesale Energy Market Prices With Aggregations of Thermostatic Loads.</article-title>
                    <source>

                        <italic toggle="yes">IEEE Transactions on Power Systems.</italic>
</source>
                    <year>2015</year>;<volume>30</volume>(<issue>2</issue>):<fpage>763</fpage>&#x2013;<lpage>772</lpage>.
                    <pub-id pub-id-type="doi">10.1109/TPWRS.2014.2335158</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <collab>D&amp;R International, Ltd.: </collab>
                    <article-title>2011 Buildings Energy Data Book</article-title>.<year>2012</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://ieer.org/wp/wp-content/uploads/2012/03/DOE-2011-Buildings-Energy-DataBook-BEDB.pdf">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <collab>U. E. I. Administration: </collab>
                    <article-title>U.S. Energy Information Administration</article-title>.<year>2010</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://www.eia.gov/totalenergy/data/annual/archive/038410.pdf">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Mathieu</surname>
                            <given-names>JL</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kamgarpour</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lygeros</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Energy arbitrage with thermostatically controlled loads</article-title>.<year>2013</year>. [Accessed 11 6 2019].
                    <pub-id pub-id-type="doi">10.23919/ECC.2013.6669582</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Maasoumy</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Razmara</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Shahbakhti</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Selecting building predictive control based on model uncertainty</article-title>.<year>2014</year>.  [Accessed 11 6 2019].
                    <pub-id pub-id-type="doi">10.1109/ACC.2014.6858875</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Koch</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mathieu</surname>
                            <given-names>JL</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Callaway</surname>
                            <given-names>DS</given-names>
                        </name>
</person-group>:
                    <article-title>Modeling and Control of Aggregated Heterogeneous Thermostatically Controlled Loads for Ancillary Services</article-title>.<year>2011</year>. [Accessed 11 6 2019].
                    <ext-link ext-link-type="uri" xlink:href="https://pdfs.semanticscholar.org/7cc2/03c7d959b2e182ec473ecc49fedb2f3bc39b.pdf">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Saha</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kuzlu</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pipattanasomporn</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Enabling Residential Demand Response Applications with a ZigBee-Based Load Controller System</article-title>.<year>2016</year>;<volume>2</volume>(<issue>4</issue>):<fpage>303</fpage>&#x2013;<lpage>318</lpage>. [Accessed 11 6 2019].
                    <pub-id pub-id-type="doi">10.1007/s40903-016-0059-4</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Verbong</surname>
                            <given-names>GPJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Beemsterboer</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sengers</surname>
                            <given-names>F</given-names>
                        </name>
</person-group>:
                    <article-title>Smart grids or smart users? Involving users in developing a low carbon electricity economy.</article-title>
                    <source>

                        <italic toggle="yes">Energy Policy.</italic>
</source>
                    <year>2013</year>;<volume>52</volume>:<fpage>117</fpage>&#x2013;<lpage>125</lpage>.
                    <pub-id pub-id-type="doi">10.1016/j.enpol.2012.05.003</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Yan</surname>
                            <given-names>X</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ozturk</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hu</surname>
                            <given-names>Z</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A review on price-driven residential demand response.</article-title>
                    <source>

                        <italic toggle="yes">Renew Sust Energ Rev.</italic>
</source>
                    <year>2018</year>;<volume>96</volume>:<fpage>411</fpage>&#x2013;<lpage>419</lpage>.
                    <pub-id pub-id-type="doi">10.1016/j.rser.2018.08.003</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hansen</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Borup</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>Smart grids and households: how are household consumers represented in experimental projects?</article-title>
                    <source>

                        <italic toggle="yes">Tech Anal Strat Manag.</italic>
</source>
                    <year>2018</year>;<volume>30</volume>(<issue>3</issue>):<fpage>255</fpage>&#x2013;<lpage>267</lpage>.
                    <pub-id pub-id-type="doi">10.1080/09537325.2017.1307955</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Littman</surname>
                            <given-names>ML</given-names>
                        </name>
</person-group>:
                    <article-title>Markov Decision Processes</article-title>.<year>2001</year>.<fpage>9240</fpage>&#x2013;<lpage>9242</lpage>. [Accessed 12 6 2019].
                    <pub-id pub-id-type="doi">10.1016/B0-08-043076-7/00614-8</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-13">
                <label>13</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Nakabi</surname>
                            <given-names>TA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Toivanen</surname>
                            <given-names>P</given-names>
                        </name>
</person-group>:
                    <article-title>An ANN-based model for learning individual customer behavior in response to electricity prices.</article-title>
                    <source>

                        <italic toggle="yes">Sustainable Energy, Grids and Networks.</italic>
</source>
                    <year>2019</year>;<volume>18</volume>.
                    <pub-id pub-id-type="doi">10.1016/j.segan.2019.100212</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-14">
                <label>14</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ihara</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Schweppe</surname>
                            <given-names>FC</given-names>
                        </name>
</person-group>:
                    <article-title>Physically based modeling of cold load pickup.</article-title>
                    <source>

                        <italic toggle="yes">IEEE Transactions on Power Apparatus and Systems.</italic>
</source>
                    <year>1981</year>;<volume>100</volume>(<issue>9</issue>):<fpage>4142</fpage>&#x2013;<lpage>4150</lpage>.
                    <pub-id pub-id-type="doi">10.1109/TPAS.1981.316965</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-15">
                <label>15</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Malhame</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Chong</surname>
                            <given-names>CY</given-names>
                        </name>
</person-group>:
                    <article-title>Electric load model synthesis by diffusion approximation of a high-order hybrid state stochastic system.</article-title>
                    <source>

                        <italic toggle="yes">IEEE TRANSACTIONS ON AUTOMATIC CONTROL.</italic>
</source>
                    <year>1985</year>;<volume>30</volume>(<issue>9</issue>):<fpage>854</fpage>&#x2013;<lpage>860</lpage>.
                    <pub-id pub-id-type="doi">10.1109/TAC.1985.1104071</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-16">
                <label>16</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Mathieu</surname>
                            <given-names>JL</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Koch</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Callaway</surname>
                            <given-names>DS</given-names>
                        </name>
</person-group>:
                    <article-title>State estimation and control of electric loads to manage real-time energy imbalance.</article-title>
                    <source>

                        <italic toggle="yes">IEEE Trans Power Syst.</italic>
</source>
                    <year>2013</year>;<volume>28</volume>(<issue>1</issue>):<fpage>430</fpage>&#x2013;<lpage>440</lpage>.
                    <pub-id pub-id-type="doi">10.1109/TPWRS.2012.2204074</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-17">
                <label>17</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hao</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sanandaji</surname>
                            <given-names>BM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Poolla</surname>
                            <given-names>K</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Aggregate Flexibility of Thermostatically Controlled Loads.</article-title>
                    <source>

                        <italic toggle="yes">IEEE Transactions on Power Systems.</italic>
</source>
                    <year>2015</year>;<volume>30</volume>(<issue>1</issue>):<fpage>189</fpage>&#x2013;<lpage>198</lpage>.
                    <pub-id pub-id-type="doi">10.1109/TPWRS.2014.2328865</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-18">
                <label>18</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kamgarpour</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ellen</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Soudjani</surname>
                            <given-names>SEZ</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Modeling options for demand side participation of thermostatically controlled loads</article-title>. In
                    <italic toggle="yes">2013 IREP Symposium Bulk Power System Dynamics and Control-IX Optimization, Security and Control of the Emerging Power Grid,</italic>. Rethymno, Greece,<year>2013</year>.
                    <pub-id pub-id-type="doi">10.1109/IREP.2013.6629396</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-19">
                <label>19</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Meng</surname>
                            <given-names>FL</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zeng</surname>
                            <given-names>XJ</given-names>
                        </name>
</person-group>:
                    <article-title>A Profit Maximization Approach to Demand Response Management with Customers Behavior Learning in Smart Grid</article-title>.
                    <italic toggle="yes">IEEE Trans Smart Grid</italic>.<year>2016</year>;<volume>7</volume>(<issue>3</issue>):<fpage>1516</fpage>&#x2013;<lpage>1529</lpage>.
                    <pub-id pub-id-type="doi">10.1109/TSG.2015.2462083</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-20">
                <label>20</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Jia</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhao</surname>
                            <given-names>Q</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Tong</surname>
                            <given-names>L</given-names>
                        </name>
</person-group>:
                    <article-title>Retail pricing for stochastic demand with unknown parameters: An online machine learning approach</article-title>. In
                    <italic toggle="yes">2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton)</italic>.  Monticello, IL USA,<year>2013</year>.
                    <pub-id pub-id-type="doi">10.1109/Allerton.2013.6736684</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-21">
                <label>21</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Dehghanpour</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Nehrir</surname>
                            <given-names>HM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sheppard</surname>
                            <given-names>JW</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Agent-Based Modeling of Retail Electrical Energy Markets With Demand Response.</article-title>
                    <source>

                        <italic toggle="yes">IEEE Transactions on Smart Grid.</italic>
</source>
                    <year>2018</year>;<volume>9</volume>(<issue>4</issue>):<fpage>3465</fpage>&#x2013;<lpage>3475</lpage>.
                    <pub-id pub-id-type="doi">10.1109/TSG.2016.2631453</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-22">
                <label>22</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Xie</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hui</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ding</surname>
                            <given-names>Y</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Operating reserve capacity evaluation of aggregated heterogeneous TCLs with price signals.</article-title>
                    <source>

                        <italic toggle="yes">Applied Energy.</italic>
</source>
                    <year>2018</year>;<volume>216</volume>:<fpage>338</fpage>&#x2013;<lpage>347</lpage>.
                    <pub-id pub-id-type="doi">10.1016/j.apenergy.2018.02.010</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-23">
                <label>23</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Jay</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Swarup</surname>
                            <given-names>K</given-names>
                        </name>
</person-group>:
                    <article-title>Price Based Demand Response of Aggregated Thermostatically Controlled Loads For Load Frequency Control</article-title>. In
                    <italic toggle="yes">17TH NATIONAL POWER SYSTEMS CONFERENCE</italic>.<year>2012</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://www.iitk.ac.in/npsc/Papers/NPSC2012/papers/12136.pdf">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-24">
                <label>24</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zou</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>X</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A Stackelberg Game Approach for Price Response Coordination of Thermostatically Controlled Loads.</article-title>
                    <source>

                        <italic toggle="yes">Applied Sciences.</italic>
</source>
                    <year>2018</year>;<volume>8</volume>(<issue>8</issue>):<fpage>1370</fpage>.
                    <pub-id pub-id-type="doi">10.3390/app8081370</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-25">
                <label>25</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>De Paola</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Trovato</surname>
                            <given-names>V</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Angeli</surname>
                            <given-names>D</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A Mean Field Game Approach for Distributed Control of Thermostatic Loads Acting in Simultaneous Energy-Frequency Response Markets.</article-title>
                    <source>

                        <italic toggle="yes">IEEE Transactions on Smart Grid.</italic>
</source>Early Access<year>2019</year>;<fpage>1</fpage>&#x2013;<lpage>1</lpage>.
                    <pub-id pub-id-type="doi">10.1109/TSG.2019.2895247</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-26">
                <label>26</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Grammatico</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gentile</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Parise</surname>
                            <given-names>F</given-names>
                        </name>
</person-group>:
                    <article-title>A Mean Field control approach for demand side management of large populations of Thermostatically Controlled Loads</article-title>. In
                    <italic toggle="yes">2015 European Control Conference (ECC)</italic>. Linz, Austria,<year>2015</year>.
                    <pub-id pub-id-type="doi">10.1109/ECC.2015.7331083</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-27">
                <label>27</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Nakabi</surname>
                            <given-names>TA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Haataja</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Toivanen</surname>
                            <given-names>P</given-names>
                        </name>
</person-group>:
                    <article-title>Computational Intelligence for Demand Side Management and Demand Response Programs in Smart Grids</article-title>. In
                    <italic toggle="yes">8th International conference on bioinspired optimization methods and their applications</italic>. Paris,<year>2018</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://www.researchgate.net/publication/331382569_Computational_Intelligence_for_Demand_Side_Management_and_Demand_Response_Programs_in_Smart_Grids">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-28">
                <label>28</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Can Kara</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Berges</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Krogh</surname>
                            <given-names>B</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Using smart devices for system-level management and control in the smart grid: A reinforcement learning framework</article-title>. In
                    <italic toggle="yes">2012 IEEE Third International Conference on Smart Grid Communications (SmartGridComm)</italic>. Tainan, Taiwan,<year>2012</year>.
                    <pub-id pub-id-type="doi">10.1109/SmartGridComm.2012.6485964</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-29">
                <label>29</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>De Somer</surname>
                            <given-names>O</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Soares</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kuijpers</surname>
                            <given-names>T</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Using Reinforcement Learning for Demand Response of Domestic Hot Water Buffers: a Real-Life Demonstration</article-title>. Cornell University,<year>2017</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://arxiv.org/pdf/1703.05486.pdf">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-30">
                <label>30</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ruelens</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Claessens</surname>
                            <given-names>BJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Quaiyum</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Reinforcement Learning Applied to an Electric Water Heater: From Theory to Practice.</article-title>
                    <source>

                        <italic toggle="yes">IEEE TRANSACTIONS ON SMART GRID.</italic>
</source>
                    <year>2018</year>;<volume>9</volume>(<issue>4</issue>):<fpage>3792</fpage>&#x2013;<lpage>3800</lpage>.
                    <pub-id pub-id-type="doi">10.1109/TSG.2016.2640184</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-31">
                <label>31</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Claessens</surname>
                            <given-names>BJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Vrancx</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ruelens</surname>
                            <given-names>F</given-names>
                        </name>
</person-group>:
                    <article-title>Convolutional Neural Networks for Automatic State-Time Feature Extraction in Reinforcement Learning Applied to Residential Load Control.</article-title>
                    <source>

                        <italic toggle="yes">IEEE Transactions on Smart Grid.</italic>
</source>
                    <year>2018</year>;<volume>9</volume>(<issue>4</issue>):<fpage>3259</fpage>&#x2013;<lpage>3269</lpage>.
                    <pub-id pub-id-type="doi">10.1109/TSG.2016.2629450</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-32">
                <label>32</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ruelens</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Claessens</surname>
                            <given-names>BJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Vandael</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Residential Demand Response of Thermostatically Controlled Loads Using Batch Reinforcement Learning.</article-title>
                    <source>

                        <italic toggle="yes">IEEE Transactions on Smart Grid.</italic>
</source>
                    <year>2017</year>;<volume>8</volume>(<issue>5</issue>):<fpage>2149</fpage>&#x2013;<lpage>2159</lpage>.
                    <pub-id pub-id-type="doi">10.1109/TSG.2016.2517211</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-33">
                <label>33</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ruelens</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Claessens</surname>
                            <given-names>BJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Vrancx</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Direct Load Control of Thermostatically Controlled Loads Based on Sparse Observations Using Deep Reinforcement Learning</article-title>. Cornell University.<year>2017</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://arxiv.org/abs/1707.08553v1">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-34">
                <label>34</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Patyn</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ruelens</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Deconinck</surname>
                            <given-names>G</given-names>
                        </name>
</person-group>:
                    <article-title>Comparing neural architectures for demand response through model-free reinforcement learning for heat pump control.</article-title>In:
                    <italic toggle="yes">2018 IEEE International Energy Conference (ENERGYCON)</italic>.  Limassol, Cyprus,<year>2018</year>.
                    <pub-id pub-id-type="doi">10.1109/ENERGYCON.2018.8398836</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-35">
                <label>35</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Mocanu</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Constantin Mocanu</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Nguyen</surname>
                            <given-names>PH</given-names>
                        </name>
</person-group>:
                    <article-title>On-line Building Energy Optimization using Deep Reinforcement Learning.</article-title>
                    <source>

                        <italic toggle="yes">IEEE Transactions on Smart Grid.</italic>
</source>
                    <year>2018</year>;<volume>10</volume>(<issue>4</issue>):<fpage>3698</fpage>&#x2013;<lpage>3708</lpage>.
                    <pub-id pub-id-type="doi">10.1109/TSG.2018.2834219</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-36">
                <label>36</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Mohsenian-Rad</surname>
                            <given-names>AH</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wong</surname>
                            <given-names>VW</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Jatskevich</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Autonomous Demand-Side Management Based on Game-Theoretic Energy Consumption Scheduling for the Future Smart Grid.</article-title>
                    <source>

                        <italic toggle="yes">IEEE Transactions on Smart Grid.</italic>
</source>
                    <year>2010</year>;<volume>1</volume>(<issue>3</issue>):<fpage>320</fpage>&#x2013;<lpage>331</lpage>.
                    <pub-id pub-id-type="doi">10.1109/TSG.2010.2089069</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-37">
                <label>37</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Holland</surname>
                            <given-names>JH</given-names>
                        </name>
</person-group>:
                    <article-title>Genetic Algorithms.</article-title>
                    <source>

                        <italic toggle="yes">Scientific American.</italic>
</source>
                    <year>1992</year>;<volume>267</volume>(<issue>1</issue>):<fpage>66</fpage>&#x2013;<lpage>72</lpage>.
                    <pub-id pub-id-type="doi">10.1038/scientificamerican0792-66</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-38">
                <label>38</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Blickle</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Thiele</surname>
                            <given-names>L</given-names>
                        </name>
</person-group>:
                    <article-title>A comparison of selection schemes used in evolutionary algorithms.</article-title>
                    <source>

                        <italic toggle="yes">Evol Comput.</italic>
</source>
                    <year>1996</year>;<volume>4</volume>(<issue>4</issue>):<fpage>361</fpage>&#x2013;<lpage>394</lpage>.
                    <pub-id pub-id-type="doi">10.1162/evco.1996.4.4.361</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-39">
                <label>39</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Syswerda</surname>
                            <given-names>G</given-names>
                        </name>
</person-group>:
                    <article-title>Uniform crossover in genetic algorithms.</article-title>
                    <source>

                        <italic toggle="yes">Proceedings of the 3rd International Conference on Genetic Algorithms.</italic>
</source>San Francisco, CA USA,<year>1989</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://dl.acm.org/citation.cfm?id=657265">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-40">
                <label>40</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Neubauer</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>Adaptive non-uniform mutation for genetic algorithms</article-title>. In:
                    <italic toggle="yes">Computational Intelligence Theory and Applications</italic>. Berlin, Heidelberg,<year>1997</year>;<fpage>24</fpage>&#x2013;<lpage>34</lpage>.
                    <pub-id pub-id-type="doi">10.1007/3-540-62868-1_94</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-41">
                <label>41</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Deb</surname>
                            <given-names>K</given-names>
                        </name>
</person-group>:
                    <article-title>An efficient constraint handling method for genetic algorithms.</article-title>
                    <source>

                        <italic toggle="yes">Comput Methods Appl Mech Eng.</italic>
</source>
                    <year>2000</year>;<volume>186</volume>(<issue>2&#x2013;4</issue>):<fpage>311</fpage>&#x2013;<lpage>338</lpage>.
                    <pub-id pub-id-type="doi">10.1016/S0045-7825(99)00389-8</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-42">
                <label>42</label>
                <mixed-citation publication-type="journal">
                    <article-title>Weather observations, Kaisaniemi observation station Helsinki</article-title>. Finnish meteorological institute. [Accessed 8 September 2019].
                    <ext-link ext-link-type="uri" xlink:href="https://en.ilmatieteenlaitos.fi/download-observations">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-43">
                <label>43</label>
                <mixed-citation publication-type="journal">
                    <article-title>Nord Pool, Elspot Day-ahead, Prices</article-title>. [Accessed 8 September 2018].
                    <ext-link ext-link-type="uri" xlink:href="https://www.nordpoolgroup.com/Market-data1/Dayahead/Area-Prices/ALL1/Hourly/?view=table">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-44">
                <label>44</label>
                <mixed-citation publication-type="data">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Nakabi</surname>
                            <given-names>T</given-names>
                        </name>
</person-group>:
                    <article-title>LSTM+GA data</article-title>. figshare. Dataset.<year>2019</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.6084/m9.figshare.9746786.v1">http://dx.doi.org/10.6084/m9.figshare.9746786.v1</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-45">
                <label>45</label>
                <mixed-citation publication-type="data">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Nakabi</surname>
                            <given-names>TA</given-names>
                        </name>
</person-group>:
                    <article-title>tahanakabi/Deep-Reinforcenment-learning-for-TCL-control: First release (Version V1.0.0)</article-title>.
                    <source>

                        <italic toggle="yes">Zenodo.</italic>
</source>
                    <year>2019</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5281/zenodo.3383615">http://dx.doi.org/10.5281/zenodo.3383615</ext-link>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report68826">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.22445.r68826</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Swarup</surname>
                        <given-names>Shanti</given-names>
                    </name>
                    <xref ref-type="aff" rid="r68826a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-4883-7649</uri>
                </contrib>
                <aff id="r68826a1">
                    <label>1</label>Department of Electrical Engineering, Indian Institute of Technology Madras, Chennai, Tamil Nadu, India</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>1</day>
                <month>9</month>
                <year>2020</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2020 Swarup S</copyright-statement>
                <copyright-year>2020</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport68826" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.20421.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>reject</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>
                <list list-type="order">
                    <list-item>
                        <p>The idea behind TCL is very good and how it effects the price.</p>
                    </list-item>
                    <list-item>
                        <p>However, the paper is not properly presented.</p>
                    </list-item>
                    <list-item>
                        <p>What is LSTM? long-short-term memory is contradicting what is long -short. Either it has to be long term memory (LTM) or short-term Memory (STM).</p>
                    </list-item>
                    <list-item>
                        <p>Discussions and conclusions should be separate. Difficult to identify the conclusions. It should be results and discussions.</p>
                    </list-item>
                    <list-item>
                        <p>There are three different tools employed as shown below. Why is there a need to use all these tools (DL, LTSM, GA)</p>
                        <p> Deep learning is used for control of TCL loads</p>
                        <p> LSTM networks for state estimation</p>
                        <p> Genetic algorithms for price optimization</p>
                        <p> Only one tool can be used.</p>
                        <p> In fact; prediction of load for TCL is missing.</p>
                    </list-item>
                    <list-item>
                        <p>Why is a need for price optimization?</p>
                    </list-item>
                    <list-item>
                        <p>The price (LMP) is dependent on the intersection between the generation and demand. This price keeps on varying.</p>
                    </list-item>
                    <list-item>
                        <p>The social benefit or social welfare&#x00a0;&#x00a0;needs to be optimized and not the price. Eqn 6 is not correct.</p>
                    </list-item>
                    <list-item>
                        <p>Figs 2a and 2b do not provide sufficient information to infer the contribution. The need for so many plots is questionable.&#x00a0;</p>
                    </list-item>
                    <list-item>
                        <p>Only important results should be provided.</p>
                    </list-item>
                    <list-item>
                        <p>In-spite of the good idea and motivation, the approach used seems to be not proper.</p>
                    </list-item>
                </list>
            </p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Partly</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>No</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Partly</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>No</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>Demand Response and Management</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report68823">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.22445.r68823</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Huang</surname>
                        <given-names>Chiou-Jye</given-names>
                    </name>
                    <xref ref-type="aff" rid="r68823a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-6262-9275</uri>
                </contrib>
                <aff id="r68823a1">
                    <label>1</label>Department of Electrical Engineering, Jiangxi University of Science and Technology, Ganzhou, Jiangxi, China</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>1</day>
                <month>9</month>
                <year>2020</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2020 Huang CJ</copyright-statement>
                <copyright-year>2020</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport68823" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.20421.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The authors proposed a long-short-term memory (LSTM) network to learn the individual behaviors of TCL units. The authors use the aggregated information to predict the response of the TCL cluster to the pricing policy. The authors use this prediction model in a genetic algorithm to find the best prices in terms of profit maximization in an energy arbitrage operation. The simulation results show that the proposed method offers a profit equal to 96% of the theoretically optimal solution. I recommend minor revisions. I recommend the following revisions. In addition, there are some questions that need to be explained below: 
                <list list-type="order">
                    <list-item>
                        <p>English language should be carefully checked and carefully check paper for language typos.</p>
                    </list-item>
                    <list-item>
                        <p>Some figures are not needed.</p>
                    </list-item>
                    <list-item>
                        <p>All the figures are unclear and hard to read, please update to a clear version.</p>
                    </list-item>
                    <list-item>
                        <p>The authors must provide a detailed flowchart of the methodology of the paper.</p>
                    </list-item>
                    <list-item>
                        <p>The conclusion section is missing some perspective related to future research work.</p>
                    </list-item>
                    <list-item>
                        <p>References are too few and must be updated in recent years. I suggest authors should add related references.</p>
                    </list-item>
                </list>
            </p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Yes</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Partly</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Partly</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Partly</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>Big data analysis, machine learning and deep learning applications on the Internet of Energy (IoE) and environmental science, especially in renewable energy, as well as electricity load demand, electricity prices, solar radiance, photovoltaic power, and PM2.5 forecasting, and photovoltaic power plants planning design, and operation maintenance management.</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <back>
            <ref-list>
                <title>References</title>
                <ref id="rep-ref-68823-1">
                    <label>1</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Multiple-Input Deep Convolutional Neural Network Model for Short-Term Photovoltaic Power Forecasting</article-title>.
                        <source>
                            <italic>IEEE Access</italic>
                        </source>.<year>2019</year>;<volume>7</volume>:
                        <elocation-id>10.1109/ACCESS.2019.2921238</elocation-id>
                        <fpage>74822</fpage>-<lpage>74834</lpage>
                        <pub-id pub-id-type="doi">10.1109/ACCESS.2019.2921238</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-68823-2">
                    <label>2</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>An Electricity Price Forecasting Model by Hybrid Structured Deep Neural Networks</article-title>.
                        <source>
                            <italic>Sustainability</italic>
                        </source>.<year>2018</year>;<volume>10</volume>(<issue>4</issue>) :
                        <elocation-id>10.3390/su10041280</elocation-id>
                        <pub-id pub-id-type="doi">10.3390/su10041280</pub-id>
                    </mixed-citation>
                </ref>
            </ref-list>
        </back>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report68829">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.22445.r68829</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Shun</surname>
                        <given-names>Matsukawa</given-names>
                    </name>
                    <xref ref-type="aff" rid="r68829a1">1</xref>
                    <role>Referee</role>
                </contrib>
                <aff id="r68829a1">
                    <label>1</label>Smart Grid Power Control Engineering Joint Laboratory, Gifu University, Gifu, Japan</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>25</day>
                <month>8</month>
                <year>2020</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2020 Shun M</copyright-statement>
                <copyright-year>2020</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport68829" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.20421.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>
                <list list-type="order">
                    <list-item>
                        <p>The sequence length of the LSTM model is so short that it does not show superior predictive ability compared to other models.</p>
                    </list-item>
                    <list-item>
                        <p>I couldn't understand the significance of Figure 2 because of its lower resolution. This issue needs to be resolved.</p>
                    </list-item>
                    <list-item>
                        <p>On my own interpretation of Figure 2, I felt that the orange breakline showing the LSTM prediction results failed to learn the response of the TCL agent because it was heavily dependent on the load of the previous step.</p>
                    </list-item>
                    <list-item>
                        <p>In Figure 4 and 5, the optimized loads on time 14-15 are changes suddenly. I felt it is necessary to investigate the cause of it.</p>
                        <p> </p>
                        <p> Comment) I felt that the real key to this study was not the optimization, but the accuracy of TCL response prediction. Therefore, a more detailed analysis of the LSTM model would further enhance the value of this paper.</p>
                    </list-item>
                </list>
            </p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Partly</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Partly</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>smart grid, baseline load estimation, machine learning, neural networks, time-series, LSTM</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
    </sub-article>
</article>
