<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.130936.2</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Research Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>Optimization of binding affinities in chemical space with generative pre-trained transformer and deep reinforcement learning</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 2; peer review: 3 approved, 2 approved with reservations]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Xu</surname>
                        <given-names>Xiaopeng</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-2414-7851</uri>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Zhou</surname>
                        <given-names>Juexiao</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Zhu</surname>
                        <given-names>Chen</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Zhan</surname>
                        <given-names>Qing</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Li</surname>
                        <given-names>Zhongxiao</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-2480-0750</uri>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Zhang</surname>
                        <given-names>Ruochi</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-6541-4050</uri>
                    <xref ref-type="aff" rid="a4">4</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Wang</surname>
                        <given-names>Yu</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a4">4</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Liao</surname>
                        <given-names>Xingyu</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Gao</surname>
                        <given-names>Xin</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Funding Acquisition</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia</aff>
                <aff id="a2">
                    <label>2</label>Computer, Electrical and Mathematical Sciences and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia</aff>
                <aff id="a3">
                    <label>3</label>KAUST Catalysis Center (KCC), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia</aff>
                <aff id="a4">
                    <label>4</label>Syneron Technology, Guangzhou, China</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:xin.gao@kaust.edu.sa">xin.gao@kaust.edu.sa</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>20</day>
                <month>2</month>
                <year>2024</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2023</year>
            </pub-date>
            <volume>12</volume>
            <elocation-id>757</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>15</day>
                    <month>2</month>
                    <year>2024</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2024 Xu X et al.</copyright-statement>
                <copyright-year>2024</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/12-757/pdf"/>
            <abstract>
                <sec>
                    <title>Background</title>
                    <p>The key challenge in drug discovery is to discover novel compounds with desirable properties. Among the properties, binding affinity to a target is one of the prerequisites and usually evaluated by molecular docking or quantitative structure activity relationship (QSAR) models.</p>
                </sec>
                <sec>
                    <title>Methods</title>
                    <p>In this study, we developed SGPT-RL, which uses a generative pre-trained transformer (GPT) as the policy network of the reinforcement learning (RL) agent to optimize the binding affinity to a target. SGPT-RL was evaluated on the Moses distribution learning benchmark and two goal-directed generation tasks, with Dopamine Receptor D2 (DRD2) and Angiotensin-Converting Enzyme 2 (ACE2) as the targets. Both QSAR model and molecular docking were implemented as the optimization goals in the tasks. The popular Reinvent method was used as the baseline for comparison.</p>
                </sec>
                <sec>
                    <title>Results</title>
                    <p>The results on the Moses benchmark showed that SGPT-RL learned good property distributions and generated molecules with high validity and novelty. On the two goal-directed generation tasks, both SGPT-RL and Reinvent were able to generate valid molecules with improved target scores. The SGPT-RL method achieved better results than Reinvent on the ACE2 task, where molecular docking was used as the optimization goal. Further analysis shows that SGPT-RL learned conserved scaffold patterns during exploration.</p>
                </sec>
                <sec>
                    <title>Conclusions</title>
                    <p>The superior performance of SGPT-RL in the ACE2 task indicates that it can be applied to the virtual screening process where molecular docking is widely used as the criteria. Besides, the scaffold patterns learned by SGPT-RL during the exploration process can assist chemists to better design and discover novel lead candidates.</p>
                </sec>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>Drug design</kwd>
                <kwd>transformers</kwd>
                <kwd>reinforcement learning</kwd>
                <kwd>molecular docking</kwd>
                <kwd>hit discovery</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1">
                    <funding-source>King Abdullah University of Science and Technology (KAUST) Office of Research Administration (ORA)</funding-source>
                    <award-id>FCC/1/1976-44-01</award-id>
                    <award-id>FCC/1/1976-45-01</award-id>
                    <award-id>URF/1/4663-01-01</award-id>
                    <award-id>REI/1/5202-01-01</award-id>
                    <award-id>REI/1/4940-01-01</award-id>
                    <award-id>RGC/3/4816-01-01</award-id>
                </award-group>
                <funding-statement>This work was supported by the grants assigned to Prof. Xin Gao from the King Abdullah University of Science and Technology (KAUST) Office of Research Administration (ORA) under Award No FCC/1/1976-44-01, FCC/1/1976-45-01, URF/1/4663-01-01, REI/1/5202-01-01, REI/1/4940-01-01, and RGC/3/4816-01-01.</funding-statement>
                <funding-statement>
                    <italic>The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</italic>
                </funding-statement>
            </funding-group>
        </article-meta>
        <notes>
            <sec sec-type="version-changes">
                <label>Revised</label>
                <title>Amendments from Version 1</title>
                <p>Changes made from version 1 to version 2: 
                    <list list-type="order">
                        <list-item>
                            <p>The repetitive explanations of abbreviations in abstract, figure legends, and table legends were removed as mentioned by the reviewers.</p>
                        </list-item>
                        <list-item>
                            <p>Included property distributions changes in Supplementary Figures into the Figures 3-4, to make the presentation clearer as mentioned by the reviewers.</p>
                        </list-item>
                        <list-item>
                            <p>Updated the Supplementary Figures accordingly to support the changes in 2.</p>
                        </list-item>
                        <list-item>
                            <p>Updated the source data reference to follow the update in 3.</p>
                        </list-item>
                        <list-item>
                            <p>Corrected several typos and removed unnecessary sentences to make the context more fluent to read, as mentioned by the reviewers.</p>
                        </list-item>
                        <list-item>
                            <p>Added descriptions to clarify the QSAR processing, as mentioned by a reviewer.</p>
                        </list-item>
                        <list-item>
                            <p>Added a citation as suggested by a reviewer.</p>
                        </list-item>
                        <list-item>
                            <p>Added descriptions to describe the formulation of the optimization as a RL problem.</p>
                        </list-item>
                        <list-item>
                            <p>Added explanations of abbreviations in the figure and table captions to make them easier to read.</p>
                        </list-item>
                        <list-item>
                            <p>Renamed subsections references to use names instead of numbers.</p>
                        </list-item>
                    </list>
                </p>
            </sec>
        </notes>
    </front>
    <body>
        <sec id="sec1" sec-type="intro">
            <title>Introduction</title>
            <p>The key challenge in drug discovery is to discover new molecules with desirable properties.
                <sup>
                    <xref ref-type="bibr" rid="ref1">1</xref>
                </sup> In traditional drug discovery campaigns, high-throughput virtual screening, biochemical assays, physicochemical assays, and 
                <italic toggle="yes">in vitro</italic> profiling of absorption, distribution, metabolism, and excretion (ADME) properties of chemicals are usually conducted.
                <sup>
                    <xref ref-type="bibr" rid="ref2">2</xref>
                </sup> However, the chemical space of possible molecules is enormous, with 10
                <sup>23</sup> to 10
                <sup>60</sup> potential drug-like molecules and the number of synthesized molecules in the order of 10
                <sup>8</sup>.
                <sup>
                    <xref ref-type="bibr" rid="ref3">3</xref>
                </sup> It is infeasible to screen all the molecules to select the desirable ones. Many machine learning tools to predict molecular properties, including binding affinity, drug-likeness, synthetic accessibility, and ADME properties have been integrated into the screening pipelines as key components,
                <sup>
                    <xref ref-type="bibr" rid="ref4">4</xref>
                </sup> as they are much faster than traditional computational methods and yield rapid and accurate property predictions.
                <sup>
                    <xref ref-type="bibr" rid="ref3">3</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref5">5</xref>
                </sup> Employing these tools has improved the efficiency to virtually screen the chemical libraries, which are generated from available chemical reagents.
                <sup>
                    <xref ref-type="bibr" rid="ref6">6</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref7">7</xref>
                </sup> However, the search is still limited to molecules in the chemical libraries.</p>
            <p>In recent years, de novo molecular design, especially deep generative models, has witnessed a rapid progress, which can efficiently explore the chemical space and optimize the molecular generation towards desired properties.
                <sup>
                    <xref ref-type="bibr" rid="ref3">3</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref8">8</xref>
                </sup>
                <sup>&#x2013;</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref43">10</xref>
                </sup> A pioneer work was published in 2018, which employed variational autoencoder (VAE) to learn a continuous representation of the chemical space and used gradient-based optimization to search for functional molecules.
                <sup>
                    <xref ref-type="bibr" rid="ref10">11</xref>
                </sup> After that, many methods were developed and the most representative classes include recurrent neural networks, autoencoders, generative adversarial networks, and reinforcement learning (RL).
                <sup>
                    <xref ref-type="bibr" rid="ref3">3</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref4">4</xref>
                </sup> Among them, RL methods were shown to be able to optimize the generation of molecules towards desirable properties, including target activity, drug-likeness, molecular weight, synthetic accessibility (SA), and similarity to given molecules.
                <sup>
                    <xref ref-type="bibr" rid="ref4">4</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref6">6</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref11">12</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref12">13</xref>
                </sup>
            </p>
            <p>Transformer
                <sup>
                    <xref ref-type="bibr" rid="ref13">14</xref>
                </sup> is a prominent deep learning method that was first proposed for natural language translation and has made tremendous impact in many fields, such as language modeling, speech processing, and computer vision.
                <sup>
                    <xref ref-type="bibr" rid="ref14">15</xref>
                </sup> A decoder-only variant of the transformer, Generative Pretrained Transformer (GPT), stands out among the many transformer variants. It was trained on a large corpus of unlabeled text and able to generate news articles difficult for human evaluators to differentiate from human-written ones.
                <sup>
                    <xref ref-type="bibr" rid="ref15">16</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref16">17</xref>
                </sup> Besides, a GPT model fine-tuned with reinforcement learning showed better generative results, with reduced toxic outputs and better truthfulness.
                <sup>
                    <xref ref-type="bibr" rid="ref17">18</xref>
                </sup>
            </p>
            <p>Several transformer-based methods have been proposed for molecular generation tasks.
                <sup>
                    <xref ref-type="bibr" rid="ref4">4</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref18">19</xref>
                </sup>
                <sup>&#x2013;</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref20">21</xref>
                </sup> A study formulated the protein-specific molecular generation as a machine translation problem and used amino acid sequences as inputs and simplified molecular input line entry system (SMILES) representation of molecules as outputs.
                <sup>
                    <xref ref-type="bibr" rid="ref18">19</xref>
                </sup> The model was pretrained on amino acid sequences of targets and the corresponding SMILES of the binding molecules, and able to generate valid molecules with structural novelty and plausible drug-likeness. Another work also formulated molecular generation as a translation problem, but their goal is to optimize the generation of molecules towards desirable properties.
                <sup>
                    <xref ref-type="bibr" rid="ref20">21</xref>
                </sup> They added a desirable property together with the starting molecules as the input and the modified molecules fulfilling the desirable property as the output to train their model. Their results showed that transformers can generate molecules with desirable properties through modifications that are intuitive to chemists. A decoder-only transformer model, MolGPT, was also proposed for molecular generation.
                <sup>
                    <xref ref-type="bibr" rid="ref19">20</xref>
                </sup> It was trained on molecules with property conditions and able to generate novel molecules fulfilling the corresponding properties. Another work also used a decoder-only transformer model but targeting multiple properties.
                <sup>
                    <xref ref-type="bibr" rid="ref4">4</xref>
                </sup> After pretraining a transformer model, a gated recurrent unit (GRU) model was used to distill it and initiate an RL agent. This agent was then trained to optimize multiple properties through the Reinvent approach.
                <sup>
                    <xref ref-type="bibr" rid="ref12">13</xref>
                </sup> The agent can generate novel molecules satisfying multiple property constraints. In summary, these studies showed the advantages of transformers on molecular generation, especially for constrained generation tasks.
                <sup>
                    <xref ref-type="bibr" rid="ref4">4</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref11">12</xref>
                </sup>
            </p>
            <p>Activity of a compound is the primary consideration for drug discovery, which is induced by binding affinity of a compound to a target. Three approaches are used to estimate binding affinity, including bioassays, quantitative structure activity relationship (QSAR) models and molecular docking.
                <sup>
                    <xref ref-type="bibr" rid="ref21">22</xref>
                </sup> 
                <italic toggle="yes">In vitro</italic> bioassays are reliable but often scarce, and QSAR models and molecular docking are usually used for in silico screening process.
                <sup>
                    <xref ref-type="bibr" rid="ref21">22</xref>
                </sup> Because transformers are so good at sequence generation and RL has an advantage on optimization tasks, an intuitive idea is to combine transformer and RL to optimize the binding affinity. However, as far as we know, no such studies have been conducted. Two main obstacles may stop researchers from conducting such studies. First, high-end GPUs with large memories are required to conduct such studies. During the RL process, a transformer decoder has to be used to generate a batch of molecules, however, such generation is very memory expensive. Besides, conducting such studies requires interdisciplinary knowledge, including computational chemistry and machine learning expertise. For example, molecular docking is usually used for virtual screening, but is not easy for machine learning experts to perform and interpret; while transformer and RL are widely used in deep learning society, but are hard for computational chemists to grasp and implement.</p>
            <p>In this study, we proposed the first method that combines GPT and RL for molecular generation. We developed a tool named SGPT-RL, which uses a transformer decoder as the policy network of RL agents. The workflow is shown in 
                <xref ref-type="fig" rid="f1">Figure 1</xref>. First, GPT was trained on lead-like molecules to obtain a prior model that learns the chemical space. This prior model was used to initiate the agent, which shared the same decoder model as the policy network. Then, the agent was trained in an RL fashion to optimize the generation of molecules towards desirable properties, as shown in 
                <xref ref-type="fig" rid="f1">Figure 1c</xref>. The agent was used to generate a batch of molecules; the molecules were scored by scoring functions to obtain the target scores; the scores were combined with the prior likelihoods to calculate the losses; the losses that contain both the target score and prior likelihood information were used to serve as the feedback to the agent. During training, the likelihood of the agent to generate molecules with good target scores is increased and those with poor scores decreased. We evaluated SGPT-RL on the Moses distribution learning benchmark and two goal-directed generation tasks. Results on the Moses benchmark showed that the SGPT-RL prior model was able to learn good property distributions and generate molecules with high novelty. The two goal-directed generation tasks are a Dopamine Receptor D2 (DRD2) task, with QSAR model-based activity as the scoring function, and an Angiotensin-Converting Enzyme 2 (ACE2) task, with molecular docking affinity as the target score. In both tasks, the SGPT-RL agents were able to generate valid molecules with high target activities. In the DRD2 task, the SGPT-RL agent was able to explore more scaffolds than the popular Reinvent method; in the ACE2 task, the SGPT-RL agent generated molecules with significantly better docking scores than Reinvent. Besides, we found that the Reinvent agents could not learn effectively after around 100 steps, while the SGPT-RL agents were continuous learning and generating molecules with more ring structures. In addition, we found that the SGPT-RL agents were able to learn some generative patterns, while the Reinvent agents were exploring with strong randomness and no clear patterns could be observed.</p>
            <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                <label>Figure 1. </label>
                <caption>
                    <title>The workflow of SGPT-RL.</title>
                    <p>a) The main workflow. Simplified molecular input line entry system (SMILES) from the Moses benchmark was used to train a prior model. An agent model was then initiated from the prior and trained in a reinforcement learning (RL) fashion to generate molecules with desirable properties. b) The architecture of the prior model. The agent shares the same architecture. c) The pipeline of the RL approach. The prior model was used to initiate the agent model. During each RL step, the agent model was used to generate a batch of SMILES sequences. The generated sequences were evaluated by the prior model and a scoring function to calculate augmented likelihoods, which serve as the feedback to update the agent. In the Dopamine Receptor D2 (DRD2) task, a quantitative structure activity relationship (QSAR) model was used as the scoring function; in the Angiotensin-Converting Enzyme 2 (ACE2) task, ACE2 docking score was used as the scoring function.</p>
                </caption>
                <graphic id="gr1" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/162639/2e93d1c3-2cb6-43b4-a2cc-3e4cf0a95707_figure1.gif"/>
            </fig>
        </sec>
        <sec id="sec2" sec-type="methods">
            <title>Methods</title>
            <sec id="sec3">
                <title>Datasets</title>
                <p>The dataset to train the prior models was obtained from the 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/molecularsets/moses">Moses benchmark</ext-link>.
                    <sup>
                        <xref ref-type="bibr" rid="ref22">23</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref40">42</xref>
                    </sup> This dataset contains 1.9 million lead-like molecules from the Zinc database.
                    <sup>
                        <xref ref-type="bibr" rid="ref23">24</xref>
                    </sup> The train and test dataset in the Moses benchmark were used for training and testing, which contain 1,584,664 and 176,075 molecules respectively.</p>
                <p>Known active molecules that bind with DRD2 or ACE2 were obtained from 
                    <ext-link ext-link-type="uri" xlink:href="https://solr.ideaconsult.net/search/excape/">ExCAPE-DB</ext-link>.
                    <sup>
                        <xref ref-type="bibr" rid="ref24">25</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref40">42</xref>
                    </sup> The 8,036 unique molecules that are known to be active against DRD2 were obtained and 56 unique molecules that are active against ACE2 were retrieved. For these two sets of known active molecules, none of them were found in the Moses training dataset.</p>
            </sec>
            <sec id="sec4">
                <title>Model architecture</title>
                <p>A brief overview of the framework is illustrated in 
                    <xref ref-type="fig" rid="f1">Figure 1a</xref>. A transformer decoder prior model was trained on the Moses dataset. This pretrained prior model was used to initiate the agent. During the RL process, the agent model was used to generate molecules, which were scored by the prior network and a scoring function to provide feedback to update the agent. The agent model trained after the final step was used to generate molecules for property distribution analysis.</p>
                <p>
                    <italic toggle="yes">The prior network</italic>
                </p>
                <p>In SGPT-RL, a generative pre-trained transformer (GPT)
                    <sup>
                        <xref ref-type="bibr" rid="ref25">26</xref>
                    </sup> was used as the prior model to learn the chemical space. Tokenized SMILES sequences were used to train the model on a next token prediction task.</p>
                <p>The GPT model we used is a simplified version of GPT-2, with only &#x223c;6M parameters. The architecture of the model is illustrated in 
                    <xref ref-type="fig" rid="f1">Figure 1b</xref>. The model is composed of eight decoder blocks, input and positional embedding before the blocks, a linear layer after the blocks, and a softmax layer before output. Each of the blocks contains a masked multi-head self-attention layer and a fully connected feedforward layer, with residual connections in each of the layers. Layer normalization is conducted in the two layers to normalize the inputs. An embedding size of 256 was used in all layers.</p>
                <p>The core of the GPT model is the masked multi-head self-attention layer. In this layer, eight scaled dot-product attention functions facilitate the model to capture key information in a sequence. In the attention function, a query vector 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>Q</mml:mi>
                        </mml:math>
                    </inline-formula> is used to calculate a dot product with the key vector 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>K</mml:mi>
                        </mml:math>
                    </inline-formula> and then divided by the key vector length 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:msub>
                                <mml:mi>d</mml:mi>
                                <mml:mi>k</mml:mi>
                            </mml:msub>
                        </mml:math>
                    </inline-formula>. The resulting product value is passed into a softmax function to get the attention weights, which is dot-producted with a value vector 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>V</mml:mi>
                        </mml:math>
                    </inline-formula> to get the final attention. The formula is shown in 
                    <xref ref-type="disp-formula" rid="e1">Equation 1</xref>.
                    <sup>
                        <xref ref-type="bibr" rid="ref13">14</xref>
                    </sup>
                    <disp-formula id="e1">
                        <mml:math display="block">
                            <mml:mtext mathvariant="italic">Attention</mml:mtext>
                            <mml:mfenced close=")" open="(" separators=",,">
                                <mml:mi>Q</mml:mi>
                                <mml:mi>K</mml:mi>
                                <mml:mi>V</mml:mi>
                            </mml:mfenced>
                            <mml:mo>=</mml:mo>
                            <mml:mtext mathvariant="italic">softmax</mml:mtext>
                            <mml:mfenced close=")" open="(">
                                <mml:mfrac>
                                    <mml:mrow>
                                        <mml:mi>Q</mml:mi>
                                        <mml:msup>
                                            <mml:mi>K</mml:mi>
                                            <mml:mi>T</mml:mi>
                                        </mml:msup>
                                    </mml:mrow>
                                    <mml:msqrt>
                                        <mml:msub>
                                            <mml:mi>d</mml:mi>
                                            <mml:mi>k</mml:mi>
                                        </mml:msub>
                                    </mml:msqrt>
                                </mml:mfrac>
                            </mml:mfenced>
                            <mml:mi>V</mml:mi>
                        </mml:math>
                        <label>(1)</label>
                    </disp-formula>
                </p>
                <p>The prior model was trained for ten epochs on the training dataset and evaluated on the testing dataset after each epoch. Cross-entropy loss was used with the AdamW optimizer
                    <sup>
                        <xref ref-type="bibr" rid="ref41">27</xref>
                    </sup> to update the model, with a learning rate of 0.001. A batch size of 1,024 was used to train the model. To generate the SMILES string of a molecule, a start token was fed to the model to predict the next. The generated token was concatenated with previous tokens to predict the next, until an end token was predicted or a maximum sequence length of 140 was reached.</p>
                <p>
                    <italic toggle="yes">Training the agent</italic>
                </p>
                <p>The process to generate molecules with desirable properties was framed as a RL problem, and the 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/MarcusOlivecrona/REINVENT">Reinvent</ext-link> approach was utilized, with the process described below.
                    <sup>
                        <xref ref-type="bibr" rid="ref11">12</xref>
                    </sup> In the RL formulation, the state is the current sequence generated, the action is the next token to add, and the reward is a augmented likelihood calculated from prior likelihood and property scores. The GPT model described in the previous Subsection was used for the prior and the agent, and customized scoring functions for the target properties were used in each of the two tasks.</p>
                <p>The loss function to update the agent model is defined as in 
                    <xref ref-type="disp-formula" rid="e2">Equations 2</xref>&#x2013;
                    <xref ref-type="disp-formula" rid="e3">3</xref>. First, a SMILES sequence 
                    <italic toggle="yes">A</italic> was sampled from the agent model with its log-likelihood 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mo>log</mml:mo>
                            <mml:mi>p</mml:mi>
                            <mml:msub>
                                <mml:mfenced close=")" open="(">
                                    <mml:mi>A</mml:mi>
                                </mml:mfenced>
                                <mml:mtext mathvariant="italic">agent</mml:mtext>
                            </mml:msub>
                        </mml:math>
                    </inline-formula>. Then the SMILES sequence was passed to the prior model to calculate a prior log-likelihood 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mo>log</mml:mo>
                            <mml:mi>p</mml:mi>
                            <mml:msub>
                                <mml:mfenced close=")" open="(">
                                    <mml:mi>A</mml:mi>
                                </mml:mfenced>
                                <mml:mtext mathvariant="italic">prior</mml:mtext>
                            </mml:msub>
                        </mml:math>
                    </inline-formula>, and evaluated with scoring functions of desirable properties to get a score 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>S</mml:mi>
                            <mml:mfenced close=")" open="(">
                                <mml:mi>A</mml:mi>
                            </mml:mfenced>
                        </mml:math>
                    </inline-formula>. The score was added to the prior log-likelihood with a coefficient 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>&#x03c3;</mml:mi>
                        </mml:math>
                    </inline-formula> to get an augmented log-likelihood 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mo>log</mml:mo>
                            <mml:mi>p</mml:mi>
                            <mml:msub>
                                <mml:mfenced close=")" open="(">
                                    <mml:mi>A</mml:mi>
                                </mml:mfenced>
                                <mml:mi mathvariant="italic">aug</mml:mi>
                            </mml:msub>
                        </mml:math>
                    </inline-formula>, as shown in 
                    <xref ref-type="disp-formula" rid="e2">Equation 2</xref>. The idea behind this equation is that the prior log-likelihood is added to preserve the rules learnt from SMILES sequences of molecules, and the score of desirable properties was used to bias the model to generate SMILES of desirable properties.
                    <disp-formula id="e2">
                        <mml:math display="block">
                            <mml:mo>log</mml:mo>
                            <mml:mi>p</mml:mi>
                            <mml:msub>
                                <mml:mfenced close=")" open="(">
                                    <mml:mi>A</mml:mi>
                                </mml:mfenced>
                                <mml:mi mathvariant="italic">aug</mml:mi>
                            </mml:msub>
                            <mml:mo>=</mml:mo>
                            <mml:mo>log</mml:mo>
                            <mml:mi>p</mml:mi>
                            <mml:msub>
                                <mml:mfenced close=")" open="(">
                                    <mml:mi>A</mml:mi>
                                </mml:mfenced>
                                <mml:mtext mathvariant="italic">prior</mml:mtext>
                            </mml:msub>
                            <mml:mo>+</mml:mo>
                            <mml:mi mathvariant="normal">&#x03c3;</mml:mi>
                            <mml:mi>S</mml:mi>
                            <mml:mfenced close=")" open="(">
                                <mml:mi>A</mml:mi>
                            </mml:mfenced>
                        </mml:math>
                        <label>(2)</label>
                    </disp-formula>
                </p>
                <p>Finally, the squared error between the augmented log-likelihood and agent log-likelihood was used as the loss to update the agent model, as shown in 
                    <xref ref-type="disp-formula" rid="e3">Equation 3</xref>.
                    <disp-formula id="e3">
                        <mml:math display="block">
                            <mml:mtext mathvariant="italic">Loss</mml:mtext>
                            <mml:mo>=</mml:mo>
                            <mml:msup>
                                <mml:mfenced close="]" open="[">
                                    <mml:mrow>
                                        <mml:mo>log</mml:mo>
                                        <mml:mi>p</mml:mi>
                                        <mml:msub>
                                            <mml:mfenced close=")" open="(">
                                                <mml:mi>A</mml:mi>
                                            </mml:mfenced>
                                            <mml:mi mathvariant="italic">aug</mml:mi>
                                        </mml:msub>
                                        <mml:mo>&#x2212;</mml:mo>
                                        <mml:mo>log</mml:mo>
                                        <mml:mi>p</mml:mi>
                                        <mml:msub>
                                            <mml:mfenced close=")" open="(">
                                                <mml:mi>A</mml:mi>
                                            </mml:mfenced>
                                            <mml:mtext mathvariant="italic">agent</mml:mtext>
                                        </mml:msub>
                                    </mml:mrow>
                                </mml:mfenced>
                                <mml:mn>2</mml:mn>
                            </mml:msup>
                        </mml:math>
                        <label>(3)</label>
                    </disp-formula>
                </p>
            </sec>
            <sec id="sec5">
                <title>Evaluation metrics</title>
                <p>Five metrics from the Moses benchmark were used to evaluate the models, including validity, uniqueness, novelty, similarity to a nearest neighbor (SNN) and internal diversity (intDiv). The definitions of the metrics are described below. The generated SMILES sequences to be evaluated are denoted by 
                    <italic toggle="yes">G</italic>, the training dataset is denoted by 
                    <italic toggle="yes">T</italic>, and 
                    <italic toggle="yes">n</italic> is the total number of the generated sequences.
                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>Validity: the fraction of the valid sequences among 10,000 generated sequences.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>Uniqueness: the fraction of the unique sequences among 10,000 valid generated sequences.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>Novelty: the fraction of the unique sequences in G, but not in T.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>Similarity to a nearest neighbor (SNN): evaluates the similarity of the generated molecules to the training molecules. It is the Tanimoto similarity 
                                <inline-formula>
                                    <mml:math display="inline">
                                        <mml:mi>T</mml:mi>
                                        <mml:mfenced close=")" open="(" separators=",">
                                            <mml:msub>
                                                <mml:mi>m</mml:mi>
                                                <mml:mi>G</mml:mi>
                                            </mml:msub>
                                            <mml:msub>
                                                <mml:mi>m</mml:mi>
                                                <mml:mi>T</mml:mi>
                                            </mml:msub>
                                        </mml:mfenced>
                                    </mml:math>
                                </inline-formula> between fingerprints of a molecule 
                                <inline-formula>
                                    <mml:math display="inline">
                                        <mml:msub>
                                            <mml:mi>m</mml:mi>
                                            <mml:mi>G</mml:mi>
                                        </mml:msub>
                                    </mml:math>
                                </inline-formula> from the generated set 
                                <italic toggle="yes">G</italic> and its nearest neighbor molecule 
                                <inline-formula>
                                    <mml:math display="inline">
                                        <mml:msub>
                                            <mml:mi>m</mml:mi>
                                            <mml:mi>T</mml:mi>
                                        </mml:msub>
                                    </mml:math>
                                </inline-formula> in the training dataset.</p>
                        </list-item>
                    </list>
                    <disp-formula id="e4">
                        <mml:math display="block">
                            <mml:mi>SNN</mml:mi>
                            <mml:mfenced close=")" open="(" separators=",">
                                <mml:mi>G</mml:mi>
                                <mml:mi>T</mml:mi>
                            </mml:mfenced>
                            <mml:mo>=</mml:mo>
                            <mml:mfrac>
                                <mml:mn>1</mml:mn>
                                <mml:mi>n</mml:mi>
                            </mml:mfrac>
                            <mml:msub>
                                <mml:mo>&#x2211;</mml:mo>
                                <mml:mrow>
                                    <mml:msub>
                                        <mml:mi>m</mml:mi>
                                        <mml:mi>G</mml:mi>
                                    </mml:msub>
                                    <mml:mo>&#x2208;</mml:mo>
                                    <mml:mi>G</mml:mi>
                                </mml:mrow>
                            </mml:msub>
                            <mml:munder>
                                <mml:mo>max</mml:mo>
                                <mml:mrow>
                                    <mml:msub>
                                        <mml:mi>m</mml:mi>
                                        <mml:mi>T</mml:mi>
                                    </mml:msub>
                                    <mml:mo>&#x2208;</mml:mo>
                                    <mml:mi>T</mml:mi>
                                </mml:mrow>
                            </mml:munder>
                            <mml:mi>T</mml:mi>
                            <mml:mfenced close=")" open="(" separators=",">
                                <mml:msub>
                                    <mml:mi>m</mml:mi>
                                    <mml:mi>G</mml:mi>
                                </mml:msub>
                                <mml:msub>
                                    <mml:mi>m</mml:mi>
                                    <mml:mi>T</mml:mi>
                                </mml:msub>
                            </mml:mfenced>
                        </mml:math>
                        <label>(4)</label>
                    </disp-formula>
                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>Internal diversity (intDiv): assesses the diversity within 
                                <italic toggle="yes">G.</italic> It is defined as one minus the averaged Tanimoto similarity of any pair of molecules 
                                <inline-formula>
                                    <mml:math display="inline">
                                        <mml:msub>
                                            <mml:mi>m</mml:mi>
                                            <mml:mn>1</mml:mn>
                                        </mml:msub>
                                    </mml:math>
                                </inline-formula>
                                <italic toggle="yes">,</italic> 
                                <inline-formula>
                                    <mml:math display="inline">
                                        <mml:msub>
                                            <mml:mi>m</mml:mi>
                                            <mml:mn>2</mml:mn>
                                        </mml:msub>
                                    </mml:math>
                                </inline-formula> in the generated sequences 
                                <italic toggle="yes">G.</italic>
                            </p>
                        </list-item>
                    </list>
                    <disp-formula id="e5">
                        <mml:math display="block">
                            <mml:mtext mathvariant="italic">IntDiv</mml:mtext>
                            <mml:mfenced close=")" open="(">
                                <mml:mi>G</mml:mi>
                            </mml:mfenced>
                            <mml:mo>=</mml:mo>
                            <mml:mn>1</mml:mn>
                            <mml:mo>&#x2212;</mml:mo>
                            <mml:mfrac>
                                <mml:mn>1</mml:mn>
                                <mml:msup>
                                    <mml:mi>n</mml:mi>
                                    <mml:mn>2</mml:mn>
                                </mml:msup>
                            </mml:mfrac>
                            <mml:msub>
                                <mml:mo>&#x2211;</mml:mo>
                                <mml:mrow>
                                    <mml:msub>
                                        <mml:mi>m</mml:mi>
                                        <mml:mn>1</mml:mn>
                                    </mml:msub>
                                    <mml:mo>,</mml:mo>
                                    <mml:msub>
                                        <mml:mi>m</mml:mi>
                                        <mml:mn>2</mml:mn>
                                    </mml:msub>
                                    <mml:mo>&#x2208;</mml:mo>
                                    <mml:mi>G</mml:mi>
                                </mml:mrow>
                            </mml:msub>
                            <mml:mi>T</mml:mi>
                            <mml:mfenced close=")" open="(" separators=",">
                                <mml:msub>
                                    <mml:mi>m</mml:mi>
                                    <mml:mn>1</mml:mn>
                                </mml:msub>
                                <mml:msub>
                                    <mml:mi>m</mml:mi>
                                    <mml:mn>2</mml:mn>
                                </mml:msub>
                            </mml:mfenced>
                        </mml:math>
                        <label>(5)</label>
                    </disp-formula>
                </p>
            </sec>
            <sec id="sec6">
                <title>Evaluated molecular properties</title>
                <p>In our experiments, seven molecular properties were calculated to evaluate the property distributions and used as the optimization goals. All these properties were used to compare the property distributions of molecules. DRD2 activity and ACE2 docking score were used as the scoring functions of the DRD2 and ACE2 tasks, respectively.</p>
                <p>DRD2 activity was evaluated with a QSAR model.
                    <sup>
                        <xref ref-type="bibr" rid="ref11">12</xref>
                    </sup> This model is a support vector machine (SVM) classifier with a Gaussian kernel trained on active and inactive molecules. In the modeling, a SMILES is converted into molecules to obtain the Morgan fingerprints using RDKit 2017.09.1.
                    <sup>
                        <xref ref-type="bibr" rid="ref28">28</xref>
                    </sup> The fingerprints were used as the features to build the SVM classifier. It predicts a probability score range from zero to one, with the closer to one the higher DRD2 activity.</p>
                <p>ACE2 affinity was calculated using molecular docking as described in Subsection &#x201c;
                    <italic toggle="yes">Task 2: structure-based generation with ACE2 as the target&#x201d;.</italic>
                </p>
                <p>The quantitative estimate of drug-likeness (QED) quantifies the drug-likeness of a molecule using molecular properties as inputs.
                    <sup>
                        <xref ref-type="bibr" rid="ref26">29</xref>
                    </sup> It was calculated by 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rdkit/rdkit">RDKit</ext-link> (2017.09.1)
                    <sup>
                        <xref ref-type="bibr" rid="ref27">30</xref>
                    </sup> and ranges from zero to one, with the closer to one the more favorable.</p>
                <p>Synthesize accessibility score (SAscore) measures the difficulty of synthesizing a molecule.
                    <sup>
                        <xref ref-type="bibr" rid="ref28">28</xref>
                    </sup> A predictive model built by Blaschke 
                    <italic toggle="yes">et al</italic>.
                    <sup>
                        <xref ref-type="bibr" rid="ref12">13</xref>
                    </sup> was used, where molecular weight was combined with raw score,
                    <sup>
                        <xref ref-type="bibr" rid="ref28">28</xref>
                    </sup> which ranges from one to 10, as features to predict the probability of synthetic accessibility. The model gives a probability score range from zero to one, with the closer to one the better.</p>
                <p>Molecular weight and the log of partition coefficient (LogP) were calculated using 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rdkit/rdkit">RDKit</ext-link>.
                    <sup>
                        <xref ref-type="bibr" rid="ref27">30</xref>
                    </sup> Length of the SMILES string was also calculated for the molecules.</p>
            </sec>
            <sec id="sec7">
                <title>Evaluation settings</title>
                <p>The SGPT-RL model was evaluated on a distribution learning benchmark and two tasks for goal-directed generation. The Moses Benchmark was used for distribution evaluation. DRD2 activity and ACE2 affinity were used as the scoring functions in the two goal-directed generations tasks, respectively.</p>
                <p>
                    <italic toggle="yes">Benchmarking on distribution learning</italic>
                </p>
                <p>To evaluate on the Moses distribution learning benchmark, the SGPT-RL prior model was trained on Moses training dataset. The model after the final epoch was used to generate 10,000 molecules to evaluate on this benchmark. Five metrics were used for comparison, including validity, uniqueness, novelty, SNN and intDiv. The baseline models from the Moses benchmark were run with default parameters for comparison. 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/jkwang93/MCMG">MCMG</ext-link> (multi-constraints molecular generation) and 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/devalab/molgpt">MolGPT</ext-link> were also run with default parameters to generate 10,000 molecules for comparison.</p>
                <p>
                    <italic toggle="yes">Task 1: goal-directed generation with DRD2 as the target</italic>
                </p>
                <p>In the DRD2 task, we aimed to generate molecules that are active against DRD2. The DRD2 activity predicted by a QSAR model
                    <sup>
                        <xref ref-type="bibr" rid="ref11">12</xref>
                    </sup> was used as the target. The prior model trained from the Moses dataset was used to initiate the agent on this task. The agent was trained for 2,000 steps and the model after the final step was used to sample 10,000 molecules for property distribution analysis.</p>
                <p>The Reinvent model
                    <sup>
                        <xref ref-type="bibr" rid="ref11">12</xref>
                    </sup> was used as the baseline in comparison. In this agent, a three-layer GRU was used as the policy model. The default hyper-parameters of Reinvent were used. The prior model was trained for five epochs with a batch size of 128. Adam optimizer was used with a learning rate of 0.001. To train this agent, the same scoring function of the SGPT-RL agent was used for a fair comparison. The Reinvent agent was trained with a batch size of 64, a learning rate of 0.0005, a sigma of 60, and 3,000 steps.</p>
                <p>
                    <italic toggle="yes">Task 2: structure-based generation with ACE2 as the target</italic>
                </p>
                <p>In the ACE2 task, we trained the SGPT-RL agent with ACE2 affinity as the desirable property. ACE2 affinity was evaluated by ligand-receptor docking experiments. The 3D structure of the human ACE2 receptor (PDB ID 
                    <ext-link ext-link-type="uri" xlink:href="https://www.rcsb.org/structure/1R4L">1R4L</ext-link>) was downloaded from the 
                    <ext-link ext-link-type="uri" xlink:href="https://www.rcsb.org/">Protein Data Bank</ext-link>. It was processed with 
                    <ext-link ext-link-type="uri" xlink:href="https://pymol.org/2/">PyMol</ext-link> (2.5.4)
                    <sup>
                        <xref ref-type="bibr" rid="ref29">31</xref>
                    </sup> to remove water molecules and original ligands. An open source of PyMol is available 
                    <ext-link ext-link-type="uri" xlink:href="https://pymol.org/2/">here</ext-link>. The structure was also processed with 
                    <ext-link ext-link-type="uri" xlink:href="https://ccsb.scripps.edu/mgltools/">MGLTools</ext-link> (1.5.7)
                    <sup>
                        <xref ref-type="bibr" rid="ref30">32</xref>
                    </sup> to add polar hydrogen and obtain the docking grid. The pocket where XX5 is located was used to dock with generated molecules. The SMILES strings of generated molecules were used to generate 3D structures of ligands using 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rdkit/rdkit">RDKit</ext-link> (2017.09.1).
                    <sup>
                        <xref ref-type="bibr" rid="ref27">30</xref>
                    </sup> The generated 3D ligand structures were processed with 
                    <ext-link ext-link-type="uri" xlink:href="https://openbabel.org/wiki/Main_Page">OpenBabel</ext-link> (3.0.0)
                    <sup>
                        <xref ref-type="bibr" rid="ref31">33</xref>
                    </sup> to assign Gasteiger partial charges and convert to pdbqt format. The final docking was performed using 
                    <ext-link ext-link-type="uri" xlink:href="https://vina.scripps.edu/">AutoDock Vina</ext-link> (1.1.2)
                    <sup>
                        <xref ref-type="bibr" rid="ref32">34</xref>
                    </sup> with eight poses for each ligand. The smallest docking score of the eight poses was used as the docking score of a ligand.</p>
                <p>To train the agent, the affinity score was expected to be in a range of zero to one to calculate the augmented log-likelihoods. So the docking score was transformed into a range of zero to one using the reverse sigmoid function as shown in 
                    <xref ref-type="disp-formula" rid="e6">Equation 6</xref>, where 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>l</mml:mi>
                        </mml:math>
                    </inline-formula>, 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>h</mml:mi>
                        </mml:math>
                    </inline-formula>, and 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>k</mml:mi>
                        </mml:math>
                    </inline-formula> were constants and set to be -12, -8 and 0.25, respectively.
                    <disp-formula id="e6">
                        <mml:math display="block">
                            <mml:mtext mathvariant="italic">Rsigmoid</mml:mtext>
                            <mml:mfenced close=")" open="(">
                                <mml:mi>x</mml:mi>
                            </mml:mfenced>
                            <mml:mo>=</mml:mo>
                            <mml:mfrac>
                                <mml:mn>1</mml:mn>
                                <mml:mrow>
                                    <mml:mn>1</mml:mn>
                                    <mml:mo>+</mml:mo>
                                    <mml:msup>
                                        <mml:mn>10</mml:mn>
                                        <mml:mrow>
                                            <mml:mi>k</mml:mi>
                                            <mml:mo>&#x2217;</mml:mo>
                                            <mml:mfrac>
                                                <mml:mrow>
                                                    <mml:mi>x</mml:mi>
                                                    <mml:mo>&#x2212;</mml:mo>
                                                    <mml:mfrac>
                                                        <mml:mrow>
                                                            <mml:mi>h</mml:mi>
                                                            <mml:mo>+</mml:mo>
                                                            <mml:mi>l</mml:mi>
                                                        </mml:mrow>
                                                        <mml:mn>2</mml:mn>
                                                    </mml:mfrac>
                                                </mml:mrow>
                                                <mml:mrow>
                                                    <mml:mi>h</mml:mi>
                                                    <mml:mo>&#x2212;</mml:mo>
                                                    <mml:mi>l</mml:mi>
                                                </mml:mrow>
                                            </mml:mfrac>
                                        </mml:mrow>
                                    </mml:msup>
                                    <mml:mspace width="0.25em"/>
                                </mml:mrow>
                            </mml:mfrac>
                        </mml:math>
                        <label>(6)</label>
                    </disp-formula>
                </p>
                <p>The Moses pretrained prior model was also used to initiate the agent on this task. The agent was trained for 1,000 steps and 64 molecules were sampled and scored during each step. 10,000 molecules were sampled from the agent model after the final step for property distribution analysis.</p>
                <p>The Reinvent model
                    <sup>
                        <xref ref-type="bibr" rid="ref11">12</xref>
                    </sup> was also used as the baseline on this task. The default hyper-parameters of Reinvent were used and the same scoring function of the SGPT-RL agent was used for comparison. This model was trained for 1,000 steps with 64 molecules generated during each step.</p>
            </sec>
            <sec id="sec8">
                <title>Scaffold analysis</title>
                <p>To analyze the scaffold overlaps of the prior models, we clustered the scaffolds of generated molecules and training reference using Butina method in 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rdkit/rdkit">RDKit</ext-link>.
                    <sup>
                        <xref ref-type="bibr" rid="ref27">30</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref33">35</xref>
                    </sup> The molecules from different sources were merged, with invalid and duplicated molecules removed. Murcko Scaffolds were obtained using 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rdkit/rdkit">RDKit</ext-link> and clustered using Morgan fingerprints as inputs. A minimum distance of 0.2 was used during clustering. Venn diagram was used to visualize the number of overlapping clusters and unique clusters. Examples of molecules were visualized using 
                    <ext-link ext-link-type="uri" xlink:href="https://perkinelmerinformatics.com/products/research/chemdraw">ChemDraw 20.1</ext-link>.
                    <sup>
                        <xref ref-type="bibr" rid="ref34">36</xref>
                    </sup> Some open source alternatives to ChemDraw are available 
                    <ext-link ext-link-type="uri" xlink:href="https://alternativeto.net/software/chemdraw/">here</ext-link>.</p>
                <p>To analyze the average number of rings and the number of explored scaffolds in 
                    <xref ref-type="fig" rid="f3">Figures 3</xref> and 
                    <xref ref-type="fig" rid="f4">4</xref>, 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rdkit/rdkit">RDKit</ext-link>
                    <sup>
                        <xref ref-type="bibr" rid="ref27">30</xref>
                    </sup> was used to obtain the Murcko Scaffold and calculate the number of rings for each generated molecule. The duplicated scaffolds were removed before counting the scaffolds.</p>
                <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                    <label>Figure 2. </label>
                    <caption>
                        <title>Scaffold overlaps of the prior models.</title>
                        <p>a) The scaffold overlaps between the training reference and molecules generated by the SGPT-RL and Reinvent prior models. Both SGPT-RL and Reinvent were able to generate molecules with novel scaffolds that did not appear in the training reference. b) Representative molecules with unique scaffolds from the three sources. The three rows correspond to training reference only (TR), SGPT-RL prior only (SP), and Reinvent prior only (RP) molecules, respectively.</p>
                    </caption>
                    <graphic id="gr2" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/162639/2e93d1c3-2cb6-43b4-a2cc-3e4cf0a95707_figure2.gif"/>
                </fig>
                <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                    <label>Figure 3. </label>
                    <caption>
                        <title>Comparison of SGPT-RL and Reinvent on the DRD2 task.</title>
                        <p>a-b) Improvements of validity and DRD2 activity during the RL process. SGPT-RL was relatively slower in generating molecules with good validity and DRD2 activity than Reinvent. c) Average number of rings in the generated molecules in the RL steps. SGPT-RL gradually increased the number of rings in the generated molecules during the RL process. It generated molecules with fewer rings than Reinvent in the beginning, but with more rings in the end. d) Accumulated number of unique scaffolds in the generated molecules during the RL process. SGPT-RL explored more scaffolds than Reinvent. e) The distribution of predicted DRD2 activities. Both SGPT-RL and Reinvent agents were able to generate molecules with high DRD2 activities. f) The distribution of synthesize accessibility scores (SAscore). 10,000 molecules are sampled from training dataset to be used as the reference (Training ref.).</p>
                    </caption>
                    <graphic id="gr3" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/162639/2e93d1c3-2cb6-43b4-a2cc-3e4cf0a95707_figure3.gif"/>
                </fig>
            </sec>
        </sec>
        <sec id="sec9" sec-type="results">
            <title>Results</title>
            <sec id="sec10">
                <title>Learning the chemical space with a GPT prior model</title>
                <p>The first step of our workflow is to train a prior model to learn the chemical space. To do that, the dataset from the Moses benchmark
                    <sup>
                        <xref ref-type="bibr" rid="ref22">23</xref>
                    </sup> was used to train the prior model. We used Moses dataset because the molecules in this dataset are lead-like molecules and have good chemical properties. A &#x223c;6M GPT model was used as the prior model, details of which are described in Subsection &#x201c;The prior network&#x201d;. The Reinvent prior model
                    <sup>
                        <xref ref-type="bibr" rid="ref11">12</xref>
                    </sup> (GRU) was trained on the same dataset for comparison. 10,000 molecules were randomly sampled from the training dataset to be used as the training reference.</p>
                <p>A comparison of different models on the Moses distribution learning benchmark
                    <sup>
                        <xref ref-type="bibr" rid="ref22">23</xref>
                    </sup> is shown in Supplementary Table 1 in 
                    <italic toggle="yes">Extended data.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref40">42</xref>
                    </sup> Five Moses metrics, including validity, uniqueness, similarity to the nearest neighbor (SNN), internal diversity (IntDiv), and novelty, were selected for comparison. From the table, we found that the SGPT-RL prior model achieved a relatively good validity (0.936), uniqueness (0.997), and novelty (0.946). Though the Reinvent prior model achieved a better validity (0.986) and uniqueness (1.000), it obtained a poor novelty (0.783). The other two transformer-based methods, MCMG and MolGPT, also achieved a good novelty (0.983 and 0.931 respectively).</p>
                <p>The property distributions of the training reference and molecules sampled from the SGPT-RL and Reinvent prior models were visualized as shown in Supplementary Figure 1 in 
                    <italic toggle="yes">Extended data.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref40">42</xref>
                    </sup> Six selected properties, including DRD2 activity, ACE2 docking score, QED, synthesize accessibility score (SAscore), length of SMILES strings, and molecular weight were used for comparison. Details on the calculation of these properties are described in Subsection &#x201c;Evaluated molecular properties&#x201d;. From this figure, we can see that both prior models learned similar property distributions to the training reference. For molecular weight, the distribution curve of SGPT-RL prior is closer to the training reference than that of the Reinvent prior.</p>
                <p>To compare the generative preferences of the SGPT-RL and the Reinvent prior models, we analyzed the scaffolds of the generated molecules. The overlapping scaffolds and unique scaffolds from each source were visualized using a Venn diagram as shown in 
                    <xref ref-type="fig" rid="f2">Figure 2a</xref>. From this diagram, we found that both the SGPT-RL and the Reinvent prior models were able to recall scaffolds from the training reference and generate many molecules with novel scaffolds. Several examples of the generated molecules and training samples are shown in 
                    <xref ref-type="fig" rid="f2">Figure 2b</xref>.</p>
            </sec>
            <sec id="sec11">
                <title>Optimizing the scores of a QSAR model through RL</title>
                <p>In our experiments, we evaluated SGPT-RL for goal-directed generation with two tasks, a DRD2 task, which used a quantitative structure-activity relationship (QSAR) model
                    <sup>
                        <xref ref-type="bibr" rid="ref11">12</xref>
                    </sup> as the scoring function, and an ACE2 task, which used a docking score calculated from AutoDock Vina
                    <sup>
                        <xref ref-type="bibr" rid="ref32">34</xref>
                    </sup> as the scoring function.</p>
                <p>DRD2 is one of the most well-studied drug targets, with many chemicals active against it being reported.
                    <sup>
                        <xref ref-type="bibr" rid="ref24">25</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref35">37</xref>
                    </sup> A QSAR model was proposed for DRD2 activity prediction.
                    <sup>
                        <xref ref-type="bibr" rid="ref11">12</xref>
                    </sup> In this task, the SGPT-RL prior model pretrained on the Moses dataset was used to initiate the agent, and the agent was trained via RL to optimize the generation of molecules towards good DRD2 activities. The Reinvent model was trained with default hyper-parameters for comparison.
                    <sup>
                        <xref ref-type="bibr" rid="ref11">12</xref>
                    </sup> Details on the training of the agents are shown in Subsection &#x201c;Training the agent&#x201d;. The hyper-parameter of SGPT-RL was fine-tuned as shown in Supplementary Results in 
                    <italic toggle="yes">Extended data.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref40">42</xref>
                    </sup> A sigma value of 60 was chosen for this agent.</p>
                <p>The learning curves of the agent models on the DRD2 task are shown in 
                    <xref ref-type="fig" rid="f3">Figure 3</xref>. From 
                    <xref ref-type="fig" rid="f3">Figures 3a-b</xref>, we see that both agents could learn a good validity and DRD2 activity after 200 steps. The Reinvent agent took fewer steps to obtain good DRD2 activity than the SGPT-RL agent. 
                    <xref ref-type="fig" rid="f3">Figures 3c-d</xref> show that the SGPT-RL agent gradually increased the number of rings during generation and explored more scaffolds within the first 200 steps. The main difference in scaffold exploration between the two agents is in 100-200 steps. The Reinvent agent was not drastically improving the goal after around 100 steps, while the SGPT-RL agent was continuously learning and improving after that.</p>
                <p>The agent models trained after the final step were also evaluated on the Moses benchmark, as shown in 
                    <xref ref-type="table" rid="T1">Table 1</xref>. The Moses metrics of MCMG was also obtained from the original paper for comparison.
                    <sup>
                        <xref ref-type="bibr" rid="ref4">4</xref>
                    </sup> We found that the SGPT-RL agent achieved better validity and novelty, while the Reinvent model obtained a better internal diversity.</p>
                <table-wrap id="T1" orientation="portrait" position="float">
                    <label>Table 1. </label>
                    <caption>
                        <title>Moses metrics of the agent models on the DRD2 task.</title>
                        <p>SGPT-RL generated molecules with good validity and novelty. SNN, similarity to a nearest neighbor; IntDiv, internal diversity; MCMG, multi-constraints molecular generation.</p>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Model</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Validity</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Uniqueness</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">SNN</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">IntDiv</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Novelty</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>Reinvent</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.997</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.880</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.508</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>0.709</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.992</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>MCMG</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">-</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>0.972</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>0.541</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>0.709</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.992</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>SGPT-RL</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>0.998</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.933</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.515</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.683</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>0.995</bold>
                                </td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>The property distributions of the training reference and molecules sampled from the final SGPT-RL and Reinvent agents were also compared in this task, as shown in 
                    <xref ref-type="fig" rid="f3">Figure 3e</xref>.
                    <sup>
                        <xref ref-type="bibr" rid="ref40">42</xref>
                    </sup> The properties analyzed include DRD2 activity, QED, SAscore, LogP, length of SMILES strings, and molecular weight. We found that both SGPT-RL and Reinvent could generate molecules with good DRD2 activities after the final steps, whereas the molecules in training reference have poor DRD2 activities. The property distributions of the molecules generated by the SGPT-RL and Reinvent agents are similar. 
                    <xref ref-type="fig" rid="f3">Figure 3f</xref> shows that both agents shifted the SAscore distributions to the left, which means generating molecules that are relatively harder to synthesize than the molecules in the training reference.</p>
            </sec>
            <sec id="sec12">
                <title>Generating molecules to optimize docking scores</title>
                <p>In this task, we aimed to generate novel molecules targeting ACE2, a receptor protein which SARS-CoV and SARS-CoV-2 bind to enter a cell.
                    <sup>
                        <xref ref-type="bibr" rid="ref36">38</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref37">39</xref>
                    </sup> Only 56 unique molecules were reported to be active against ACE2 in ExCAPE-DB.
                    <sup>
                        <xref ref-type="bibr" rid="ref24">25</xref>
                    </sup> For such targets where few known active molecules are available, it is not possible to build a reliable QSAR model to predict activity. To find binding molecules against targets like ACE2, structure-based docking methods are widely used to evaluate the affinities. In this study, the ACE2 affinity of a molecule was evaluated as the minimum binding free energy calculated by AutoDock Vina.
                    <sup>
                        <xref ref-type="bibr" rid="ref32">34</xref>
                    </sup> Details on the calculation of ACE2 affinity can be found in Subsection &#x201c;Evaluated molecular properties&#x201d;. The pocket, where XX5 is located, in the 3D structure of the human ACE2 receptor (PDB ID 1R4L
                    <sup>
                        <xref ref-type="bibr" rid="ref38">40</xref>
                    </sup>) was used to dock with a ligand. The prior model trained on Moses dataset
                    <sup>
                        <xref ref-type="bibr" rid="ref22">23</xref>
                    </sup> was also used to initiate this agent, and the agent was trained for 1,000 steps. The Reinvent model was also trained on this task for a fair comparison.</p>
                <p>The learning curves of the agent models are shown in 
                    <xref ref-type="fig" rid="f4">Figure 4</xref>. The SGPT-RL agent was able to generate valid molecules with good ACE2 docking scores after 200 steps. Like the DRD2 task, in the ACE2 task the Reinvent model was not efficiently learning after around 100 steps. The docking scores of the generated molecules were not clearly improving after that. Besides, we also observed that SGPT-RL gradually increased the number of rings in the exploration process, as shown in 
                    <xref ref-type="fig" rid="f4">Figure 4c</xref>. Examples of molecules generated by SGPT-RL during the initial exploration steps are shown in 
                    <xref ref-type="fig" rid="f5">Figure 5</xref>. The SGPT-RL agent generated molecules with few rings in the first step, and gradually increased the number of rings. The Reinvent agent was randomly exploring the molecules, and no clear patterns can be observed, as shown in Supplementary Figure 7 in 
                    <italic toggle="yes">Extended data.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref40">42</xref>
                    </sup>
                </p>
                <fig fig-type="figure" id="f4" orientation="portrait" position="float">
                    <label>Figure 4. </label>
                    <caption>
                        <title>Comparison of SGPT-RL and Reinvent on the ACE2 task.</title>
                        <p>a-b) Improvements of validity and ACE2 docking scores during the RL process. SGPT-RL generated molecules with better validity and ACE2 docking scores than Reinvent after 200 steps. c) Averaged number of rings in the generated molecules in the RL steps. SGPT-RL gradually increased the number of chemical rings of the molecules. The curve difference in c is highly correlated with the curve difference in b (Pearson&#x2019;s r = 0.87). d) Accumulated number of unique scaffolds in the generated molecules during the RL process. Both SGPT-RL and Reinvent generated new scaffolds with increasing steps. e) The distribution of ACE2 docking scores. SGPT-RL shifted the distribution towards better docking scores. f) The distribution of SAscore.</p>
                    </caption>
                    <graphic id="gr4" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/162639/2e93d1c3-2cb6-43b4-a2cc-3e4cf0a95707_figure4.gif"/>
                </fig>
                <fig fig-type="figure" id="f5" orientation="portrait" position="float">
                    <label>Figure 5. </label>
                    <caption>
                        <title>Examples of scaffolds explored by SGPT-RL in the initial steps of the ACE2 task.</title>
                        <p>The SGPT-RL agent generated molecules with few rings in the beginning, and gradually increased the number of rings. DS, docking score.</p>
                    </caption>
                    <graphic id="gr5" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/162639/2e93d1c3-2cb6-43b4-a2cc-3e4cf0a95707_figure5.gif"/>
                </fig>
                <p>The final agents were evaluated on the Moses metrics, as shown in 
                    <xref ref-type="table" rid="T2">Table 2</xref>. The SGPT-RL agent achieved good validity (0.990) and novelty (1.000), while Reinvent was better on SNN and internal diversity. The property distributions were plotted for the two agents. Six selected properties, including ACE2 docking score, QED, SAscore, LogP, length of SMILES string, and molecular weight, were analyzed, as shown in Supplementary Figure 8 in 
                    <italic toggle="yes">Extended data.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref40">42</xref>
                    </sup> Calculations of these properties are described in Subsection &#x201c;Evaluated molecular properties&#x201d;. From 
                    <xref ref-type="fig" rid="f4">Figure 4e</xref>,
                    <sup>
                        <xref ref-type="bibr" rid="ref40">42</xref>
                    </sup> we see that the SGPT-RL agent was able to generate molecules with good docking scores and clearly shifted the distribution curves to the left. The ACE2 docking scores of SGPT-RL generated molecules were better than the training reference or the Reinvent generated molecules. Supplementary Figure 9 in 
                    <italic toggle="yes">Extended data</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref40">42</xref>
                    </sup> shows some examples of molecules generated by the agents in the last step. SGPT-RL generated molecules are more similar to each other in comparison with Reinvent generated molecules. From these molecules, we can see that SGPT-RL tends to generate with certain preferences, such as a naphthalene structure in one end in this task.</p>
                <table-wrap id="T2" orientation="portrait" position="float">
                    <label>Table 2. </label>
                    <caption>
                        <title>Moses metrics of the agents on the ACE2 task.</title>
                        <p>SNN, similarity to a nearest neighbor; IntDiv, internal diversity.</p>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Model</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Validity</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Uniqueness</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">SNN</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">IntDiv</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Novelty</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>Reinvent</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.875</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>0.987</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>0.560</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>0.816</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.976</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>SGPT-RL</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>0.990</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.986</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.466</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.797</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>1.000</bold>
                                </td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>The top six molecules with the highest docking scores generated by the agents are shown in 
                    <xref ref-type="fig" rid="f6">Figure 6</xref>. The SGPT-RL agent was able to generate more molecules with high docking affinities than the Reinvent agent. Besides, five out of the top six molecules generated by SGPT-RL contain a naphthalene structure in one end. Considering the same pattern in the molecules generated by SGPT-RL in the last step, we would guess that the agent had learned such a pattern during the exploration process. However, the top scoring molecules generated by the Reinvent agent have strong randomness and no clear scaffold patterns can be observed.</p>
                <fig fig-type="figure" id="f6" orientation="portrait" position="float">
                    <label>Figure 6. </label>
                    <caption>
                        <title>Top scoring molecules generated in the ACE2 task.</title>
                        <p>The SGPT-RL generated molecules are more similar to each other in comparison with the Reinvent generated molecules. DS, docking score.</p>
                    </caption>
                    <graphic id="gr6" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/162639/2e93d1c3-2cb6-43b4-a2cc-3e4cf0a95707_figure6.gif"/>
                </fig>
            </sec>
        </sec>
        <sec id="sec13" sec-type="discussion">
            <title>Discussion</title>
            <p>In this study, we developed a tool named SGPT-RL for de novo molecular generation, which uses a transformer decoder as the policy network of the reinforcement learning (RL) agent. A workstation with two A100 GPUs was used for our experiments. The docking score was used as a scoring function in addition to a QSAR-based scoring function. This enabled us to explore not only a target with many known active molecules but also a new target with few known actives.</p>
            <p>We evaluated SGPT-RL on two goal-directed generation tasks, a DRD2 task and an ACE2 task. As many known DRD2 actives are available, it is possible to build a reliable QSAR model to be used as the scoring function in the DRD2 task. However, few known actives were reported for ACE2, so Vina docking scores had to be used as the optimization goal in the ACE2 task. Our experiments showed that both SGPT-RL (which uses GPT as the policy network) and Reinvent (which uses GRU as the policy network) were able to propose molecules with improved scores on the two tasks. However, the SGPT-RL generated molecules showed significantly better scores on the ACE2 task compared to the Reinvent generated ones (p-value: 0.0). As the molecular docking was widely used for the virtual screening process, we believe that the superior performance of SGPT-RL in the ACE2 task would indicate its wide applicability in the practical molecular design procedure.</p>
            <p>Besides, we found three generative differences between the SGPT-RL and Reinvent agents during the exploration steps. First, in the experiments, we found that Reinvent was exploring with strong randomness in the two tasks in general, however, SGPT-RL gradually explored the scaffolds during the generation processes. In the initial steps, SGPT-RL generated molecules with few rings and gradually increased the number of rings during exploration; in the late steps, it generated molecules with some conserved scaffold patterns, such as double ring structures in the ACE2 task. Second, we found that Reinvent was not clearly improving the goal after around 100 steps, while SGPT-RL was continuously optimizing the scores even after 400 steps. We believe that this difference is mainly caused by the difference in policy networks: it is not easy for GRU to learn ring patterns, which are represented as distant numbers in SMILES; however, GPT was able to learn long-range dependencies to remember the ring patterns that had improved scores in previous steps. Thirdly, the SGPT-RL agent could generate molecules with more rings than the Reinvent agent in the ACE2 task (shown in 
                <xref ref-type="fig" rid="f4">Figure 4c</xref>). A diverse number of rings indicates a variety of scaffold structures. Considering the importance of appropriate scaffolds in lead identification,
                <sup>
                    <xref ref-type="bibr" rid="ref39">41</xref>
                </sup> we believe that including GPT as the policy network in RL agents might be useful to discover lead candidates of novel scaffolds.</p>
            <p>While the results of our work are noteworthy, there are two limitations to consider. First, the dataset to train the prior models would be a limit to the generative results. All the prior models were pretrained on the Moses dataset.
                <sup>
                    <xref ref-type="bibr" rid="ref22">23</xref>
                </sup> As the Moses dataset was collected from the Zinc database,
                <sup>
                    <xref ref-type="bibr" rid="ref23">24</xref>
                </sup> which mainly consists of lead-like molecules, the prior distribution could not represent the entire chemical space. The prior models were used to guide the agents in the two optimization tasks, and the bias in the prior models might contribute to the bias in the agent models. Such bias might be contributive, because it would help to generate molecules with lead-like properties, such as good synthetic accessibility and drug-likeness; however, it might also be undesirable, as it limits the chemical space the agents explored. In tasks which aim to explore out of the space of lead-like molecules, other training data should be utilized to train the prior models. Second, the settings of the docking experiments would also be a limit. We analyzed ACE2 for docking, but docking experiments of additional targets would further confirm the observations in our study.</p>
            <p>As molecular docking was widely used for virtual screening, generative models combined with molecular docking provides another solution for the virtual screening process. The superior performance of SGPT-RL on the ACE2 task indicates that it can be applied to this practical molecular design process and propose novel molecules with good target-binding capabilities. Besides, SGPT-RL explored the chemical space with certain scaffold patterns. The patterns learned by SGPT-RL can provide intuitions for chemists to explore, thus aid the molecular design.</p>
        </sec>
    </body>
    <back>
        <sec id="sec17" sec-type="data-availability">
            <title>Data availability</title>
            <sec id="sec18">
                <title>Underlying data</title>
                <p>Protein Data Bank: 3D structure of the human ACE2 receptor. Accession number 1R4L; 
                    <ext-link ext-link-type="uri" xlink:href="https://www.rcsb.org/structure/1R4L">https://www.rcsb.org/structure/1R4L</ext-link>.</p>
                <p>The dataset to train the prior models was obtained from the Moses benchmark.
                    <sup>

                        <xref ref-type="bibr" rid="ref22">23</xref>
</sup> This dataset contains 1.9 million lead-like molecules from the Zinc database, and is available to readers here: 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/molecularsets/moses">https://github.com/molecularsets/moses</ext-link>. The train and test dataset in the 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/molecularsets/moses">Moses benchmark</ext-link>, used here for training and testing, contains 1,584,664 and 176,075 molecules respectively. Moses is licensed under 
                    <ext-link ext-link-type="uri" xlink:href="https://opensource.org/license/mit/">MIT</ext-link> license (redistribution permitted).</p>
                <p>The 8,036 unique molecules that are known to be active against DRD2 and 56 unique molecules that are active against ACE2 were downloaded from 
                    <ext-link ext-link-type="uri" xlink:href="https://solr.ideaconsult.net/search/excape/">ExCAPE-DB
</ext-link>,
                    <sup>

                        <xref ref-type="bibr" rid="ref24">25</xref>
</sup> and which are licensed under 
                    <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link> (redistribution permitted).</p>
                <p>The specific underlying data used in this study been uploaded by the authors to Zenodo (see below).</p>
                <p>Zenodo: Optimization of binding affinities in chemical space with transformer and deep reinforcement learning -- source data. 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.10654313">https://doi.org/10.5281/zenodo.10654313</ext-link>.
                    <sup>

                        <xref ref-type="bibr" rid="ref40">42</xref>
</sup>
                </p>
                <p>This project contains the following underlying data:
                    <list list-type="bullet">
                        <list-item>
                            <label>-</label>
                            <p>Data.zip (the Moses dataset, the DRD2 and ACE2 active molecules, the pretrained models, and the source data underlying 
                                <xref ref-type="fig" rid="f3">
Figures 3</xref>&#x2013;
                                <xref ref-type="fig" rid="f4">4</xref>).
</p>
                        </list-item>
                    </list>
                </p>
                <p>Data are available under the terms of the 
                    <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/publicdomain/zero/1.0/">Creative Commons Zero &#x201c;No rights reserved&#x201d; data waiver</ext-link> (CC0 1.0 Public domain dedication).</p>
            </sec>
            <sec id="sec19">
                <title>Extended data</title>
                <p>Zenodo: Optimization of binding affinities in chemical space with transformer and deep reinforcement learning -- source data 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.10654313">https://doi.org/10.5281/zenodo.10654313</ext-link>.
                    <sup>

                        <xref ref-type="bibr" rid="ref40">42</xref>
</sup>
                </p>
                <p>This project contains the following extended data:
                    <list list-type="bullet">
                        <list-item>
                            <label>-</label>
                            <p>SGPT_SI.pdf (supplementary results, tables, and figures).</p>
                        </list-item>
                        <list-item>
                            <label>-</label>
                            <p>Sgpt-rl.png (the workflow of SGPT-RL).
</p>
                        </list-item>
                    </list>
                </p>
                <p>Data are available under the terms of the 
                    <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/publicdomain/zero/1.0/">Creative Commons Zero &#x201c;No rights reserved&#x201d; data waiver</ext-link> (CC0 1.0 Public domain dedication).</p>
            </sec>
        </sec>
        <sec id="sec14">
            <title>Software availability</title>
            <p>Source code available from: 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/charlesxu90/sgpt">https://github.com/charlesxu90/sgpt</ext-link>
            </p>
            <p>Archived source code at time of publication: 
                <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.7612354">https://doi.org/10.5281/zenodo.7612354</ext-link>.
                <sup>

                    <xref ref-type="bibr" rid="ref42">43</xref>
</sup>
            </p>
            <p>License: 
                <ext-link ext-link-type="uri" xlink:href="https://opensource.org/license/mit/">MIT</ext-link>
            </p>
        </sec>
        <ref-list>
            <title>References</title>
            <ref id="ref1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Nicolaou</surname>
                            <given-names>CA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Brown</surname>
                            <given-names>N</given-names>
                        </name>
</person-group>:
                    <article-title>Multi-objective optimization methods in drug design.</article-title>
                    <source>

                        <italic toggle="yes">Drug Discov. Today Technol.</italic>
</source>
                    <year>2013</year>;<volume>10</volume>(<issue>3</issue>):<fpage>e427</fpage>&#x2013;<lpage>e435</lpage>.
                    <pub-id pub-id-type="doi">10.1016/j.ddtec.2013.02.001</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hughes</surname>
                            <given-names>JP</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Stephen Rees</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kalindjian</surname>
                            <given-names>B</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Principles of early drug discovery.</article-title>
                    <source>

                        <italic toggle="yes">Br. J. Pharmacol.</italic>
</source>
                    <year>2011</year>;<volume>162</volume>(<issue>6</issue>):<fpage>1239</fpage>&#x2013;<lpage>1249</lpage>.
                    <pub-id pub-id-type="pmid">21091654</pub-id>
                    <pub-id pub-id-type="doi">10.1111/j.1476-5381.2010.01127.x</pub-id>
                    <pub-id pub-id-type="pmcid">PMC3058157</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Elton</surname>
                            <given-names>DC</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Boukouvalas</surname>
                            <given-names>Z</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Fuge</surname>
                            <given-names>MD</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Deep learning for molecular design&#x2014;a review of the state of the art.</article-title>
                    <source>

                        <italic toggle="yes">Molecular Systems Design &amp; Engineering.</italic>
</source>
                    <year>2019</year>;<volume>4</volume>(<issue>4</issue>):<fpage>828</fpage>&#x2013;<lpage>849</lpage>.
                    <pub-id pub-id-type="doi">10.1039/C9ME00039A</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hsieh</surname>
                            <given-names>C-Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Multi-constraint molecular generation based on conditional transformer, knowledge distillation and reinforcement learning.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Mach. Intell.</italic>
</source>
                    <year>2021</year>;<volume>3</volume>(<issue>10</issue>):<fpage>914</fpage>&#x2013;<lpage>922</lpage>.
                    <pub-id pub-id-type="doi">10.1038/s42256-021-00403-1</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Butler</surname>
                            <given-names>KT</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Davies</surname>
                            <given-names>DW</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Cartwright</surname>
                            <given-names>H</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Machine learning for molecular and materials science.</article-title>
                    <source>

                        <italic toggle="yes">Nature.</italic>
</source>
                    <year>2018</year>;<volume>559</volume>(<issue>7715</issue>):<fpage>547</fpage>&#x2013;<lpage>555</lpage>.
                    <pub-id pub-id-type="doi">10.1038/s41586-018-0337-2</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>St&#x00e5;hl</surname>
                            <given-names>N</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Falkman</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Karlsson</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Deep reinforcement learning for multiparameter optimization in de novo drug design.</article-title>
                    <source>

                        <italic toggle="yes">J. Chem. Inf. Model.</italic>
</source>
                    <year>2019</year>;<volume>59</volume>(<issue>7</issue>):<fpage>3166</fpage>&#x2013;<lpage>3176</lpage>.
                    <pub-id pub-id-type="doi">10.1021/acs.jcim.9b00325</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hoffmann</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gastreich</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>The next level in chemical space navigation: going far beyond enumerable compound libraries.</article-title>
                    <source>

                        <italic toggle="yes">Drug Discov. Today.</italic>
</source>
                    <year>2019</year>;<volume>24</volume>(<issue>5</issue>):<fpage>1148</fpage>&#x2013;<lpage>1156</lpage>.
                    <pub-id pub-id-type="pmid">30851414</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.drudis.2019.02.013</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Xia</surname>
                            <given-names>X</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Jianxing</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>Y</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Graph-based generative models for de novo drug design.</article-title>
                    <source>

                        <italic toggle="yes">Drug Discov. Today Technol.</italic>
</source>
                    <year>2019</year>;<volume>32</volume>:<fpage>45</fpage>&#x2013;<lpage>53</lpage>.</mixed-citation>
            </ref>
            <ref id="ref9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Vanhaelen</surname>
                            <given-names>Q</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lin</surname>
                            <given-names>Y-C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhavoronkov</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>The advent of generative chemistry.</article-title>
                    <source>

                        <italic toggle="yes">ACS Med. Chem. Lett.</italic>
</source>
                    <year>2020</year>;<volume>11</volume>(<issue>8</issue>):<fpage>1496</fpage>&#x2013;<lpage>1505</lpage>.
                    <pub-id pub-id-type="pmid">32832015</pub-id>
                    <pub-id pub-id-type="doi">10.1021/acsmedchemlett.0c00088</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7429972</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref43">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bai</surname>
                            <given-names>Q</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Liu</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Tian</surname>
                            <given-names>Y</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Application advances of deep learning methods for de novo drug design and molecular dynamics simulation. 
                        <italic toggle="yes">Wiley Interdisciplinary Reviews.</italic>
</article-title>
                    <source>

                        <italic toggle="yes">Wiley Interdiscip. Rev. Comput. Mol. Sci.</italic>
</source>
                    <year>2022</year>;<volume>12</volume>(<issue>3</issue>): e1581.
                    <pub-id pub-id-type="doi">10.1002/wcms.1581</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref10">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>G&#x00f3;mez-Bombarelli</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wei</surname>
                            <given-names>JN</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Duvenaud</surname>
                            <given-names>D</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Automatic chemical design using a data-driven continuous representation of molecules.</article-title>
                    <source>

                        <italic toggle="yes">ACS central science.</italic>
</source>
                    <year>2018</year>;<volume>4</volume>(<issue>2</issue>):<fpage>268</fpage>&#x2013;<lpage>276</lpage>.
                    <pub-id pub-id-type="pmid">29532027</pub-id>
                    <pub-id pub-id-type="doi">10.1021/acscentsci.7b00572</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5833007</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref11">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Olivecrona</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Blaschke</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Engkvist</surname>
                            <given-names>O</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Molecular de-novo design through deep reinforcement learning.</article-title>
                    <source>

                        <italic toggle="yes">J. Chem.</italic>
</source>
                    <year>2017</year>;<volume>9</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>14</lpage>.
                    <pub-id pub-id-type="doi">10.1186/s13321-017-0235-x</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref12">
                <label>13</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Blaschke</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Aru&#x00b4;s-Pous</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Chen</surname>
                            <given-names>H</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Reinvent 2.0: an ai tool for de novo drug design.</article-title>
                    <source>

                        <italic toggle="yes">J. Chem. Inf. Model.</italic>
</source>
                    <year>2020</year>;<volume>60</volume>(<issue>12</issue>):<fpage>5918</fpage>&#x2013;<lpage>5922</lpage>.
                    <pub-id pub-id-type="doi">10.1021/acs.jcim.0c00915</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref13">
                <label>14</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Vaswani</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Shazeer</surname>
                            <given-names>N</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Parmar</surname>
                            <given-names>N</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Attention is all you need.</article-title>
                    <source>

                        <italic toggle="yes">Adv. Neural Inf. Proces. Syst.</italic>
</source>
                    <year>2017</year>;<volume>30</volume>.</mixed-citation>
            </ref>
            <ref id="ref14">
                <label>15</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Lin</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Liu</surname>
                            <given-names>X</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A survey of transformers.</article-title>
                    <source>

                        <italic toggle="yes">arXiv preprint arXiv:2106.04554.</italic>
</source>
                    <year>2021</year>.</mixed-citation>
            </ref>
            <ref id="ref15">
                <label>16</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Radford</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Narasimhan</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Salimans</surname>
                            <given-names>T</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Improving language understanding by generative pre-training.</article-title>
                    <source>

                        <italic toggle="yes">arXiv preprint.</italic>
</source>
                    <year>2018</year>.</mixed-citation>
            </ref>
            <ref id="ref16">
                <label>17</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Brown</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mann</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ryder</surname>
                            <given-names>N</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Language models are few-shot learners.</article-title>
                    <source>

                        <italic toggle="yes">Adv. Neural Inf. Proces. Syst.</italic>
</source>
                    <year>2020</year>;<volume>33</volume>:<fpage>1877</fpage>&#x2013;<lpage>1901</lpage>.</mixed-citation>
            </ref>
            <ref id="ref17">
                <label>18</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ouyang</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wu</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Jiang</surname>
                            <given-names>X</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Training language models to follow instructions with human feedback.</article-title>
                    <source>

                        <italic toggle="yes">arXiv preprint arXiv:2203.02155.</italic>
</source>
                    <year>2022</year>.</mixed-citation>
            </ref>
            <ref id="ref18">
                <label>19</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Grechishnikova</surname>
                            <given-names>D</given-names>
                        </name>
</person-group>:
                    <article-title>Transformer neural network for protein-specific de novo drug generation as a machine translation problem.</article-title>
                    <source>

                        <italic toggle="yes">Sci. Rep.</italic>
</source>
                    <year>2021</year>;<volume>11</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>13</lpage>.
                    <pub-id pub-id-type="doi">10.1038/s41598-020-79682-4</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref19">
                <label>20</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bagal</surname>
                            <given-names>V</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Aggarwal</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Vinod</surname>
                            <given-names>PK</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Molgpt: Molecular generation using a transformer-decoder model.</article-title>
                    <source>

                        <italic toggle="yes">J. Chem. Inf. Model.</italic>
</source>
                    <year>2021</year>;<volume>62</volume>(<issue>9</issue>):<fpage>2064</fpage>&#x2013;<lpage>2076</lpage>.
                    <pub-id pub-id-type="pmid">34694798</pub-id>
                    <pub-id pub-id-type="doi">10.1021/acs.jcim.1c00600</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref20">
                <label>21</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>He</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>You</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sandstro&#x00a8;m</surname>
                            <given-names>E</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Molecular optimization by capturing chemist&#x2019;s intuition using deep neural networks.</article-title>
                    <source>

                        <italic toggle="yes">J. Chem.</italic>
</source>
                    <year>2021</year>;<volume>13</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>17</lpage>.
                    <pub-id pub-id-type="doi">10.1186/s13321-021-00497-0</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref21">
                <label>22</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Boitreaud</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mallet</surname>
                            <given-names>V</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Oliver</surname>
                            <given-names>C</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Optimol: optimization of binding affinities in chemical space for drug discovery.</article-title>
                    <source>

                        <italic toggle="yes">J. Chem. Inf. Model.</italic>
</source>
                    <year>2020</year>;<volume>60</volume>(<issue>12</issue>):<fpage>5658</fpage>&#x2013;<lpage>5666</lpage>.
                    <pub-id pub-id-type="pmid">32986426</pub-id>
                    <pub-id pub-id-type="doi">10.1021/acs.jcim.0c00833</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref22">
                <label>23</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Polykovskiy</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhebrak</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sanchez-Lengeling</surname>
                            <given-names>B</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Molecular sets (moses): a benchmarking platform for molecular generation models.</article-title>
                    <source>

                        <italic toggle="yes">Front. Pharmacol.</italic>
</source>
                    <year>2020</year>;<volume>11</volume>:<fpage>1931</fpage>.</mixed-citation>
            </ref>
            <ref id="ref23">
                <label>24</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Irwin</surname>
                            <given-names>JJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Shoichet</surname>
                            <given-names>BK</given-names>
                        </name>
</person-group>:
                    <article-title>Zinc- a free database of commercially available compounds for virtual screening.</article-title>
                    <source>

                        <italic toggle="yes">J. Chem. Inf. Model.</italic>
</source>
                    <year>2005</year>;<volume>45</volume>(<issue>1</issue>):<fpage>177</fpage>&#x2013;<lpage>182</lpage>.
                    <pub-id pub-id-type="pmid">15667143</pub-id>
                    <pub-id pub-id-type="doi">10.1021/ci049714+</pub-id>
                    <pub-id pub-id-type="pmcid">PMC1360656</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref24">
                <label>25</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Sun</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Jeliazkova</surname>
                            <given-names>N</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Chupakhin</surname>
                            <given-names>V</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Excape-db: an integrated large scale dataset facilitating big data analysis in chemogenomics.</article-title>
                    <source>

                        <italic toggle="yes">J. Chem.</italic>
</source>
                    <year>2017</year>;<volume>9</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>9</lpage>.</mixed-citation>
            </ref>
            <ref id="ref25">
                <label>26</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Radford</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Jeffrey</surname>
                            <given-names>W</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Child</surname>
                            <given-names>R</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Language models are unsupervised multitask learners.</article-title>
                    <source>

                        <italic toggle="yes">OpenAI blog.</italic>
</source>
                    <year>2019</year>;<volume>1</volume>(<issue>8</issue>):<fpage>9</fpage>.</mixed-citation>
            </ref>
            <ref id="ref41">
                <label>27</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Loshchilov</surname>
                            <given-names>I</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hutter</surname>
                            <given-names>F</given-names>
                        </name>
</person-group>:
                    <article-title>Decoupled Weight Decay Regularization.</article-title>
                    <source>

                        <italic toggle="yes">International Conference on Learning Representations.</italic>
</source>
                    <year>2019</year>.</mixed-citation>
            </ref>
            <ref id="ref28">
                <label>28</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ertl</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Schuffenhauer</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions.</article-title>
                    <source>

                        <italic toggle="yes">J. Chem.</italic>
</source>
                    <year>2009</year>;<volume>1</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>11</lpage>.
                    <pub-id pub-id-type="doi">10.1186/1758-2946-1-8</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref26">
                <label>29</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Richard Bickerton</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Paolini</surname>
                            <given-names>GV</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Besnard</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Quantifying the chemical beauty of drugs.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Chem.</italic>
</source>
                    <year>2012</year>;<volume>4</volume>(<issue>2</issue>):<fpage>90</fpage>&#x2013;<lpage>98</lpage>.
                    <pub-id pub-id-type="pmid">22270643</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nchem.1243</pub-id>
                    <pub-id pub-id-type="pmcid">PMC3524573</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref27">
                <label>30</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Landrum</surname>
                            <given-names>G</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling.</article-title>
                    <year>2013</year>.</mixed-citation>
            </ref>
            <ref id="ref29">
                <label>31</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>DeLano</surname>
                            <given-names>WL</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Pymol: An open-source molecular graphics tool.</article-title>
                    <source>

                        <italic toggle="yes">CCP4 Newsl. Protein Crystallogr.</italic>
</source>
                    <year>2002</year>;<volume>40</volume>(<issue>1</issue>):<fpage>82</fpage>&#x2013;<lpage>92</lpage>.</mixed-citation>
            </ref>
            <ref id="ref30">
                <label>32</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Morris</surname>
                            <given-names>GM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Huey</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lindstrom</surname>
                            <given-names>W</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Autodock4 and autodocktools4: Automated docking with selective receptor flexibility.</article-title>
                    <source>

                        <italic toggle="yes">J. Comput. Chem.</italic>
</source>
                    <year>2009</year>;<volume>30</volume>(<issue>16</issue>):<fpage>2785</fpage>&#x2013;<lpage>2791</lpage>.
                    <pub-id pub-id-type="pmid">19399780</pub-id>
                    <pub-id pub-id-type="doi">10.1002/jcc.21256</pub-id>
                    <pub-id pub-id-type="pmcid">PMC2760638</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref31">
                <label>33</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>O&#x2019;Boyle</surname>
                            <given-names>NM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Banck</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>James</surname>
                            <given-names>CA</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Open babel: An open chemical toolbox.</article-title>
                    <source>

                        <italic toggle="yes">J. Chem.</italic>
</source>
                    <year>2011</year>;<volume>3</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>14</lpage>.</mixed-citation>
            </ref>
            <ref id="ref32">
                <label>34</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Trott</surname>
                            <given-names>O</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Olson</surname>
                            <given-names>AJ</given-names>
                        </name>
</person-group>:
                    <article-title>Autodock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading.</article-title>
                    <source>

                        <italic toggle="yes">J. Comput. Chem.</italic>
</source>
                    <year>2010</year>;<volume>31</volume>(<issue>2</issue>):<fpage>455</fpage>&#x2013;<lpage>461</lpage>.</mixed-citation>
            </ref>
            <ref id="ref33">
                <label>35</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Butina</surname>
                            <given-names>D</given-names>
                        </name>
</person-group>:
                    <article-title>Unsupervised data base clustering based on daylight&#x2019;s fingerprint and tanimoto similarity: A fast and automated way to cluster small and large data sets.</article-title>
                    <source>

                        <italic toggle="yes">J. Chem. Inf. Comput. Sci.</italic>
</source>
                    <year>1999</year>;<volume>39</volume>(<issue>4</issue>):<fpage>747</fpage>&#x2013;<lpage>750</lpage>.
                    <pub-id pub-id-type="doi">10.1021/ci9803381</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref34">
                <label>36</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Mills</surname>
                            <given-names>N</given-names>
                        </name>
</person-group>:
                    <article-title>Chemdraw ultra 10.0 cambridgesoft, 100 cambridgepark drive, cambridge, ma 02140.</article-title>
                    <year>2006</year>. commercial price: 1910fordownload, 2150 for cd-rom; academic price: 710fordownload, 800 for cd-rom.
                    <ext-link ext-link-type="uri" xlink:href="http://www.cambridgesoft.com">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref35">
                <label>37</label>
                <mixed-citation publication-type="other">
                    <collab>GeneCards</collab>:
                    <article-title>DRD2 Gene - Dopamine Receptor D2.</article-title>
                    <year>2022</year>.</mixed-citation>
            </ref>
            <ref id="ref36">
                <label>38</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zhou</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Yang</surname>
                            <given-names>X-L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>X-G</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A pneumonia outbreak associated with a new coronavirus of probable bat origin.</article-title>
                    <source>

                        <italic toggle="yes">Nature.</italic>
</source>
                    <year>2020</year>;<volume>579</volume>(<issue>7798</issue>):<fpage>270</fpage>&#x2013;<lpage>273</lpage>.
                    <pub-id pub-id-type="pmid">32015507</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41586-020-2012-7</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7095418</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref37">
                <label>39</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Napolitano</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Xiaopeng</surname>
                            <given-names>X</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gao</surname>
                            <given-names>X</given-names>
                        </name>
</person-group>:
                    <article-title>Impact of computational approaches in the fight against covid-19: an ai guided review of 17 000 studies.</article-title>
                    <source>

                        <italic toggle="yes">Brief. Bioinform.</italic>
</source>
                    <year>2022</year>;<volume>23</volume>(<issue>1</issue>):<fpage>bbab456</fpage>.
                    <pub-id pub-id-type="pmid">34788381</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bib/bbab456</pub-id>
                    <pub-id pub-id-type="pmcid">PMC8689952</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref38">
                <label>40</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Towler</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Staker</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Prasad</surname>
                            <given-names>SG</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Ace2 x-ray structures reveal a large hinge-bending motion important for inhibitor binding and catalysis.</article-title>
                    <source>

                        <italic toggle="yes">J. Biol. Chem.</italic>
</source>
                    <year>2004</year>;<volume>279</volume>(<issue>17</issue>):<fpage>17996</fpage>&#x2013;<lpage>18007</lpage>.
                    <pub-id pub-id-type="pmid">14754895</pub-id>
                    <pub-id pub-id-type="doi">10.1074/jbc.M311191200</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7980034</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref39">
                <label>41</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zhao</surname>
                            <given-names>H</given-names>
                        </name>
</person-group>:
                    <article-title>Scaffold selection and scaffold hopping in lead generation: a medicinal chemistry perspective.</article-title>
                    <source>

                        <italic toggle="yes">Drug Discov. Today.</italic>
</source>
                    <year>2007</year>;<volume>12</volume>(<issue>3-4</issue>):<fpage>149</fpage>&#x2013;<lpage>155</lpage>.
                    <pub-id pub-id-type="pmid">17275735</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.drudis.2006.12.003</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref40">
                <label>42</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Xu</surname>
                            <given-names>X</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhou</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhu</surname>
                            <given-names>C</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Optimization of binding affinities in chemical space with generative pre-trained transformer and deep reinforcement learning -- source data (v1.2.4).</article-title>
                    <source>

                        <italic toggle="yes">Zenodo.</italic>
</source>
                    <year>2023</year>.
                    <pub-id pub-id-type="doi">10.5281/zenodo.10654313</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref42">
                <label>43</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Xu</surname>
                            <given-names>X</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhou</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhu</surname>
                            <given-names>C</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Optimization of binding affinities in chemical space with generative pre-trained transformer and deep reinforcement learning -- source code (v1.2.0).</article-title>
                    <source>

                        <italic toggle="yes">Zenodo.</italic>
</source>
                    <year>2023</year>.
                    <pub-id pub-id-type="doi">10.5281/zenodo.7612354</pub-id>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report248667">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.162639.r248667</article-id>
            <title-group>
                <article-title>Reviewer response for version 2</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Wang</surname>
                        <given-names>Jianmin</given-names>
                    </name>
                    <xref ref-type="aff" rid="r248667a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-8910-0929</uri>
                </contrib>
                <aff id="r248667a1">
                    <label>1</label>Yonsei University, Seodaemun-gu, Seoul, South Korea</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>29</day>
                <month>2</month>
                <year>2024</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2024 Wang J</copyright-statement>
                <copyright-year>2024</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport248667" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.130936.2"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>No further comment. Thank you for your kind responses.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Yes</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Yes</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>drug design, deep learning</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report248666">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.162639.r248666</article-id>
            <title-group>
                <article-title>Reviewer response for version 2</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Bai</surname>
                        <given-names>Qifeng</given-names>
                    </name>
                    <xref ref-type="aff" rid="r248666a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-7296-6187</uri>
                </contrib>
                <aff id="r248666a1">
                    <label>1</label>Lanzhou University, Lanzhou, Gansu, China</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>23</day>
                <month>2</month>
                <year>2024</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2024 Bai Q</copyright-statement>
                <copyright-year>2024</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport248666" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.130936.2"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>Good work. Please accept it.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Yes</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Partly</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Yes</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>deep learning, binding affinity and drug design,</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report248664">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.162639.r248664</article-id>
            <title-group>
                <article-title>Reviewer response for version 2</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Wong</surname>
                        <given-names>Ka-Chun</given-names>
                    </name>
                    <xref ref-type="aff" rid="r248664a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-6062-733X</uri>
                </contrib>
                <aff id="r248664a1">
                    <label>1</label>Department of Computer Science, City University of Hong Kong, Kowloon Tong, Kowloon, Hong Kong</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>23</day>
                <month>2</month>
                <year>2024</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2024 Wong KC</copyright-statement>
                <copyright-year>2024</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport248664" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.130936.2"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The authors have responded.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Yes</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Yes</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Bioinformatics</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report188009">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.143734.r188009</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Tanr&#x0131;verdi</surname>
                        <given-names>Asl&#x0131;han Aycan</given-names>
                    </name>
                    <xref ref-type="aff" rid="r188009a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-5811-8253</uri>
                </contrib>
                <aff id="r188009a1">
                    <label>1</label>Kafkas University, Kars Merkez, Turkey</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>12</day>
                <month>12</month>
                <year>2023</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2023 Tanr&#x0131;verdi AA</copyright-statement>
                <copyright-year>2023</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport188009" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.130936.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The authors published the paper entitled "Optimization of binding affinities in chemical space with generative pre-trained transformer and deep reinforcement learning [version 1; peer review: 1 approved with reservations]." The work is very comprehensive. An innovative article worth publishing. I want to congratulate the authors. There's just one point where I'm stuck.</p>
            <p> </p>
            <p> ***QSAR processing methodology should be given step by step in the methods section.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Yes</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Partly</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Yes</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>- Polymer Synthesis and Characterization- Monomer Synthesis and Ch.- Quantum Chemistry- Molecular Modelling- Molecular Dynamic- Drug Design- Density Functional Theory- Atom in Molecules Analysis- Film Formation- Gel Formation</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment11067-188009">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Xu</surname>
                            <given-names>Xiaopeng</given-names>
                        </name>
                        <aff>King Aabdullah University of Science and Technology, Saudi Arabia</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>The authors declare no conflicts of interest.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>14</day>
                    <month>2</month>
                    <year>2024</year>
                </pub-date>
            </front-stub>
            <body>
                <p>
                    <bold>We thank the Reviewer for sharing our aims and appreciating our efforts. The issue pointed out is responded as below.</bold>
                </p>
                <p> </p>
                <p> ***QSAR processing methodology should be given step by step in the methods section.</p>
                <p> 
                    <bold>We thank the Reviewer for pointing out this issue. We added descriptions to explain the QSAR processing part, as presented below.&#x00a0;</bold>
                </p>
                <p> 
                    <bold>"In the modeling, a SMILES is converted into molecules to obtain the Morgan fingerprints using RDKit. The fingerprints were used as the features to build the SVM classifier. It predicts a probability score range from zero to one, with the closer to one the higher DRD2 activity. A molecule that cannot obtain valid fingerprints was assigned with a score of zero."</bold>
                </p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report214522">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.143734.r214522</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Bai</surname>
                        <given-names>Qifeng</given-names>
                    </name>
                    <xref ref-type="aff" rid="r214522a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-7296-6187</uri>
                </contrib>
                <aff id="r214522a1">
                    <label>1</label>Lanzhou University, Lanzhou, Gansu, China</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>12</day>
                <month>12</month>
                <year>2023</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2023 Bai Q</copyright-statement>
                <copyright-year>2023</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport214522" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.130936.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>In this study, authors use generative pre-trained transformer and deep reinforcement learning to optimize the binding affinities in chemical space. I have some comments as follows:</p>
            <p> </p>
            <p> 1. I have checked the source codes 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/charlesxu90/sgpt">https://github.com/charlesxu90/sgpt</ext-link>. The authors give a nice description for their models. I have an install question. Why do authors repeat to install &#x201c;openbabel&#x201d; by command: &#x201c;sudo apt-get install -y openbabel&#x201d; even though Conda can install openbabel?</p>
            <p> </p>
            <p> 2. Please check equation 1. There are some kinds of attention formulas. Do authors describe the correct attention formulas for their used pre-trained models?</p>
            <p> </p>
            <p> 3. To make the affinity introduction richer, authors can add more references about binding affinities with deep learning methods such as &#x201c;Bai, Q, Liu, S, Tian, Y, Xu, T, Banegas-Luna, AJ, P&#x00e9;rez-S&#x00e1;nchez, H, et al. Application advances of deep learning methods for de novo drug design and molecular dynamics simulation. WIREs Comput Mol Sci. 2022; 12:e1581. 
                <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1002/wcms.1581">https://doi.org/10.1002/wcms.1581</ext-link> &#x201c;</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Yes</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Partly</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Yes</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>deep learning, binding affinity and drug design,</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment11066-214522">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Xu</surname>
                            <given-names>Xiaopeng</given-names>
                        </name>
                        <aff>King Aabdullah University of Science and Technology, Saudi Arabia</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>The authors declare no conflicts of interest.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>14</day>
                    <month>2</month>
                    <year>2024</year>
                </pub-date>
            </front-stub>
            <body>
                <p>
                    <bold>We thank the Reviewer for the summary and the helpful comments. Point by point responses to the issues are as follows.</bold>
                </p>
                <p> </p>
                <p> 1. I have checked the source codes https://github.com/charlesxu90/sgpt. The authors give a nice description for their models. I have an install question. Why do authors repeat to install &#x201c;openbabel&#x201d; by command: &#x201c;sudo apt-get install -y openbabel&#x201d; even though Conda can install openbabel?</p>
                <p> 
                    <bold>We thank the Reviewer for looking into the code and pointing out this issue. We also want to use openbabel in Conda, however, in our experiments, we found that the default openbabel in Conda is not providing the functionality required. The default openbabel installed in our system works well. We believe this is an issue due to the distributed version of openbabel in Conda at the time of our experiments.</bold>
                </p>
                <p> </p>
                <p> 2. Please check equation 1. There are some kinds of attention formulas. Do authors describe the correct attention formulas for their used pre-trained models?</p>
                <p> 
                    <bold>We thank the Reviewer for pointing out this issue. We trained a generative pre-trained transformer (GPT) from scratch to learn the prior knowledge of molecular distributions. A GPT-2 model with the multi-head self-attention mechanism was used in our model. Equation 1 describes the attention mechanism, which is the core element of it.</bold>
                </p>
                <p> </p>
                <p> 3. To make the affinity introduction richer, authors can add more references about binding affinities with deep learning methods such as &#x201c;Bai, Q, Liu, S, Tian, Y, Xu, T, Banegas-Luna, AJ, P&#x00e9;rez-S&#x00e1;nchez, H, et al. Application advances of deep learning methods for de novo drug design and molecular dynamics simulation. WIREs Comput Mol Sci. 2022; 12:e1581. https://doi.org/10.1002/wcms.1581 &#x201c;</p>
                <p> 
                    <bold>We added this citation as suggested by the Reviewer.</bold>
                </p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report214547">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.143734.r214547</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Wang</surname>
                        <given-names>Jianmin</given-names>
                    </name>
                    <xref ref-type="aff" rid="r214547a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-8910-0929</uri>
                </contrib>
                <aff id="r214547a1">
                    <label>1</label>Yonsei University, Seodaemun-gu, Seoul, South Korea</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>1</day>
                <month>11</month>
                <year>2023</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2023 Wang J</copyright-statement>
                <copyright-year>2023</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport214547" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.130936.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>This paper introduces a method called SGPT-RL, which utilizes GPT as the policy network in the Reinvent approach to improve the optimization of binding affinities, such as DRD2 QSAR score and ACE2 docking score. The findings of the study indicate that GPT effectively learns about the chemical space and generates compounds that are both novel and valid, which is consistent with previous research. Furthermore, GPT proves to be proficient in learning ring patterns and successfully explores various scaffolds during the exploration process in both optimization tasks. Particularly in the ACE2 task, SGPT-RL outperforms by achieving superior docking scores and identifying specific patterns, like the presence of double-ring structures.</p>
            <p> </p>
            <p> The study shows promise overall, with GPT being a robust generative model and drug design being an important area of application for generative AI. The manuscript is well composed, but certain improvements are necessary to address a few issues.</p>
            <p> </p>
            <p> Major issues: 
                <list list-type="order">
                    <list-item>
                        <p>In this study, the authors compared MCMG in the DRD2 task but chose not to include it in the ACE2 task. It seems more logical to compare MCMG in both tasks. However, what might be the rationale behind excluding it from the ACE2 task comparison?</p>
                    </list-item>
                    <list-item>
                        <p>The clarity of the presented results is insufficient. In my opinion, Supplementary Figure 8 effectively demonstrates the distributions and should be included in the main content for clear comprehension. Figure 5, on the other hand, would be more suitable to be relocated to the supplementary material.</p>
                    </list-item>
                </list> Minor issues: 
                <list list-type="order">
                    <list-item>
                        <p>The manuscript is burdened with too many explanations for common abbreviations, making it a tedious read. For example, the abbreviation "SGPT-RL" is explained repeatedly in each figure, and abbreviations like "RL," "DRD2," and "ACE2" are needlessly reiterated in the captions. It would be more efficient to provide explanations for these abbreviations only when they first appear in the captions, thus avoiding unnecessary repetition.</p>
                    </list-item>
                    <list-item>
                        <p>The authors should carefully review the paper to avoid any typos and grammatical errors. Specifically, 'the' should be included before 'Moses benchmark'; 'Similarity to the nearest neighbor (SNN)' in the Subsection 'Evaluation metrics', need to be in lower case; &#x201c;range in
                            <sup>1, 10</sup>&#x201d; should be &#x201c;range in [1, 10]&#x201d;.</p>
                    </list-item>
                    <list-item>
                        <p>Several spots are not fluent to read. For example, the first sentence in the &#x201c;Model architecture&#x201d; Subsection does not fit with the context and should be tuned. &#x201c;see also Underlying data&#x201d; doesn&#x2019;t fit with the context as well.</p>
                    </list-item>
                </list>
            </p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Yes</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Yes</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>drug design, deep learning</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment11065-214547">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Xu</surname>
                            <given-names>Xiaopeng</given-names>
                        </name>
                        <aff>King Aabdullah University of Science and Technology, Saudi Arabia</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>The authors declare no conflicts of interest</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>14</day>
                    <month>2</month>
                    <year>2024</year>
                </pub-date>
            </front-stub>
            <body>
                <p>
                    <bold>We thank the Reviewer for sharing our aims and appreciating our efforts. Point by point responses to the issues are as follows.</bold>
                </p>
                <p> </p>
                <p> 
                    <bold>Major issues:</bold>
                </p>
                <p> 1. In this study, the authors compared MCMG in the DRD2 task but chose not to include it in the ACE2 task. It seems more logical to compare MCMG in both tasks. However, what might be the rationale behind excluding it from the ACE2 task comparison?</p>
                <p> 
                    <bold>We thank the Reviewer for pointing out this issue. Initially, we also want to compare MCMG in both tasks. However, after a careful investigation, we found it not doable. MCMG relies on a Transformer decoder, which is trained on known binding molecules, to distill the knowledge to GRU. However, in the ACE2 task, we tackled the task where no sufficient binding molecules exist. MCMG was not designed for such tasks and cannot be applied to tackle this problem.</bold>
                </p>
                <p> 
                    <bold>&#x00a0;</bold>
                </p>
                <p> 2. The clarity of the presented results is insufficient. In my opinion, Supplementary Figure 8 effectively demonstrates the distributions and should be included in the main content for clear comprehension. Figure 5, on the other hand, would be more suitable to be relocated to the supplementary material.</p>
                <p> 
                    <bold>We thank the Reviewer for the kind advice. We included the main subfigures from Supplementary Figure 8 into our main context to showcase the improvement of properties in the optimization process. Figure 5 illustrates the increasing number of rings in the molecules generated in the first several steps. We think it is one of the most important discoveries in the results, so we would like to keep it in the main context.</bold>
                </p>
                <p> </p>
                <p> 
                    <bold>Minor issues:</bold>
                </p>
                <p> 1. The manuscript is burdened with too many explanations for common abbreviations, making it a tedious read. For example, the abbreviation "SGPT-RL" is explained repeatedly in each figure, and abbreviations like "RL," "DRD2," and "ACE2" are needlessly reiterated in the captions. It would be more efficient to provide explanations for these abbreviations only when they first appear in the captions, thus avoiding unnecessary repetition.</p>
                <p> 
                    <bold>We thank the Reviewer for pointing out this issue. We updated the paragraphs and captions and removed the repeated explanations to make the sentences more fluent to read.</bold>
                </p>
                <p> </p>
                <p> 2. The authors should carefully review the paper to avoid any typos and grammatical errors. Specifically, 'the' should be included before 'Moses benchmark'; 'Similarity to the nearest neighbor (SNN)' in the Subsection 'Evaluation metrics', need to be in lower case; &#x201c;range in1, 10&#x201d; should be &#x201c;range in [1, 10]&#x201d;.</p>
                <p> 
                    <bold>We thank the Reviewer for pointing out these typos and errors. We meticulously reviewed this article again, and fixed the errors and typos pointed out.</bold>
                </p>
                <p> </p>
                <p> 3. Several spots are not fluent to read. For example, the first sentence in the &#x201c;Model architecture&#x201d; Subsection does not fit with the context and should be tuned. &#x201c;see also Underlying data&#x201d; doesn&#x2019;t fit with the context as well.</p>
                <p> 
                    <bold>We thank the Reviewer for pointing out these spots. We updated the sentences to make them more fluent to read.&#x00a0; Specifically, we removed the sentence &#x201c;Please note that all code associated with this article is available in the Software availability section&#x201d; and &#x201c;see also Underlying data&#x201d; within the paragraphs.</bold>
                </p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report188001">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.143734.r188001</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Wang</surname>
                        <given-names>Guohua</given-names>
                    </name>
                    <xref ref-type="aff" rid="r188001a1">1</xref>
                    <role>Referee</role>
                </contrib>
                <aff id="r188001a1">
                    <label>1</label>Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin, Heilongjiang, China</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>1</day>
                <month>11</month>
                <year>2023</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2023 Wang G</copyright-statement>
                <copyright-year>2023</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport188001" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.130936.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>In this paper, the authors proposed SGPT-RL, a method that utilizes GPT as the policy network within the Reinvent approach to enhance the optimization of binding affinities, including DRD2 QSAR score and ACE2 docking score. The results of their study demonstrate that GPT effectively learns the chemical space, generating compounds with high novelty and validity, consistent with previous research. Notably, in both optimization tasks, GPT exhibits proficiency in learning ring patterns and successfully explores a wide range of scaffolds during the exploration process. Importantly, SGPT-RL outperforms in the ACE2 task by obtaining superior docking scores and identifying specific patterns, such as the presence of double ring structures.</p>
            <p> </p>
            <p> Overall, this study is interesting, as GPT is the current hotspot in AI research and de novo drug design is one of the most successful cases in AI for science. The manuscript is also well written and easy to understand. But there are several issues which should be improved.</p>
            <p> </p>
            <p> Firstly, there are an excessive number of explanations for common abbreviations in this manuscript, which makes it tedious to read. For instance, the abbreviation "SGPT-RL" is repeatedly explained in each of the figures. Similarly, abbreviations like "RL," "DRD2," and "ACE2" are unnecessarily reiterated many times in the captions. I believe it would be more effective to provide explanations for these abbreviations only during their initial occurrence in the captions, thereby avoiding repetitive explanations.</p>
            <p> </p>
            <p> Secondly, in this study, the authors compared MCMG in the DRD2 task, but not in the ACE2 task. Wouldn't it be more natural to compare it in both tasks? What is the reason for excluding it from the comparison in the ACE2 task?</p>
            <p> </p>
            <p> Thirdly, while going through the supplementary information, I came across Supplementary Figure 8, which serves as a clear illustration. I believe the author should incorporate it into the main content as it provides a clear explanation of the resulting distributions. Figure 5 should be relocated to the supplementary material instead.</p>
            <p> </p>
            <p> Furthermore, the author should thoroughly proofread the paper for any typos and formatting errors. For instance, 'the' should be added before 'Moses benchmark'. In the first paragraph of Subsection 'Evaluation metrics', 'Similarity to a nearest neighbor (SNN)' should be corrected to 'similarity to a nearest neighbor (SNN)'.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Yes</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Yes</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Artificial intelligence in bioinformatics</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment11064-188001">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Xu</surname>
                            <given-names>Xiaopeng</given-names>
                        </name>
                        <aff>King Aabdullah University of Science and Technology, Saudi Arabia</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>The authors declare no conflicts of interest</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>14</day>
                    <month>2</month>
                    <year>2024</year>
                </pub-date>
            </front-stub>
            <body>
                <p>
                    <bold>We thank the Reviewer for the summary, the acknowledgement of our novelty, and for the helpful comments. Point by point responses to the issues are as follows.</bold>
                </p>
                <p> </p>
                <p> Firstly, there are an excessive number of explanations for common abbreviations in this manuscript, which makes it tedious to read. For instance, the abbreviation "SGPT-RL" is repeatedly explained in each of the figures. Similarly, abbreviations like "RL," "DRD2," and "ACE2" are unnecessarily reiterated many times in the captions. I believe it would be more effective to provide explanations for these abbreviations only during their initial occurrence in the captions, thereby avoiding repetitive explanations.</p>
                <p> 
                    <bold>We thank the Reviewer for pointing out this issue. We updated the paragraphs and captions and removed the duplicated explanations to make the sentences more fluent to read.</bold>
                </p>
                <p> </p>
                <p> Secondly, in this study, the authors compared MCMG in the DRD2 task, but not in the ACE2 task. Wouldn't it be more natural to compare it in both tasks? What is the reason for excluding it from the comparison in the ACE2 task?</p>
                <p> 
                    <bold>We thank the Reviewer for pointing out this issue. Initially, we also want to compare MCMG in both tasks. However, after a careful investigation, we found it not doable. MCMG relies on a Transformer decoder, which is trained on known binding molecules, to distill the knowledge to GRU. However, in the ACE2 task, we tackled the task where no sufficient binding molecules exist. MCMG was not designed for such tasks and cannot be applied to tackle this problem.</bold>
                </p>
                <p> </p>
                <p> Thirdly, while going through the supplementary information, I came across Supplementary Figure 8, which serves as a clear illustration. I believe the author should incorporate it into the main content as it provides a clear explanation of the resulting distributions. Figure 5 should be relocated to the supplementary material instead.</p>
                <p> 
                    <bold>We thank the Reviewer for the kind advice. We included the main subfigures from Supplementary Figure 8 into our main context to showcase the improvement of properties in the optimization process.&#x00a0; Figure 5 illustrates the increasing number of rings in the molecules generated in the first several steps. We think it is one of the most important discoveries in the results, so we would like to keep it in the main context.</bold>
                </p>
                <p> </p>
                <p> Furthermore, the author should thoroughly proofread the paper for any typos and formatting errors. For instance, 'the' should be added before 'Moses benchmark'. In the first paragraph of Subsection 'Evaluation metrics', 'Similarity to a nearest neighbor (SNN)' should be corrected to 'similarity to a nearest neighbor (SNN)'.</p>
                <p> 
                    <bold>We thank the Reviewer for pointing out these typos and errors. We meticulously reviewed this article again, and fixed the errors and typos pointed out.</bold>
                </p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report188006">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.143734.r188006</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Wong</surname>
                        <given-names>Ka-Chun</given-names>
                    </name>
                    <xref ref-type="aff" rid="r188006a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-6062-733X</uri>
                </contrib>
                <aff id="r188006a1">
                    <label>1</label>Department of Computer Science, City University of Hong Kong, Kowloon Tong, Kowloon, Hong Kong</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>20</day>
                <month>7</month>
                <year>2023</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2023 Wong KC</copyright-statement>
                <copyright-year>2023</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport188006" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.130936.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The authors proposed a method, SGPT-RL, to optimize the SMILES sequences to improve binding affinities through incorporating GPT into a reinforcement learning (RL) framework. The authors trained a GPT model as a prior model to learn the chemical space by pretraining on Moses SMILES, and then trained two RL models, one for DRD2 QSAR scores and the other for ACE2 docking scores, to generate SMILES with good binding affinities. The results show that the GPT prior model learned a good distribution of the chemical space. The RL models were able to generate SMILES sequences with binding affinities. In addition, SGPT-RL generated sequences with better docking scores than Reinvent and able to learn certain patterns during the RL process. There are a few considerations that could be addressed:</p>
            <p> </p>
            <p> Major issues:</p>
            <p> </p>
            <p> 1. The manuscript includes repetitive explanations of abbreviations, such as SGPT-RL, DRD2, ACE2, and SMILES, throughout the passages and captions. This not only hinders the flow of reading but also makes it tedious to navigate through. To enhance readability, I suggest minimizing the frequency of explanations and providing them only when necessary, particularly upon their initial mention.</p>
            <p> </p>
            <p> 2. GPT is mainly learning distributions, and RL is introducing inductive biases to steer the distributions towards desirable properties. Therefore, I think it is crucial to include a figure that demonstrates how the distribution of these properties evolves. Supplementary Figure 8 addresses this aspect effectively, and I recommend incorporating it into the main context to provide a clear illustration.</p>
            <p> </p>
            <p> 3. The supplementary information should have the same name as the main article, i.e. &#x2018;transformer&#x2019; should be &#x2018;generative pre-trained transformer&#x2019;.</p>
            <p> </p>
            <p> Minor issues:</p>
            <p> </p>
            <p> 1. In Subsection that explains &#x201d;SAscore&#x201d;, the sentence &#x201c;which ranges in1, 10&#x201d; should be &#x201c;which ranges in [1, 10]&#x201d;. &#x00a0;</p>
            <p> </p>
            <p> 2. The authors stated that SGPT-RL outperformed Reinvent on the ACE2 task with a significant p-value. However, the p-value is reported as 0.0, which appears as a numerical zero. To accurately represent this score, it would be preferable to present it in 2-digit scientific notation.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Yes</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Yes</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Bioinformatics</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment11062-188006">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Xu</surname>
                            <given-names>Xiaopeng</given-names>
                        </name>
                        <aff>King Aabdullah University of Science and Technology, Saudi Arabia</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>The authors declare no conflicts of interest</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>14</day>
                    <month>2</month>
                    <year>2024</year>
                </pub-date>
            </front-stub>
            <body>
                <p>
                    <bold>We thank the Reviewer for sharing our aims and appreciating our efforts. Point by point responses to the issues are as follows.</bold>
                </p>
                <p>
                    <bold> </bold>
                </p>
                <p>
                    <bold> Major issues:</bold>
                </p>
                <p> 1. The manuscript includes repetitive explanations of abbreviations, such as SGPT-RL, DRD2, ACE2, and SMILES, throughout the passages and captions. This not only hinders the flow of reading but also makes it tedious to navigate through. To enhance readability, I suggest minimizing the frequency of explanations and providing them only when necessary, particularly upon their initial mention.</p>
                <p> 
                    <bold>We thank the Reviewer for pointing out this issue. We updated the paragraphs and captions and removed the duplicated explanations to make the sentences more fluent to read.</bold>
                </p>
                <p> </p>
                <p> 2. GPT is mainly learning distributions, and RL is introducing inductive biases to steer the distributions towards desirable properties. Therefore, I think it is crucial to include a figure that demonstrates how the distribution of these properties evolves. Supplementary Figure 8 addresses this aspect effectively, and I recommend incorporating it into the main context to provide a clear illustration.</p>
                <p> 
                    <bold>We thank the Reviewer for the kind advice. We included the main results from Supplementary Figure 8 into our main context to showcase the improvement of the core properties during the optimization process.</bold>
                </p>
                <p> </p>
                <p> 3. The supplementary information should have the same name as the main article, i.e. &#x2018;transformer&#x2019; should be &#x2018;generative pre-trained transformer&#x2019;.</p>
                <p> 
                    <bold>We thank the Reviewer for pointing out this issue. We updated the title of the supplementary information to fix this issue.</bold>
                </p>
                <p> </p>
                <p> 
                    <bold>Minor issues:</bold>
                </p>
                <p> 1. In the Subsection that explains &#x201d;SAscore&#x201d;, the sentence &#x201c;which ranges in1, 10&#x201d; should be &#x201c;which ranges in [1, 10]&#x201d;.&#x00a0;&#x00a0;</p>
                <p> 
                    <bold>We thank the Reviewer for pointing out this typeset issue. The sentence is updated to fix this issue as shown below.</bold>
                </p>
                <p> 
                    <bold>"A predictive model built by Blaschke et al. was used, where molecular weight was combined with raw score, which ranges in from one to 10, as features to predict the probability of synthetic accessibility."</bold>
                </p>
                <p> </p>
                <p> 2. The authors stated that SGPT-RL outperformed Reinvent on the ACE2 task with a significant p-value. However, the p-value is reported as 0.0, which appears as a numerical zero. To accurately represent this score, it would be preferable to present it in 2-digit scientific notation.</p>
                <p> 
                    <bold>We thank the Reviewer for pointing out this issue. Ideally, we also want to have a p-value in 2-digit scientific notation, however, the result from computation is zero with no 2-digits calculated. We think this is due to the nature of distinct distribution. We use &#x201c;(p-value &lt;0.01)&#x201d; as a replacement.</bold>
                </p>
            </body>
        </sub-article>
    </sub-article>
</article>
