<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="systematic-review" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.51117.3</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Systematic Review</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>Data extraction methods for systematic review (semi)automation: Update of a living systematic review</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 3; peer review: 3 approved]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Schmidt</surname>
                        <given-names>Lena</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-0709-8226</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Finnerty Mutlu</surname>
                        <given-names>Ailbhe N.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a4">4</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Elmore</surname>
                        <given-names>Rebecca</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-1161-2064</uri>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Olorisade</surname>
                        <given-names>Babatunde K.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a3">3</xref>
                    <xref ref-type="aff" rid="a5">5</xref>
                    <xref ref-type="aff" rid="a6">6</xref>
                    <xref ref-type="aff" rid="a7">7</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Thomas</surname>
                        <given-names>James</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-4805-4190</uri>
                    <xref ref-type="aff" rid="a4">4</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Higgins</surname>
                        <given-names>Julian P. T.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Funding Acquisition</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-8323-2514</uri>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>NIHR Innovation Observatory, Newcastle University, Newcastle upon Tyne, NE4 5TG, UK</aff>
                <aff id="a2">
                    <label>2</label>Sciome LLC, Research Triangle Park, North Carolina, 27713, USA</aff>
                <aff id="a3">
                    <label>3</label>Bristol Medical School, University of Bristol, Bristol, BS8 2PS, UK</aff>
                <aff id="a4">
                    <label>4</label>UCL Social Research Institute, University College London, London, WC1H 0AL, UK</aff>
                <aff id="a5">
                    <label>5</label>Evaluate Ltd, London, SE1 2RE, UK</aff>
                <aff id="a6">
                    <label>6</label>Cardiff School of Technologies, Cardiff Metropolitan University, Cardiff, CF5 2YB, UK</aff>
                <aff id="a7">
                    <label>7</label>EdgeStride (Timeless Dynamics Academy), AACSL 1st Floor, North Westgate House, Harlow, Essex, CM20 1YS, UK</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:lena.schmidt@io.nihr.ac.uk">lena.schmidt@io.nihr.ac.uk</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>8</day>
                <month>4</month>
                <year>2025</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2021</year>
            </pub-date>
            <volume>10</volume>
            <elocation-id>401</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>11</day>
                    <month>3</month>
                    <year>2025</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2025 Schmidt L et al.</copyright-statement>
                <copyright-year>2025</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/10-401/pdf"/>
            <abstract>
                <sec id="sec6">
                    <title>Background</title>
                    <p>The reliable and usable (semi) automation of data extraction can support the field of systematic review by reducing the workload required to gather information about the conduct and results of the included studies. This living systematic review examines published approaches for data extraction from reports of clinical studies.</p>
                </sec>
                <sec id="sec7" sec-type="methods">
                    <title>Methods</title>
                    <p>We systematically and continually search PubMed, ACL Anthology, arXiv, OpenAlex via EPPI-Reviewer, and the 
                        <italic toggle="yes">dblp computer science bibliography</italic> databases. Full text screening and data extraction are conducted using a mix of open-source and commercial tools. This living review update includes publications up to August 2024 and OpenAlex content up to September 2024.</p>
                </sec>
                <sec id="sec8" sec-type="results">
                    <title>Results</title>
                    <p>117 publications are included in this review. Of these, 30 (26%) used full texts while the rest used titles and abstracts. A total of 112 (96%) publications developed classifiers for randomised controlled trials. Over 30 entities were extracted, with PICOs (population, intervention, comparator, outcome) being the most frequently extracted. Data are available from 53 (45%), and code from 49 (42%) publications. Nine (8%) implemented publicly available tools.</p>
                </sec>
                <sec id="sec9" sec-type="conclusions">
                    <title>Conclusions</title>
                    <p>This living systematic review presents an overview of (semi)automated data-extraction literature of interest to different types of literature review. We identified a broad evidence base of publications describing data extraction for interventional reviews and a small number of publications extracting other study types. Between review updates, large language models emerged as a new tool for data extraction. While facilitating access to automated extraction, they showed a trend of decreasing quality of results reporting, especially quantitative results such as recall and lower reproducibility of results. Compared with the previous update, trends such as transition to relation extraction and sharing of code and datasets stayed similar.</p>
                </sec>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>Data Extraction</kwd>
                <kwd>Natural Language Processing</kwd>
                <kwd>Reproducibility</kwd>
                <kwd>Systematic Reviews</kwd>
                <kwd>Text Mining</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1">
                    <funding-source>National Institute for Health Research</funding-source>
                    <award-id>DRF-2018-11-ST2-048</award-id>
                </award-group>
                <award-group id="fund-2" xlink:href="http://dx.doi.org/10.13039/501100000272">
                    <funding-source>National Institute for Health Research</funding-source>
                    <award-id>RM-SR-2017-09-028</award-id>
                </award-group>
                <funding-statement>LS was supported by National Institute for Health and Care Research (NIHR) through a NIHR Systematic Reviews Fellowship [RM-SR-2017-09-028] and later through [HSRIC-2016-10009/Innovation Observatory]&#13;
&#13;
The views expressed in this article are those of the authors and do not necessarily represent those of the NHS, the NIHR, MRC, or the Department of Health and Social Care.&#13;
</funding-statement>
                <funding-statement>
                    <italic>The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</italic>
                </funding-statement>
            </funding-group>
        </article-meta>
        <notes>
            <sec sec-type="version-changes">
                <label>Updated</label>
                <title>Changes from Version 2</title>
                <p>This version of the LSR includes 41 new papers. The article text was updated to reflect changes and new research trends such as large language models (LLMs) being used to extract data, as well as continuing trends in increased availability of datasets, source code, relation extraction and summarisation. We updated existing figures and tables and due to the increasing amount of evidence we additionally provide interactive html maps to explore the dataset. Those can be accessed via the appendix (3.2) or the living review website. We also provide Table A1 with an overview of all 117 included records in the appendix. Changes to data extraction items: For update 2 we added data extraction items specific to LLM automation, including prompt development, reproducibility of LLM output, strategies of applying LLMs, and a question about whether the paper describes a study within a review.</p>
            </sec>
        </notes>
    </front>
    <body>
        <sec id="sec1" sec-type="intro">
            <label>1.</label>
            <title>Introduction</title>
            <p>In a systematic review, data extraction is the process of capturing key characteristics of studies in structured and standardised form based on information in journal articles and reports. It is a necessary precursor to assessing the risk of bias in individual studies and synthesising their findings. Interventional, diagnostic, or prognostic systematic reviews routinely extract information from a specific set of fields that can be predefined.
                <sup>
                    <xref ref-type="bibr" rid="ref1">1</xref>
                </sup> The most common fields for extraction in interventional reviews are defined in the PICO framework (population, intervention, comparison, outcome) and similar frameworks are available for other review types. The data extraction task can be time-consuming and repetitive when done by hand. This creates opportunities for support through intelligent software, which identify and extract information automatically. When applied to the field of health research, this (semi) automation sits at the interface between evidence-based medicine (EBM) and data science, and as described in the following section, interest in its development has grown in parallel with interest in AI in other areas of computer science.</p>
            <sec id="sec1.1">
                <label>1.1</label>
                <title>Related systematic reviews and overviews</title>
                <p>This review is, to the best of our knowledge, the only living systematic review (LSR) of data extraction methods in clinical trial text. A living review of automated data extraction for social science studies was published recently, adapting part of our methodology.
                    <xref ref-type="bibr" rid="ref155">
                        <sup>155</sup>
                    </xref> We identified four previous reviews of tools and methods in the first iteration of this living review (called base-review hereafter),
                    <xref ref-type="bibr" rid="ref2">
                        <sup>2</sup>
                    </xref>
                    <sup>&#x2013;</sup>
                    <xref ref-type="bibr" rid="ref5">
                        <sup>5</sup>
                    </xref> and two documents providing overviews and guidelines relevant to our topic.
                    <xref ref-type="bibr" rid="ref3">
                        <sup>3</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref6">
                        <sup>6</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref7">
                        <sup>7</sup>
                    </xref> Between the base-review and the 2023 update, we identified six more related (systematic) literature reviews.
                    <xref ref-type="bibr" rid="ref8">
                        <sup>8</sup>
                    </xref>
                    <sup>&#x2013;</sup>
                    <xref ref-type="bibr" rid="ref13">
                        <sup>13</sup>
                    </xref>For the most recent 2024 update, 13 reviews and seven editorials or opinion pieces were identified.</p>
                <p>

                    <bold>Related reviews before 2018:</bold> The systematic reviews from 2014 to 2015 present an overview of classical machine learning and natural language processing (NLP) methods applied to tasks such as data mining in the field of evidence-based medicine. At the time of publication of these documents, methods such as topic modelling (Latent Dirichlet Allocation) and support vector machines (SVM) were considered state-of-the art for language models.</p>
                <p>In 2014, Tsafnat 
                    <italic toggle="yes">et al.</italic> provided a broad overview on automation technologies for different stages of authoring a systematic review.
                    <sup>
                        <xref ref-type="bibr" rid="ref5">5</xref>
                    </sup> O&#x2019;Mara-Eves 
                    <italic toggle="yes">et al</italic>. published a systematic review focusing on text-mining approaches in 2015.
                    <sup>
                        <xref ref-type="bibr" rid="ref4">4</xref>
                    </sup> It includes a summary of methods for the evaluation of systems, such as recall, accuracy, and F1 score (the harmonic mean of recall and precision, a metric frequently used in machine-learning). The reviewers focused on tasks related to PICO classification and supporting the screening process. In the same year, Jonnalagadda, Goyal and Huffman
                    <sup>
                        <xref ref-type="bibr" rid="ref3">3</xref>
                    </sup> described methods for data extraction, focusing on PICOs and related fields. The age of these publications means that the latest static or contextual embedding-based and neural methods are not included. These newer methods,
                    <sup>
                        <xref ref-type="bibr" rid="ref14">14</xref>
                    </sup> however, are used in contemporary systematic review automation software which will be reviewed in the scope of this living review.</p>
                <p>

                    <bold>Related reviews up to 2020:</bold> Reviews up to 2020 focus on discussions around tool development and integration in practice, and mark the starting date of the inclusion of automation methods based on neural networks. Beller 
                    <italic toggle="yes">et al.</italic> describe principles for development and integration of tools for systematic review automation.
                    <sup>
                        <xref ref-type="bibr" rid="ref6">6</xref>
                    </sup> Marshall and Wallace
                    <sup>
                        <xref ref-type="bibr" rid="ref7">7</xref>
                    </sup> present a guide to automation technology, with a focus on availability of tools and adoption into practice. They conclude that tools facilitating screening are widely accessible and usable, while data extraction tools are still at piloting stages or require a higher amount of human input.</p>
                <p>A systematic review of machine-learning for systematic review automation, published in Portuguese in 2020, included 35 publications. The authors examined journals in which publications about systematic review automation are published, and conducted a term-frequency and citation analysis. They categorised papers by systematic review task, and provided a brief overview of data extraction methods.
                    <sup>
                        <xref ref-type="bibr" rid="ref2">2</xref>
                    </sup>
                </p>
                <p>

                    <bold>Related reviews up to 2023 update:</bold> These six reviews include and discuss end-user tools and cover different tasks across the SR workflow, including data extraction. Compared with this LSR, these reviews are broader in scope but have less included references on the automation of data extraction. Ruiz and Duffy
                    <xref ref-type="bibr" rid="ref10">
                        <sup>10</sup>
                    </xref> did a literature and trend analysis showing that the number of published references about SR automation is steadily increasing. Sundaram and Berleant
                    <xref ref-type="bibr" rid="ref11">
                        <sup>11</sup>
                    </xref> analyse 29 references applying text mining to different parts of the SR process and note that 24 references describe automation in study selection while research gaps are most prominent for data extraction, monitoring, quality assessment, and synthesis.
                    <xref ref-type="bibr" rid="ref11">
                        <sup>11</sup>
                    </xref> Khalil et al.
                    <xref ref-type="bibr" rid="ref9">
                        <sup>9</sup>
                    </xref> include 47 tools and descriptions of validation studies in a scoping review, of which 8 are available end-user tools that mostly focus on screening, but also cover data extraction and risk of bias assessments. They discuss limitations of tools such as lack of generalisability, integration, funding, and limited performance or access.
                    <xref ref-type="bibr" rid="ref9">
                        <sup>9</sup>
                    </xref> Cierco Jimenez et al.
                    <xref ref-type="bibr" rid="ref8">
                        <sup>8</sup>
                    </xref> included 63 references in a mapping review of machine-learning to assist SRs during different workflow steps, of which 41 were available end-user tools for use by researchers without informatics background. In accordance with other reviews they describe screening as the most frequently automated step, while automated data extraction tools are lacking due to the complexity of the task. Zhang et al.
                    <xref ref-type="bibr" rid="ref12">
                        <sup>12</sup>
                    </xref> included 49 references on automation of data extraction fields such as diseases, outcomes, or metadata. They focussed on extraction from traditional Chinese medicine texts such as published clinical trial texts, health records, or ancient literature.
                    <xref ref-type="bibr" rid="ref12">
                        <sup>12</sup>
                    </xref> Schmidt et al.
                    <xref ref-type="bibr" rid="ref13">
                        <sup>13</sup>
                    </xref> published a narrative review of tools with a focus on living systematic review automation. They discuss tools that automate or support the constant literature retrieval that is the hallmark of LSRs, while well-integrated (semi) automation of data extraction and automatic dissemination or visualisation of results between official review updates is supported by some, but less common.</p>
                <p>

                    <bold>Related reviews since 2023 update:</bold> We identified a further 13 reviews on the topic of literature review automation, and seven opinion pieces or editorials. All references are listed in Appendix 3.1. We mention here only selected papers due to the large increase in related literature. Aletaha et al. (2023)
                    <xref ref-type="bibr" rid="ref129">
                        <sup>129</sup>
                    </xref> published a highly related scoping review of automated data extraction methods, including 26 references up to 2022. Their conclusions reproduce the conclusions of our previous 2023 LSR update, namely low availability of software and trends towards Transformer models.
                    <xref ref-type="bibr" rid="ref129">
                        <sup>129</sup>
                    </xref>
                </p>
                <p>T&#x00f3;th et al.
                    <xref ref-type="bibr" rid="ref170">
                        <sup>170</sup>
                    </xref> and Ofori-Boateng et al.
                    <xref ref-type="bibr" rid="ref160">
                        <sup>160</sup>
                    </xref> discussed data extraction among automation methods for other SR tasks. T&#x00f3;th included 13 data extraction methods and described 15 automated SRs, of these SRs only one automated data extraction while the others employed search and screening methods.
                    <xref ref-type="bibr" rid="ref170">
                        <sup>170</sup>
                    </xref> Ofori-Boateng included 52 papers, with six addressing data extraction.
                    <xref ref-type="bibr" rid="ref160">
                        <sup>160</sup>
                    </xref>
                </p>
                <p>Hammer et al.
                    <xref ref-type="bibr" rid="ref142">
                        <sup>142</sup>
                    </xref> reviewed deduplication tools and evaluation methods. In the field of large language models (LLMs) Wang et al.
                    <xref ref-type="bibr" rid="ref172">
                        <sup>172</sup>
                    </xref> reviewed prompt engineering methods in the medical field, for example in data extraction or evidence inference. Tam et al.
                    <xref ref-type="bibr" rid="ref168">
                        <sup>168</sup>
                    </xref> reviewed 142 papers on the human evaluation of LLM application in healthcare in general and find that generalizability, applicability, and reliability are lacking in current evaluation practices; a finding supported by our current review update.</p>
            </sec>
            <sec id="sec1.2">
                <label>1.2</label>
                <title>Aim</title>
                <p>We aim to review published methods and tools aimed at automating or (semi) automating the process of data extraction in the context of a systematic review of medical research studies. We do this in the form of a living systematic review, keeping information up to date and relevant to the challenges faced by systematic reviewers at any time.</p>
                <p>Our objectives in reviewing this literature are two-fold. First, we want to examine the methods and tools from the data science perspective, seeking to reduce duplicate efforts, summarise current knowledge, and encourage comparability of published methods. Second, we seek to highlight the added value of the methods and tools from the perspective of systematic reviewers who wish to use (semi) automation for data extraction, i.e., what is the extent of automation? Is it reliable? We address these issues by summarising important caveats discussed in the literature, as well as factors that facilitate the adoption of tools in practice.</p>
            </sec>
        </sec>
        <sec id="sec2" sec-type="methods">
            <label>2.</label>
            <title>Methods</title>
            <sec id="sec2.1">
                <label>2.1</label>
                <title>Registration/protocol</title>
                <p>This review was conducted following a preregistered and published protocol.
                    <xref ref-type="bibr" rid="ref15">
                        <sup>15</sup>
                    </xref> Any deviations from the protocol have been described below.</p>
            </sec>
            <sec id="sec2.2">
                <label>2.2</label>
                <title>Living review methodology</title>
                <p>We are conducting a living review because the field of systematic review (semi) automation is evolving rapidly along with advances in language processing, machine-learning and deep-learning.</p>
                <p>The process of updating started as described in the protocol
                    <xref ref-type="bibr" rid="ref15">
                        <sup>15</sup>
                    </xref> and was adapted between review updates. The living review application used for daily reference updates in the past is no longer used, in part due to its programming and packages ageing and becoming unreliable. For example, we observed discrepancies between results retrieved by our automated PubMed search vs. results when applying a search manually via PubMed. ArXiv, ACL, and dblp search updates are still executed and deduplicated using the previous methods, but are then fed manually into SWIFT-ActiveScreener for screening once a year, in order to use priority screening with early stopping as described elsewhere in detail.
                    <xref ref-type="bibr" rid="ref144">
                        <sup>144</sup>
                    </xref>
                </p>
                <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                    <label>
Figure 1. </label>
                    <caption>
                        <title>Continuous updating of the living review.</title>
                        <p>This image is reproduced under the terms of a 
                            <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/legalcode">Creative Commons Attribution 4.0 International license (CC-BY 4.0)</ext-link> from Schmidt et al.
                            <sup>
                                <xref ref-type="bibr" rid="ref15">15</xref>
                            </sup>
                        </p>
                    </caption>
                    <graphic id="gr1" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/179210/f066a662-12e8-4aff-8733-c7f9bda9d794_figure1.gif"/>
                </fig>
                <p>The decision for full review updates is made every six months based on the number of new publications added to the review. For more details about this, please refer to the protocol or to the 
                    <ext-link ext-link-type="uri" xlink:href="https://community.cochrane.org/sites/default/files/uploads/inline-files/Transform/201912_LSR_Revised_Guidance.pdf">Cochrane living systematic review guidance</ext-link>. Between updates, the screening process and current state of the data extraction is visible via the 
                    <ext-link ext-link-type="uri" xlink:href="https://l-ena.github.io/living_review_data_extraction/">living review website</ext-link>.</p>
            </sec>
            <sec id="sec2.3">
                <label>2.3</label>
                <title>Eligibility criteria</title>
                <p>

                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>We included full text publications that describe an original NLP approach for extracting data related to systematic reviewing tasks. Data fields of interest (referred to here as entities or as sentences) were adapted from the Cochrane Handbook for Systematic Reviews of Interventions,
                                <sup>
                                    <xref ref-type="bibr" rid="ref1">1</xref>
                                </sup> and are defined in the protocol.
                                <sup>
                                    <xref ref-type="bibr" rid="ref16">15</xref>
                                </sup> We included the full range of NLP methods (e.g., regular expressions, rule-based systems, machine learning, and deep neural networks).</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>Publications must describe a full cycle of the implementation and evaluation of a method. For example, they must report training and at least one measure of evaluating the performance of a data extraction algorithm.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>We included reports published from 2005 until the present day, similar to previous work.
                                <sup>
                                    <xref ref-type="bibr" rid="ref3">3</xref>
                                </sup> We would have translated non-English reports, had we found any.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>The data that the included publications use for mining must be texts from randomised controlled trials, comparative cohort studies, case control studies or comparative cross-sectional studies (e.g., for diagnostic test accuracy). The scope of data extraction methods can be applied to the full text or to abstracts within each eligible publication&#x2019;s corpus. We included publications that extracted data from other study types, as long as at least one of our study types of interest was contained in the corpus.
</p>
                        </list-item>
                    </list>
                </p>
                <p>We excluded publications reporting:

                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>Methods and tools related solely to image processing and importing biomedical data from PDF files without any NLP approach, including data extraction from graphs.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>Any research that focuses exclusively on protocol preparation, synthesis of already extracted data, write-up, solely the pre-processing of text or its dissemination.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>Methods or tools that provided no natural language processing approach and offered only organisational interfaces, document management, databases, or version control</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>Any publications related to electronic health reports or mining genetic data.
</p>
                        </list-item>
                    </list>
                </p>
            </sec>
            <sec id="sec2.4">
                <label>2.4</label>
                <title>Search</title>
                <p>

                    <bold>Base-review:</bold> We searched five electronic databases, using the search methods previously described in our protocol.
                    <sup>
                        <xref ref-type="bibr" rid="ref16">15</xref>
                    </sup> In short, we searched MEDLINE via Ovid, using a search strategy developed with the help of an information specialist, and searched Web of Science Core Collection and IEEE using adaptations of this strategy, which were made by the review authors. Searches on the arXiv (computer science) and dblp were conducted on full database dumps using the search functionality described by McGuinness and Schmidt.
                    <sup>
                        <xref ref-type="bibr" rid="ref16">16</xref>
                    </sup> The full search results and further information about document retrieval are available in 
                    <italic toggle="yes">Underlying data:</italic> Appendix A and B.
                    <sup>
                        <xref ref-type="bibr" rid="ref128">127</xref>
                    </sup>
                </p>
                <p>
Originally, we planned to include a full literature search from the Web of Science Core Collection. Due to the large number of publications retrieved via this search (n = 7822) we decided to screen publications from all other sources first, to train a machine-learning ensemble classifier, and to add only publications that were predicted as relevant for our living review. This reduced the Web of Science Core Collection publications to 547 abstracts, which were added to the studies in the initial screening step. The dataset, code and weights of trained models are available in 
                    <italic toggle="yes">Underlying data:</italic> Appendix C.
                    <xref ref-type="bibr" rid="ref127">
                        <sup>127</sup>
                    </xref> This includes plots of each model&#x2019;s evaluation in terms of area under the curve (AUC), accuracy, F1, recall, and variance of cross-validation results for every metric.</p>
                <p>

                    <bold>Update 1 (2023):</bold> As planned, we changed to the PubMed API for searching MEDLINE. This decision was made to facilitate continuous reference retrieval. We searched only for pre-print or published literature and therefore did not search sources such as GITHUB or other source code repositories. We also searched arXiv (computer science), ACL-Anthology, dblp, and used EPPI-Reviewer to collect citations from MicrosoftAcademic which later became OpenAlex. In EPPI-Reviewer we used the &#x2018;Bi-Citation AND Recommendations&#x2019; method.</p>
                <p>

                    <bold>Update 2 (2024):</bold> We noticed discrepancies between the automated reference retrieval in PubMed and decided to 1) adjust the search strategy to include LLM-related terms and 2) to re-run the search from July 2021 to retrieve articles potentially missed. For the EPPI-Reviewer/OpenAlex search, we used the same retrieval method but applied it only once in September 2024, after supplying it with new included references from this current review update so that it could retrieve the latest related publications.</p>
            </sec>
            <sec id="sec2.5">
                <label>2.5</label>
                <title>Data collection and analysis</title>
                <p>

                    <bold>

                        <italic toggle="yes">2.5.1 Selection of studies</italic>
</bold>
                </p>
                <p>

                    <bold>Base review:</bold> Initial screening and data extraction were conducted as stated in the protocol. In short, for the base-review we screened all retrieved publications using the Abstrackr tool. All abstracts were screened by two independent reviewers. Conflicting judgements were resolved by the authors who made the initial screening decisions. Full texts screening was conducted in a similar manner to abstract screening but used our web application for LSRs described in the following section.</p>
                <p>

                    <bold>Update 1 (2023):</bold> For the updated review we used our living review web application to retrieve all publications with the exception of the items retrieved by EPPI-Reviewer (these are added to the dataset separately). We further used our application to de-duplicate, screen, and data-extract all publications.</p>
                <p>A methodological update to the screening process included a change to single-screening to assess eligibility on both abstract and full-text level, reducing dual-screening to 10% of the publications.</p>
                <p>

                    <bold>Update 2 (2024):</bold> All references from database searches were imported to SWIFT-ActiveScreener and screened to an estimated recall of 95%.
                    <xref ref-type="bibr" rid="ref144">
                        <sup>144</sup>
                    </xref> We used included references from the previous LSR update as seeds for the tool&#x2019;s reference prioritisation algorithm. References retrieved by EPPI-Reviewer from OpenAlex were later screened in full.</p>
                <p>

                    <bold>

                        <italic toggle="yes">2.5.2 Data extraction, assessment, and management</italic>
</bold>
                </p>
                <p>

                    <bold>Base Review and update 1 (2023):</bold> We previously developed a web application to automate reference retrieval for living review updates (see 
                    <italic toggle="yes">Software availability</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref17">17</xref>
                    </sup>), to support both abstract and full text screening for review updates, and to manage the data extraction process throughout.
                    <sup>
                        <xref ref-type="bibr" rid="ref17">17</xref>
                    </sup> For future updates of this living review we will use the web application, and not Abstrackr, for screening references. This web application is already in use by another living review.
                    <sup>
                        <xref ref-type="bibr" rid="ref19">18</xref>
                    </sup> It automates daily reference retrieval from the included sources and has a screening and data extraction interface. All extracted data are stored in a database. Figures and tables can be exported on a daily basis and the progress in between review updates is shared on our living review website. The full spreadsheet of items extracted from each included reference is available in the 
                    <italic toggle="yes">Underlying data.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref128">127</xref>
                    </sup> As previously described in the protocol, quality of reporting and reproducibility was initially assessed based on a previously published checklist for reproducibility in text mining, but some of the items were removed from the scope of this review update.
                    <sup>
                        <xref ref-type="bibr" rid="ref19">19</xref>
                    </sup>
                </p>
                <p>

                    <bold>Update 2 (2024):</bold> All data extraction was carried out in SWIFT-ActiveScreener.</p>
                <p>As planned in the protocol, a single reviewer conducted data extraction, and a random 10% of the included publications were checked by a second reviewer.</p>
                <p>

                    <bold>

                        <italic toggle="yes">2.5.3 Visualisation</italic>
</bold>
                </p>
                <p>

                    <bold>Base Review and update 1 (2023):</bold> The creation of all figures and interactive plots on the living review website and in this review&#x2019;s &#x2018;Results&#x2019; section was automated based on structured content from our living review database (see Appendix A, D, E 
                    <italic toggle="yes">Underlying data</italic>
                    <xref ref-type="bibr" rid="ref127">
                        <sup>127</sup>
                    </xref>). We automated the export of PDF reports for each included publication. Calculation of percentages, export of extracted text, and creation of figures was also automated.</p>
                <p>

                    <bold>Update 2 (2024)</bold>: We merged data extracted via the different tools with our previous database in order to use the same workflow for visualisation. We additionally created an EPPI-Mapper map [
                    <xref ref-type="fn" rid="fn1">1</xref>] to display results (available on the website and in Appendix 3.2).</p>
                <p>

                    <bold>

                        <italic toggle="yes">2.5.4 Accessibility of data</italic>
</bold>
                </p>
                <p>All data and code are free to access. A detailed list of sources is given in the &#x2018;Data availability&#x2019; and &#x2018;Software availability&#x2019; sections.</p>
            </sec>
            <sec id="sec2.6">
                <label>2.6</label>
                <title>Changes from protocol and between updates</title>
                <p>In the protocol we stated that data would be available via an OSF repository. Instead, the full review data are available via the Harvard Dataverse, as this repository allows us to keep an assigned DOI after updating the repository with new content for each iteration of this living review. We also stated that we would screen all publications from the Web of Science search. Instead, we describe a changed approach in the Methods section, under &#x2018;Search&#x2019;. For review updates, Web of Science was dropped and replaced with OpenAlex searches via EPPI-Reviewer.</p>
                <p>For update 2 we added data extraction items specific to LLM automation, including prompt development, reproducibility, strategies of applying LLMs, and a question about whether the paper describes a study within a review.
                    <xref ref-type="bibr" rid="ref133">
                        <sup>133</sup>
                    </xref>
                </p>
                <p>We added a data extraction item for the type of information which a publication mines (e.g. P, IC, O) into the section of primary items of interest, and we moved the type of input and output format from primary to secondary items of interest. We grouped the secondary item of interest &#x2018;Other reported metrics, such as impacts on systematic review processes (e.g., time saved during data extraction)&#x2019; with the primary item of interest &#x2018;Reported performance metrics used for evaluation&#x2019;.</p>
                <p>The item &#x2018;Persistence: is the dataset likely to be available for future use?&#x2019; was changed to: &#x2018;Can data be retrieved based on the information given in the publication?&#x2019;. We decided not to speculate if a dataset is likely to be available in the future and chose instead to record if the dataset was available at the time when we tried to access it.</p>
                <p>The item &#x2018;Can we obtain a runnable version of the software based on the information in the publication?&#x2019; was changed to &#x2018;Is an app available that does the data mining, e.g. a web-app or desktop version?&#x2019;.</p>
                <p>In the base-review we assessed the included publications based on a list of 17 items in the domains of reproducibility (3.4.1), transparency (3.4.2), description of testing (3.4.3), data availability (3.4.4), and internal and external validity (3.4.5). The list of items was reduced to six items:
                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>3.4.2.2 Is there a description of the dataset used and of its characteristics?</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>3.4.2.4 Is the source code available?</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>3.4.3.2 Are basic metrics reported (true/false positives and negatives)?</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>3.4.4.1 Can we obtain a runnable version of the software based on the information in the publication?</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>3.4.4.2 Persistence: Can data be retrieved based on the information given in the publication?</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>3.4.5.1 Does the dataset or assessment measure provide a possibility to compare to other tools in the same domain?
</p>
                        </list-item>
                    </list>
                </p>
                <p>The following items were removed, although the results and discussion from the assessment of these items in the base-review remains within the review text:

                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>3.4.1.1 Are the sources for training/testing data reported?</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>3.4.1.2 If pre-processing techniques were applied to the data, are they described?</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>3.4.2.1 Is there a description of the algorithms used?</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>3.4.2.3 Is there a description of the hardware used?</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>3.4.3.1 Is there a justification/an explanation of the model assessment?</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>3.4.3.3 Does the assessment include any information about trade-offs between recall or precision (also known as sensitivity and positive predictive value)?</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>3.4.4.3 Is the use of third-party frameworks reported and are they accessible?</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>3.4.5.2 Are explanations for the influence of both visible and hidden variables in the dataset given?</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>3.4.5.3 Is the process of avoiding overfitting or underfitting described?</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>3.4.5.4 Is the process of splitting training from validation data described?</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>3.4.5.5 Is the model&#x2019;s adaptability to different formats and/or environments beyond training and testing data described?
</p>
                        </list-item>
                    </list>
                </p>
            </sec>
        </sec>
        <sec id="sec3" sec-type="results">
            <label>3.</label>
            <title>Results</title>
            <sec id="sec3.1">
                <label>3.1</label>
                <title>Results of the search</title>
                <p>Our database searches identified 10,107 publications after duplicates were removed (see 
                    <xref ref-type="fig" rid="f2">
Figure 2</xref>). We identified one more publication manually.</p>
                <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                    <label>
Figure 2. </label>
                    <caption>
                        <title>PRISMA2020 flow diagram adapted for living reviews.
                            <sup>
                                <xref ref-type="bibr" rid="ref20">20</xref>
                            </sup>
                            <sup>&#x2013;</sup>
                            <sup>
                                <xref ref-type="bibr" rid="ref22">22</xref>
                            </sup>
                        </title>
                        <p>The base review included 2 updated searches, the first LSR update included 6 searches, and the current 2024 update included 3 update searches until publication cut-off.</p>
                    </caption>
                    <graphic id="gr2" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/179210/f066a662-12e8-4aff-8733-c7f9bda9d794_figure2.gif"/>
                </fig>
                <p>This iteration of the living review includes 117 publications, summarised in Table A1 in 
                    <italic toggle="yes">Underlying data</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref128">127</xref>
                    </sup>).</p>
                <p>

                    <bold>

                        <italic toggle="yes">3.1.1 Excluded publications</italic>
</bold>
                </p>
                <p>Across the base-review and the updates, 255 publications were excluded at the full text screening stage, with the most common reason for exclusion being that it did not fit target entities or target data. In most cases, this was due to the text-types mined in the publications. Electronic health records and non-trial data were common, and we created a list of datasets that would be excluded in this category (see more information in 
                    <italic toggle="yes">Underlying data:</italic> Appendix B
                    <sup>
                        <xref ref-type="bibr" rid="ref128">127</xref>
                    </sup>). Some publications addressed the right kind of text but were excluded for not mining data of interest to this review. For example, Norman, Leeflang and N&#x00e9;v&#x00e9;ol
                    <sup>
                        <xref ref-type="bibr" rid="ref23">23</xref>
                    </sup> performed data extraction for diagnostic test accuracy reviews, but focused on extracting the results and data for statistical analyses. Millard, Flach and Higgins
                    <sup>
                        <xref ref-type="bibr" rid="ref24">24</xref>
                    </sup> and Marshall, Kuiper and Wallace
                    <sup>
                        <xref ref-type="bibr" rid="ref25">25</xref>
                    </sup> looked at risk of bias classification, which is beyond the scope of this review. Boudin, Nie and Dawes
                    <sup>
                        <xref ref-type="bibr" rid="ref26">26</xref>
                    </sup> developed a weighing scheme based on an analysis of PICO element locations, leaving the detection of single PICO elements for future work. Luo 
                    <italic toggle="yes">et al</italic>.
                    <sup>
                        <xref ref-type="bibr" rid="ref27">27</xref>
                    </sup> extracted data from clinical trial registrations but focused on parsing inclusion criteria into event or temporal entities to aid participant selection for randomised controlled trials (RCTs).</p>
                <p>The second most common reason for study exclusion was that they had &#x2018;no original data extraction approach&#x2019;. Rathbone 

                    <italic toggle="yes">et al</italic>.,
                    <sup>
                        <xref ref-type="bibr" rid="ref28">28</xref>
                    </sup> for example, used hand-crafted Boolean searches specific to a systematic review&#x2019;s PICO criteria to support the screening process of a review within Endnote. We classified this article as not having any original data extraction approach because it does not create any structured outputs specific to P, IC, or O. Malheiros 
                    <italic toggle="yes">et al.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref29">29</xref>
                    </sup> performed visual text mining, supporting systematic review authors by document clustering and text highlighting. Similarly, Fabbri 
                    <italic toggle="yes">et al.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref30">30</xref>
                    </sup> implemented a tool that supports the whole systematic review workflow, from protocol to data extraction, performing clustering and identification of similar publications. Other systematic reviewing tasks that can benefit from automation but were excluded from this review are listed in 
                    <italic toggle="yes">Underlying data:</italic> Appendix B.
                    <sup>
                        <xref ref-type="bibr" rid="ref128">127</xref>
                    </sup>
                </p>
            </sec>
            <sec id="sec3.2">
                <label>3.2</label>
                <title>Results from the data extraction: Primary items of interest</title>
                <p>

                    <bold>

                        <italic toggle="yes">3.2.1 Automation approaches used</italic>
</bold>
                </p>
                <p>
                    <xref ref-type="fig" rid="f3">Figure 3</xref> shows aspects of the system architectures implemented in the included publications. A short summary of these for each publication is provided in Table A1 in 
                    <italic toggle="yes">Underlying data.</italic>
                    <xref ref-type="bibr" rid="ref127">
                        <sup>127</sup>
                    </xref> Where possible, we tried to break down larger system architectures into smaller components. For example, an architecture combining a word embedding + long short-term memory (LSTM) network would have been broken down into the two respective sub-components. We grouped binary classifiers, such as na&#x00ef;ve Bayes and logistic regression. Although SVM is also binary classifier, it was assigned as separate category due to its popularity. The final categories are a mixture of non-machine-leaning automation (application programming interface (API) and metadata retrieval, PDF extraction, rule-base), classic machine-learning (na&#x00ef;ve Bayes, decision trees, SVM, or other binary classifiers) and neural or deep-learning approaches (convolutional neural network (CNN), LSTM, transformers, or word embeddings). This figure shows that there is no obvious choice of system architecture for this task. For the LSR update, the strongest trend was the increasing application of LLMs, which appeared in 17 publications. LLMs are large language models such as GPT-4 that were initially intended to generate text, but are also being applied to data extraction tasks. Further LLM training and fine-tuning methods such as LoRA (Low Rank Adaptation) among others were reported in six publications, but fine-tuning was used less frequently than zero-shot prompting with 13 papers or k-shot prompting in two.
                    <xref ref-type="bibr" rid="ref138">
                        <sup>138</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref148">
                        <sup>148</sup>
                    </xref> Previously, BERT (Bidirectional Encoder Representations from Transformers) was the most commonly used architecture, sometimes coupled with CRF or LSTM. BERT was published in 2018 and other architecturally-identical versions of it tailored to using scientific text, such as SciBERT, are summarised under the same category in this review.
                    <xref ref-type="bibr" rid="ref14">
                        <sup>14</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref31">
                        <sup>31</sup>
                    </xref> In the previous update it appeared 21 times while now it is used in 40 included publications. Other transformer-based architectures such as the bio-pretrained version of ELECTRA, are also still gaining attention,
                    <xref ref-type="bibr" rid="ref32">
                        <sup>32</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref33">
                        <sup>33</sup>
                    </xref> as well as FLAIR-based models.
                    <xref ref-type="bibr" rid="ref34">
                        <sup>34</sup>
                    </xref>
                    <sup>&#x2013;</sup>
                    <xref ref-type="bibr" rid="ref36">
                        <sup>36</sup>
                    </xref>
                </p>
                <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                    <label>
Figure 3. </label>
                    <caption>
                        <title>System architectures used for automating data extraction in the included publications.</title>
                        <p>Results are divided into different categories of machine-learning and natural language processing approaches and coloured by the year of publication. More than one architecture component per publication is possible. Where API, application programming interface; BERT, bidirectional encoder representations from Transformers; CNN, convolutional neural network; CRF, conditional random fields; LLM, Large Language Model; LSTM, long short-term memory; PICO, population, intervention, comparison, outcome; RNN, recurrent neural networks; SVM, support vector machines.</p>
                    </caption>
                    <graphic id="gr3" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/179210/f066a662-12e8-4aff-8733-c7f9bda9d794_figure3.gif"/>
                </fig>
                <p>Rule-bases, including approaches using heuristics, wordlists, and regular expressions, were one of the earliest techniques used for data extraction in EBM literature. Rule-bases are still being used, but most publications use them in combination with other classifiers (data shown in 
                    <italic toggle="yes">Underlying data</italic>
                    <xref ref-type="bibr" rid="ref127">
                        <sup>127</sup>
                    </xref>). Although used more frequently in the past, the 15 publications published between 2017 and now that use this approach alongside other architectures such as LLM,
                    <xref ref-type="bibr" rid="ref148">
                        <sup>148</sup>
                    </xref> Transformer,
                    <xref ref-type="bibr" rid="ref33">
                        <sup>33</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref37">
                        <sup>37</sup>
                    </xref>
                    <sup>&#x2013;</sup>
                    <xref ref-type="bibr" rid="ref39">
                        <sup>39</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref148">
                        <sup>148</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref156">
                        <sup>156</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref174">
                        <sup>174</sup>
                    </xref> conditional random fields (CRF),
                    <xref ref-type="bibr" rid="ref40">
                        <sup>40</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref156">
                        <sup>156</sup>
                    </xref> use it with SVM
                    <xref ref-type="bibr" rid="ref41">
                        <sup>41</sup>
                    </xref> or other binary classifiers.
                    <xref ref-type="bibr" rid="ref42">
                        <sup>42</sup>
                    </xref> In practice, these systems use rule-bases in the form of hand-crafted lists to identify candidate phrases for amount entities such as sample size
                    <xref ref-type="bibr" rid="ref42">
                        <sup>42</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref43">
                        <sup>43</sup>
                    </xref> or to refine a result obtained by a machine-learning classifier on the entity level (e.g., instances where a specific intervention or outcome is extracted from a sentence).
                    <xref ref-type="bibr" rid="ref40">
                        <sup>40</sup>
                    </xref>
                </p>
                <p>Binary classifiers, most notably na&#x00ef;ve Bayes and SVMs, are also frequently used system components in the data extraction literature. They are frequently used in studies published between 2005 and now but their usage started declining with the advent of neural models.</p>
                <p>Embedding and neural architectures are increasingly being used in literature over the past seven years. Recurrent neural networks (RNN), CNN, and LSTM networks require larger amounts of training data; by using transformer-based embeddings with pre-training algorithms based on unlabelled data they have become increasingly more interesting in fields such as data extraction for EBM- where high-quality training data are difficult and expensive to obtain.</p>
                <p>In the &#x2018;Other&#x2019; category, tools mentioned were mostly other classifiers such as maximum entropy classifiers (n = 3), kLog, J48, and various position or document-length classification algorithms. We also added inovel training approaches to existing neural architectures in this category, as well as ensemble or normalisation models and custom algorithms like a template-filling algorithm.
                    <xref ref-type="bibr" rid="ref175">
                        <sup>175</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref176">
                        <sup>176</sup>
                    </xref>
                </p>
                <p>

                    <bold>

                        <italic toggle="yes">3.2.2 Reported performance metrics used for evaluation</italic>
</bold>
                </p>
                <p>Precision (i.e., positive predictive value), recall (i.e., sensitivity), and F1 score (harmonic mean of precision and recall) are the most widely used metrics for evaluating classifiers. This is reflected in 
                    <xref ref-type="fig" rid="f4">
Figure 4</xref>, which shows that at least one of these metrics was used in the majority of the included publications. Accuracy and area under the curve - receiver operator characteristics (AUC-ROC) were less frequently used.</p>
                <fig fig-type="figure" id="f4" orientation="portrait" position="float">
                    <label>
Figure 4. </label>
                    <caption>
                        <title>The most common assessment metrics used in the included publications in order to evaluate the performance of a data extraction system.</title>
                        <p>More than one metric per publication is possible, which means that the total number of included publications (n = 117) is lower than the sum of counts of the bars within this figure. AUC-ROC, area under the curve - receiver operator characteristics; F1, harmonic mean of precision and recall.</p>
                    </caption>
                    <graphic id="gr4" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/179210/f066a662-12e8-4aff-8733-c7f9bda9d794_figure4.gif"/>
                </fig>
                <p>There were several approaches and justifications of using macro- or micro-averaged precision, recall, or F1 scores in the included publications. Micro or macro scores are computed in multi-class cases, and the final scores can differ whenever the classes in a dataset are imbalanced (as is the case in most datasets used for automating data extraction in SR automation).</p>
                <p>Both micro and macro scores were reported by Singh et al. (2021),
                    <xref ref-type="bibr" rid="ref45">
                        <sup>45</sup>
                    </xref> Kilicoglu et al. (2021),
                    <xref ref-type="bibr" rid="ref38">
                        <sup>38</sup>
                    </xref> Kiritchenko et al. (2010),
                    <xref ref-type="bibr" rid="ref46">
                        <sup>46</sup>
                    </xref> Fiszman et al. (2007),
                    <xref ref-type="bibr" rid="ref47">
                        <sup>47</sup>
                    </xref> Zhang et al. (2024),
                    <xref ref-type="bibr" rid="ref179">
                        <sup>179</sup>
                    </xref> Karystianis et al. (2014, 2017)
                    <xref ref-type="bibr" rid="ref48">
                        <sup>48</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref49">
                        <sup>49</sup>
                    </xref> reported micro across documents, and macro across the classes. Jiang et al. (2024)
                    <xref ref-type="bibr" rid="ref148">
                        <sup>148</sup>
                    </xref> provide an interesting discussion on the influence of class imbalance on micro vs. macro-scoring and how both approaches can be used to evaluate different aspects of their work.</p>
                <p>Macro-scores were previously used in only one publication,
                    <xref ref-type="bibr" rid="ref37">
                        <sup>37</sup>
                    </xref> but in the current review update seven more publications used them exclusively.
                    <xref ref-type="bibr" rid="ref130">
                        <sup>130</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref131">
                        <sup>131</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref132">
                        <sup>132</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref139">
                        <sup>139</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref169">
                        <sup>169</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref180">
                        <sup>180</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref181">
                        <sup>181</sup>
                    </xref>
                </p>
                <p>Micro scores were used by Fiszman et al.
                    <xref ref-type="bibr" rid="ref47">
                        <sup>47</sup>
                    </xref> for class-level results. In one publication harmonic mean was used for precision and recall, while micro-scoring was used for F1.
                    <xref ref-type="bibr" rid="ref50">
                        <sup>50</sup>
                    </xref> Micro scores were most widely used, including Al-Hussaini et al. (2022),
                    <xref ref-type="bibr" rid="ref32">
                        <sup>32</sup>
                    </xref> Sanchez-Graillet et al. (2022),
                    <xref ref-type="bibr" rid="ref51">
                        <sup>51</sup>
                    </xref> Kim et al. (2011),
                    <xref ref-type="bibr" rid="ref52">
                        <sup>52</sup>
                    </xref> Verbeke et al. (2012),
                    <xref ref-type="bibr" rid="ref53">
                        <sup>53</sup>
                    </xref> and Jin and Szolovits (2020) 
                    <xref ref-type="bibr" rid="ref54">
                        <sup>54</sup>
                    </xref> were used in the evaluation script of Nye et al. (2018).
                    <xref ref-type="bibr" rid="ref55">
                        <sup>55</sup>
                    </xref> In the review update, five more publications applied micro scores.
                    <xref ref-type="bibr" rid="ref140">
                        <sup>140</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref145">
                        <sup>145</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref151">
                        <sup>151</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref175">
                        <sup>175</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref176">
                        <sup>176</sup>
                    </xref>
                </p>
                <p>In the latest update, four publications used weighed or average scores instead.
                    <xref ref-type="bibr" rid="ref147">
                        <sup>147</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref162">
                        <sup>162</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref167">
                        <sup>167</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref178">
                        <sup>178</sup>
                    </xref>
                </p>
                <p>In the category &#x2018;Other&#x2019; we added several instances where a relaxation of a metric was introduced, e.g., precision using top-n classified sentences
                    <sup>
                        <xref ref-type="bibr" rid="ref45">44</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref47">46</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref57">56</xref>
                    </sup> or mean average precision and the metric &#x2018;precision @rank 10&#x2019; for sentence ranking exercises.
                    <sup>
                        <xref ref-type="bibr" rid="ref58">57</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref59">58</xref>
                    </sup> Another type of relaxation for standard metrics is a distance relaxation when normalising entities into concepts in medical subject headings (MesH) or unified medical language system (UMLS), to allow N hops between predicted and target concepts.
                    <sup>
                        <xref ref-type="bibr" rid="ref60">59</xref>
                    </sup>
                </p>
                <p>The LSR update showed an increasing trend of text summarisation and relation extraction algorithms. ROGUE, &#x2206;EI, or Jaccard similarity were metrics for summarisation.
                    <sup>
                        <xref ref-type="bibr" rid="ref61">60</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref62">61</xref>
                    </sup> For relation extraction F1, precision, and recall remained the most common metrics.
                    <sup>
                        <xref ref-type="bibr" rid="ref63">62</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref64">63</xref>
                    </sup>
                </p>
                <p>Other metrics were kappa,
                    <sup>
                        <xref ref-type="bibr" rid="ref59">58</xref>
                    </sup> random shuffling
                    <sup>
                        <xref ref-type="bibr" rid="ref65">64</xref>
                    </sup> or binomial proportion test
                    <sup>
                        <xref ref-type="bibr" rid="ref66">65</xref>
                    </sup> to test statistical significance, given with confidence intervals.
                    <sup>
                        <xref ref-type="bibr" rid="ref42">41</xref>
                    </sup> Further metrics included under &#x2018;Other&#x2019; were odds ratios,
                    <sup>
                        <xref ref-type="bibr" rid="ref67">66</xref>
                    </sup> normalised discounted cumulative gain,
                    <sup>
                        <xref ref-type="bibr" rid="ref45">44</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref68">67</xref>
                    </sup> &#x2018;sentences needed to screen per article&#x2019; in order to find one relevant sentence,
                    <sup>
                        <xref ref-type="bibr" rid="ref69">68</xref>
                    </sup> McNemar test,
                    <sup>
                        <xref ref-type="bibr" rid="ref66">65</xref>
                    </sup> C-statistic (with 95% CI) and Brier score (with 95% CI).
                    <sup>
                        <xref ref-type="bibr" rid="ref70">69</xref>
                    </sup> Barnett (2022)
                    <sup>
                        <xref ref-type="bibr" rid="ref71">70</xref>
                    </sup> extracted sample sizes and reported the mean difference between true and extracted numbers.</p>
                <p>Real-life evaluations, such as the percentage of outputs needing human correction, or time saved per article, were reported by four publications,
                    <xref ref-type="bibr" rid="ref32">
                        <sup>32</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref46">
                        <sup>46</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref150">
                        <sup>150</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref158">
                        <sup>158</sup>
                    </xref> and an evaluation as part of a wider screening system was done in another.
                    <xref ref-type="bibr" rid="ref71">
                        <sup>71</sup>
                    </xref> Notably, one of these papers evaluated their method in terms of helpfulness and time-taken and did a direct comparison with the existing Trialstreamer application, giving useful insights into practical aspects of using automation tools.
                    <xref ref-type="bibr" rid="ref150">
                        <sup>150</sup>
                    </xref>
                </p>
                <p>

                    <bold>

                        <italic toggle="yes">3.2.3 Type of data</italic>
</bold>
                </p>
                <p>3.2.3.1 Scope and data</p>
                <p>Most data extraction is carried out on abstracts (See Table A1 in 
                    <italic toggle="yes">Underlying data</italic>,
                    <xref ref-type="bibr" rid="ref127">
                        <sup>127</sup>
                    </xref> and the supplementary table giving an overview of all included publications). Abstracts are the most practical choice, due to the possibility of exporting them along with literature search results from databases such as MEDLINE. Within the 30 (26%) references that reported usage of full texts, most specifically mentioned that this also included abstracts. Due to unclear descriptions and lack of dataset publication it is unclear if all full texts included abstract text, but we assumed that all full texts included abstracts, and that all datasets including abstracts also included titles. Descriptions of the benefits of using full texts for data extraction include having access to a more complete dataset, while the benefits of using titles (N=4, 5%) include lower complexity for the data extraction task.
                    <xref ref-type="bibr" rid="ref43">
                        <sup>43</sup>
                    </xref> Xu et al. (2010)
                    <xref ref-type="bibr" rid="ref72">
                        <sup>72</sup>
                    </xref> exclusively used titles, while the three publications that specifically mentioned titles also used abstracts in their datasets.
                    <xref ref-type="bibr" rid="ref43">
                        <sup>43</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref73">
                        <sup>73</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref74">
                        <sup>74</sup>
                    </xref>
                </p>
                <p>
                    <xref ref-type="fig" rid="f4">Figure 5</xref> shows that RCTs are the most common study design texts used for data extraction in the included publications (see also extended Table A1 in 
                    <italic toggle="yes">Underlying data</italic>).
                    <xref ref-type="bibr" rid="ref127">
                        <sup>127</sup>
                    </xref> This is not surprising, because systematic reviews of interventions are the most common type of systematic review, and they are usually focusing on evidence from RCTs. Therefore, the literature for automation of data extraction focuses on RCTs, and their related PICO elements. Systematic reviews of diagnostic test accuracy are less frequent and currently, 5 (4%) publications report using data from diagnostic test papers. Previously only one included publication specifically focused on text and entities related to these studies,
                    <xref ref-type="bibr" rid="ref75">
                        <sup>75</sup>
                    </xref> while two mentioned diagnostic procedures among other fields of interest.
                    <xref ref-type="bibr" rid="ref35">
                        <sup>35</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref76">
                        <sup>76</sup>
                    </xref> During the 2024 update, two more publications were identified that included diagnostic test studies in their corpus among other study types, but only
                    <xref ref-type="bibr" rid="ref162">
                        <sup>162</sup>
                    </xref> specifically mined entities related to diagnostic tests.
                    <xref ref-type="bibr" rid="ref151">
                        <sup>151</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref162">
                        <sup>162</sup>
                    </xref> Twelve publications focused on extracting data specifically from epidemiology research, non-randomised interventional studies, or included text from cohort studies as well as RCT text. 
                    <xref ref-type="bibr" rid="ref48">
                        <sup>48</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref49">
                        <sup>49</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref61">
                        <sup>61</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref72">
                        <sup>72</sup>
                    </xref>
                    <sup>&#x2013;</sup>
                    <xref ref-type="bibr" rid="ref74">
                        <sup>74</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref76">
                        <sup>76</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref77">
                        <sup>77</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref151">
                        <sup>151</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref162">
                        <sup>162</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref165">
                        <sup>165</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref166">
                        <sup>166</sup>
                    </xref> More publications mining data from surveys, animal RCTs, or case series might have been found if our search and review had concentrated on these types of texts.</p>
                <fig fig-type="figure" id="f5" orientation="portrait" position="float">
                    <label>
Figure 5. </label>
                    <caption>
                        <title>The study types from which data were extracted.</title>
                        <p>Commonly, randomized controlled trials (RCT) text was at least one of the target text types used in the included publications.</p>
                    </caption>
                    <graphic id="gr5" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/179210/f066a662-12e8-4aff-8733-c7f9bda9d794_figure5.gif"/>
                </fig>
                <p>3.2.3.2 Data extraction targets</p>
                <p>Due to the high numbers of references cited in this section, we removed references for entities that appeared more than 10 times. The publications are still accessible and can be filtered via the map available on the review website (
                    <ext-link ext-link-type="uri" xlink:href="https://l-ena.github.io/living_review_data_extraction/">https://l-ena.github.io/living_review_data_extraction/</ext-link>) and in appendix 3.2. Mining P, IC, and O elements is the most common task performed in the literature of systematic review (semi-)automation (see Table A1 in 
                    <italic toggle="yes">Underlying data</italic>,
                    <xref ref-type="bibr" rid="ref127">
                        <sup>127</sup>
                    </xref> and 
                    <xref ref-type="fig" rid="f5">Figure 6</xref>). In the base-review, P was the most common entity. Currently, O (n=85, 72%) has become the most popular, due to the continuing trend of relation-extraction models that focus on the relationship between O and I entities and therefore may omit the automatic extraction of P. Some of the less-frequent data extraction targets in the literature can be categorised as sub-classes of a PICO,
                    <xref ref-type="bibr" rid="ref55">
                        <sup>55</sup>
                    </xref> for example, by annotating hierarchically multiple entity types such as health condition, age, and gender under the P class. The entity type &#x2018;P (Condition and disease)&#x2019;, was the most common entity closely related to the P class, appearing in 18 included publications.</p>
                <fig fig-type="figure" id="f6" orientation="portrait" position="float">
                    <label>
Figure 6. </label>
                    <caption>
                        <title>The most common entities, as extracted in the included publications.</title>
                        <p>More than one entity type per publication is common, which means that the total number of included publications (n = 76) is lower than the sum of counts within this figure. P, population; I, intervention; C, comparison; O, outcome.</p>
                    </caption>
                    <graphic id="gr6" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/179210/f066a662-12e8-4aff-8733-c7f9bda9d794_figure6.gif"/>
                </fig>
                <p>A notable trend within the latest review update was 23 publications now annotating or working with datasets that differentiated between intervention and control arms; fourteen of these published during or after 2022. This trend can be attributed towards relation extraction and summarisation tasks requiring this type of data. It is still common for I and C being merged for straightforward entity or sentence extraction (n=71, 61%). Most data extraction approaches focused on recognising instances of entity or sentence classes, and a small number of publications went one step further to normalise to actual concepts and including data sources such as UMLS (Unified Medical Language System).
                    <xref ref-type="bibr" rid="ref35">
                        <sup>35</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref39">
                        <sup>39</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref59">
                        <sup>59</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref73">
                        <sup>73</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref85">
                        <sup>85</sup>
                    </xref>
                </p>
                <p>The &#x2018;Other&#x2019; category includes some more detailed drug annotations
                    <xref ref-type="bibr" rid="ref65">
                        <sup>65</sup>
                    </xref> or information such as confounders
                    <xref ref-type="bibr" rid="ref49">
                        <sup>49</sup>
                    </xref> and other entity types (see the full dataset in 
                    <italic toggle="yes">Underlying data</italic> for more information
                    <xref ref-type="bibr" rid="ref127">
                        <sup>127</sup>
                    </xref>).</p>
            </sec>
            <sec id="sec3.3">
                <label>3.3</label>
                <title>Results from the data extraction: Secondary items of interest</title>
                <p>

                    <bold>

                        <italic toggle="yes">3.3.1 Granularity of data extraction</italic>
</bold>
                </p>
                <p>A total of 86 publications (73%) extracted at least one type of information at the entity level, while 59 publications (50%) used sentence level (see Table A1 extended version in 
                    <italic toggle="yes">Underlying data</italic>
                    <xref ref-type="bibr" rid="ref127">
                        <sup>127</sup>
                    </xref>). We defined the entity level as any number of words that is shorter than a whole sentence, e.g., noun-phrases or other chunked text. Data types such as P, IC, or O commonly appeared to be extracted on both entity and sentence level, whereas &#x2018;N&#x2019;, the number of people participating in a study, was commonly extracted on entity level only.</p>
                <p>

                    <bold>

                        <italic toggle="yes">3.3.2 Type of input</italic>
</bold>
                </p>
                <p>The majority of publications and benchmark corpora mentioned MEDLINE, via PubMed, as the data source for text. Text files (n = 99), next to XML (n = 12), or HTML (n = 3), are the most common format of the data downloaded from these sources. Therefore, most systems described using, or were assumed to use, text files as input data. Ten included publications described using PDF files as input.
                    <xref ref-type="bibr" rid="ref44">
                        <sup>44</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref46">
                        <sup>46</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref59">
                        <sup>59</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref68">
                        <sup>68</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref75">
                        <sup>75</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref81">
                        <sup>81</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref86">
                        <sup>86</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref158">
                        <sup>158</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref161">
                        <sup>161</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref173">
                        <sup>173</sup>
                    </xref>
                </p>
                <p>

                    <bold>

                        <italic toggle="yes">3.3.3 Type of output</italic>
</bold>
                </p>
                <p>An increasing number of publications described structured summaries as output of their extracted data (n = 20, increasing trend between LSR updates). Alternatives to exporting structured summaries were JSON (n = 4), XML, and HTML (n = 2 each). Three publications mentioned structured data outputs in the form of an ontology or knowledge graph.
                    <xref ref-type="bibr" rid="ref51">
                        <sup>51</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref88">
                        <sup>88</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref137">
                        <sup>137</sup>
                    </xref> Most publications mentioned only classification scores without specifying an output type. In these cases, we assumed that the output could be saved as text files, for example as entity token/span annotations or lists of sentences (n = 102).</p>
            </sec>
            <sec id="sec3.4">
                <label>3.4</label>
                <title>Assessment of the quality of reporting</title>
                <p>In the base-review we used a list of 17 items to investigate reproducibility, transparency, description of testing, data availability, and internal and external validity of the approaches in each publication. The maximum and minimum number of items that were positively rated were 16 and 1, respectively, with a median of 10 (see Table A1 in 
                    <italic toggle="yes">Underlying data</italic>).
                    <sup>
                        <xref ref-type="bibr" rid="ref128">127</xref>
                    </sup> Scores were added up and calculated based on the data provided in Appendix A and D (see 
                    <italic toggle="yes">Underlying data</italic>),
                    <sup>
                        <xref ref-type="bibr" rid="ref128">127</xref>
                    </sup> using the sum and median functions integrated in Excel. Publications from recent years up to 2021 showed a trend towards more complete and clear reporting.</p>
                <p>

                    <bold>

                        <italic toggle="yes">3.4.1 Reproducibility</italic>
</bold>
                </p>
                <p>3.4.1.1 Are the sources for training/testing data reported?</p>
                <p>Of the included publications in the base-review, 50 out of 53 (94%) clearly stated the sources of their data used for training and evaluation. MEDLINE was the most popular source of data, with abstracts usually described as being retrieved via searches on PubMed, or full texts from PubMed Central. A small number of publications described using text from specific journals such as PLoS Clinical Trials, New England Journal of Medicine, The Lancet, or BMJ.
                    <sup>
                        <xref ref-type="bibr" rid="ref57">56</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref84">83</xref>
                    </sup> Texts and metadata from Cochrane, either provided in full or retrieved via PubMed, were used in five publications.
                    <sup>
                        <xref ref-type="bibr" rid="ref58">57</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref60">59</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref69">68</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref76">75</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref87">86</xref>
                    </sup> Corpora such as the ebm-nlp dataset,
                    <sup>
                        <xref ref-type="bibr" rid="ref56">55</xref>
                    </sup> or PubMed-PICO
                    <sup>
                        <xref ref-type="bibr" rid="ref55">54</xref>
                    </sup> are available for direct download. Publications published in recent years are increasingly reporting that they are using these benchmark datasets rather than creating and annotating their own corpora (see 4 for more details).</p>
                <p>3.4.1.2 If pre-processing techniques were applied to the data, are they described?</p>
                <p>Of the included publications in the base-review, 47 out of 53 (89%) reported processing the textual data before applying/training algorithms for data extraction. Different types of pre-processing, with representative examples for usage and implementation, are listed in 
                    <xref ref-type="table" rid="T1">
Table 1</xref> below.</p>
                <table-wrap id="T1" orientation="portrait" position="float">
                    <label>
Table 1. </label>
                    <caption>
                        <title>Pre-processing techniques, a short description and examples from the literature.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Technique</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Details</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">
Example in literature</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Tokenisation</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Splitting text on sentence and word level</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref57">56</xref>
                                    </sup>
                                    <sup>,</sup>
                                    <sup>
                                        <xref ref-type="bibr" rid="ref84">83</xref>
                                    </sup>
                                    <sup>,</sup>
                                    <sup>
                                        <xref ref-type="bibr" rid="ref89">88</xref>
                                    </sup>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Normalisation</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Replacing integers, units, dates, lower-casing
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref66">65</xref>
                                    </sup>
                                    <sup>,</sup>
                                    <sup>
                                        <xref ref-type="bibr" rid="ref90">89</xref>
                                    </sup>
                                    <sup>,</sup>
                                    <sup>
                                        <xref ref-type="bibr" rid="ref91">90</xref>
                                    </sup>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Lemmatisation and stemming</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Reducing words to shorter or more common forms</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref54">53</xref>
                                    </sup>
                                    <sup>,</sup>
                                    <sup>
                                        <xref ref-type="bibr" rid="ref92">91</xref>
                                    </sup>
                                    <sup>,</sup>
                                    <sup>
                                        <xref ref-type="bibr" rid="ref93">92</xref>
                                    </sup>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Stop-word removal</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Removing common words, such as &#x2018;the&#x2019;, from the text</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref45">44</xref>
                                    </sup>
                                    <sup>,</sup>
                                    <sup>
                                        <xref ref-type="bibr" rid="ref49">48</xref>
                                    </sup>
                                    <sup>,</sup>
                                    <sup>
                                        <xref ref-type="bibr" rid="ref81">80</xref>
                                    </sup>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Part-of-speech tagging and dependency parsing</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Tagging words with their respective grammatical roles</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref42">41</xref>
                                    </sup>
                                    <sup>,</sup>
                                    <sup>
                                        <xref ref-type="bibr" rid="ref79">78</xref>
                                    </sup>
                                    <sup>,</sup>
                                    <sup>
                                        <xref ref-type="bibr" rid="ref89">88</xref>
                                    </sup>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Chunking</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Defining sentence parts, such as noun-phrases
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref66">65</xref>
                                    </sup>
                                    <sup>,</sup>
                                    <sup>
                                        <xref ref-type="bibr" rid="ref77">76</xref>
                                    </sup>
                                    <sup>,</sup>
                                    <sup>
                                        <xref ref-type="bibr" rid="ref94">93</xref>
                                    </sup>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Concept tagging</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Processing and tagging words with semantic classes or concepts, e.g. using word lists or MetaMap</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref76">75</xref>
                                    </sup>
                                    <sup>,</sup>
                                    <sup>
                                        <xref ref-type="bibr" rid="ref80">79</xref>
                                    </sup>
                                    <sup>,</sup>
                                    <sup>
                                        <xref ref-type="bibr" rid="ref95">94</xref>
                                    </sup>
                                </td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>After the publication of the base-review, transformer models such as BERT became dominant in the literature (see 
                    <xref ref-type="fig" rid="f2">Figure 3</xref>). With their word-piece vocabulary, contextual embeddings, and self-supervised pre-training on large unlabelled corpora these models have essentially removed the need for most pre-processing beyond automatically-applied lower-casing.
                    <xref ref-type="bibr" rid="ref14">
                        <sup>14</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref31">
                        <sup>31</sup>
                    </xref> LLM-based methods that do not require pre-processing emerged during the 2024 update. We are therefore not going to update this table in this, or any future iterations of this LSR. We leave it for reference to publications that may still use these methods in the future.</p>
                <p>

                    <bold>

                        <italic toggle="yes">3.4.2 Transparency of methods</italic>
</bold>
                </p>
                <p>3.4.2.1 Is there a description of the algorithms used?</p>
                <p>
                    <xref ref-type="fig" rid="f7">
Figure 7</xref> shows that 43 out of 53 publications in the base-review (81%) provided descriptions of their data extraction algorithm. In the case of machine learning and neural networks, we looked for a description of hyperparameters and feature generation, and for the details of implementation (e.g. the machine-learning framework). Hyperparameters were rarely described in full, but if the framework (e.g., Scikit-learn, Mallet, or Weka) was given, in addition to a description of implementation and important parameters for each classifier, then we rated the algorithm as fully described. For rule-based methods we looked for a description of how rules were derived, and for a list of full or representative rules given as examples. Where multiple data extraction approaches were described, we gave a positive rating if the best-performing approach was described.</p>
                <fig fig-type="figure" id="f7" orientation="portrait" position="float">
                    <label>
Figure 7. </label>
                    <caption>
                        <title>Bar chart, showing the levels of algorithm description in the included publications.</title>
                    </caption>
                    <graphic id="gr7" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/179210/f066a662-12e8-4aff-8733-c7f9bda9d794_figure7.gif"/>
                </fig>
                <p>3.4.2.2 Is there a description of the dataset used and of its characteristics?</p>
                <p>
Of the included publications in the review updates, 109 out of 117 (93%) provided descriptions of their dataset and its characteristics. The decrease from 97% during the last review update can be attributed to a shared task with an unclear description of the dataset, as well as publications adapting existing benchmark datasets and not providing updated information.</p>
                <p>Most publications provided descriptions of the dataset(s) used for training and evaluation. The size of each dataset, as well as the frequencies of classes within the data, were transparent and described for most included publications. All dataset citations, along with a short description and availability of the data, are shown in 
                    <xref ref-type="table" rid="T4">
Table 4</xref>.</p>
                <p>3.4.2.3 Is there a description of the hardware used?</p>
                <p>Most included publications in the base-review did not report their hardware specifications, though five publications (9%) did. One, for example, applied their system to new, unlabeled data and reported that classifying the whole of PubMed takes around 20 hours using a graphics processing unit (GPU).
                    <xref ref-type="bibr" rid="ref69">
                        <sup>69</sup>
                    </xref> In another example, the authors reported using Google Colab GPUs, along with estimates of computing time for different training settings.
                    <xref ref-type="bibr" rid="ref95">
                        <sup>95</sup>
                    </xref> In the 2024 update, one LLM publication described using the OpenAI Batch API to process 682,000 RCT abstracts from PubMed, costing $3390 and requiring &lt;3 hours.
                    <xref ref-type="bibr" rid="ref164">
                        <sup>164</sup>
                    </xref>
                </p>
                <p>3.4.2.4 Is the source code available?</p>
                <p>
                    <xref ref-type="fig" rid="f8">Figure 8</xref> shows that most of the included publications did not provide any source code. Currently, 49 (42%) of all included publications included links to code, two additional publications provided model weights or selected parts of the code. There was a very strong trend towards better code-availabilty in the publications between base=review and the first update (n=19 published code, 83% of the new publications provided code). For the current review update, the open-source trend still continued, but has weakened due to LLM-based methods such as zero-shot prompting not requiring classic programming. We did count LLM-based publications as having provided code if they provided prompts and parameters. However, 7 out of 13 LLM publications that relied on zero-shot prompting did not provide sufficient information. Publications that did provide the source code were exclusively published or last updated in the last seven years. GitHub is the most popular platform for making code accessible. Some publications also provided links to notebooks on Google Colab, which is a cloud-based platform to develop and execute code online. Two publications provided access to parts of the code, or access was restricted. A full list of code repositories from the included publications is available in 
                    <xref ref-type="table" rid="T2">Table 2</xref>.</p>
                <fig fig-type="figure" id="f8" orientation="portrait" position="float">
                    <label>
Figure 8. </label>
                    <caption>
                        <title>This chart shows the extent to which included publications provided access to their source code.</title>
                    </caption>
                    <graphic id="gr8" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/179210/f066a662-12e8-4aff-8733-c7f9bda9d794_figure8.gif"/>
                </fig>
                <table-wrap id="T2" orientation="portrait" position="float">
                    <label>
Table 2. </label>
                    <caption>
                        <title>Repositories containing source code for the included publications.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Publication</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Code</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">LSR</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref82">81</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/ijmarshall/robotreviewer">https://github.com/ijmarshall/robotreviewer</ext-link>, older version: 
                                    <ext-link ext-link-type="uri" xlink:href="https://figshare.com/articles/Spa/997707">https://figshare.com/articles/Spa/997707</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Base</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref97">96</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/jind11/LSTM-PICO-Detection">https://github.com/jind11/LSTM-PICO-Detection</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Base</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref56">55</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/bepnye/EBM-NLP">https://github.com/bepnye/EBM-NLP
</ext-link>

                                    <ext-link ext-link-type="uri" xlink:href="https://colab.research.google.com/drive/1Ir52OmkJ2C_Iy9V_eS-_KFVLircJ4MXp">https://colab.research.google.com/drive/1Ir52OmkJ2C_Iy9V_eS-_KFVLircJ4MXp</ext-link>

                                    <break/>

                                    <ext-link ext-link-type="uri" xlink:href="https://colab.research.google.com/drive/1YbbQojM147Ybt1nEcyoXTqlvefmwMg-q">https://colab.research.google.com/drive/1YbbQojM147Ybt1nEcyoXTqlvefmwMg-q
</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Base</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref55">54</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/jind11/Deep-PICO-Detection">https://github.com/jind11/Deep-PICO-Detection</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Base</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref98">97</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://ii.nlm.nih.gov/DataSets/index.shtml">https://ii.nlm.nih.gov/DataSets/index.shtml</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Base</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref86">85</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/Tian312/PICO_Parser">https://github.com/Tian312/PICO_Parser</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Base</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref96">95</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/L-ENA/HealthINF2020">https://github.com/L-ENA/HealthINF2020</ext-link>

                                    <break/>

                                    <ext-link ext-link-type="uri" xlink:href="https://www.kaggle.com/lenaschmidt0493/qa-integrated-biomedical-ner-classifier-for-pico">https://www.kaggle.com/lenaschmidt0493/qa-integrated-biomedical-ner-classifier-for-pico</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Base</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref70">69</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/ijmarshall/trialstreamer">https://github.com/ijmarshall/trialstreamer</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Base</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref48">47</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Unclear if Java code is accessible, pending user access: 
                                    <ext-link ext-link-type="uri" xlink:href="https://semrep.nlm.nih.gov/SemRep.v1.8_Installation.html#Download">https://semrep.nlm.nih.gov/SemRep.v1.8_Installation.html#Download</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Base</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref76">75</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Used public Google implementation of transformers + 
                                    <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/1303259#.X4wSoaySk2w">https://zenodo.org/record/1303259#.X4wSoaySk2w</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Base</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref61">60</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/smileslab/Brain_Aneurysm_Research/tree/master/BioMed_Summarizer">https://github.com/smileslab/Brain_Aneurysm_Research/tree/master/BioMed_Summarizer</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref75">74</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/nstylia/pico_entities/">https://github.com/nstylia/pico_entities/</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref99">98</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/wds-seu/Aceso">https://github.com/wds-seu/Aceso</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref63">62</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/jayded/evidence-inference">https://github.com/jayded/evidence-inference
</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref62">61</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/allenai/ms2">https://github.com/allenai/ms2</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref100">99</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/Tian312/MD-Attention">https://github.com/Tian312/MD-Attention
</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref39">38</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/kilicogluh/CONSORT-TM">https://github.com/kilicogluh/CONSORT-TM
</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref36">35</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/lcampillos/Medical-NER">https://github.com/lcampillos/Medical-NER
</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref37">36</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://gitlab.com/tomaye/ecai2020-transformer_based_am">https://gitlab.com/tomaye/ecai2020-transformer_based_am
</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref51">50</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/jetsunwhitton/RCT-ART">https://github.com/jetsunwhitton/RCT-ART
</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref35">34</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/LivNLP/ODP-tagger">https://github.com/LivNLP/ODP-tagger
</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref34">33</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://data.mendeley.com/datasets/ccfnn3jb2x/1">https://data.mendeley.com/datasets/ccfnn3jb2x/1</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref83">82</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://osf.io/2dqcg/">https://osf.io/2dqcg/</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref52">51</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/6365890">https://zenodo.org/record/6365890</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref46">45</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/gauravsc/pico-tagging">https://github.com/gauravsc/pico-tagging
</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref68">67</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/MichealAbaho/Label-Context-Aware-Attention-Model">https://github.com/MichealAbaho/Label-Context-Aware-Attention-Model</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref101">100</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/evidence-surveillance/sent2span">https://github.com/evidence-surveillance/sent2span</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref71">70</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/6647853#.ZBnpLXbP2Uk">https://zenodo.org/record/6647853#.ZBnpLXbP2Uk</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <sup>
                                        <xref ref-type="bibr" rid="ref38">37</xref>
                                    </sup>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/anjani-dhrangadhariya/distant-PICO">https://github.com/anjani-dhrangadhariya/distant-PICO
</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref130">
                                        <sup>130</sup>
                                    </xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/allenai/scibert/">https://github.com/allenai/scibert/</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update 2</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref134">
                                        <sup>134</sup>
                                    </xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/anjani-dhrangadhariya/distant-studytype/tree/master">https://github.com/anjani-dhrangadhariya/distant-studytype/tree/master</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update 2</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref138">
                                        <sup>138</sup>
                                    </xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/Sreyan88/BioAug">https://github.com/Sreyan88/BioAug</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update 2</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref147">
                                        <sup>147</sup>
                                    </xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/UDICatNCHU/Scientific-Literature-Sentence-Classification-by-BERT-based-Reading-Comprehension">https://github.com/UDICatNCHU/Scientific-Literature-Sentence-Classification-by-BERT-based-Reading-Comprehension</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update 2</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref148">
                                        <sup>148</sup>
                                    </xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/ScienceNLP-Lab/RCT-Transparency">https://github.com/ScienceNLP-Lab/RCT-Transparency</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update 2</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref156">
                                        <sup>156</sup>
                                    </xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">partly, RE patterns in python code for some elements available: 
                                    <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/frai.2024.1454945/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/frai.2024.1454945/full#supplementary-material</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update 2</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref169">
                                        <sup>169</sup>
                                    </xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Partly, model weights available but no code: 
                                    <ext-link ext-link-type="uri" xlink:href="https://aka.ms/huggingface">https://aka.ms/huggingface</ext-link> PubMedBERT: 
                                    <ext-link ext-link-type="uri" xlink:href="https://aka.ms/pubmedbert">https://aka.ms/pubmedbert</ext-link> PubMedBERT-LARGE: 
                                    <ext-link ext-link-type="uri" xlink:href="https://aka.ms/pubmedbert-large">https://aka.ms/pubmedbert-large</ext-link> PubMedELECTRA: 
                                    <ext-link ext-link-type="uri" xlink:href="https://aka.ms/pubmedelectra">https://aka.ms/pubmedelectra</ext-link> PubMedELECTRA-LARGE: 
                                    <ext-link ext-link-type="uri" xlink:href="https://aka.ms/pubmedelectra-large">https://aka.ms/pubmedelectra-large</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update 2</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref136">
                                        <sup>136</sup>
                                    </xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/CSU-NLP-Group/Sequential-Sentence-Classification">https://github.com/CSU-NLP-Group/Sequential-Sentence-Classification</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update 2</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref138">
                                        <sup>138</sup>
                                    </xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/shrimonmuke0202/AlpaPICO.git">https://github.com/shrimonmuke0202/AlpaPICO.git</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update 2</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref143">
                                        <sup>143</sup>
                                    </xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/kellyhoang0610/RCTMethodologyIE">https://github.com/kellyhoang0610/RCTMethodologyIE</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update 2</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref145">
                                        <sup>145</sup>
                                    </xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/BIDS-Xu-Lab/section_specific_annotation_of_PICO/tree/main">https://github.com/BIDS-Xu-Lab/section_specific_annotation_of_PICO/tree/main</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update 2</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref150">
                                        <sup>150</sup>
                                    </xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/WengLab-InformaticsResearch/EvidenceMap_Model">https://github.com/WengLab-InformaticsResearch/EvidenceMap_Model</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update 2</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref154">
                                        <sup>154</sup>
                                    </xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/applebyboy/SEEtrials (no prompts given)">https://github.com/applebyboy/SEEtrials (no prompts given)</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update 2</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref161">
                                        <sup>161</sup>
                                    </xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/TakedaGME/MedTrialExtractor/">https://github.com/TakedaGME/MedTrialExtractor/</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update 2</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref176">
                                        <sup>176</sup>
                                    </xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/records/10419786">https://zenodo.org/records/10419786</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update 2</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref135">
                                        <sup>135</sup>
                                    </xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/anjani-dhrangadhariya/distant-cto">https://github.com/anjani-dhrangadhariya/distant-cto</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update 2</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref179">
                                        <sup>179</sup>
                                    </xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/WengLab-InformaticsResearch/PICOX">https://github.com/WengLab-InformaticsResearch/PICOX</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update 2</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref149">
                                        <sup>149</sup>
                                    </xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/lilywchen/FactPICO">https://github.com/lilywchen/FactPICO</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update 2</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref177">
                                        <sup>177</sup>
                                    </xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/hyesunyun/llm-meta-analysis">https://github.com/hyesunyun/llm-meta-analysis</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update 2</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref165">
                                        <sup>165</sup>
                                    </xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/L-ENA/ES-hackathon-GPT-evaluation">https://github.com/L-ENA/ES-hackathon-GPT-evaluation</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update 2</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref137">
                                        <sup>137</sup>
                                    </xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/smileslab/EBM_Automated_KG/tree/main">https://github.com/smileslab/EBM_Automated_KG/tree/main</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update 2</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref138">
                                        <sup>138</sup>
                                    </xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Available under: 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/shrimonmuke0202/EBM-PICO">https://github.com/shrimonmuke0202/EBM-PICO</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Update 2</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>

                    <bold>

                        <italic toggle="yes">3.4.3 Testing</italic>
</bold>
                </p>
                <p>3.4.3.1 Is there a justification/an explanation of the model assessment?</p>
                <p>Of the included publications in the base-review, 47 out of 53 (89%) gave a detailed assessment of their data extraction algorithms. We rated this item as negative if only the performance scores were given, i.e., if no error analysis was performed and no explanations or examples were given to illustrate model performance. In most publications a brief error analysis was common, for example discussions on representative examples for false negatives and false positives,
                    <sup>
                        <xref ref-type="bibr" rid="ref48">47</xref>
                    </sup> major error sources
                    <sup>
                        <xref ref-type="bibr" rid="ref91">90</xref>
                    </sup> or highlighting errors with respect to every entity class.
                    <sup>
                        <xref ref-type="bibr" rid="ref77">76</xref>
                    </sup> Both Refs.
                    <xref ref-type="bibr" rid="ref53">52</xref>, 
                    <xref ref-type="bibr" rid="ref54">53</xref> used structured and unstructured abstracts, and therefore discussed the implications of unstructured text data for classification scores.</p>
                <p>A small number of publications did a real-life assessment, where the data extraction algorithm was applied to different, unlabelled, and often much larger datasets, tested while conducting actual systematic reviews, or evaluated in other practical scenarios.
                    <xref ref-type="bibr" rid="ref46">
                        <sup>46</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref58">
                        <sup>58</sup>
                    </xref> 
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref63">
                        <sup>63</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref69">
                        <sup>69</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref48">
                        <sup>48</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref95">
                        <sup>95</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref101">
                        <sup>101</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref102">
                        <sup>102</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref150">
                        <sup>150</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref178">
                        <sup>178</sup>
                    </xref>
                </p>
                <p>3.4.3.2 Are basic metrics reported (true/false positives and negatives)?</p>
                <p>
                    <xref ref-type="fig" rid="f9">Figure 9</xref> shows the extent to which all raw basic metrics, such as true-positives, were reported in the included publications in the LSR update. In most publications (n = 99) these basic metrics are not reported, and there is a trend between base-review and this update towards not reporting these anymore. However, basic metrics could be obtained since many new included publications made source code available and used publicly available datasets. When dealing with entity-level data extraction it can be challenging to define the quantity of true negative entities. This is true especially if entities are labelled and extracted based on text chunks, because there can be many combinations of phrases and tokens that constitute an entity.
                    <xref ref-type="bibr" rid="ref47">
                        <sup>47</sup>
                    </xref> This problem was solved in more recent publications by conducting a token-based evaluation that computes scores across every single token, hence gaining the ability to score partial matches for multi-word entities.
                    <xref ref-type="bibr" rid="ref55">
                        <sup>55</sup>
                    </xref>
                </p>
                <fig fig-type="figure" id="f9" orientation="portrait" position="float">
                    <label>
Figure 9. </label>
                    <caption>
                        <title>Reporting of basic metrics (true positive, false positive, true negative, and false negative).</title>
                        <p>For each included paper. More than one selection is possible, which means that the total number of included publications (n=117) is lower than the sum of counts within this figure.</p>
                    </caption>
                    <graphic id="gr9" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/179210/f066a662-12e8-4aff-8733-c7f9bda9d794_figure9.gif"/>
                </fig>
                <p>3.4.3.3 Does the assessment include any information about trade-offs between recall or precision (also known as sensitivity and positive predictive value)?</p>
                <p>Of the included publications in the base-review, 17 out of 53 (32%) described trade-offs or provided plots or tables showing the development of evaluation scores if certain parameters were altered or relaxed. Recall (i.e., sensitivity) is often described as the most important metric for systematic review automation tasks, as it is a methodological demand that systematic reviews do not exclude any eligible data.</p>
                <p>References 
                    <xref ref-type="bibr" rid="ref57">56</xref> and 
                    <xref ref-type="bibr" rid="ref77">76</xref> showed how the decision of extracting the top two or N predictions impacts the evaluation scores, for example precision or recall. Reference 
                    <xref ref-type="bibr" rid="ref103">102</xref> shows precision-recall plots for different classification thresholds. Reference 
                    <xref ref-type="bibr" rid="ref73">72</xref> shows four cut-offs, whereas Ref. 
                    <xref ref-type="bibr" rid="ref96">95</xref> shows different probability thresholds for their classifier, and describe the impacts of this on precision, recall, and F1 curves.</p>
                <p>Some machine-learning architectures need to convert text into features before performing classification. A feature can be, for example, the number of times that a certain word occurs, or the length of an abstract. The number of features used, e.g. for CRF algorithms, which was given in multiple publications,
                    <sup>
                        <xref ref-type="bibr" rid="ref93">92</xref>
                    </sup> together with a discussion of classifiers that should be used in high recall is needed.
                    <sup>
                        <xref ref-type="bibr" rid="ref43">42</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref104">103</xref>
                    </sup> show ROC curves quantifying the amount of training data and its impact on the scores.</p>
                <p>

                    <bold>

                        <italic toggle="yes">3.4.4 Availability of the final model or tool</italic>
</bold>
                </p>
                <p>3.4.4.1 Can we obtain a runnable version of the software based on the information in the publication?</p>
                <p>Compiling and testing code from every publication is outside the scope of this review. Instead, in 
                    <xref ref-type="fig" rid="f9">Figure 10</xref> and 
                    <xref ref-type="table" rid="T3">Table 3</xref> we recorded the publications where a (web) interface or finished application was available. Counting RobotReviewer and Trialstreamer as separate projects, 11 (9%) of the included publications had an application associated with it, but only 7 are usable via web-apps. Applications were available as open-source, completely free, or free basic versions with optional features that can be purchased or subscribed to.</p>
                <fig fig-type="figure" id="f10" orientation="portrait" position="float">
                    <label>
Figure 10. </label>
                    <caption>
                        <title>Publications that provide applications with user interface.</title>
                    </caption>
                    <graphic id="gr10" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/179210/f066a662-12e8-4aff-8733-c7f9bda9d794_figure10.gif"/>
                </fig>
                <table-wrap id="T3" orientation="portrait" position="float">
                    <label>
Table 3. </label>
                    <caption>
                        <title>Publications that provide user interfaces to their final data extraction system.</title>
                        <p>Some tools are predominantly useful to support search by providing a dataset with pre-extracted data. Others allow users to analyse and mine their own data.</p>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Paper</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Access</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Note</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref43">42</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://ihealth.uemc.es/">https://ihealth.uemc.es/</ext-link>
</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref44">43</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://www.tripdatabase.com/#pico">https://www.tripdatabase.com/#pico</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">For search</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref45">44</xref>,
                                    <xref ref-type="bibr" rid="ref45">81</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://www.robotreviewer.net/">https://www.robotreviewer.net/</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Analysis of own data</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref47">46</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://exact.cluster.gctools.nrc.ca/ExactDemo/">https://exact.cluster.gctools.nrc.ca/ExactDemo/</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Analysis of own data</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref48">47</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://semrep.nlm.nih.gov/SemRep.v1.8_Installation.html">https://semrep.nlm.nih.gov/SemRep.v1.8_Installation.html</ext-link>, SemMed is a web-based application published after this publication was released: 
                                    <ext-link ext-link-type="uri" xlink:href="https://skr3.nlm.nih.gov/SemMed/semmed.html">https://skr3.nlm.nih.gov/SemMed/semmed.html</ext-link>
</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref70">69</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Database with all extracted data is available online: 
                                    <ext-link ext-link-type="uri" xlink:href="https://trialstreamer.robotreviewer.net/">https://trialstreamer.robotreviewer.net/</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">For search</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref59">58</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Pending: article mentions that an app is being implemented.</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref37">36</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="http://ns.inria.fr/acta/">http://ns.inria.fr/acta/</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Search and analysis of own data</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref83">82</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">App code for own deployment available here: 
                                    <ext-link ext-link-type="uri" xlink:href="https://osf.io/2dqcg/">https://osf.io/2dqcg/</ext-link>
</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref171">171</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="http://ico-relations.ebm-nlp.com/">http://ico-relations.ebm-nlp.com/</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">For search</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref174">174</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://www.scantrials.com/">https://www.scantrials.com/</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">For search</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>3.4.4.2 Persistence: Can data be retrieved based on the information given in the publication?</p>
                <p>We observed an increasing trend of dataset availability and publications re-using benchmark corpora. Only seven of the included publications in the base-review (13%) made their datasets publicly available, out of the 36 unique corpora found then. At the previous update we found 55 publications with unique corpora, with 23 available online. 40 Publications reported using one or more of these datasets in the previous version of this LSR.</p>
                <p>For the 2024 update we again observed wide adoption of benchmark datasets but also usage of adapted and re-labeled versions of the benchmarks. In total, we found 76 publications mentioning unique datasets. Of these, 33 publications provide links to access the datasets. These datasets were then mentioned by 63 downstream papers included in this review. These number may seem high, but can be explained by many publications employing more than one dataset for validation. 
                    <xref ref-type="table" rid="T4">
Table 4</xref> shows a summary of the corpora, their size, classes, links to the datasets, and cross-reference to known publications re-using each dataset. For the base review, we collected the corpora, provide a central link to all datasets, and planned to add datasets as they become available during the life span of this living review (see 
                    <italic toggle="yes">Underlying data</italic>
                    <xref ref-type="bibr" rid="ref127">
                        <sup>127</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref128">
                        <sup>128</sup>
                    </xref> below). Due to the increased number of available corpora we stopped downloading the data and provide links instead. When a dataset is made freely available without barriers (i.e., direct downloads of text and labels), then any researcher can re-use the data and publish results from different models, which become comparable to one another. Copyright issues surrounding data sharing were noted by Ref. 
                    <xref ref-type="bibr" rid="ref75">75</xref>, therefore they shared the gold-standard annotations used as training or evaluation data and information on how to obtain the texts.</p>
                <p>3.4.4.3 Is the use of third-party frameworks reported and are they accessible?</p>
                <p>Of the included publications in the base-review, 47 out of 53 (88%) described using at least one third-party framework for their data extraction systems. The following list is likely to be incomplete, due to non-available code and incomplete reporting in the included publications. Most commonly, there was a description of machine-learning toolkits (Mallet, N = 12; Weka, N = 6; tensorflow, N = 5; scikit-learn, N = 3). Natural language processing toolkits such as Stanford parser/CoreNLP (N = 12) or NLTK (N = 3), were also commonly reported for the pre-processing and dependency parsing steps within publications. The MetaMap tool was used in nine publications, and the GENIA tagger in four. For the complete list of frameworks please see Appendix A and D in 
                    <italic toggle="yes">Underlying data.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref128">127</xref>
                    </sup>
                </p>
                <p>

                    <bold>

                        <italic toggle="yes">3.4.5 Internal and external validity of the model</italic>
</bold>
                </p>
                <p>3.4.5.1 Does the dataset or assessment measure provide a possibility to compare to other tools in the same domain?</p>
                <p>With this item we aimed to assess publications to see if the evaluation results from models are comparable with the results from other models. Ideally, a publication would have reported the results of another classification model on the same dataset, either by re-implementing the model themselves
                    <sup>
                        <xref ref-type="bibr" rid="ref97">96</xref>
                    </sup> or by describing results of other models when using benchmark datasets.
                    <sup>
                        <xref ref-type="bibr" rid="ref65">64</xref>
                    </sup> This was rarely the case for the publications in the base-review, as most datasets were curated and used in single publications only. However, the re-use of benchmark corpora increased with the publications in the LSR updates, where we found 63 publications that report results on one of the previously published benchmark datasets (see 
                    <xref ref-type="table" rid="T4">
Table 4</xref>).</p>
                <table-wrap id="T4" orientation="portrait" position="float">
                    <label>
Table 4. </label>
                    <caption>
                        <title>Corpora used in the included publications.</title>
                        <p>RCT, randomized controlled trials; IR, information retrieval; PICO, population, intervention, comparison, outcome; UMLS, unified medical language system.</p>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Publication</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Also used by</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">
Name</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">
Description</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Classes</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">
Size/type</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">
Availability</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Note</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref97">96</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref39">39</xref>,
                                    <xref ref-type="bibr" rid="ref54">54</xref>,
                                    <xref ref-type="bibr" rid="ref87">87</xref>,
                                    <xref ref-type="bibr" rid="ref95">95</xref>,
                                    <xref ref-type="bibr" rid="ref98">98</xref>,
                                    <xref ref-type="bibr" rid="ref136">136</xref> Dataset adaptations: 
                                    <xref ref-type="bibr" rid="ref60">60</xref>, 
                                    <xref ref-type="bibr" rid="ref167">167</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">PubMedPICO</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Automatically labelled sentence labels from structured abstracts up to Aug&#x2019;17</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC, O, Method</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">24,668 abstracts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Yes, 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/jind11/PubMed-PICO-Detection">https://github.com/jind11/PubMed-PICO-Detection</ext-link>
</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref56">55</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref33">32</xref>,
                                    <xref ref-type="bibr" rid="ref33">33</xref>,
                                    <xref ref-type="bibr" rid="ref36">36</xref>,
                                    <xref ref-type="bibr" rid="ref61">61</xref>,
                                    <xref ref-type="bibr" rid="ref74">74</xref>,
                                    <xref ref-type="bibr" rid="ref85">85</xref>,
                                    <xref ref-type="bibr" rid="ref95">95</xref>,
                                    <xref ref-type="bibr" rid="ref98">98</xref>,
                                    <xref ref-type="bibr" rid="ref100">100</xref>,
                                    <xref ref-type="bibr" rid="ref106">106</xref>,
                                    <xref ref-type="bibr" rid="ref130">130</xref>,
                                    <xref ref-type="bibr" rid="ref135">135</xref>,
                                    <xref ref-type="bibr" rid="ref138">138</xref>,
                                    <xref ref-type="bibr" rid="ref140">140</xref>,
                                    <xref ref-type="bibr" rid="ref157">157</xref>,
                                    <xref ref-type="bibr" rid="ref165">165</xref>,
                                    <xref ref-type="bibr" rid="ref178">178</xref>,
                                    <xref ref-type="bibr" rid="ref179">179</xref>, Via BLURB-Benchmark: 
                                    <xref ref-type="bibr" rid="ref132">132</xref>, 
                                    <xref ref-type="bibr" rid="ref169">169</xref> Dataset adaptions: 
                                    <xref ref-type="bibr" rid="ref34">34</xref>,
                                    <xref ref-type="bibr" rid="ref37">37</xref>,
                                    <xref ref-type="bibr" rid="ref50">50</xref>,
                                    <xref ref-type="bibr" rid="ref67">67</xref>,
                                    <xref ref-type="bibr" rid="ref134">134</xref>,
                                    <xref ref-type="bibr" rid="ref139">139</xref>,
                                    <xref ref-type="bibr" rid="ref145">145</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">EBMNLP, EBM-PICO</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC, O + age, gender, and more entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">5,000 abstracts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Yes, 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/bepnye/EBM-NLP">https://github.com/bepnye/EBM-NLP</ext-link>
</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref98">97</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">I and dosage-related</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">694 abstract/full text</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Yes, 
                                    <ext-link ext-link-type="uri" xlink:href="https://ii.nlm.nih.gov/DataSets/index.shtml">https://ii.nlm.nih.gov/DataSets/index.shtml</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Domain drug-based interventions</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref49">48</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, O, Design, Exposure</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">60 + 30 abstracts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Yes, 
                                    <ext-link ext-link-type="uri" xlink:href="http://gnteam.cs.manchester.ac.uk/old/epidemiology/data.html">http://gnteam.cs.manchester.ac.uk/old/epidemiology/data.html</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Domain obesity</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref76">75</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Sentence level 90,000 distant supervision annotations, 1000 manual.</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Target condition, index test and reference standard</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">90,000 + 1000 sentences</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Yes (labels, not text), 
                                    <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/1303259">https://zenodo.org/record/1303259</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Domain diagnostic tests</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref53">52</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref65">64</xref> (includes classifiers from), 
                                    <xref ref-type="bibr" rid="ref40">40</xref>,
                                    <xref ref-type="bibr" rid="ref53">53</xref>,
                                    <xref ref-type="bibr" rid="ref54">54</xref>,
                                    <xref ref-type="bibr" rid="ref102">102</xref>,
                                    <xref ref-type="bibr" rid="ref107">107</xref>&#x2013;
                                    <xref ref-type="bibr" rid="ref110">110</xref>,
                                    <xref ref-type="bibr" rid="ref147">147</xref>,
                                    <xref ref-type="bibr" rid="ref153">153</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">NICTA-PIBOSO</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Structured and unstructured abstracts, multi-label on sentences.</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC, O, Design</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1000 abstracts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Yes, 
                                    <ext-link ext-link-type="uri" xlink:href="https://drive.google.com/file/d/1M9QCgrRjERZnD9LM2FeK-3jjvXJbjRTl/view?usp=sharing">https://drive.google.com/file/d/1M9QCgrRjERZnD9LM2FeK-3jjvXJbjRTl/view?usp=sharing</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Multi-label sentences</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref48">47</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Sentences</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Drug intervention and comparative statements for each arm</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">300 (500 in available data) sentences</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Yes, 
                                    <ext-link ext-link-type="uri" xlink:href="https://dataverse.harvard.edu/file.xhtml?fileId=4171005&amp;version=1.0">https://dataverse.harvard.edu/file.xhtml?fileId=4171005&amp;version=1.0</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Domain drug-based interventions</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref99">98</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Sentences</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC, O</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">5099 sentences from references included in SRs, labelled using active-learning</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Yes, 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/wds-seu/Aceso/tree/master/datasets">https://github.com/wds-seu/Aceso/tree/master/datasets</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Domain heart disease</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref63">62</xref> based on 
                                    <xref ref-type="bibr" rid="ref111">111</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref33">32</xref>,
                                    <xref ref-type="bibr" rid="ref61">61</xref>,
                                    <xref ref-type="bibr" rid="ref99">99</xref>,
                                    <xref ref-type="bibr" rid="ref171">171</xref>. Extending/adapting dataset: 
                                    <xref ref-type="bibr" rid="ref177">177</xref>,
                                    <xref ref-type="bibr" rid="ref149">149</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Evidence-inference 2.0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Sentences</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, I, O</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Fulltext: 12,616 prompts stemming from 3,346 articles; Abstract-only: 6375 prompts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Yes, 
                                    <ext-link ext-link-type="uri" xlink:href="http://evidence-inference.ebm-nlp.com/download/">http://evidence-inference.ebm-nlp.com/download/</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Triplets for relation extraction</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref177">177</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities and document-level classifications</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">IC (per arm), O, N (per arm), Other</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">120 abstracts+results sections from existing corpus</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">Yes, 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/hyesunyun/llm-meta-analysis/tree/main/evaluation/data">https://github.com/hyesunyun/llm-meta-analysis/tree/main/evaluation/data</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Extending Evidence Inference 2.0</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref149">149</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">LLM summaries for each entity</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC (per arm), O, Other</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">345 RCT summries created by 3 LLMs from 115 abstracts in Evidence Inference 2.0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Yes, 
                                    <ext-link ext-link-type="uri" xlink:href="https://utexas.app.box.com/s/mpe5idxrqrzs1wcakphng7xfi7h4g83j">https://utexas.app.box.com/s/mpe5idxrqrzs1wcakphng7xfi7h4g83j</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Extending Evidence Inference 2.0</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref62">61</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">MS^2</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Sentences, Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC, O</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">470 studies from 20k reviews, entity labels initially assigned via model trained on EBM-NLP</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Yes, 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/allenai/ms2">https://github.com/allenai/ms2</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Relation extraction with direction of effect labels</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref36">35</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC, diagnostic test</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">500 abstracts and 700 trial records</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Yes, 
                                    <ext-link ext-link-type="uri" xlink:href="http://www.lllf.uam.es/ESP/nlpmedterm_en.html">http://www.lllf.uam.es/ESP/nlpmedterm_en.html</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Spanish dataset, UMLS normalisations</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref37">36</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">AbstRCT Argument Mining Dataset</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, O</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">660 RCT abstracts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Yes, 
                                    <ext-link ext-link-type="uri" xlink:href="https://gitlab.com/tomaye/abstrct">https://gitlab.com/tomaye/abstrct</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Relation extraction, domains neoplasm, glaucoma, hepatitis, diabetes, hypertension</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref113">112</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref51">50</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC, O, Design</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99 RCT abstracts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Yes, 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/jetsunwhitton/RCT-ART">https://github.com/jetsunwhitton/RCT-ART</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Excluded for containing only glaucoma studies</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref35">34</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref68">67</xref>,
                                    <xref ref-type="bibr" rid="ref138">138</xref>,
                                    <xref ref-type="bibr" rid="ref139">139</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">EBM-Comet</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">O</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">300 abstracts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Yes, 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/LivNLP/ODP-tagger">https://github.com/LivNLP/ODP-tagger</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Own data + adaptation of EBM-NLP with normalization to 38 domains and 5 outcome-areas</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref34">33</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">I</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1807 abstracts, labelled automatically by matching intervention strings from clinical trial registration</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Yes, 
                                    <ext-link ext-link-type="uri" xlink:href="https://data.mendeley.com/datasets/ccfnn3jb2x/1">https://data.mendeley.com/datasets/ccfnn3jb2x/1</ext-link>
</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref61">60</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref137">137</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Sentences</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC, O</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">42000 sentences</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Yes, 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/smileslab/Brain_Aneurysm_Research/tree/master/BioMed_Summarizer">https://github.com/smileslab/Brain_Aneurysm_Research/tree/master/BioMed_Summarizer</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Own data on brain aneurysm + existing dataset from Jin and Szolovits 
                                    <xref ref-type="bibr" rid="ref96">96</xref>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref75">74</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Sentences, Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC, O</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">130 abstracts from MEDLINE's PubMed Online PICO interface</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Yes, 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/nstylia/pico_entities/">https://github.com/nstylia/pico_entities/</ext-link>
</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref100">99</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref150">150</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">I,C,O</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">10 RCT abstracts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Yes, 
                                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8135980/bin/ocab077_supplementary_data.pdf">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8135980/bin/ocab077_supplementary_data.pdf</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Relation extraction, domain COVID-19</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref39">38</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref143">143</xref>,
                                    <xref ref-type="bibr" rid="ref147">147</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">CONSORT-TM</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Sentences</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC, O, N + CONSORT items</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">50 Full text RCTs</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Yes, 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/kilicogluh/CONSORT-TM">https://github.com/kilicogluh/CONSORT-TM</ext-link>
</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref83">82</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities, Sentences</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">I, C, O + animal entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">400 RCT abstracts in first corpus, 10k abstract in additional corpus from mined data</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Yes, 
                                    <ext-link ext-link-type="uri" xlink:href="https://osf.io/2dqcg/">https://osf.io/2dqcg/</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Domain animal RCTs</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref52">51</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref175">175</xref>,
                                    <xref ref-type="bibr" rid="ref176">176</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, I, C, O, other</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">211 RCT abstracts and 20 full texts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Yes, 
                                    <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/6365890">https://zenodo.org/record/6365890</ext-link>
</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref71">70</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">N</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">200 RCT fulltexts from PMC, annotated N from baseline tables</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Yes, 
                                    <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/6647853#.ZCa9dXbMJPY">https://zenodo.org/record/6647853#.ZCa9dXbMJPY</ext-link>
</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref64">63</xref> based on 
                                    <xref ref-type="bibr" rid="ref111">111</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref171">171</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">I, C, O</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">First corpus 160 abstracts, second corpus 20</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Yes, 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/bepnye/evidence_extraction/blob/master/data/exhaustive_ico_fixed.csv">https://github.com/bepnye/evidence_extraction/blob/master/data/exhaustive_ico_fixed.csv</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Second corpus is domain cancer</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref174">174</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">N (per arm), N (total), Other N</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">abstracts: 847 RCTs train+ test 150 RCTs</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Yes, 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/windisch-paul/sample_size_extraction/tree/main">https://github.com/windisch-paul/sample_size_extraction/tree/main</ext-link>
</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref150">150</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">O, IC (per arm), P</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">80 COVID-19 RCT abstracts + 229 general RCT abstracts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Yes, 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/WengLab-InformaticsResearch/EvidenceMap_Model">https://github.com/WengLab-InformaticsResearch/EvidenceMap_Model</ext-link>
</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref145">145</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref179">179</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities, Sentences</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, O, IC (per arm), Sections (Aim; Method etc.)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities: 150 Covid RCT abstracts+ 150 Alzheimers disease (AD) RCT abstracts. Sentences: 200 covid and AD each</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Yes, 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/BIDS-Xu-Lab/section_specific_annotation_of_PICO/tree/main">https://github.com/BIDS-Xu-Lab/section_specific_annotation_of_PICO/tree/main</ext-link>
</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref143">143</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities, Sentences</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Withdrawals or exclusions, Randomisation, Setting, Blinding, N (per arm), N (total), Design, Other</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">45 PMC full text sections, ti-abs-methods-results</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Yes, 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/kellyhoang0610/RCTMethodologyIE">https://github.com/kellyhoang0610/RCTMethodologyIE</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Possible overlap with CONSORT-TM, earlier version</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref165">165</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC, O, N (total), Country, Design</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">30 abstracts from RCT, animal studies, social science studies</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Yes, appendix of paper and https://githubcom/L-ENA/ES-hackathon-GPT-evaluation</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref152">152</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">IC (dose; duration and others), P (Condition or disease), O, Design, N (total), N (per arm)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">ReMedy database (cancer) and own curated leukemia dataset</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Partly, leukemia data no, remedy data: 
                                    <ext-link ext-link-type="uri" xlink:href="https://remedycancer.app.emory.edu/multi-search?">https://remedycancer.app.emory.edu/multi-search?</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Domain cancer</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref156">156</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities, Sentences</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC, O, N (total), Age, Randomisation, Blinding, Design</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">10,266 Chinese RCT paragraphs</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Yes, 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/yizhen-buaa/Annotated-dataset-of-TCM-clinical-literature">https://github.com/yizhen-buaa/Annotated-dataset-of-TCM-clinical-literature</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Traditional Chinese Medicine</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref166">166</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC (per arm), O, O (primary or secondary outcome), N (total), Exposure, Design</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">100 various study types abstracts + 1488 abstracts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Domain nutrition, cardiovascular</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref172">172</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">IC (per arm), P, O, Design</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">870 involved clinical studies from 25 meta-analyses, full texts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Domain cancer</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref135">135</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">IC</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">940k distantly supervised, 200 manual gold standard</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Domain physio/rehabilitation</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref163">163</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">N (per arm), N (total), Randomisation, Other, IC (per arm)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">4 NMA reviews with 29 RCTS fulltexts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Prognostic studies</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref161">161</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">IC, IC (dose; duration and others), Age, Design</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Fulltexts: cancer 16+70; Fabry 26+150 studies from reviews and PubMed. RCT, prognostic, observational</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Domain cancer, Fabry disease</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref157">157</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">N (per arm), N (total)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">300 Covid19 RCT abstracts + 100 generic RCT abstracts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref154">154</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC, IC (Drug name), IC (dose; duration and others), Country, O, N (per arm), N (total), Design</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">245 multiple myeloma abstracts + 115 abstracts across four other cancers</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Domain cancer</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref164">164</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC, O</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">682,667 abstracts from PubMed, 350 labelled</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref137">137</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Sentences</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, O, IC</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Covid dataset, size unclear</td>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Domain Covid</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref162">162</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC, O, Diagnostic tests, N (total), Design, Eligibility criteria, Funding org</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">400 rct abstracts+ 123 abstracts+ included studies from 8 Cochrane reviews</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref182">182</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref131">131</xref>,
                                    <xref ref-type="bibr" rid="ref180">180</xref>,
                                    <xref ref-type="bibr" rid="ref181">181</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">CHIP 2023 Task 5</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Sentences</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC, O, design</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">4500 abstracts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Chinese</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref39">39</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Sentences, Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC, O</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">500 labelled abstracts for sentences and 100 for P, O entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref74">73</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">O</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1300 abstracts with 3100 outcome statements</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Domain cancer</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref64">63</xref>,
                                    <xref ref-type="bibr" rid="ref11">111</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">EvidenceInference 1.0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top"/>
                                <td align="left" colspan="1" rowspan="1" valign="top"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Yes, but use EvidenceInference 2.0 
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/jayded/evidence-inference">https://github.com/jayded/evidence-inference</ext-link>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Evidence inference, papers not included for not reporting ICO results</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref46">45</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC, O</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Cochrane-provided dataset with 10137 abstracts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref62">61</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref114">113</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Sentences and entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, N, sections</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">3657 structured abstracts with sentence tags, 204 abstracts with N (total) entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref58">57</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Structured, auto-labelled RCT abstracts with sentence tags and 378 documents with entity-level IR query-retrieval tags</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC, O</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">15,000 abstracts + 378 documents with IR tags</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref85">84</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref84">83</xref> (unclear)</td>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Sentences and entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">IC, O, N (total + per arm)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">263 abstracts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref77">76</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref54">53</xref>,
                                    <xref ref-type="bibr" rid="ref58">58</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">100 abstracts with P, Condition, IC, possibly on entity level. For O, 633 abstracts are annotated on sentence level.</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, Condition, IC, 0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">633 abstracts for O, 100 for other classes</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref78">77</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Age, Design, Setting (Country), IC, N, study dates and affiliated institutions</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">185 full texts (at least 93 labelled)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref80">79</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Sentences and entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC, Age, Gender, Design, Condition, Race</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">2000 sentences from abstracts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref94">93</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">200 abstracts, 140 contain sentence and entity labels</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">200 abstracts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref115">114</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Auto-labelled structured abstracts, sentence level.</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC, O</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">14200+ abstracts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref95">94</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, age, gender, race</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">50 abstracts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref116">115</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Sentences (and entities?)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC, O</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">3000 abstracts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref43">42</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">N (total)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">648 abstracts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref91">90</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">IC</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">330 abstracts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref67">66</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Indonesian text with sentence annotations</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P,I,C,O</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">200 abstracts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref69">68</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Sentences from 69 (heart) +24 (random) RCTs included in Cochrane reviews</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Inclusion criteria</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">69 + 24 full texts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Domain cardiology</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref81">80</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Sentences and entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC, Age, Gender, P (Condition or disease)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">200 abstracts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref72">71</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">4,824 sentences from 18 UpToDate documents and 714 sentences from MEDLINE citations for P. For I: CLEF 2013 shared task, and 852 MEDLINE citations</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC, P (Condition or disease)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">abstracts, full texts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">General topic and cardiology domain</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref42">41</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref103">102</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entity annotation as noun phrases</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">O, IC</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">100 + 132 sentences from full texts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Diabetes and endocrinology journals as source</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref93">92</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref104">103</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Auto-labelled structured RCT abstract sentences. 
                                    <xref ref-type="bibr" rid="ref92">92</xref> has 19,854 sentences, assumed same corpus as authors and technique are the same.</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC, O</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">23,472 abstracts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref47">46</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">RCTs abstracts and full texts: 132 + 50 articles</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">IC (per arm), IC (drug entities.), O (time point), O (primary or secondary outcome), N (total), Eligibility criteria, Enrolment dates, Funding org, Grant number, Early stopping, Trial registration, Metadata</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">132 + 50 abstracts and full texts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref87">86</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Sentences and entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC, O, N (per arm + total)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">48 full texts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref50">49</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Studies from 5 systematic reviews on environmental health exposure, entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, O, Country, Exposure</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Studies from 5 systematic reviews</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Observational studies on environmental health exposure in humans</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref45">44</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Labelled via supervised distant supervision. Full texts (~12500 per class), 50 + 133 manually annotated for evaluation.</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC, O</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">12700+ full texts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref90">89</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Sentence labels, structured &amp; unstructured abstracts. Manually annotated: 344 IC, 341 O, and 144 P and more derived by automatic labelling.</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC, O</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">344+ abstracts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref89">88</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC, O, O as "Instruments" or "Study Variables"</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">20 full texts/abstracts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref86">85</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities (Brat, IOB format)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC, O</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">170 abstracts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref60">59</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Entities assigned to UMLS concepts (probably Cochrane corpus, size unclear). '88 instances, annotated in total with 76, 87, and 139 [P, IC, O respectively]'</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC, O</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Unclear, at least 88 documents</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref44">43</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Sentences and entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC (per arm), N (total)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1750 title or abstracts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref117">116</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Excluded paper, no data extraction system. Corpus of Patient, Population, Problem, Exposure, Intervention, Comparison, Outcome, Duration and Results sentences in abstracts.</td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Excluded from review, but describes relevant corpus</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref57">56</xref>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">Sentences and entities</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">P, IC (per arm), O, multiple more</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">88 full texts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">No</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>Addtionally, in the base-review, in 40 publications (75%) data were well described, and they utilised commonly used entities and common assessment metrics, such as precision, recall, and F1-scores, leading to a limited comparability of results. In these cases, the comparability is limited because those publications used different data sets, which can influence the difficulty of the data extraction task and lead to better results within for example structured datasets or topic-specific datasets.</p>
                <p>3.4.5.2 Are explanations for the influence of both visible and hidden variables in the dataset given?</p>
                <p>This item relates only to publications using machine learning or neural networks. Rule-based classification systems (N = 8, 15% reporting rule-base as sole approach) are not applicable to this item, because the rules leading to decisions are intentionally chosen by the creators of the system and are therefore always visible.</p>
                <p>Ten publications in the base-review (19%) discussed hidden variables.
                    <sup>
                        <xref ref-type="bibr" rid="ref84">83</xref>
                    </sup> discussed that the identification of the treatment group entity yielded the best results. However, when neither the words &#x2018;group&#x2019; nor &#x2018;arm&#x2019; were present in the text then the system had problems with identifying the entity. &#x2018;Trigger tokens&#x2019;
                    <sup>
                        <xref ref-type="bibr" rid="ref105">104</xref>
                    </sup> and the influence of common phrases were also described by Ref. 
                    <xref ref-type="bibr" rid="ref69">68</xref>, the latter showed that their system was able to yield some positive classifications in the absence of common phrases.
                    <sup>
                        <xref ref-type="bibr" rid="ref104">103</xref>
                    </sup> went a step further and provided a table with words that had the most impact on the prediction of each class.
                    <sup>
                        <xref ref-type="bibr" rid="ref58">57</xref>
                    </sup> describes removing sentence headings in structured abstracts in order to avoid creating a system biased towards common terms, while Ref. 
                    <xref ref-type="bibr" rid="ref91">90</xref> discussed abbreviations and grammar as factors influencing the results. Length of input text
                    <sup>
                        <xref ref-type="bibr" rid="ref60">59</xref>
                    </sup> and position of a sentence within a paragraph or abstract, e.g. up to 10% lower classification scores for certain sentence combinations in unstructured abstracts, were shown in several publications.
                    <sup>
                        <xref ref-type="bibr" rid="ref47">46</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref67">66</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref103">102</xref>
                    </sup>
                </p>
                <p>3.4.5.3 Is the process of avoiding overfitting or underfitting described?</p>
                <p>&#x2018;Overfitted&#x2019; is a term used to describe a system that shows particularly good evaluation results on a specific dataset because it has learned to classify noise and other intrinsic variations in the data as part of its model.
                    <sup>
                        <xref ref-type="bibr" rid="ref106">105</xref>
                    </sup>
                </p>
                <p>Of the included publications in the base-review, 33 out of 53 (62%) reported that they used methods to avoid overfitting. Eight (15%) of all publications reported rule-based classification as their only approach, allowing them to not be susceptible to overfitting by machine learning.</p>
                <p>Furthermore, 28 publications reported cross-validation to avoid overfitting. Mostly these classifiers were in the domain of machine-learning, e.g. SVMs. Most commonly, 10 folds were used (N = 15), but depending on the size of evaluation corpora, 3, 6, 5 or 15 folds were also described. Two publications
                    <sup>
                        <xref ref-type="bibr" rid="ref56">55</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref86">85</xref>
                    </sup> cautioned that cross-validation with a high amount of folds (e.g. 10) causes high variance in evaluation results when using small datasets such as NICTA-PIBOSO. One publication
                    <sup>
                        <xref ref-type="bibr" rid="ref105">104</xref>
                    </sup> stratified folds by class in order to avoid this variance in evaluation results in a fold which is caused by a sparsity of positive instances.</p>
                <p>Publications in the neural and deep-learning domain described approaches such as early stopping, dropout, L2-regularisation, or weight decay.
                    <sup>
                        <xref ref-type="bibr" rid="ref60">59</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref97">96</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref107">106</xref>
                    </sup> Some publications did not specifically discuss overfitting in the text, but their open-source code indicated that the latter techniques were used.
                    <sup>
                        <xref ref-type="bibr" rid="ref56">55</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref76">75</xref>
                    </sup>
                </p>
                <p>3.4.5.4 Is the process of splitting training from validation data described?</p>
                <p>Random allocation to treatment groups is an important item when assessing bias in RCTs, because selective allocation can lead to baseline differences.
                    <sup>
                        <xref ref-type="bibr" rid="ref1">1</xref>
                    </sup> Similarly the process of splitting a dataset randomly, or in a stratified manner, into training (or rule-crafting) and test data is important when constructing classifiers and intelligent systems.
                    <sup>
                        <xref ref-type="bibr" rid="ref118">117</xref>
                    </sup>
                </p>
                <p>All included publications in the base-review gave indications of how different train and evaluation datasets were obtained. Most commonly there was one dataset and the splitting ratio which indicated that splits were random. This information was provided in 36 publications (68%).</p>
                <p>For publications mentioning cross-validation (N = 28, 53%) we assumed that splits were random. The ratio of splitting (e.g. 80:20 for training and test data) was clear in the cross-validation cases and was described in the remainder of publications.</p>
                <p>It was also common for publications to use completely different datasets, or multiple iterations of splitting, training and testing (N = 13, 24%). For example Ref. 
                    <xref ref-type="bibr" rid="ref57">56</xref> used cross-validation to train and evaluate their model, and then used an additional corpus after the cross-validation process. Similarly Ref. 
                    <xref ref-type="bibr" rid="ref60">59</xref>, used 60:40 train/test splits, but then created an additional corpus of 88 documents to further validate the model&#x2019;s performance on previously unseen data.</p>
                <p>Within publications from the 2024 update, specifically with LLM-related methods employing zero or few-shot classification, we observed a reduction of transparency with respect to reporting usage of separate datasets for prompt development and testing. Often it was not clearly described how many and which texts were used for prompt development, how they were selected, and if predictions on them were included in the evaluation results. As mentioned previously, the availability of code and datasets was equally lower within the cohort of papers that employed prompt-based extraction. For papers reporting training or some form of weight adjustment on LLMs we observed reporting that adhered to good standard of practice.</p>
                <p>3.4.5.5 Is the model&#x2019;s adaptability to different formats and/or environments beyond training and testing data described?</p>
                <p>For this item we aimed to find out how many of the included publications in the base-review tested their data extraction algorithms on different datasets. A limitation often noted in the literature was that gold-standard annotators have varying styles and preferences, and that datasets were small and limited to a specific literature search. Evaluating a model on multiple independent datasets provides the possibility of quantifying how well data can be extracted across domains and how flexible a model is in real-life application with completely new data sets. Of the included publications in the base review, 19 (36%) discussed how their model performed on datasets with characteristics that were different to those used for training and testing, and in the latest review update we found that uptake of publicly available datasets increased further. In some instances, however, this evaluation was qualitative where the models were applied to large unlabelled, real-life datasets.
                    <xref ref-type="bibr" rid="ref46">
                        <sup>46</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref58">
                        <sup>58</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref69">
                        <sup>69</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref48">
                        <sup>48</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref95">
                        <sup>95</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref101">
                        <sup>101</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref102">
                        <sup>102</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref164">
                        <sup>164</sup>
                    </xref>
                </p>
                <p>

                    <bold>

                        <italic toggle="yes">3.4.6 Other</italic>
</bold>
                </p>
                <p>3.4.6.1 Caveats</p>
                <p>Caveats were extracted as free text. Included publications (N = 64, 86%) reported a variety of caveats. After extraction we structured them into six different domains:
                    <list list-type="bullet">
                        <list-item>
                            <label>1.</label>
                            <p>
Label-quality and inter-annotator disagreements</p>
                        </list-item>
                        <list-item>
                            <label>2.</label>
                            <p>Variations in text</p>
                        </list-item>
                        <list-item>
                            <label>3.</label>
                            <p>Domain adaptation and comparability</p>
                        </list-item>
                        <list-item>
                            <label>4.</label>
                            <p>Computational or system architecture implications</p>
                        </list-item>
                        <list-item>
                            <label>5.</label>
                            <p>Missing information in text or knowledge base</p>
                        </list-item>
                        <list-item>
                            <label>6.</label>
                            <p>Practical implications</p>
                        </list-item>
                    </list>
                </p>
                <p>These are further discussed in the &#x2018;Discussion&#x2019; section of this living review.</p>
                <p>3.4.6.2 Sources of funding and conflict of interest</p>
                <p>
                    <xref ref-type="fig" rid="f11">
Figure 11</xref> shows that most of the included publications in the base review did not declare any conflict of interest. This is true for most publications published before 2010, and about 50% of the literature published in more recent years. However, sources of funding were declared more commonly, with 69% of all publications including statements for this item. This reflects a trend of more complete reporting in more recent years.</p>
                <fig fig-type="figure" id="f11" orientation="portrait" position="float">
                    <label>
Figure 11. </label>
                    <caption>
                        <title>Declaration of funding sources and conflict of interest in the included studies.</title>
                    </caption>
                    <graphic id="gr11" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/179210/f066a662-12e8-4aff-8733-c7f9bda9d794_figure11.gif"/>
                </fig>
            </sec>
        </sec>
        <sec id="sec4">
            <label>4.</label>
            <title>Discussion</title>
            <sec id="sec4.1">
                <label>4.1</label>
                <title>Summary of key findings</title>
                <p>

                    <bold>

                        <italic toggle="yes">4.1.1 System architectures</italic>
</bold>
                </p>
                <p>Systems described within the included publications are evolving over time. Non-machine-learning data extraction via rule-base and API is one of the earliest and most frequently used approaches. Various classical machine-learning classifiers such as na&#x00ef;ve Bayes and SVMs are very common in the literature published between 2005-2018. Up until 2020 there was a trend towards word embeddings and neural networks such as LSTMs. Between 2020 and 2022 we observed a trend towards transformers, especially the BERT, RoBERTa and ELECTRA architectures pre-trained on biomedical or scientific text. From 2023 onwards, the number of included publications rose sharply due to the adoption of LLMs. Zero-shot prompt-based methods created opportunities for anyone (with or without programming skills) to automate data extraction without the need to curate training data. These LLM extractions tend to be generative summaries of the data of interest, rather than exhaustive verbatim extraction of each entity of interest. 17 papers investigated LLMs, including 6 for fine-tuning, 13 for zero-shot and 2 for k-shot (reference cited in results section).</p>
                <p>

                    <bold>

                        <italic toggle="yes">4.1.2 Evaluation</italic>
</bold>
                </p>
                <p>We found that precision, recall, and F1 were used as evaluation metrics in most publications, although sometimes these metrics were adapted or relaxed in order to account for partial or similar matches. Due to the generative nature of LLMs, the evaluation of zero or k-shot prompting reported in the review update diverges from previous good practice. In LLM publications, evaluators gravitated towards assessing document-level accuracy of predictions or scores such as Rouge that were initially developed for translation and generative tasks that do not align with automated data extraction. Additionally, the generative nature of LLM output had a negative effect on evaluation dataset size, because LLM evaluating requires humans for assessment (an issue that is not present with transformer, machine-learning, or rule-based systems as these can be evaluated automatically using benchmark datasets).</p>
                <p>

                    <bold>

                        <italic toggle="yes">4.1.3 Scope</italic>
</bold>
                </p>
                <p>Most of the included publications focused on extracting data from titles and abstracts. The reasons for this include the availability of data and ease of access, as well as the high coverage of information and the availability of structured abstracts that can automatically derive labelled training data. A much smaller number of the included publications extracted data from full texts. Half of the 30 systems that extract data from full text were published within the last four years. In systematic review practice, manually extracting data from abstracts is quicker and easier than manually extracting data from full texts. Therefore, the potential time-saving and utility of full text data extraction is much higher because more time can be saved by automation and it provides automation that more closely reflects the work done by systematic reviewers in practice. The data extraction literature on full text is increasing but still a minority, possibly due to a lack of public benchmarking corpora as authors are concerned about copyright. Extraction from abstracts may be of limited value to reviewers in practice because it carries the risk of missing information.</p>
                <p>

                    <bold>

                        <italic toggle="yes">4.1.4 Target texts</italic>
</bold>
                </p>
                <p>Reports of randomised controlled trials were the most common texts used for data extraction. Evidence concerning data extraction from other study types was less common and is discussed further in the following sections.</p>
            </sec>
            <sec id="sec4.2">
                <label>4.2</label>
                <title>Assessment of the quality of reporting</title>
                <p>We only assessed full quality of reporting in the base-review, and assessed selected items during the review updates. The quality of reporting in the included studies in the base-review was found to be improving over time. We assessed the included publications in the base-review based on a list of 17 items in the domains of reproducibility, transparency, description of testing, data availability, and internal and external validity.</p>
                <p>

                    <bold>Base-review:</bold> Reproducibility was high throughout, with information about sources of training and evaluation data reported in 94% of all publications and pre-processing described in 89%. In terms of transparency, 81% of the publications provided a clear description of their algorithm, 94% described the characteristics of their datasets, but only 9% mentioned hardware specifications or feasibility of using their algorithm on large real-world datasets such as PubMed. Testing of the systems was generally described, 89% gave a detailed assessment of their algorithms. Trade-offs between precision and recall were discussed in 32%. A total of 88% of the publications described using at least one accessible third-party framework for their data extraction system. Internal and external validity of each model was assessed based on its comparability to other tools (75%), assessment of visible and hidden variables in the data (19%), avoiding overfitting (62%, not applicable to non-machine learning systems) descriptions of splitting training from validation data (100%).</p>
                <p>

                    <bold>Review updates</bold>: In terms of data availability, source code was often shared in the publications added in the LSR updates. In the base-review (which included publications up to 2021), only 15% of all includes had made their code available. After the LSR updates, 42% (N=49) now have their code available and all links to code repositories are shown in 
                    <xref ref-type="table" rid="T2">Table 2</xref>.</p>
                <p>For testing, basic metrics were reported in only 15% (N=18) of the included publications, which is a downward trend from 24% in the base-review. However, more complete reporting of source-code and public datasets still leads to increased transparency and comparability.</p>
                <p>Availability of the final models as end-user tools remains poor. Eleven (9%) of the included publications had an application with user-interface associated with it, but only 7 tools are deployed and directly usable via web-apps (see 
                    <xref ref-type="table" rid="T3">Table 3</xref> for links). It is noteworthy, however, that four out of seven available tools are searchable databases with pre-extracted entities that aim to add value to reference searching. Their content is often limited to references in PubMed. These tools are designed for search and not the actual data extraction process within a literature review. There are two main drawbacks of these tools for data extraction practice. Firstly, the user is likely to obtain additional (non PubMed) references that require data extraction on demand, and these tools do not support on-demand inference. Secondly, depending on the data type that is being extracted, data then need to be added to exportable hierarchical forms or study characteristics tables, which is currently not supported by these tools. Is unclear how many of the other tools described in the literature are used in practice, even if only used internally within their author&#x2019;s research groups.</p>
                <p>There was a surprisingly strong trend towards sharing and re-using already published corpora in the LSR updates. In the base-review, labelled training and evaluation data were available from 13% of the publications. After the latest LSR update we identified 76 publications with unique corpora, 33 corpora were available online and at least 63 other included publication mention using them. 
                    <xref ref-type="table" rid="T4">Table 4</xref> provides the sources of all corpora and publications using them, including adaptations or extensions to datasets. The most commonly used dataset for entity recognition is EBM-NLP, also referred to as EBM PICO. It is reported to be used by 28 downstream publications: 19 usages as-is, 7 making or using an adapted or extended version of it, and two publications using the Microsoft BLURB-Benchmark
                    <xref ref-type="bibr" rid="ref141">
                        <sup>141</sup>
                    </xref> [
                    <xref ref-type="fn" rid="fn2">2</xref>], which includes EBM-NLP. For sentence classification, NICTA-PIBOSO and PubMedPICO lead with 11 and 8 publications respectively re-using their dataset. For relation extraction, EvidenceInference 2.0 is used by at least four other publications, while two additional publications extended the dataset.</p>
                <p>We collected information on whether authors evaluated the adaptability of their algorithms by testing them on additional datasets with different characteristics, e.g. with references from a different disease domain, study type, or on a large unlabeled corpus. It is impossible (although very desirable) to quantify how well data extraction would work on real-world projects and if it performs better or worse on domain-specific data. Testing on multiple corpora with different characteristics can help with estimating how much the performance could vary when adopted into practice. An example for this is Witte 2024
                    <xref ref-type="bibr" rid="ref176">
                        <sup>176</sup>
                    </xref> who report that their F1-score is higher on glaucoma studies compared with diabetes 2 studies (0.63 vs. 0.54) but this has also been shown by others.
                    <xref ref-type="bibr" rid="ref179">
                        <sup>179</sup>
                    </xref> There is a positive trend with authors of included publication increasingly using multiple corpora for evaluation, which is also aided by the availability of multiple benchmarking corpora in entity, sentence, and relation classification (see Table 4). In the base-review, adaptability was reported for 19 publications (36%), while now it is reported by 55 (47%).</p>
                <p>Caveats and limitations noted in the included publications are discussed in the following section.</p>
            </sec>
            <sec id="sec4.3">
                <label>4.3</label>
                <title>Caveats and challenges for systematic review (semi)automation</title>
                <p>In the following section we discuss caveats and challenges highlighted by the authors of the included publications. We found a variety of topics discussed in these publications and summarised them under seven different domains. Due to the increasing trend of relation-extraction we now summarise any challenges or caveats related to these within the updated text at the end of each applicable domain and added a new section specifically focusing on LLMs.</p>
                <p>

                    <bold>

                        <italic toggle="yes">4.3.1 Label-quality and inter-annotator disagreements</italic>
</bold>
                </p>
                <p>The quality of labels in annotated datasets was identified as a problem by several authors. The length of the entity being annotated, for example O or P entities, often caused disagreements between annotators.
                    <sup>
                        <xref ref-type="bibr" rid="ref47">46</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref49">48</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref59">58</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref70">69</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref96">95</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref102">101</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref103">102</xref>
                    </sup> We created an example in 
                    <xref ref-type="fig" rid="f12">
Figure 12</xref>, which shows two potentially correct, but nevertheless different annotations on the same sentence.</p>
                <fig fig-type="figure" id="f12" orientation="portrait" position="float">
                    <label>
Figure 12. </label>
                    <caption>
                        <title>Example of inter-annotator disagreement.</title>
                        <p>P, population; I, intervention; C, comparison; O, outcome.</p>
                    </caption>
                    <graphic id="gr12" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/179210/f066a662-12e8-4aff-8733-c7f9bda9d794_figure12.gif"/>
                </fig>
                <p>Similar disagreements,
                    <xref ref-type="bibr" rid="ref65">
                        <sup>65</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref85">
                        <sup>85</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref104">
                        <sup>104</sup>
                    </xref> along with missed annotations,
                    <xref ref-type="bibr" rid="ref72">
                        <sup>72</sup>
                    </xref> are time-intensive to reconciliate
                    <xref ref-type="bibr" rid="ref97">
                        <sup>97</sup>
                    </xref> and make the scores less reliable.
                    <xref ref-type="bibr" rid="ref95">
                        <sup>95</sup>
                    </xref> As examples of this, two publications observed that their system performed worse on classes with high disagreement
                    <xref ref-type="bibr" rid="ref75">
                        <sup>75</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref104">
                        <sup>104</sup>
                    </xref> and one discussed boundary errors caused by different annotation styles between corpora.
                    <xref ref-type="bibr" rid="ref135">
                        <sup>135</sup>
                    </xref> There exist different explanations for worse performance in these cases. It is possibly harder for models to learn from labelled data with systematic differences within. Another reason is that the model learns predictions based on one annotation style and therefore artificial errors are produced when evaluated against differently labelled data, or that the annotation task itself is naturally harder in cases with high inter-annotator disagreement, and therefore lower performance from the models might be explainable. An overview of the included publications discussing this, together with their inter-annotator disagreement scores, is given in 
                    <xref ref-type="table" rid="T5">
Table 5</xref>.</p>
                <table-wrap id="T5" orientation="portrait" position="float">
                    <label>
Table 5. </label>
                    <caption>
                        <title>Examples for reports of inter-annotator disagreements in the included publications.</title>
                        <p>Please see each included publication for further details on corpus quality.</p>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Publication</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Type</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Score, or range between worst to best class</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref44">43</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Average accuracy between annotators</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Range: 0.62 to 0.70</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref49">48</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Agreement rate</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">80%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref66">65</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Cohen&#x2019;s Kappa</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.84 overall, down to 0.59 for worst class</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref105">104</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Cohen&#x2019;s Kappa</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Range: 0.41 to 0.71</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref76">75</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Inter-annotation recall</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Range: 0.38 to 0.86</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref56">55</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Cohen&#x2019;s Kappa between experts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Range: 0.5 to 0.59</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref56">55</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Macro-averaged worker vs. aggregation precision, recall, F1 (see publication for full scores)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Range: 0.39 to 0.70</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref117">116</xref> (describes only PECODR corpus creation, excluded from review)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Initial agreement between annotators</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Range: 85-87%</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref53">52</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Average and range of agreement</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">62%, Range: 41-71</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref59">58</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Avg. sentences labelled by expert vs. student per abstract</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1.9 vs. 4.2</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref59">58</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Cohen&#x2019;s Kappa expert vs. student</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.42</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref62">61</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Agreement; Cohen&#x2019;s Kappa</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">86%; 0.76</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref39">38</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">MASI measure (Measuring Agreement on Set-Valued Items) for article/selection level; Krippendorff&#x2019;s alpha for class-level</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">MASI 0.6 range 0.5-0.89; Krippendorf 0.53 for I, 0.57 for O, ranging from 0.06-0.96 between all classes</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref36">35</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">F1 strict vs. relaxed, at beginning and end of annotation phase</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">85.6% vs. 93.9% at the end; relaxed score increasing from 86% at beginning of annotation phase to 93.9% at the end</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref37">36</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Fleiss&#x2019; Kappa on 47 abstracts for outcomes and on 30 for relation-extraction</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Outcomes 0.81; Relations 0.62-0.72</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref64">63</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">B3, MUC, Constrained Entity-Alignment F-Measure (CEAFe) scores</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">B3 0.40; MUC 0.46; and CEAFe 0.42</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref52">51</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Kappa for entities and F1 for complex entities with sub-classes or relations</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Kappa range 0.74-0.68; complex entities 0.81</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref38">37</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Cohen&#x2019;s Kappa of their EBM-NLP adaptation vs. original dataset</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Between 0.53 for P-0.69 for O</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">
                                    <xref ref-type="bibr" rid="ref171">171</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Fleiss Kappa for expert annotators, percentage of exact overlaps</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Fleiss Kappa 0.77, exact match 92.4% of the time</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">
                                    <xref ref-type="bibr" rid="ref150">150</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Mean inter-rater reliability F1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">For entities mean 0.86, range 0.72-0.92. For dependencies 0.69</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">
                                    <xref ref-type="bibr" rid="ref145">145</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Cohen&#x2019;s Kappa before and after annotation guideline and scope redefined for re-annotating EBM-NLP</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.3 before vs. 0.74 after</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">
                                    <xref ref-type="bibr" rid="ref179">179</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Inter-rater reliability</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Combined 0.74, range 0.7-0.8</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">
                                    <xref ref-type="bibr" rid="ref143">143</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Document-level Cohen&#x2019;s kappa range, span f1 range, span-level F1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Document level range 0.74-0.83, span-F1 0.92-0.95, span-level F1 0.9-0.94</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">
                                    <xref ref-type="bibr" rid="ref149">149</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Randolph&#x2019;s kappa PICO range on 15 texts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.56 (P entity) &#x2013; 0.8 (I entity), EvidenceInference corpus 0.47</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">
                                    <xref ref-type="bibr" rid="ref162">162</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Cohen&#x2019;s kappa, token-level F1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Kappa 0.81, F1 0.88</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="bottom">
                                    <xref ref-type="bibr" rid="ref156">156</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Cohen&#x2019;s kappa</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">0.8</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref134">134</xref>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Pairwise F1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">78%</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>To mitigate these problems, careful training and guides for expert annotators are needed.
                    <sup>
                        <xref ref-type="bibr" rid="ref59">58</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref78">77</xref>
                    </sup> For example, information should be provided on whether multiple basic entities or one longer entity annotation are preferred.
                    <sup>
                        <xref ref-type="bibr" rid="ref86">85</xref>
                    </sup> Crowd-sourced annotations can contain noisy or incorrect information and have low interrater reliability. However, they can be aggregated to improve quality.
                    <sup>
                        <xref ref-type="bibr" rid="ref56">55</xref>
                    </sup> In recent publications, partial entity matches (i.e., token-wise evaluation) downstream were generally favoured above complete detection, which helps to mitigate this problem&#x2019;s impact on final evaluation scores.
                    <sup>
                        <xref ref-type="bibr" rid="ref56">55</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref84">83</xref>
                    </sup>
                </p>
                <p>For automatically labelled or distantly supervised data, label quality is generally lower. This is primarily caused by incomplete annotation due to missing headings, or by ambiguity in sentence data, which is discussed as part of the next domain.
                    <sup>
                        <xref ref-type="bibr" rid="ref45">44</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref58">57</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref104">103</xref>
                    </sup>
                </p>
                <p>

                    <bold>

                        <italic toggle="yes">4.3.2 Ambiguity</italic>
</bold>
                </p>
                <p>The most common source of ambiguity in labels described in the included publications is associated with automatically labelled sentence-level data. Examples of this are sentences that could belong to multiple categories, e.g., those that should have both &#x2018;P&#x2019; and an &#x2018;I&#x2019; label, or sentences that were assigned to the class &#x2018;other&#x2019; while containing PICO information (Refs. 
                    <xref ref-type="bibr" rid="ref55">54</xref>, 
                    <xref ref-type="bibr" rid="ref96">95</xref>, 
                    <xref ref-type="bibr" rid="ref97">96</xref>, among others). Ambiguity was also discussed with respect to intervention terms
                    <sup>
                        <xref ref-type="bibr" rid="ref77">76</xref>
                    </sup> or when distinguishing between &#x2018;control&#x2019; and &#x2018;intervention&#x2019; arms.
                    <sup>
                        <xref ref-type="bibr" rid="ref47">46</xref>
                    </sup> When using, or mapping to UMLS concepts, ambiguity was discussed in Refs. 
                    <xref ref-type="bibr" rid="ref42">41</xref>, 
                    <xref ref-type="bibr" rid="ref53">52</xref>, 
                    <xref ref-type="bibr" rid="ref73">72</xref>.</p>
                <p>At the text level, ambiguity around the meaning of specific wordings was discussed as a challenge, e.g., the word 'concentration' can be a quantitative measure or a mental concept.
                    <sup>
                        <xref ref-type="bibr" rid="ref42">41</xref>
                    </sup> Numbers were also described as challenging due to ambiguity, because they can refer to the total number of participants, number per arm of a trial, or can just refer to an outcome-related number.
                    <sup>
                        <xref ref-type="bibr" rid="ref85">84</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref114">113</xref>
                    </sup> When classifying participants, the P entity or sentence is often overloaded because it includes too much information on different, smaller, entities within it, such as age, gender, or diagnosis.
                    <sup>
                        <xref ref-type="bibr" rid="ref90">89</xref>
                    </sup>
                </p>
                <p>Ambiguity in relation-extraction can include cases where interventions and comparators are classified separately in a trial with more than two arms, thus leading to an increased complexity in correctly grouping and extracting data for each separate comparison.</p>
                <p>

                    <bold>

                        <italic toggle="yes">4.3.3 Variations in text</italic>
</bold>
                </p>
                <p>Variations in natural language, wording, or grammar were identified as challenges in many references that looked closer at the texts within their corpora. Such variation may arise when describing entities or sentences (e.g., Refs. 
                    <xref ref-type="bibr" rid="ref49">48</xref>, 
                    <xref ref-type="bibr" rid="ref80">79</xref>, 
                    <xref ref-type="bibr" rid="ref98">97</xref>) or may reflect idiosyncrasies specific to one data source, e.g., the position of entities in a specific journal.
                    <sup>
                        <xref ref-type="bibr" rid="ref47">46</xref>
                    </sup> In particular, different styles or expressions were noted as caveats in rule-based systems.
                    <sup>
                        <xref ref-type="bibr" rid="ref43">42</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref49">48</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref81">80</xref>
                    </sup>
                </p>
                <p>There is considerable variation in how an entity is reported, for example between intervention types (drugs, therapies, routes of application)
                    <xref ref-type="bibr" rid="ref56">
                        <sup>56</sup>
                    </xref> or in outcome measures.
                    <xref ref-type="bibr" rid="ref46">
                        <sup>46</sup>
                    </xref> In particular, variations in style between structured and unstructured abstracts
                    <xref ref-type="bibr" rid="ref65">
                        <sup>65</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref78">
                        <sup>78</sup>
                    </xref> and the description lengths and detail
                    <xref ref-type="bibr" rid="ref59">
                        <sup>59</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref79">
                        <sup>79</sup>
                    </xref> can cause inconsistent results in the data extraction, for example by not detecting information correctly or extracting unexpected information. Complex sentence structure was mentioned as a caveat especially for rule-based systems.
                    <xref ref-type="bibr" rid="ref80">
                        <sup>80</sup>
                    </xref> An example of a complex structure is when more than one entity is described (e.g., Refs. 
                    <xref ref-type="bibr" rid="ref93">93</xref>, 
                    <xref ref-type="bibr" rid="ref102">102</xref>) or when entities such as &#x2018;I&#x2019; and &#x2018;O&#x2019; are mentioned close to each other.
                    <xref ref-type="bibr" rid="ref57">
                        <sup>57</sup>
                    </xref> Finally, different names for the same entity within an abstract are a potential source of problems,
                    <xref ref-type="bibr" rid="ref84">
                        <sup>84</sup>
                    </xref> which for example makes the extraction of outcomes challenging.
                    <xref ref-type="bibr" rid="ref164">
                        <sup>164</sup>
                    </xref> When using non-English texts, such as Spanish articles, it was noted that mandatory translation of titles can lead to spelling mistakes and translation errors
                    <xref ref-type="bibr" rid="ref35">
                        <sup>35</sup>
                    </xref> and that it is unknown how current algorithms perform react to non-English text.
                    <xref ref-type="bibr" rid="ref164">
                        <sup>164</sup>
                    </xref> For the 2024 update we identified four publications describing automation on Chinese texts, one working with a traditional Chinese medicine corpus
                    <xref ref-type="bibr" rid="ref156">
                        <sup>156</sup>
                    </xref> and three submissions to the CHIP 2023 Shared task 5: Medical Literature PICOS Identification
                    <xref ref-type="bibr" rid="ref182">
                        <sup>182</sup>
                    </xref> but it is unclear how well these corpora represent Chinese literature and how comparable the results are to English PICO extraction.
                    <xref ref-type="bibr" rid="ref156">
                        <sup>156</sup>
                    </xref>
                </p>
                <p>Another common variation in text was implied information. For example, rather than stating dosage specifically, a trial text might report dosages of &#x2018;10 or 20 mg&#x2019;, where the &#x2018;mg&#x2019; unit is implied for the number 10, making it a &#x2018;dosage&#x2019; entity. 
                    <xref ref-type="bibr" rid="ref46">
                        <sup>46</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref48">
                        <sup>48</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref90">
                        <sup>90</sup>
                    </xref> Implied information also applies when extracting the number of participants at various stages of a trial, when numbers of participants per arm need to be added in order to infer the total N or when participants are lost to follow-up.
                    <xref ref-type="bibr" rid="ref157">
                        <sup>157</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref158">
                        <sup>158</sup>
                    </xref> This issue can cause the number of participants or the number of events to be inflated.
                    <xref ref-type="bibr" rid="ref174">
                        <sup>174</sup>
                    </xref> Hoang 2022 discuss that missing information led annotators to imply information, which resulted in less consistent annotations for their gold standard, which may then in turn negatively affect the models trained on such data.
                    <xref ref-type="bibr" rid="ref143">
                        <sup>143</sup>
                    </xref>
                </p>
                <p>Implied information was also mentioned as problem in the field of relation-extraction, with Nye et al. (2021)
                    <sup>
                        <xref ref-type="bibr" rid="ref64">63</xref>
                    </sup> discussing importance of correctly matching and resolving intervention arm names that only imply which intervention was used. Examples are using &#x2018;Group 1&#x2019; instead of referring to the actual intervention name, or implying effects across a group of outcomes, such as all adverse events.
                    <sup>
                        <xref ref-type="bibr" rid="ref64">63</xref>
                    </sup>
                </p>
                <p>

                    <bold>

                        <italic toggle="yes">4.3.4 Domain adaptation and comparability</italic>
</bold>
                </p>
                <p>Because of the wide variation across medical domains, there is no guarantee that a data extraction system developed on one dataset automatically adapts to produce reliable results across different datasets relating to other domains. The hyperparameter configuration or rule-base used to conceive a system may not retrieve comparable results in a different medical domain.
                    <xref ref-type="bibr" rid="ref40">
                        <sup>40</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref68">
                        <sup>68</sup>
                    </xref> Therefore, scores might not be similar between different datasets, especially for rule-based classifiers,
                    <xref ref-type="bibr" rid="ref80">
                        <sup>80</sup>
                    </xref> when datasets are small,
                    <xref ref-type="bibr" rid="ref35">
                        <sup>35</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref49">
                        <sup>49</sup>
                    </xref> when structure and distribution of class of interest varies,
                    <xref ref-type="bibr" rid="ref40">
                        <sup>40</sup>
                    </xref> or when the annotation guidelines vary.
                    <xref ref-type="bibr" rid="ref85">
                        <sup>85</sup>
                    </xref> A model for outcome detection, for example, might learn to be biased towards outcomes frequently appearing in a certain domain, such as chemotherapy-related outcomes for cancer literature or it might favour to detect outcomes more frequent in older trial texts if the underlying training data are older or outdated.
                    <xref ref-type="bibr" rid="ref73">
                        <sup>73</sup>
                    </xref> A model trained on common RCT texts might fail to detect entities in crossover or factorial trials.
                    <xref ref-type="bibr" rid="ref150">
                        <sup>150</sup>
                    </xref> Another caveat mentioned by Refs. 
                    <xref ref-type="bibr" rid="ref59">59</xref>, 
                    <xref ref-type="bibr" rid="ref85">85</xref> is that the size of the label space must be considered when comparing scores, as models that normalise to specific concepts rather than detecting entities tend to have lower precision, recall, and F1 scores.</p>
                <p>Comparability between models might be further decreased by comparing results between publications that use relaxed vs. strict evaluation approaches for token-based evaluation,
                    <sup>
                        <xref ref-type="bibr" rid="ref35">34</xref>
                    </sup> or publications that use the same dataset but with different random seeds to split training and testing data.
                    <sup>
                        <xref ref-type="bibr" rid="ref34">33</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref119">118</xref>
                    </sup>
                </p>
                <p>Therefore, several publications discuss that a larger amount of benchmarking datasets with standardised splits for train, development, and evaluation datasets and standardised evaluation scripts could increase the comparability between published systems.
                    <sup>
                        <xref ref-type="bibr" rid="ref47">46</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref93">92</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref115">114</xref>
                    </sup>
                </p>
                <p>

                    <bold>

                        <italic toggle="yes">4.3.5 Computational or system architecture implications</italic>
</bold>
                </p>
                <p>Computational cost and scalability were described in two publications.
                    <xref ref-type="bibr" rid="ref53">
                        <sup>53</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref114">
                        <sup>114</sup>
                    </xref> Problems within the system, e.g., encoding
                    <xref ref-type="bibr" rid="ref97">
                        <sup>97</sup>
                    </xref> or PDF extraction errors
                    <xref ref-type="bibr" rid="ref75">
                        <sup>75</sup>
                    </xref> lead to problems downstream and ultimately result in bias, favouring articles from big publishers with better formatted data.
                    <xref ref-type="bibr" rid="ref75">
                        <sup>75</sup>
                    </xref> Similarly, grammar and parsing part-of-speech and/or chunking errors (Refs. 
                    <xref ref-type="bibr" rid="ref76">76</xref>, 
                    <xref ref-type="bibr" rid="ref80">80</xref>, 
                    <xref ref-type="bibr" rid="ref90">90</xref>, among others) or faulty parse-trees
                    <xref ref-type="bibr" rid="ref78">
                        <sup>78</sup>
                    </xref> can reduce a system&#x2019;s performance if it relies on access to correct grammatical structure. In terms of system evaluation, 10-fold cross-validation causes high variance in results when using small datasets such as NICTA-PIBOSO,
                    <xref ref-type="bibr" rid="ref54">
                        <sup>54</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref85">
                        <sup>85</sup>
                    </xref>
                    <sup>,</sup>
                    <xref ref-type="bibr" rid="ref104">
                        <sup>104</sup>
                    </xref> described that the same problem needs to be addressed through stratification of the positive instances of each class within folds. LLMs such as GPT-4 are commonly accessed via third-party APIs because their size and computational power requirements exceed the capacity of most home or office computers. When applying them to large datasets, such as all PubMed RCTs, methods such as employing batch-APIs to reduce time and costs were reported.
                    <xref ref-type="bibr" rid="ref164">
                        <sup>164</sup>
                    </xref>
                </p>
                <p>

                    <bold>

                        <italic toggle="yes">4.3.6 Missing information in text or knowledge base</italic>
</bold>
                </p>
                <p>Information in text can be incomplete.
                    <sup>
                        <xref ref-type="bibr" rid="ref115">114</xref>
                    </sup> For example, the number of patients in a study might not be explicitly reported,
                    <sup>
                        <xref ref-type="bibr" rid="ref77">76</xref>
                    </sup> or abstracts lacking information about study design and methods can appear, especially in unstructured abstracts and older trial texts.
                    <sup>
                        <xref ref-type="bibr" rid="ref92">91</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref97">96</xref>
                    </sup> In some cases, abstracts can be missing entirely. These problems can sometimes be solved by considering the use of full texts as input.
                    <sup>
                        <xref ref-type="bibr" rid="ref72">71</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref88">87</xref>
                    </sup>
                </p>
                <p>Where a model relies on features, e.g., MetaMap, then missing UMLS coverage causes errors.
                    <sup>
                        <xref ref-type="bibr" rid="ref73">72</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref77">76</xref>
                    </sup> This also applies to models like CNNs that assign specific concepts, where unseen entities are not defined in the output label space.
                    <sup>
                        <xref ref-type="bibr" rid="ref60">59</xref>
                    </sup>
                </p>
                <p>In terms of automatic summarisation and relation extraction it was also cautioned that relying on abstracts will lead to a low sensitivity of retrieved information, as not all information of interest may be reported in sufficient detail to allow comprehensive summaries or statements about relationships between interventions and outcomes to be made.
                    <sup>
                        <xref ref-type="bibr" rid="ref61">60</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref64">63</xref>
                    </sup>
                </p>
                <sec id="sec4.3.7">
                    <label>4.3.7.</label>
                    <title>Caveats and considerations related to LLMs</title>
                    <p>4.3.7.1 Hallucinations</p>
                    <p>Missing information, implied information, numerical data, or complex descriptions in the input texts was reported as leading to hallucinations, where the generative model generates plausible sounding but fictional content.
                        <xref ref-type="bibr" rid="ref149">
                            <sup>149</sup>
                        </xref>
                        <sup>,</sup>
                        <xref ref-type="bibr" rid="ref163">
                            <sup>163</sup>
                        </xref>
                        <sup>,</sup>
                        <xref ref-type="bibr" rid="ref171">
                            <sup>171</sup>
                        </xref>
                        <sup>,</sup>
                        <xref ref-type="bibr" rid="ref173">
                            <sup>173</sup>
                        </xref>
                        <sup>,</sup>
                        <xref ref-type="bibr" rid="ref177">
                            <sup>177</sup>
                        </xref> When LLMs were fine-tuned on RCT data, they were reported to hallucinate information that would be expected in an RCT when presented with non-RCTs in a real-world application scenario.
                        <xref ref-type="bibr" rid="ref171">
                            <sup>171</sup>
                        </xref> Even in zero-shot scenarios LLMs made up participant numbers and guessed trial info.
                        <xref ref-type="bibr" rid="ref152">
                            <sup>152</sup>
                        </xref> Hallucinations are a major problem with LLM-based architectures; their generative nature is presenting challenges when applied to data extraction tasks because the information source within an analysed document cannot be located.
                        <xref ref-type="bibr" rid="ref132">
                            <sup>132</sup>
                        </xref>
                        <sup>,</sup>
                        <xref ref-type="bibr" rid="ref152">
                            <sup>152</sup>
                        </xref> Hallucinations are an important issue in the publications up to the 2024 cut-off for this review. According to LLM-providers OpenAI, their GPT4.5 model, which was released in February 2025, shows &#x201c;reduced hallucinations and more reliability&#x201d;[
                        <xref ref-type="fn" rid="fn3">3</xref>]. Publications included in future updates of this review will show to which extend these newer models impact reliability concerns such as hallucination and reproducibility.</p>
                    <p>4.3.7.2 Fairness of direct comparisons with LLMs</p>
                    <p>In data extraction, automation methods based on discriminative models such as BERT traditionally identified exact matches in text or attempted to normalized information to standardized vocabularies, and predictions were evaluated in terms of precision, recall, and F1 score.
                        <xref ref-type="bibr" rid="ref132">
                            <sup>132</sup>
                        </xref>
                        <sup>,</sup>
                        <xref ref-type="bibr" rid="ref159">
                            <sup>159</sup>
                        </xref>
                    </p>
                    <p>Most LLMs operate on a fundamentally different level. They are generative and usually summarise and return a new piece of text that is then evaluated for overall accuracy. Making such a single overall &#x2018;document-level&#x2019; prediction can be considered a much easier task than the previously widely accepted token-based classification for named-entity recognition.</p>
                    <p>With token-based classification, every single relevant word in an abstract has a binary label and algorithms need to exhaustively identify each occurrence of the positive label. There usually exists an imbalance with more negative labels present (e.g. an abstract has a handful of words describing &#x2018;Participants&#x2019; but many more words describing other concepts). Similarly, in sentence prediction tasks there also exist class imbalances in abstract-level tasks but when considering full texts as input, the imbalance grows. This characteristic led to the wide adoption of precision, recall, and F1 scores for meaningful evaluations, rather than accuracy or specificity. Accuracy and specificity take true negative token or sentence predictions into account and thus present generally high and not meaningful scores. For the token-based classification, if an algorithm correctly identifies one mention of the &#x2018;Participant&#x2019; class but misses another in the same abstract, its recall is evaluated to be a poor 0.5. At the same time, LLM-based evaluations included in this review have assigned perfect (document-level) accuracy scores to abstracts where an LLM provided only one paraphrased version of a &#x2018;Participant&#x2019; description, while ignoring other mentions. While both evaluations are correct, they can&#x2019;t be considered a fair comparison because they likely lead to a relative over-estimation of LLM performance.</p>
                    <p>Papers that carried out direct comparisons between transformers and LLMs using the same evaluation metric showed LLMs clearly underperforming for data extraction or classification tasks when compared with discriminative models.
                        <xref ref-type="bibr" rid="ref131">
                            <sup>131</sup>
                        </xref>
                        <sup>,</sup>
                        <xref ref-type="bibr" rid="ref132">
                            <sup>132</sup>
                        </xref>
                        <sup>,</sup>
                        <xref ref-type="bibr" rid="ref148">
                            <sup>148</sup>
                        </xref> This was true in cases when there was sufficient data to train a discriminative transformer, eg. for PICO extraction and many of the entities covered in the 76 existing and 33 available datasets to date. LLMs did outperform transformer models in cases where training data was insufficient or when fine-tuned for evidence-inference,
                        <xref ref-type="bibr" rid="ref171">
                            <sup>171</sup>
                        </xref> but care needs to be taken when planning and reporting evaluations. A checklist for LLM evaluation and suggestions for reporting of results was compiled and published by Schmidt et al.
                        <xref ref-type="bibr" rid="ref165">
                            <sup>165</sup>
                        </xref>
                    </p>
                    <p>In recent months, especially the human-LLM comparison gained attention. LLMs matching human performance was a topic of discussion for example at the recent &#x2018;2024 Global Evidence Summit&#x2019; in discussions within special sessions, but also during presentations. In the related field of automated bias assessment, it was also shown that RobotReviewer performed equally to humans.
                        <xref ref-type="bibr" rid="ref146">
                            <sup>146</sup>
                        </xref> Human judgements are imperfect, and thus humans employ dual-workflows during screening and data extraction when doing systematic reviews. Just by looking at the inter-rater reliability in Table 5 of this publication it becomes clear that human &#x2018;Gold Standards&#x2019; are imperfect. In the light of this evidence from practice, a &#x2018;fair&#x2019; comparison might include for example comparisons of LLMs against both humans and other non-generative automation methods using the same evaluation method.</p>
                    <p>4.3.7.3 LLM evaluation workload and dataset sizes</p>
                    <p>One frequently mentioned issue with the fair evaluation is that the LLM&#x2019;s generative output is challenging to evaluate in the automated manner that entity-recognition or sentence classification models employ.
                        <xref ref-type="bibr" rid="ref132">
                            <sup>132</sup>
                        </xref> When doing automated evaluation based on previously labelled data such as EBM-NLP, resources need to be invested only once upfront by the dataset creators. Once the labelled dataset is obtained, predictions of any model can be evaluated quantitatively with no further manual work. LLMs currently require a human to perform the assessment of whether output is accurate, and the validation needs to be repeated after each prompt change and becomes invalid if the LLM itself is updated over time. Some included publications also reported inconsistencies when re-running the same prompts and randomly receiving incorrect results, which leads to an even higher amount of manual work.
                        <xref ref-type="bibr" rid="ref163">
                            <sup>163</sup>
                        </xref>
                        <sup>,</sup>
                        <xref ref-type="bibr" rid="ref165">
                            <sup>165</sup>
                        </xref> At that point, the LLM might have saved resources by not requiring upfront training data; but the human workload at the evaluation stage is not insignificant.
                        <xref ref-type="bibr" rid="ref149">
                            <sup>149</sup>
                        </xref>
                        <sup>,</sup>
                        <xref ref-type="bibr" rid="ref171">
                            <sup>171</sup>
                        </xref> For example, Wadhwa et al. (2023)
                        <xref ref-type="bibr" rid="ref171">
                            <sup>171</sup>
                        </xref> reported hiring evaluators via Upwork[
                        <xref ref-type="fn" rid="fn4">4</xref>] paying $30 per hour for LLM evaluations that only apply to their specific project. More objective, meaningful, and most importantly automatically applicable evaluation metrics need to be developed.
                        <xref ref-type="bibr" rid="ref149">
                            <sup>149</sup>
                        </xref>
                        <sup>,</sup>
                        <xref ref-type="bibr" rid="ref165">
                            <sup>165</sup>
                        </xref>
                        <sup>,</sup>
                        <xref ref-type="bibr" rid="ref166">
                            <sup>166</sup>
                        </xref>
                    </p>
                    <p>4.3.7.4 LLM dataset splitting and prompt development</p>
                    <p>
Publications that reported fine-tuning of LLMs generally adhered to good-practice standards of splitting their dataset into separate training and evaluation sets. This helps to avoid over-fitting and over-estimating extraction performance. However, the often smaller zero-shot LLM publications seemed to have lower quality of reporting standards. Out of 13 papers that explored zero-shot data extraction, six provided insufficient information about prompt development. Often, prompt texts were not shared and it was unclear if authors developed and evaluated prompts on the same dataset. Going forward, we urge authors interested in developing LLM-based data extraction methods to create small and randomly partitioned prompt development datasets and to provide a brief description of this process in their publication. A reporting template that describes necessary steps and information for LLM automation of data extraction can be found here.
                        <xref ref-type="bibr" rid="ref165">
                            <sup>165</sup>
                        </xref>
                    </p>
                    <p>4.3.7.5 General considerations</p>
                    <p>Depending on the size and provider of the LLM there may be additional cost factors. Costs can be incurred for example for deploying these computationally intensive models on a server or using proprietary APIs for single or batch-processed calls.
                        <xref ref-type="bibr" rid="ref173">
                            <sup>173</sup>
                        </xref>
                        <sup>,</sup>
                        <xref ref-type="bibr" rid="ref177">
                            <sup>177</sup>
                        </xref> As undesirable side-effect of these deployment and evaluation costs can lead researchers to using smaller and less representative datasets, which may harm the scientific quality of experiments.
                        <xref ref-type="bibr" rid="ref148">
                            <sup>148</sup>
                        </xref>
                        <sup>,</sup>
                        <xref ref-type="bibr" rid="ref177">
                            <sup>177</sup>
                        </xref>
                    </p>
                    <p>Besides hallucination, LLMs were described to be overgeneralizing and grouping multiple similar outcomes in error, systematically ignoring negative numbers, confusing similar items, providing duplicate or incomplete outputs.
                        <xref ref-type="bibr" rid="ref148">
                            <sup>148</sup>
                        </xref>
                        <sup>,</sup>
                        <xref ref-type="bibr" rid="ref149">
                            <sup>149</sup>
                        </xref>
                        <sup>,</sup>
                        <xref ref-type="bibr" rid="ref171">
                            <sup>171</sup>
                        </xref>
                        <sup>,</sup>
                        <xref ref-type="bibr" rid="ref177">
                            <sup>177</sup>
                        </xref> When asked to describe results, LLMs were found to be &#x2018;misinterpreting&#x2019; information and using the concept of statistical significance incorrectly.
                        <xref ref-type="bibr" rid="ref163">
                            <sup>163</sup>
                        </xref> Similar to classic entity-extraction, LLMs made errors and showed inconsistency when information was implied, eg. when extracting numbers of participants during different points in time in a trial, group names, or drug dosage.
                        <xref ref-type="bibr" rid="ref154">
                            <sup>154</sup>
                        </xref>
                        <sup>,</sup>
                        <xref ref-type="bibr" rid="ref158">
                            <sup>158</sup>
                        </xref>
                    </p>
                    <p>Interestingly, with respect to outcomes, LLMs were found to be swapping effect directions but simultaneously also adjusting the outcome name and thus making a correct prediction. The example given by Wadhwa (2023)
                        <xref ref-type="bibr" rid="ref171">
                            <sup>171</sup>
                        </xref> was that &#x2018;decrease in body weight&#x2019; became &#x2018;increase in body weight reduction&#x2019;.
                        <xref ref-type="bibr" rid="ref171">
                            <sup>171</sup>
                        </xref>
                    </p>
                    <p>Strategies of LLM usage were categorized into three groups: zero-shot (N=13), few-shot (N=2), and fine-tuning (N=6) and some included references compared more than one strategy. For few-shot prompting one paper assessed the optimal amount of &#x2018;training&#x2019; examples in a prompt but concluded that there was no definite answer because the number varied across different datasets.
                        <xref ref-type="bibr" rid="ref138">
                            <sup>138</sup>
                        </xref> When prompted incorrectly, LLMs were reported to be too verbose, i.e. to return too much text, to overexplain,
                        <xref ref-type="bibr" rid="ref176">
                            <sup>176</sup>
                        </xref> or in the most curious report to be adding and then answering questions autonomously.
                        <xref ref-type="bibr" rid="ref152">
                            <sup>152</sup>
                        </xref> This is an important practical issue that may arise when prompts are not developed, post-processed and evaluated correctly or when a poorly-performing LLM is selected.</p>
                    <p>The most basic objective of automated data extraction is to create structured outputs based on unstructured text inputs. Even though LLMs have shown impressive capacity for text generation and ease of use in the data extraction space, the publications in this review showed that it is crucial to evaluate their performance fairly. At present, LLMs and other architectures cannot completely automate data extraction due to unreliabilty,
                        <xref ref-type="bibr" rid="ref149">
                            <sup>149</sup>
                        </xref> but they have potential to accelerate the process or to be used as &#x2018;second reviewers&#x2019;.
                        <xref ref-type="bibr" rid="ref165">
                            <sup>165</sup>
                        </xref>
                        <sup>,</sup>
                        <xref ref-type="bibr" rid="ref176">
                            <sup>176</sup>
                        </xref>
                    </p>
                    <p>

                        <bold>

                            <italic toggle="yes">4.3.8 Practical and other implications</italic>
</bold>
                    </p>
                    <p>In contrast to the problem of missing information, too much information can also have practical implications. For instance, often there are multiple sentences with each label, of which one is &#x2018;key&#x2019;, e.g., the descriptions of inclusion and exclusion criteria often span multiple sentences, and for a data extraction system it can be challenging to work out which sentence is the key sentence. The same problem applies to methods that select and rank the top-n sentences for each data extraction target, where a system risks including too much, or not enough results depending on the amount of sentences that are kept.
                        <sup>
                            <xref ref-type="bibr" rid="ref47">46</xref>
                        </sup>
                    </p>
                    <p>Low recall is an important practical implication,
                        <sup>
                            <xref ref-type="bibr" rid="ref54">53</xref>
                        </sup> especially in entities that appear infrequently in the training data, and are therefore not well represented in the training process of the classification system.
                        <sup>
                            <xref ref-type="bibr" rid="ref49">48</xref>
                        </sup> In other words, an entity such as &#x2018;Race&#x2019; might not be labelled very often is a training corpus, and systematically missed or wrongly classified when the data extraction system is used on new texts. Therefore, human involvement is needed,
                        <sup>
                            <xref ref-type="bibr" rid="ref87">86</xref>
                        </sup> and scores need to be improved.
                        <sup>
                            <xref ref-type="bibr" rid="ref42">41</xref>
                        </sup> It is challenging to find the best set of hyperparameters
                        <sup>
                            <xref ref-type="bibr" rid="ref107">106</xref>
                        </sup> and to adjust precision and recall trade-offs to maximise the utility of a system while being transparent about the number of data points that might be missed when increasing system precision to save work for a human reviewer.
                        <sup>
                            <xref ref-type="bibr" rid="ref70">69</xref>
                        </sup>
                        <sup>,</sup>
                        <sup>
                            <xref ref-type="bibr" rid="ref96">95</xref>
                        </sup>
                        <sup>,</sup>
                        <sup>
                            <xref ref-type="bibr" rid="ref102">101</xref>
                        </sup>
                    </p>
                    <p>When using distantly supervised or automatically created corpora, such as corpora deriving sentence labels from headings of structured abstracts, there is a risk of producing evaluation results that underestimate model performance. This was shown by Duan et al
                        <xref ref-type="bibr" rid="ref136">
                            <sup>136</sup>
                        </xref> who discuss that errors in the auto-generated gold standard for validation accounted for 56% of the &#x2018;misclassifications&#x2019; of their model.
                        <xref ref-type="bibr" rid="ref136">
                            <sup>136</sup>
                        </xref>
                    </p>
                    <p>For relation extraction or normalisation tasks, error-propagation was noted as a practical issue in joint models.
                        <sup>
                            <xref ref-type="bibr" rid="ref64">63</xref>
                        </sup>
                        <sup>,</sup>
                        <sup>
                            <xref ref-type="bibr" rid="ref68">67</xref>
                        </sup> To extract relations, first a model to identify entities is needed, and then another model to classify relationships is applied in a pipeline. Neither human nor machine can instantly perform perfect data extraction or labelling,
                        <sup>
                            <xref ref-type="bibr" rid="ref38">37</xref>
                        </sup> and thus errors done in earlier classification steps can be carried forwards and accumulate.</p>
                    <p>For relation extraction and summarisation, the importance of qualitative real-world evaluation was discussed. This was due to missing clarity of how well summarisation metrics relate to the actual usefulness or completeness of a summary and because challenges such as contradictions or negations within and between trial texts need to be evaluated within the context of a review and not just a trial itself.
                        <sup>
                            <xref ref-type="bibr" rid="ref62">61</xref>
                        </sup>
                        <sup>,</sup>
                        <sup>
                            <xref ref-type="bibr" rid="ref64">63</xref>
                        </sup>
                    </p>
                    <p>A separate practical caveat with relation-extraction models are longer dependencies, i.e. bigger gaps between salient pieces of information in text that lead to a conclusion. This leads to increased complexity of the task and thus to reduced performance.
                        <sup>
                            <xref ref-type="bibr" rid="ref100">99</xref>
                        </sup>
                    </p>
                    <p>In their statement on ethical concerns, DeYoung et al. (2021)
                        <xref ref-type="bibr" rid="ref61">
                            <sup>61</sup>
                        </xref> mention that these complex relation and summarisation models can produce correct-looking but factually incorrect statements and are risky to be applied in practice without extra caution, a problem also seen with newer LLM-based models.</p>
                </sec>
            </sec>
            <sec id="sec4.4">
                <label>4.4</label>
                <title>Explainability and interpretability of data extraction systems</title>
                <p>The neural networks or machine-learning models from publications included in this review learn to classify and extract data by adjusting numerical weights and by applying mathematical functions to these sets of weights. The decision-making process behind the classification of a sentence or an entity is therefore comparable with a black box, because it is very hard to comprehend how, or why the model made its predictions. A recent comment published in Nature has called for a more in-depth analysis and explanation of the decision-making process within neural networks.
                    <sup>
                        <xref ref-type="bibr" rid="ref118">117</xref>
                    </sup> Ultimately, hidden tendencies in the training data can influence the decision-making processes of a data extraction model in a non-transparent way. Many of the examples discussed in the comment are related to healthcare, but in practice there is a very limited understanding of their inherent biases despite the broad application of machine learning and neural networks.
                    <sup>
                        <xref ref-type="bibr" rid="ref118">117</xref>
                    </sup>
                </p>
                <p>A deeper understanding of what occurs between data entry and the point of prediction can benefit the general performance of a system, because it uncovers shortcomings in the training process. These shortcomings can be related to the composition of training data (e.g. overrepresentation or underrepresentation of groups), the general system architecture, or to other unintended tendencies in a system&#x2019;s prediction.
                    <sup>
                        <xref ref-type="bibr" rid="ref120">119</xref>
                    </sup> A small number of included publications in the base-review (N = 10) discussed issues related to hidden variables as part of an extensive error analysis (see section 3.5.2). The composition of training and testing data were described in most publications, but no publication that specifically addresses the issues of interpretability or explainability was found.</p>
            </sec>
            <sec id="sec4.5">
                <label>4.5</label>
                <title>Availability of corpora, and copyright issues</title>
                <p>There are several corpora described in the literature, many with manual gold-standard labels (see 
                    <xref ref-type="table" rid="T4">
Table 4</xref>). There are still publications with custom, unshared datasets. Possible reasons for this are concerns over copyright, or malfunctioning download links from websites mentioned in older publications. Ideally, data extraction algorithms should be evaluated on different datasets in order to detect over-fitting, to test how the systems react to data from different domains and different annotators, and to enable the comparison of systems in a reliable way. As a supplement to this manuscript, we have collected links to datasets in 
                    <xref ref-type="table" rid="T4">
Table 4</xref> and encourage researchers to share their automatically or manually annotated labels and texts so that other researchers may use them for development and evaluation of new data extraction systems.</p>
            </sec>
            <sec id="sec4.6">
                <label>4.6.</label>
                <title>Latest developments and upcoming research</title>
                <p>This LSR has its cut-off in a period of very high publishing activity in the field of automated data extraction &#x2013; mostly due to LLMs facilitating access to automation methods, but also due to continuing interest in transformer models. Before this update, we wrote that the arrival of LLMs &#x2018;may mark the current state of the field at the end of a challenging period of investigation, where the limitations of recent machine learning approaches have been apparent, and the automation of data extraction was quite limited.&#x2019; The performance of LLMs did not disappoint, but their usage for automated data extraction is not yet mature. We expect to see many more publications in the near future that investigate LLM hallucinations and reproducibility issues, practical comparisons with humans, and evaluation of time-savings induced by (semi) automation methods.</p>
                <sec id="sec4.6.1">
                    <label>4.6.1</label>
                    <title>Limitations of this living review</title>
                    <p>This review focused on data extraction from reports of clinical trials and epidemiological research. This mostly includes data extraction from reports of randomised controlled trials where intervention and comparators are usually jointly extracted, and only a very small fraction of the evidence that addresses other important study types (e.g., diagnostic accuracy studies). During screening we excluded all publications related to clinical data (such as electronic health records) and publications extracting disease, population, or intervention data from genetic and biological research. There is a wealth of evidence and potential training and evaluation data in these publications, but it was not feasible to include them in the living review.</p>
                </sec>
            </sec>
        </sec>
        <sec id="sec5">
            <label>5.</label>
            <title>Conclusion</title>
            <p>This LSR presents an overview of the data-extraction literature of interest to different types of systematic review. We included a broad evidence base of publications describing data extraction for interventional systematic reviews (focusing on P, IC, and O classes and RCT data), and a very small number of publications extracting epidemiological and diagnostic accuracy data. Within the LSR update we identified research trends such as the emergence of LLM methods, ongoing popularity of transformer neural networks, or increased code and dataset availability. However, the number of accessible tools that can help systematic reviewers with data extraction is still very low. Currently, only around one in ten publications is linked to a usable tool or describes an ongoing implementation.</p>
            <p>The data extraction algorithms and the characteristics of the data they were trained and evaluated on were well reported. Around three in ten publications made their datasets available to the public, and more than half of all included publications reported training or evaluating on these datasets. Unfortunately, usage of different evaluation scripts, different methods for averaging of results, or increasing numbers of custom adaptations to datasets still make it difficult to draw conclusions on which is the best performing system. Additionally, data extraction is a very hard task. It usually requires conflict resolution between expert systematic reviewers when done manually, and consequently creates problems when creating the gold standards used for training and evaluation of the algorithms in this review.</p>
            <p>
We listed many ongoing challenges in the field of data extraction for systematic review (semi) automation, and specifically focus on issues emerging through usage of LLMs. These issues involve hallucinations, inconsistent predictions, and meaningful and fair comparisons with humans or other automated methods. With this living review we aim to review the literature continuously as it becomes available. Therefore, the most current review version, along with the number of abstracts screened and included after the publication of this review iteration, is available on our website.</p>
        </sec>
        <sec id="sec10">
            <title>Data availability</title>
            <sec id="sec18">
                <title>Underlying data</title>
                <p>Harvard Dataverse: Appendix for base review. 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.7910/DVN/LNGCOQ">10.7910/DVN/LNGCOQ</ext-link>.
                    <xref ref-type="bibr" rid="ref127">
                        <sup>127</sup>
                    </xref>
                </p>
                <p>This project contains the following underlying data:
                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>Appendix_A.zip (full database with all data extraction and other fields for base review data)</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>Appendix B.docx (further information about excluded publications)</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>Appendix_C.zip (code, weights, data, scores of abstract classifiers for Web of Science content)</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>Appendix_D.zip (full database with all data extraction and other fields for LSR update 1 (2023))</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>Appendix_E.zip (full database with all data extraction and other fields for LSR update 2 (2024))</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>Supplementary_key_items.docx (overview of items extracted for each included study)</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>table 1. csv and table 1_long.csv (Table A1 in csv format, the long version includes extra data)</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>table 1_long_updated.xlsx (LSR 2 update 2024 for Table A1 in csv format, the long version includes extra data)
                                <bold>Figures2.zip (LSR 2 updated figures)</bold>
                            </p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>3.1.xlsx (additional info about related publications from update 2 (2024)</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>3.2.zip (EPPI-mapper json file with full data extraction, and maps to filter results)</p>
                        </list-item>
                    </list>
                </p>
                <p>Harvard Dataverse: Available datasets for SR automation. 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.7910/DVN/0XTV25">10.7910/DVN/0XTV25</ext-link>.
                    <xref ref-type="bibr" rid="ref128">
                        <sup>128</sup>
                    </xref>
                </p>
                <p>This project contains the following underlying data:
                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>Datasets shared by authors of the included publications (collected for base review, see table 1_long_updated.xlsx for links to code and data for other includes)</p>
                        </list-item>
                    </list>
                </p>
                <p>Data are available under the terms of the 
                    <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/publicdomain/zero/1.0/">Creative Commons Zero &#x201c;No rights reserved&#x201d; data waiver</ext-link> (CC0 1.0 Public domain dedication).</p>
            </sec>
            <sec id="sec12">
                <title>Extended data</title>
                <p>Open Science Framework: Data Extraction Methods for Systematic Review (semi)Automation: A Living Review Protocol. 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.17605/OSF.IO/ECB3T">https://doi.org/10.17605/OSF.IO/ECB3T</ext-link>.
                    <sup>
                        <xref ref-type="bibr" rid="ref16">15</xref>
                    </sup>
                </p>
                <p>This project contains the following extended data:
                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>Review protocol</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>Additional_Fields.docx (overview of data fields of interest for text mining in clinical trials)</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>Search.docx (additional information about the searches, including full search strategies)</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>PRISMA P checklist for &#x2018;Data extraction methods for systematic review (semi)automation: A living review protocol.&#x2019;</p>
                        </list-item>
                    </list>
                </p>
                <p>Data are available under the terms of the 
                    <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International license</ext-link> (CC-BY 4.0).</p>
            </sec>
            <sec id="sec13">
                <title>Reporting guidelines</title>
                <p>Harvard Dataverse: PRISMA checklist for &#x2018;Data extraction methods for systematic review (semi)automation: A living systematic review&#x2019; 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.7910/DVN/LNGCOQ">https://doi.org/10.7910/DVN/LNGCOQ</ext-link>.
                    <sup>
                        <xref ref-type="bibr" rid="ref128">127</xref>
                    </sup>
                </p>
                <p>Data are available under the terms of the 
                    <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/publicdomain/zero/1.0/">Creative Commons Zero &#x201c;No rights reserved&#x201d; data waiver</ext-link> (CC0 1.0 Public domain dedication).</p>
            </sec>
            <sec id="sec14">
                <title>Software availability</title>
                <p>The development version of the software for automated searching is available from Github: 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/mcguinlu/COVID_suicide_living">https://github.com/mcguinlu/COVID_suicide_living
</ext-link>.</p>
                <p>Archived source code at time of publication: 
                    <ext-link ext-link-type="uri" xlink:href="http://doi.org/10.5281/zenodo.3871366">http://doi.org/10.5281/zenodo.3871366</ext-link>.
                    <sup>
                        <xref ref-type="bibr" rid="ref18">17</xref>
                    </sup>
                </p>
                <p>License: MIT</p>
            </sec>
        </sec>
        <sec id="sec15">
            <title>Author contributions</title>
            <p>LS: Conceptualization, Investigation, Methodology, Software, Visualization, Writing &#x2013; Original Draft Preparation</p>
            <p>ANFM: Data Curation, Investigation, Writing &#x2013; Review &amp; Editing</p>
            <p>RE: Data Curation, Investigation, Writing &#x2013; Review &amp; Editing</p>
            <p>BKO: Conceptualization, Investigation, Methodology, Software, Writing &#x2013; Review &amp; Editing</p>
            <p>JT: Conceptualization, Investigation, Methodology, Writing &#x2013; Review &amp; Editing</p>
            <p>JPTH: Conceptualization, Funding Acquisition, Investigation, Methodology, Writing &#x2013; Review &amp; Editing</p>
        </sec>
    </body>
    <back>
        <ack>
            <title>Acknowledgements</title>
            <p>We thank Luke McGuinness for his contribution to the base-review, specifically the LSR web-app programming, screening, conflict-resolution, and his feedback to the base-review manuscript.</p>
            <p>We thank Patrick O&#x2019;Driscoll for his help with checking data, counts, and wording in the manuscript and the appendix.</p>
            <p>We thank Sarah Dawson for developing and evaluating the search strategy, and for providing advice on databases to search for this review. Many thanks also to Alexandra McAleenan and Vincent Cheng for providing valuable feedback on this review and its protocol.</p>
        </ack>
        <ref-list>
            <title>References</title>
            <ref id="ref1">
                <label>1</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Higgins</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Cochrane Handbook for Systematic Reviews of Interventions version 6.1 (updated September 2020).</article-title>
                    <year>2020</year>:
                    <publisher-name>Cochrane</publisher-name>.</mixed-citation>
            </ref>
            <ref id="ref2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Fukumi Tsunoda</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Concei&#x00e7;&#x00e3;o Moreira</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ribeiro Guimar&#x00e3;es</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>Machine learning e revis&#x00e3;o sistem&#x00e1;tica de literatura automatizada: uma revis&#x00e3;o sistem&#x00e1;tica.</article-title>
                    <source>

                        <italic toggle="yes">Revista Tecnologia e Sociedade.</italic>
                    </source>
                    <year>2020</year>;<volume>16</volume>(<issue>45</issue>).</mixed-citation>
            </ref>
            <ref id="ref3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Jonnalagadda</surname>
                            <given-names>SR</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Goyal</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Huffman</surname>
                            <given-names>MD</given-names>
                        </name>
</person-group>:
                    <article-title>Automating data extraction in systematic reviews: a systematic review.</article-title>
                    <source>

                        <italic toggle="yes">Systematic Reviews.</italic>
                    </source>
                    <year>2015</year>;<volume>4</volume>(<issue>1</issue>):<fpage>78</fpage>.
                    <pub-id pub-id-type="pmid">26073888</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13643-015-0066-7</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4514954</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>O&#x2019;Mara-Eves</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Using text mining for study identification in systematic reviews: a systematic review of current approaches.</article-title>
                    <source>

                        <italic toggle="yes">Syst Rev.</italic>
                    </source>
                    <year>2015</year>;<volume>4</volume>(<issue>1</issue>):<fpage>5</fpage>.
                    <pub-id pub-id-type="pmid">25588314</pub-id>
                    <pub-id pub-id-type="doi">10.1186/2046-4053-4-5</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4320539</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Tsafnat</surname>
                            <given-names>G</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Systematic review automation technologies.</article-title>
                    <source>

                        <italic toggle="yes">Syst Rev.</italic>
                    </source>
                    <year>2014</year>;<volume>3</volume>(<issue>1</issue>):<fpage>74</fpage>.
                    <pub-id pub-id-type="doi">10.1186/2046-4053-3-74</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Beller</surname>
                            <given-names>E</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Making progress with the automation of systematic reviews: principles of the International Collaboration for the Automation of Systematic Reviews (ICASR).</article-title>
                    <source>

                        <italic toggle="yes">Syst. Rev.</italic>
                    </source>
                    <year>2018</year>;<volume>7</volume>(<issue>1</issue>):<fpage>77</fpage>.</mixed-citation>
            </ref>
            <ref id="ref7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Marshall</surname>
                            <given-names>IJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wallace</surname>
                            <given-names>BC</given-names>
                        </name>
</person-group>:
                    <article-title>Toward systematic review automation: a practical guide to using machine learning tools in research synthesis.</article-title>
                    <source>

                        <italic toggle="yes">Syst Rev.</italic>
                    </source>
                    <year>2019</year>;<volume>8</volume>(<issue>1</issue>):<fpage>163</fpage>.</mixed-citation>
            </ref>
            <ref id="ref8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Cierco Jimenez</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lee</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Rosillo</surname>
                            <given-names>N</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Machine learning computational tools to assist the performance of systematic reviews: A mapping review.</article-title>
                    <source>

                        <italic toggle="yes">BMC Med Res Methodol.</italic>
                    </source>
                    <year>2022</year>;<volume>22</volume>(<issue>1</issue>):<fpage>322</fpage>.
                    <pub-id pub-id-type="pmid">36522637</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s12874-022-01805-4</pub-id>
                    <pub-id pub-id-type="pmcid">PMC9756658</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Khalil</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ameen</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zarnegar</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>Tools to support the automation of systematic reviews: a scoping review.</article-title>
                    <source>

                        <italic toggle="yes">J Clin Epidemiol.</italic>
                    </source>
                    <year>2022</year>;<volume>144</volume>:<fpage>22</fpage>&#x2013;<lpage>42</lpage>.
                    <pub-id pub-id-type="pmid">34896236</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.jclinepi.2021.12.005</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ruiz</surname>
                            <given-names>RL</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Duffy</surname>
                            <given-names>VG</given-names>
                        </name>
</person-group>:
                    <article-title>Automation in Healthcare Systematic Review.</article-title>In
                    <person-group person-group-type="editor">

                        <name name-style="western">
                            <surname>Stephanidis</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Duffy</surname>
                            <given-names>VG</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kr&#x00f6;mker</surname>
                            <given-names>H</given-names>
                        </name>

                        <etal/>
</person-group>
                    <italic toggle="yes">HCI International 2021 -</italic>
                    <source>

                        <italic toggle="yes">Late Breaking Papers: HCI Applications in Health, Transport, and Industry.</italic> Cham.</source>
                    <year>2021</year>.</mixed-citation>
            </ref>
            <ref id="ref11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Sundaram</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Berleant</surname>
                            <given-names>D</given-names>
                        </name>
</person-group>:
                    <article-title>Automating Systematic Literature Reviews with Natural Language Processing and Text Mining: a Systematic Literature Review.</article-title>
                    <source>

                        <italic toggle="yes">arXiv preprint arXiv.</italic>
                    </source>
                    <italic toggle="yes">:2211.15397.</italic>
                    <year>2022</year>.</mixed-citation>
            </ref>
            <ref id="ref12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zhang</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Huang</surname>
                            <given-names>Z</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>Y</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Information Extraction from the Text Data on Traditional Chinese Medicine: A Review on Tasks, Challenges, and Methods from 2010 to 2021.</article-title>
                    <source>

                        <italic toggle="yes">Evid Based Complement Alternat Med.</italic>
                    </source>
                    <year>2022</year>;<volume>2022</volume>:<fpage>1679589</fpage>.
                    <pub-id pub-id-type="pmid">35600940</pub-id>
                    <pub-id pub-id-type="doi">10.1155/2022/1679589</pub-id>
                    <pub-id pub-id-type="pmcid">PMC9122692</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref13">
                <label>13</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Schmidt</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sinyor</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Webb</surname>
                            <given-names>RT</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A narrative review of recent tools and innovations toward automating living systematic reviews and evidence syntheses.</article-title>
                    <source>

                        <italic toggle="yes">Zeitschrift fur Evidenz, Fortbildung und Qualitat im Gesundheitswesen.</italic>
                    </source>
                    <year>2023</year>; S1865-9217(23)00140-X.
                    <pub-id pub-id-type="doi">10.1016/j.zefq.2023.06.007</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref14">
                <label>14</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Devlin</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Chang</surname>
                            <given-names>M-W</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lee</surname>
                            <given-names>K</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding.</article-title>
                    <source>

                        <italic toggle="yes">arXiv preprint arXiv.</italic>
                    </source>
                    <year>2018</year>;<volume>1810</volume>:<fpage>04805</fpage>.</mixed-citation>
            </ref>
            <ref id="ref15">
                <label>15</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Schmidt</surname>
                            <given-names>L</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Data extraction methods for systematic review (semi)automation: A living review protocol.</article-title>
                    <source>

                        <italic toggle="yes">F1000Res.</italic>
                    </source>
                    <year>2020</year>;<volume>9</volume>(<issue>210</issue>).
                    <pub-id pub-id-type="pmid">32724560</pub-id>
                    <pub-id pub-id-type="doi">10.12688/f1000research.22781.2</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7338918</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref16">
                <label>16</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>McGuinness</surname>
                            <given-names>LA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Schmidt</surname>
                            <given-names>L</given-names>
                        </name>
</person-group>:
                    <article-title>medrxivr: Accessing and searching medRxiv and bioRxivpreprint data in R.</article-title>
                    <source>

                        <italic toggle="yes">JOSS.</italic>
                    </source>
                    <year>2020</year>.
                    <pub-id pub-id-type="doi">10.21105/joss.02651</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref17">
                <label>17</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>McGuinness</surname>
                            <given-names>LA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Schmidt</surname>
                            <given-names>L</given-names>
                        </name>
</person-group>:
                    <article-title>mcguinlu/COVID_suicide_living: Initial Release (Version v1.0.0).</article-title>
                    <source>

                        <italic toggle="yes">Zenodo.</italic>
                    </source>
                    <year>2020, June 1</year>.
                    <pub-id pub-id-type="doi">10.5281/zenodo.3871366</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref18">
                <label>18</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>John</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The impact of the COVID-19 pandemic on self-harm and suicidal behaviour: protocol for a living systematic review [version 1; peer review: 1 approved, 1 approved with reservations].</article-title>
                    <source>

                        <italic toggle="yes">F1000Res.</italic>
                    </source>
                    <year>2020</year>;<volume>9</volume>(<issue>644</issue>).
                    <pub-id pub-id-type="pmid">33604025</pub-id>
                    <pub-id pub-id-type="doi">10.12688/f1000research.25522.1</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7871358</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref19">
                <label>19</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Olorisade</surname>
                            <given-names>BK</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Brereton</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Andras</surname>
                            <given-names>P</given-names>
                        </name>
</person-group>:
                    <article-title>Reproducibility of studies on text mining for citation screening in systematic reviews: Evaluation and checklist.</article-title>
                    <source>

                        <italic toggle="yes">J Biomed Inform.</italic>
                    </source>
                    <year>2017</year>;<volume>73</volume>:<fpage>1</fpage>&#x2013;<lpage>13</lpage>.
                    <pub-id pub-id-type="pmid">28711679</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.jbi.2017.07.010</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref20">
                <label>20</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Haddaway</surname>
                            <given-names>NR</given-names>
                        </name>
</person-group>:
                    <article-title>

                        <italic toggle="yes">livingPRISMA_flow: R package and ShinyApp for producing PRISMA-style flow diagrams for living systematic reviews (Version 0.0.1).</italic>
                    </article-title>In Zenodo. xxx<year>2021</year>.
                    <pub-id pub-id-type="pmcid">PMC9828146</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref21">
                <label>21</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kahale</surname>
                            <given-names>LA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Elkhoury</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>El Mikati</surname>
                            <given-names>I</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Tailored PRISMA 2020 flow diagrams for living systematic reviews: a methodological survey and a proposal.</article-title>
                    <source>

                        <italic toggle="yes">F1000Res.</italic>
                    </source>
                    <year>2021</year>;<volume>10</volume>:<fpage>192</fpage>.
                    <pub-id pub-id-type="pmid">35136567</pub-id>
                    <pub-id pub-id-type="doi">10.12688/f1000research.51723.3</pub-id>
                    <pub-id pub-id-type="pmcid">PMC8804909</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref22">
                <label>22</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Page</surname>
                            <given-names>MJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>McKenzie</surname>
                            <given-names>JE</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Bossuyt</surname>
                            <given-names>PM</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The PRISMA 2020 statement: an updated guideline for reporting systematic reviews.</article-title>
                    <source>

                        <italic toggle="yes">BMJ.</italic>
                    </source>
                    <year>2021</year>;<volume>372</volume>, n71.
                    <pub-id pub-id-type="pmid">33782057</pub-id>
                    <pub-id pub-id-type="doi">10.1136/bmj.n71</pub-id>
                    <pub-id pub-id-type="pmcid">PMC8005924</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref23">
                <label>23</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Norman</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Leeflang</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>N&#x00e9;v&#x00e9;ol</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>Data Extraction and Synthesis in Systematic Reviews of Diagnostic Test Accuracy: A Corpus for Automating and Evaluating the Process.</article-title>
                    <source>

                        <italic toggle="yes">AMIA Annu Symp Proc.</italic>
                    </source>
                    <year>2018</year>;<volume>2018</volume>:<fpage>817</fpage>&#x2013;<lpage>826</lpage>.
                    <pub-id pub-id-type="pmid">30815124</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6371350</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref24">
                <label>24</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Millard</surname>
                            <given-names>LA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Flach</surname>
                            <given-names>PA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Higgins</surname>
                            <given-names>JP</given-names>
                        </name>
</person-group>:
                    <article-title>Machine learning to assist risk-of-bias assessments in systematic reviews.</article-title>
                    <source>

                        <italic toggle="yes">Int J Epidemiol.</italic>
                    </source>
                    <year>2016</year>;<volume>45</volume>(<issue>1</issue>):<fpage>266</fpage>&#x2013;<lpage>277</lpage>.
                    <pub-id pub-id-type="pmid">26659355</pub-id>
                    <pub-id pub-id-type="doi">10.1093/ije/dyv306</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4795562</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref25">
                <label>25</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Marshall</surname>
                            <given-names>IJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kuiper</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wallace</surname>
                            <given-names>B</given-names>
                        </name>
</person-group>:
                    <article-title>RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials.</article-title>
                    <source>

                        <italic toggle="yes">J Am Med Inform Assoc.</italic>
                    </source>
                    <year>2016</year>;<volume>23</volume>(<issue>1</issue>):<fpage>193</fpage>&#x2013;<lpage>201</lpage>.
                    <pub-id pub-id-type="pmid">26104742</pub-id>
                    <pub-id pub-id-type="doi">10.1093/jamia/ocv044</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4713900</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref26">
                <label>26</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Boudin</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Nie</surname>
                            <given-names>JY</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Dawes</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>Clinical Information Retrieval using Document and PICO Structure.</article-title>
                    <source>

                        <italic toggle="yes">Assoc. Compu. Linguist.</italic>
                    </source>
                    <year>2010</year>:<fpage>822</fpage>&#x2013;<lpage>830</lpage>.</mixed-citation>
            </ref>
            <ref id="ref27">
                <label>27</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Luo</surname>
                            <given-names>Z</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Extracting temporal constraints from clinical research eligibility criteria using conditional random fields.</article-title>
                    <source>

                        <italic toggle="yes">AMIA Annu Symp Proc.</italic>
                    </source>
                    <year>2011</year>;<volume>2011</volume>:<fpage>843</fpage>&#x2013;<lpage>852</lpage>.
                    <pub-id pub-id-type="pmid">22195142</pub-id>
                    <pub-id pub-id-type="pmcid">PMC3243135</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref28">
                <label>28</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Rathbone</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Expediting citation screening using PICo-based title-only screening for identifying studies in scoping searches and rapid reviews.</article-title>
                    <source>

                        <italic toggle="yes">Syst Rev.</italic>
                    </source>
                    <year>2017</year>;<volume>6</volume>(<issue>1</issue>):<fpage>233</fpage>.
                    <pub-id pub-id-type="pmid">29178925</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13643-017-0629-x</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5702220</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref29">
                <label>29</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Malheiros</surname>
                            <given-names>V</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <chapter-title>A Visual Text Mining approach for Systematic Reviews</chapter-title>. in:
                    <source>

                        <italic toggle="yes">First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007).</italic>
                    </source>
                    <year>2007</year>.</mixed-citation>
            </ref>
            <ref id="ref30">
                <label>30</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Fabbri</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <chapter-title>Using Information Visualization and Text Mining to Facilitate the Conduction of Systematic Literature Reviews</chapter-title>. in
                    <source>

                        <italic toggle="yes">Enterprise Information Systems.</italic>
                    </source>
                    <year>2013</year>.
                    <publisher-loc>Berlin, Heidelberg</publisher-loc>:
                    <publisher-name>Springer Berlin Heidelberg</publisher-name>.</mixed-citation>
            </ref>
            <ref id="ref31">
                <label>31</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Beltagy</surname>
                            <given-names>I</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lo</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Cohan</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>SciBERT: A pretrained language model for scientific text.</article-title>
                    <italic toggle="yes">arXiv preprint arXiv:1903.10676.</italic>
                    <year>2019</year>.</mixed-citation>
            </ref>
            <ref id="ref32">
                <label>32</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Al-Hussaini</surname>
                            <given-names>I</given-names>
                        </name>

                        <name name-style="western">
                            <surname>An</surname>
                            <given-names>DN</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lee</surname>
                            <given-names>AJ</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>CCS Explorer: Relevance Prediction, Extractive Summarization, and Named Entity Recognition from Clinical Cohort Studies.</article-title>
                    <italic toggle="yes">2022 IEEE International Conference on Big Data (Big Data).</italic>2022, 17-20 Dec.<year>2022</year>.
                    <pub-id pub-id-type="pmcid">PMC7647812</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref33">
                <label>33</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Tsubota</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Bollegala</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhao</surname>
                            <given-names>Y</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Improvement of intervention information detection for automated clinical literature screening during systematic review.</article-title>
                    <source>

                        <italic toggle="yes">J Biomed Inform.</italic>
                    </source>
                    <year>2022</year>;<volume>134</volume>:104185.
                    <pub-id pub-id-type="pmid">36038066</pub-id>
                    <pub-id pub-id-type="doi">https://doi.org/https://doi.org/10.1016/j.jbi.2022.104185</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref34">
                <label>34</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Abaho</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Bollegala</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Williamson</surname>
                            <given-names>PR</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Assessment of contextualised representations in detecting outcome phrases in clinical trials.</article-title>
                    <italic toggle="yes">arXiv preprint arXiv: 2203.03547.</italic>
                    <year>2022</year>.</mixed-citation>
            </ref>
            <ref id="ref35">
                <label>35</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Campillos-Llanos</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Valverde-Mateos</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Capllonch-Carri&#x00f3;n</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine.</article-title>
                    <source>

                        <italic toggle="yes">BMC Med Inform Decis Mak.</italic>
                    </source>
                    <year>2021</year>;<volume>21</volume>(<issue>1</issue>):<fpage>69</fpage>.
                    <pub-id pub-id-type="pmid">33618727</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s12911-021-01395-z</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7898014</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref36">
                <label>36</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Mayer</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Marro</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Cabrio</surname>
                            <given-names>E</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Enhancing evidence-based medicine with natural language argumentative analysis of clinical trials.</article-title>
                    <source>

                        <italic toggle="yes">Artif Intell Med.</italic>
                    </source>
                    <year>2021</year>;<volume>118</volume>: 102098.
                    <pub-id pub-id-type="pmid">34412851</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.artmed.2021.102098</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref37">
                <label>37</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Dhrangadhariya</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>M&#x00fc;ller</surname>
                            <given-names>H</given-names>
                        </name>
</person-group>:
                    <article-title>Not so weak PICO: leveraging weak supervision for participants, interventions, and outcomes recognition for systematic review automation.</article-title>
                    <source>

                        <italic toggle="yes">JAMIA Open.</italic>
                    </source>
                    <year>2023</year>;<volume>6</volume>(<issue>1</issue>):<fpage>ooac107</fpage>.
                    <pub-id pub-id-type="pmid">36632329</pub-id>
                    <pub-id pub-id-type="doi">10.1093/jamiaopen/ooac107</pub-id>
                    <pub-id pub-id-type="pmcid">PMC9828146</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref38">
                <label>38</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kilicoglu</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Rosemblat</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hoang</surname>
                            <given-names>L</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Toward assessing clinical trial publications for reporting transparency.</article-title>
                    <source>

                        <italic toggle="yes">J Biomed Inform.</italic>
                    </source>
                    <year>2021</year>;<volume>116</volume>, 103717.
                    <pub-id pub-id-type="pmid">33647518</pub-id>
                    <pub-id pub-id-type="doi">https://doi.org/https://doi.org/10.1016/j.jbi.2021.103717</pub-id>
                    <pub-id pub-id-type="pmcid">PMC8112250</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref39">
                <label>39</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zhang</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Yu</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mei</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Unlocking the power of deep pico extraction: Step-wise medical ner identification.
</article-title>
                    <source>

                        <italic toggle="yes">arXiv preprint arXiv.</italic>
                    </source>
                    <italic toggle="yes">:</italic>
                    <year>2005</year>
                    <italic toggle="yes">.06601.</italic>
                    <year>2020</year>.</mixed-citation>
            </ref>
            <ref id="ref40">
                <label>40</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Chabou</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Iglewski</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>Combination of conditional random field with a rule based method in the extraction of PICO elements.</article-title>
                    <source>

                        <italic toggle="yes">BMC Med Inform Decis Mak.</italic>
                    </source>
                    <year>2018</year>;<volume>18</volume>:<fpage>14</fpage>.
                    <pub-id pub-id-type="pmid">30509272</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s12911-018-0699-2</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6278016</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref41">
                <label>41</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Lucic</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Blake</surname>
                            <given-names>CL</given-names>
                        </name>
</person-group>:
                    <article-title>Improving Endpoint Detection to Support Automated Systematic Reviews.</article-title>
                    <source>

                        <italic toggle="yes">AMIA Annu Symp Proc.
</italic>
                    </source>
                    <year>2016</year>;<volume>2016</volume>: p.<fpage>1900</fpage>&#x2013;<lpage>1909</lpage>.
                    <pub-id pub-id-type="pmid">28269949</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5333237</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref42">
                <label>42</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Baladron</surname>
                            <given-names>C</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Tool for filtering PubMed search results by sample size.</article-title>
                    <source>

                        <italic toggle="yes">J Am Med Inform Assoc.</italic>
                    </source>
                    <year>2018</year>;<volume>25</volume>(<issue>7</issue>):<fpage>774</fpage>&#x2013;<lpage>779</lpage>.
                    <pub-id pub-id-type="pmid">29409012</pub-id>
                    <pub-id pub-id-type="doi">10.1093/jamia/ocx155</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7647020</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref43">
                <label>43</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Brassey</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Price</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Edwards</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Developing a fully automated evidence synthesis tool for identifying, assessing and collating the evidence.</article-title>
                    <source>

                        <italic toggle="yes">BMJ Evid Based Med.</italic>
                    </source>
                    <year>2021</year>;<volume>26</volume>(<issue>1</issue>):<fpage>24</fpage>&#x2013;<lpage>27</lpage>.
                    <pub-id pub-id-type="pmid">31467247</pub-id>
                    <pub-id pub-id-type="doi">10.1136/bmjebm-2018-111126</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref44">
                <label>44</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wallace</surname>
                            <given-names>BC</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Extracting PICO Sentences from Clinical Trial Reports using Supervised Distant Supervision.</article-title>
                    <source>

                        <italic toggle="yes">J Mach Learn Res.</italic>
                    </source>
                    <year>2016</year>;<volume>17</volume>.
                    <pub-id pub-id-type="pmid">27746703</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5065023</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref45">
                <label>45</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Singh</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sabet</surname>
                            <given-names>Z</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Shawe-Taylor</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <chapter-title>Constructing Artificial Data for Fine-Tuning for Low-Resource Biomedical Text Tagging with Applications in PICO Annotation</chapter-title>. In
                    <person-group person-group-type="editor">

                        <name name-style="western">
                            <surname>Shaban-Nejad</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Michalowski</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Buckeridge</surname>
                            <given-names>DL</given-names>
                        </name>
</person-group>, editors.
                    <source>

                        <italic toggle="yes">Explainable AI in Healthcare and Medicine: Building a Culture of Transparency and Accountability.</italic>
                    </source>
                    <publisher-name>Springer International Publishing</publisher-name>; pp.<fpage>131</fpage>&#x2013;<lpage>145</lpage>.<year>2021</year>.
                    <pub-id pub-id-type="doi">10.1007/978-3-030-53352-6_12</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref46">
                <label>46</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kiritchenko</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>ExaCT: automatic extraction of clinical trial characteristics from journal publications.</article-title>
                    <source>

                        <italic toggle="yes">BMC Med Inform Decis Mak.</italic>
                    </source>
                    <year>2010</year>;<volume>10</volume>:<fpage>17</fpage>. BMC Med Inform Decis Mak
.</mixed-citation>
            </ref>
            <ref id="ref47">
                <label>47</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Fiszman</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Interpreting comparative constructions in biomedical text.</article-title>
                    <year>2007</year>:<fpage>137</fpage>&#x2013;<lpage>144</lpage>.</mixed-citation>
            </ref>
            <ref id="ref48">
                <label>48</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Karystianis</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Buchan</surname>
                            <given-names>I</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Nenadic</surname>
                            <given-names>G</given-names>
                        </name>
</person-group>:
                    <article-title>Mining characteristics of epidemiological studies from Medline: a case study in obesity.</article-title>
                    <source>

                        <italic toggle="yes">J Biomed Semantics.</italic>
                    </source>
                    <year>2014</year>;<volume>5</volume>:<fpage>11</fpage>.
                    <pub-id pub-id-type="pmid">24949194</pub-id>
                    <pub-id pub-id-type="doi">10.1186/2041-1480-5-22</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4062908</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref49">
                <label>49</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Karystianis</surname>
                            <given-names>G</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Evaluation of a rule-based method for epidemiological document classification towards the automation of systematic reviews.</article-title>
                    <source>

                        <italic toggle="yes">J Biomed Inform.</italic>
                    </source>
                    <year>2017</year>;<volume>70</volume>:<fpage>27</fpage>&#x2013;<lpage>34</lpage>.
                    <pub-id pub-id-type="pmid">28455150</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.jbi.2017.04.004</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref50">
                <label>50</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Whitton</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hunter</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>Automated tabulation of clinical trial results: A joint entity and relation extraction approach with transformer-based language representations.</article-title>
                    <source>

                        <italic toggle="yes">arXiv preprint arXiv.</italic>
                    </source>
                    <italic toggle="yes">:</italic>
                    <year>2112</year>
                    <italic toggle="yes">.05596.</italic>
                    <year>2021</year>.</mixed-citation>
            </ref>
            <ref id="ref51">
                <label>51</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Sanchez-Graillet</surname>
                            <given-names>O</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Witte</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Grimm</surname>
                            <given-names>F</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>An annotated corpus of clinical trial publications supporting schema-based relational information extraction.</article-title>
                    <source>

                        <italic toggle="yes">J. Biomed. Semantics.</italic>
                    </source>
                    <year>2022</year>;<volume>13</volume>(<issue>1</issue>):<fpage>14</fpage>.
                    <pub-id pub-id-type="pmid">35606797</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13326-022-00271-7</pub-id>
                    <pub-id pub-id-type="pmcid">PMC9128209</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref52">
                <label>52</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kim</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Automatic classification of sentences to support Evidence Based Medicine.</article-title>
                    <source>

                        <italic toggle="yes">BMC Bioinform.</italic>
                    </source>
                    <year>2011</year>;<volume>12</volume>(<issue>S-2</issue>):<fpage>S5</fpage>.
                    <pub-id pub-id-type="pmid">21489224</pub-id>
                    <pub-id pub-id-type="doi">10.1186/1471-2105-12-S2-S5</pub-id>
                    <pub-id pub-id-type="pmcid">PMC3073185</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref53">
                <label>53</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Verbeke</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A Statistical Relational Learning Approach to Identifying Evidence Based Medicine Categories.</article-title>
                    <year>2012</year>. p.<fpage>579</fpage>&#x2013;<lpage>589</lpage>.</mixed-citation>
            </ref>
            <ref id="ref54">
                <label>54</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Jin</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Szolovits</surname>
                            <given-names>P</given-names>
                        </name>
</person-group>:
                    <article-title>Advancing PICO element detection in biomedical text via deep neural networks.</article-title>
                    <source>

                        <italic toggle="yes">Bioinform.</italic>
                    </source>
                    <year>2020</year>;<volume>36</volume>(<issue>12</issue>):<fpage>3856</fpage>&#x2013;<lpage>3862</lpage>.
                    <pub-id pub-id-type="pmid">32311009</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btaa256</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref55">
                <label>55</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Nye</surname>
                            <given-names>B</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature.</article-title>
                    <source>

                        <italic toggle="yes">Proc Conf Assoc Comput Linguist Meet.</italic>
                    </source>
                    <year>2018</year>;<volume>2018</volume>:<fpage>197</fpage>&#x2013;<lpage>207</lpage>.
                    <pub-id pub-id-type="pmid">30305770</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6174533</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref56">
                <label>56</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>de Bruijn</surname>
                            <given-names>B</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Automated information extraction of key trial design elements from clinical trial publications.</article-title>
                    <source>

                        <italic toggle="yes">AMIA Annu Symp Proc.</italic>
                    </source>
                    <year>2008</year>; p.<fpage>141</fpage>&#x2013;<lpage>5</lpage>.
                    <pub-id pub-id-type="pmid">18999067</pub-id>
                    <pub-id pub-id-type="pmcid">PMC2655966</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref57">
                <label>57</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Boudin</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Shi</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Nie</surname>
                            <given-names>J-Y</given-names>
                        </name>
</person-group>:
                    <article-title>Improving Medical Information Retrieval with PICO Element Detection.</article-title>
                    <year>2010</year>. p.<fpage>50</fpage>&#x2013;<lpage>61</lpage>.
                    <pub-id pub-id-type="doi">10.1007/978-3-642-12275-0_8</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref58">
                <label>58</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Demner-Fushman</surname>
                            <given-names>D</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Research Paper: Automatically Identifying Health Outcome Information in MEDLINE Records.</article-title>
                    <source>

                        <italic toggle="yes">J. Am. Medical Informatics Assoc.</italic>
                    </source>
                    <year>2006</year>;<volume>13</volume>(<issue>1</issue>):<fpage>52</fpage>&#x2013;<lpage>60</lpage>.
                    <pub-id pub-id-type="pmid">16221937</pub-id>
                    <pub-id pub-id-type="doi">10.1197/jamia.M1911</pub-id>
                    <pub-id pub-id-type="pmcid">PMC1380197</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref59">
                <label>59</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Singh</surname>
                            <given-names>G</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A Neural Candidate-Selector Architecture for Automatic Structured Clinical Text Annotation.</article-title>
                    <source>

                        <italic toggle="yes">Proc ACM Int Conf Inf Knowl Manag.</italic>
                    </source>
                    <year>2017</year>;<volume>2017</volume>:<fpage>1519</fpage>&#x2013;<lpage>1528</lpage>.
                    <pub-id pub-id-type="pmid">29308293</pub-id>
                    <pub-id pub-id-type="doi">10.1145/3132847.3132989</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5752318</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref60">
                <label>60</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Afzal</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Alam</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Malik</surname>
                            <given-names>KM</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Clinical Context&#x2013;Aware Biomedical Text Summarization Using Deep Neural Network: Model Development and Validation.</article-title>
                    <source>

                        <italic toggle="yes">J Med Internet Res.</italic>
                    </source>
                    <year>2020</year>;<volume>22</volume>(<issue>10</issue>): e19810.
                    <pub-id pub-id-type="pmid">33095174</pub-id>
                    <pub-id pub-id-type="doi">10.2196/19810</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7647812</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref61">
                <label>61</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>DeYoung</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Beltagy</surname>
                            <given-names>I</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zuylen</surname>
                            <given-names>M</given-names>
                            <prefix>van</prefix>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Ms2: Multi-document summarization of medical studies.</article-title>
                    <italic toggle="yes">arXiv preprint arXiv:2104.06486.</italic>
                    <year>2021</year>.</mixed-citation>
            </ref>
            <ref id="ref62">
                <label>62</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>DeYoung</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lehman</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Nye</surname>
                            <given-names>B</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Evidence inference 2.0: More data, better models.</article-title>
                    <italic toggle="yes">arXiv preprint arXiv:2005.04177.</italic>
                    <year>2020</year>.</mixed-citation>
            </ref>
            <ref id="ref63">
                <label>63</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Nye</surname>
                            <given-names>BE</given-names>
                        </name>

                        <name name-style="western">
                            <surname>DeYoung</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lehman</surname>
                            <given-names>E</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Understanding Clinical Trial Reports: Extracting Medical Entities and Their Relations.</article-title>
                    <source>

                        <italic toggle="yes">AMIA Jt Summits Transl Sci Proc.</italic>
                    </source>
                    <year>2021</year>;<volume>2021</volume>:<fpage>485</fpage>&#x2013;<lpage>494</lpage>.
                    <pub-id pub-id-type="pmid">34457164</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref64">
                <label>64</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Amini</surname>
                            <given-names>I</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mart&#x00ed;nez</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Aliod</surname>
                            <given-names>DM</given-names>
                        </name>
</person-group>:
                    <article-title>Overview of the ALTA.</article-title>
                    <source>

                        <italic toggle="yes">Shared Task.</italic>
                    </source>
                    <year>2012</year>;<volume>2012</volume>:<fpage>124</fpage>&#x2013;<lpage>129</lpage>.</mixed-citation>
            </ref>
            <ref id="ref65">
                <label>65</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Guo</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Blake</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Guan</surname>
                            <given-names>Y</given-names>
                        </name>
</person-group>:
                    <article-title>Evaluating automated entity extraction with respect to drug and non-drug treatment strategies.</article-title>
                    <source>

                        <italic toggle="yes">J Biomed Inform.</italic>
                    </source>
                    <year>2019</year>;<volume>94</volume>:<fpage>103177</fpage>.
                    <pub-id pub-id-type="pmid">30986506</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.jbi.2019.103177</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref66">
                <label>66</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Suwarningsih</surname>
                            <given-names>W</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Purwarianti</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Supriana</surname>
                            <given-names>I</given-names>
                        </name>
</person-group>:
                    <chapter-title>Indonesian medical question classification with pattern matching</chapter-title>. in:
                    <source>

                        <italic toggle="yes">2015 International Conference on Automation, Cognitive Science, Optics, Micro Electro-Mechanical System, and Information Technology (ICACOMIT).</italic>
                    </source>
                    <year>2015</year>.</mixed-citation>
            </ref>
            <ref id="ref67">
                <label>67</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Abaho</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Bollegala</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Williamson</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <chapter-title>Detect and Classify &#x2013; Joint Span Detection and Classification for Health Outcomes</chapter-title>.
                    <source>

                        <italic toggle="yes">Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.</italic> Online and Punta Cana</source>,
                    <publisher-loc>Dominican Republic</publisher-loc>;<year>2021, November</year>.</mixed-citation>
            </ref>
            <ref id="ref68">
                <label>68</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Basu</surname>
                            <given-names>T</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A Novel Framework to Expedite Systematic Reviews by Automatically Building Information Extraction Training Corpora.</article-title>
                    <source>

                        <italic toggle="yes">CoRR.</italic>
                    </source>
                    <year>2016</year>.<fpage>abs/1606.06424</fpage>.</mixed-citation>
            </ref>
            <ref id="ref69">
                <label>69</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Marshall</surname>
                            <given-names>IJ</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Trialstreamer: A living, automatically updated database of clinical trial reports.</article-title>
                    <source>

                        <italic toggle="yes">J Am Med Inform Assoc.</italic>
                    </source>
                    <year>2020</year>;<volume>27</volume>(<issue>12</issue>):<fpage>1903</fpage>&#x2013;<lpage>1912</lpage>.
                    <pub-id pub-id-type="pmid">32940710</pub-id>
                    <pub-id pub-id-type="doi">10.1093/jamia/ocaa163</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7727361</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref70">
                <label>70</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Barnett</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>Automated detection of over- and under-dispersion in baseline tables in randomised controlled trials.</article-title>
                    <source>

                        <italic toggle="yes">F1000Research.</italic>
                    </source>
                    <year>2022</year>;<volume>11</volume>(<issue>783</issue>).
                    <pub-id pub-id-type="doi">10.12688/f1000research.123002.1</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref71">
                <label>71</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Raja</surname>
                            <given-names>K</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A Hybrid Citation Retrieval Algorithm for Evidence-based Clinical Knowledge Summarization: Combining Concept Extraction, Vector Similarity and Query Expansion for High Precision.</article-title>
                    <source>

                        <italic toggle="yes">CoRR.</italic>
                    </source>
                    <year>2016</year>.<fpage>abs/1609.01597</fpage>.</mixed-citation>
            </ref>
            <ref id="ref72">
                <label>72</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Xu</surname>
                            <given-names>H</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Mining Biomedical Literature for Terms related to Epidemiologic Exposures.</article-title>
                    <source>

                        <italic toggle="yes">AMIA Annu Symp Proc.</italic>
                    </source>
                    <year>2010</year>;<volume>2010</volume>:<fpage>897</fpage>&#x2013;<lpage>901</lpage>.
                    <pub-id pub-id-type="pmid">21347108</pub-id>
                    <pub-id pub-id-type="pmcid">PMC3041399</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref73">
                <label>73</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Saiz</surname>
                            <given-names>FS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sanders</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Stevens</surname>
                            <given-names>R</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Artificial Intelligence Clinical Evidence Engine for Automatic Identification, Prioritization, and Extraction of Relevant Clinical Oncology Research.</article-title>
                    <source>

                        <italic toggle="yes">JCO Clin Cancer Inform.</italic>
                    </source>
                    <year>2021</year>;<volume>5</volume>:<fpage>102</fpage>&#x2013;<lpage>111</lpage>.
                    <pub-id pub-id-type="pmid">33439724</pub-id>
                    <pub-id pub-id-type="doi">10.1200/cci.20.00087</pub-id>
                    <pub-id pub-id-type="pmcid">PMC8140792</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref74">
                <label>74</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Stylianou</surname>
                            <given-names>N</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Razis</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Goulis</surname>
                            <given-names>DG</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>EBM+: Advancing Evidence-Based Medicine via two level automatic identification of Populations, Interventions, Outcomes in medical literature.</article-title>
                    <source>

                        <italic toggle="yes">Artif Intell Med.</italic>
                    </source>
                    <year>2020</year>;<volume>108</volume>, 101949.
                    <pub-id pub-id-type="pmid">32972669</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.artmed.2020.101949</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref75">
                <label>75</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Norman</surname>
                            <given-names>CR</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Leeflang</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Spijker</surname>
                            <given-names>R</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A distantly supervised dataset for automated data extraction from diagnostic studies.</article-title>
                    <source>

                        <italic toggle="yes">Proceedings of the 18th BioNLP Workshop and Shared Task.</italic>
                    </source>Florence, Italy.<year>2019</year>. pp.<fpage>105</fpage>&#x2013;<lpage>114</lpage>.
                    <pub-id pub-id-type="doi">10.18653/v1/W19-5012</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref76">
                <label>76</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Demner-Fushman</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lin</surname>
                            <given-names>J</given-names>
                        </name>
</person-group>:
                    <source>

                        <italic toggle="yes">Knowledge Extraction for Clinical Question Answering: Preliminary Results.</italic>
                    </source>
                    <year>2005</year>.</mixed-citation>
            </ref>
            <ref id="ref77">
                <label>77</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Lin</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Extracting Formulaic and Free Text Clinical Research Articles Metadata using Conditional Random Fields.</article-title>
                    <year>2010</year>. p.<fpage>90</fpage>&#x2013;<lpage>95</lpage>.</mixed-citation>
            </ref>
            <ref id="ref78">
                <label>78</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Xu</surname>
                            <given-names>R</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Extracting Subject Demographic Information From Abstracts of Randomized Clinical Trial Reports.</article-title>
                    <year>2007</year>. p.<fpage>550</fpage>&#x2013;<lpage>554</lpage>.
                    <pub-id pub-id-type="pmid">17911777</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref79">
                <label>79</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zhao</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Bysani</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kan</surname>
                            <given-names>M-Y</given-names>
                        </name>
</person-group>:
                    <article-title>Exploiting Classification Correlations for the Extraction of Evidence-based Practice Information.</article-title>
                    <year>2012</year>.
                    <pub-id pub-id-type="pmid">23304383</pub-id>
                    <pub-id pub-id-type="pmcid">PMC3540431</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref80">
                <label>80</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Raja</surname>
                            <given-names>K</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Towards Evidence-based Precision Medicine: Extracting Population Information from Biomedical Text using Binary Classifiers and Syntactic Patterns.</article-title>
                    <source>

                        <italic toggle="yes">AMIA Jt Summits Transl Sci Proc.</italic>
                    </source>
                    <year>2016</year>;<volume>2016</volume>:<fpage>203</fpage>&#x2013;<lpage>212</lpage>.
                    <pub-id pub-id-type="pmid">27570671</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5001749</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref81">
                <label>81</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Marshall</surname>
                            <given-names>IJ</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <chapter-title>Automating Biomedical Evidence Synthesis: RobotReviewer</chapter-title>. In:
                    <source>

                        <italic toggle="yes">Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.</italic>
                    </source>ed.
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bansal</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ji</surname>
                            <given-names>H</given-names>
                        </name>
</person-group>.<year>2017</year>
                    <publisher-loc>Stroudsburg</publisher-loc>:
                    <publisher-name>Assoc Computational Linguistics-Acl</publisher-name>.<fpage>7</fpage>&#x2013;<lpage>12</lpage>.</mixed-citation>
            </ref>
            <ref id="ref82">
                <label>82</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>Q</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Liao</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lapata</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>PICO entity extraction for preclinical animal literature.</article-title>
                    <source>

                        <italic toggle="yes">Syst Rev.</italic>
                    </source>
                    <year>2022</year>;<volume>11</volume>(<issue>1</issue>):<fpage>209</fpage>.
                    <pub-id pub-id-type="pmid">36180888</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13643-022-02074-4</pub-id>
                    <pub-id pub-id-type="pmcid">PMC9524079</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref83">
                <label>83</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Summerscales</surname>
                            <given-names>RL</given-names>
                        </name>

                        <name name-style="western">
                            <surname>A</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hupert</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Identifying treatments, groups, and outcomes in medical abstracts.</article-title>
                    <year>2009</year>.</mixed-citation>
            </ref>
            <ref id="ref84">
                <label>84</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Summerscales</surname>
                            <given-names>RL</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <chapter-title>Automatic Summarization of Results from Clinical Trials</chapter-title>. in:
                    <source>

                        <italic toggle="yes">2011 IEEE International Conference on Bioinformatics and Biomedicine.</italic>
                    </source>
                    <year>2011</year>.</mixed-citation>
            </ref>
            <ref id="ref85">
                <label>85</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kang</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zou</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Weng</surname>
                            <given-names>C</given-names>
                        </name>
</person-group>:
                    <article-title>Pretraining to Recognize PICO Elements from Randomized Controlled Trial Literature.</article-title>
                    <source>

                        <italic toggle="yes">Stud Health Technol Inform.</italic>
                    </source>
                    <year>2019</year>;<volume>264</volume>:<fpage>188</fpage>&#x2013;<lpage>192</lpage>.
                    <pub-id pub-id-type="pmid">31437911</pub-id>
                    <pub-id pub-id-type="doi">10.3233/SHTI190209</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6852618</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref86">
                <label>86</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bui</surname>
                            <given-names>DDA</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Extractive text summarization system to aid data extraction from full text in systematic review development.</article-title>
                    <source>

                        <italic toggle="yes">J Biomed Inform.</italic>
                    </source>
                    <year>2016</year>;<volume>64</volume>:<fpage>265</fpage>&#x2013;<lpage>272</lpage>.
                    <pub-id pub-id-type="pmid">27989816</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.jbi.2016.10.014</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5362293</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref87">
                <label>87</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Xia</surname>
                            <given-names>Y</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Extracting PICO elements from RCT abstracts using 1-2gram analysis and multitask classification.</article-title>
                    <source>

                        <italic toggle="yes">CoRR.</italic>
                    </source>
                    <year>2019</year>.<fpage>abs/901.08351</fpage>.
                    <pub-id pub-id-type="doi">10.1145/3340037.3340043</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref88">
                <label>88</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Valdez</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Rueschman</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kim</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <chapter-title>An Ontology-Enabled Natural Language Processing Pipeline for Provenance Metadata Extraction from Biomedical Text</chapter-title>. in:
                    <source>

                        <italic toggle="yes">On the Move to Meaningful Internet Systems: Otm.</italic>
                    </source>
                    <year>2016</year>Conferences,
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Debruyne</surname>
                            <given-names>C</given-names>
                        </name>

                        <etal/>
</person-group>, Editors.<year>2016</year>;
                    <publisher-name>Springer Int Publishing Ag: Cham</publisher-name>. pp.<fpage>699</fpage>&#x2013;<lpage>708</lpage>.</mixed-citation>
            </ref>
            <ref id="ref89">
                <label>89</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Chung</surname>
                            <given-names>GY</given-names>
                        </name>
</person-group>:
                    <article-title>Sentence retrieval for abstracts of randomized controlled trials.</article-title>
                    <source>

                        <italic toggle="yes">BMC Med Inform Decis Mak.</italic>
                    </source>
                    <year>2009</year>;<volume>9</volume>:<fpage>13</fpage>.
                    <pub-id pub-id-type="pmid">19208256</pub-id>
                    <pub-id pub-id-type="doi">10.1186/1472-6947-9-10</pub-id>
                    <pub-id pub-id-type="pmcid">PMC2657779</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref90">
                <label>90</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Chung</surname>
                            <given-names>GYC</given-names>
                        </name>
</person-group>:
                    <article-title>Towards identifying intervention arms in randomized controlled trials: Extracting coordinating constructions.</article-title>
                    <source>

                        <italic toggle="yes">J Biomed Inform.</italic>
                    </source>
                    <year>2009</year>;<volume>42</volume>(<issue>5</issue>):<fpage>790</fpage>&#x2013;<lpage>800</lpage>.
                    <pub-id pub-id-type="pmid">19166975</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.jbi.2008.12.011</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref91">
                <label>91</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Chung</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Coiera</surname>
                            <given-names>EW</given-names>
                        </name>
</person-group>:
                    <article-title>A Study of Structured Clinical Abstracts and the Semantic Classification of Sentences.</article-title>
                    <year>2007</year>. p.<fpage>121</fpage>&#x2013;<lpage>128</lpage>.</mixed-citation>
            </ref>
            <ref id="ref92">
                <label>92</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Huang</surname>
                            <given-names>K</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Classification of PICO elements by text features systematically extracted from PubMed abstracts.</article-title>
                    <source>

                        <italic toggle="yes">2011 IEEE International Conference on Granular Computing.</italic>
                    </source>
                    <year>2011</year>.</mixed-citation>
            </ref>
            <ref id="ref93">
                <label>93</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hara</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Matsumoto</surname>
                            <given-names>Y</given-names>
                        </name>
</person-group>:
                    <article-title>Extracting Clinical Trial Design Information from MEDLINE Abstracts.</article-title>
                    <source>

                        <italic toggle="yes">New Gener. Comput.</italic>
                    </source>
                    <year>2007</year>;<volume>25</volume>(<issue>3</issue>):<fpage>263</fpage>&#x2013;<lpage>275</lpage>.
                    <pub-id pub-id-type="doi">10.1007/s00354-007-0017-5</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref94">
                <label>94</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zhu</surname>
                            <given-names>H</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Automatic extracting of patient-related attributes: disease, age, gender and race.</article-title>
                    <source>

                        <italic toggle="yes">Stud Health Technol Inform.</italic>
                    </source>
                    <year>2012</year>;<volume>180</volume>:<fpage>589</fpage>&#x2013;<lpage>593</lpage>.
                    <pub-id pub-id-type="pmid">22874259</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref95">
                <label>95</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Schmidt</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Weeds</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Higgins</surname>
                            <given-names>JPT</given-names>
                        </name>
</person-group>:
                    <article-title>Data Mining in Clinical Trial Text: Transformers for Classification and Question Answering Tasks.</article-title>
                    <year>2020</year>. p.<fpage>83</fpage>&#x2013;<lpage>94</lpage>.</mixed-citation>
            </ref>
            <ref id="ref96">
                <label>96</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Jin</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Szolovits</surname>
                            <given-names>P</given-names>
                        </name>
</person-group>:
                    <article-title>PICO Element Detection in Medical Text via Long Short-Term Memory Neural Networks.</article-title>
                    <source>

                        <italic toggle="yes">Proceedings of the BioNLP 2018 workshop.</italic>
                    </source>Melbourne, Australia.<year>2018</year>. p.<fpage>67</fpage>&#x2013;<lpage>75</lpage>.
                    <pub-id pub-id-type="doi">10.18653/v1/W18-2308</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref97">
                <label>97</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Demner-Fushman</surname>
                            <given-names>D</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Finding medication doses in the liteature.</article-title>
                    <source>

                        <italic toggle="yes">AMIA Annu Symp Proc.</italic>
                    </source>
                    <year>2018</year>;<volume>2018</volume>: p.<fpage>368</fpage>&#x2013;<lpage>376</lpage>.
                    <pub-id pub-id-type="pmid">30815076</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6371291</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref98">
                <label>98</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zhang</surname>
                            <given-names>X</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Geng</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhang</surname>
                            <given-names>T</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Aceso: PICO-Guided Evidence Summarization on Medical Literature.</article-title>
                    <source>

                        <italic toggle="yes">IEEE J Biomed Health Inform.</italic>
                    </source>
                    <year>2020</year>;<volume>24</volume>(<issue>9</issue>):<fpage>2663</fpage>&#x2013;<lpage>2670</lpage>.
                    <pub-id pub-id-type="pmid">32275627</pub-id>
                    <pub-id pub-id-type="doi">10.1109/JBHI.2020.2984704</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref99">
                <label>99</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kang</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Turfah</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kim</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A neuro-symbolic method for understanding free-text medical evidence.</article-title>
                    <source>

                        <italic toggle="yes">J Am Med Inform Assoc.</italic>
                    </source>
                    <year>2021</year>;<volume>28</volume>(<issue>8</issue>):<fpage>1703</fpage>&#x2013;<lpage>1711</lpage>.
                    <pub-id pub-id-type="pmid">33956981</pub-id>
                    <pub-id pub-id-type="doi">10.1093/jamia/ocab077</pub-id>
                    <pub-id pub-id-type="pmcid">PMC8135980</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref100">
                <label>100</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Liu</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sun</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Li</surname>
                            <given-names>B</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Sent2Span: span detection for PICO extraction in the biomedical text without span annotations.</article-title>
                    <italic toggle="yes">arXiv preprint arXiv:2109.02254.</italic>
                    <year>2021</year>.
                    <pub-id pub-id-type="pmid">21489224</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref101">
                <label>101</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Nye</surname>
                            <given-names>BE</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Trialstreamer: Mapping and Browsing Medical Evidence in Real-Time.</article-title>
                    <source>

                        <italic toggle="yes">CoRR.</italic>
                    </source>
                    <year>2020</year>.<fpage>abs/2005.10865</fpage>.</mixed-citation>
            </ref>
            <ref id="ref102">
                <label>102</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Blake</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lucic</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>Automatic endpoint detection to support the systematic review process.</article-title>
                    <source>

                        <italic toggle="yes">J Biomed Inform.</italic>
                    </source>
                    <year>2015</year>;<volume>56</volume>:<fpage>42</fpage>&#x2013;<lpage>56</lpage>.
                    <pub-id pub-id-type="pmid">26003938</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.jbi.2015.05.004</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref103">
                <label>103</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Huang</surname>
                            <given-names>KC</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>PICO element detection in medical text without metadata: are first sentences enough?</article-title>
                    <source>

                        <italic toggle="yes">J Biomed Inform.</italic>
                    </source>
                    <year>2013</year>;<volume>46</volume>(<issue>5</issue>):<fpage>940</fpage>&#x2013;<lpage>946</lpage>.
                    <pub-id pub-id-type="pmid">23899909</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.jbi.2013.07.009</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref104">
                <label>104</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hassanzadeh</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Groza</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hunter</surname>
                            <given-names>J</given-names>
                        </name>
</person-group>:
                    <article-title>Identifying scientific artefacts in biomedical literature: The Evidence Based Medicine use case.</article-title>
                    <source>

                        <italic toggle="yes">J Biomed Inform.</italic>
                    </source>
                    <year>2014</year>;<volume>49</volume>:<fpage>159</fpage>&#x2013;<lpage>170</lpage>.
                    <pub-id pub-id-type="pmid">24530879</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.jbi.2014.02.006</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref105">
                <label>105</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Burnham</surname>
                            <given-names>KP</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Anderson</surname>
                            <given-names>DR</given-names>
                        </name>
</person-group>:
                    <article-title>Model Selection and Multimodel Inference (2nd ed.).</article-title>
                    <year>2002</year>;
                    <publisher-name>Springer-Verlag</publisher-name>.</mixed-citation>
            </ref>
            <ref id="ref106">
                <label>106</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Brockmeier</surname>
                            <given-names>AJ</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Improving reference prioritisation with PICO recognition.</article-title>
                    <source>

                        <italic toggle="yes">BMC Med Inform Decis Mak.</italic>
                    </source>
                    <year>2019</year>;<volume>19</volume>(<issue>1</issue>):<fpage>14</fpage>.
                    <pub-id pub-id-type="pmid">31805934</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s12911-019-0992-8</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6896258</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref107">
                <label>107</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Gella</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Long</surname>
                            <given-names>DT</given-names>
                        </name>
</person-group>:
                    <article-title>Automatic sentence classifier using sentence ordering features for Event Based Medicine: Shared task system description.</article-title>
                    <year>2012</year>. p.<fpage>130</fpage>&#x2013;<lpage>133</lpage>.</mixed-citation>
            </ref>
            <ref id="ref108">
                <label>108</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Lui</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>Feature Stacking for Sentence Classification in Evidence-Based Medicine.</article-title>
                    <year>2012</year>:<fpage>134</fpage>&#x2013;<lpage>138</lpage>.</mixed-citation>
            </ref>
            <ref id="ref109">
                <label>109</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Moll&#x00e1;</surname>
                            <given-names>D</given-names>
                        </name>
</person-group>:
                    <article-title>Experiments with Clustering-based Features for Sentence Classification in Medical Publications: Macquarie Test's participation in the ALTA 2012 shared task.</article-title>
                    <year>2012</year>:<fpage>139</fpage>&#x2013;<lpage>142</lpage>.</mixed-citation>
            </ref>
            <ref id="ref110">
                <label>110</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Sarker</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <source>

                        <italic toggle="yes">An Approach for automatic multi-label classification of medical sentences.</italic>
                    </source>
                    <publisher-loc>NICTA</publisher-loc>:
                    <publisher-name>Eveleigh NSW</publisher-name>;<year>2013</year>.</mixed-citation>
            </ref>
            <ref id="ref111">
                <label>111</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Lehman</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>DeYoung</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Barzilay</surname>
                            <given-names>R</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Inferring which medical treatments work from reports of clinical trials.</article-title>
                    <source>

                        <italic toggle="yes">arXiv preprint arXiv.</italic>
                    </source>
                    <italic toggle="yes">:1904.01606.</italic>
                    <year>2019</year>.</mixed-citation>
            </ref>
            <ref id="ref112">
                <label>112</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Trenta</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hunter</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Riedel</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>Extraction of evidence tables from abstracts of randomized clinical trials using a maximum entropy classifier and global constraints.</article-title>
                    <source>

                        <italic toggle="yes">CoRR</italic>, 
                        <italic toggle="yes">abs.</italic>
                    </source>
                    <italic toggle="yes">/1509.05209.</italic>
                    <year>2015</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/1509.05209">http://arxiv.org/abs/1509.05209</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref113">
                <label>113</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hansen</surname>
                            <given-names>MJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Rasmussen</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Fau - Chung</surname>
                            <given-names>N&#x00d8;</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A method of extracting the number of trial participants from abstracts describing randomized controlled trials.</article-title>
                    <source>

                        <italic toggle="yes">(1758-1109 (Electronic)).</italic>
                    </source>
                </mixed-citation>
            </ref>
            <ref id="ref114">
                <label>114</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Boudin</surname>
                            <given-names>F</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Combining classifiers for robust PICO element detection.</article-title>
                    <source>

                        <italic toggle="yes">BMC Med Inform Decis Mak.</italic>
                    </source>
                    <year>2010</year>;<volume>10</volume>:<fpage>29</fpage>.
                    <pub-id pub-id-type="pmid">20470429</pub-id>
                    <pub-id pub-id-type="doi">10.1186/1472-6947-10-29</pub-id>
                    <pub-id pub-id-type="pmcid">PMC2891622</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref115">
                <label>115</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Chabou</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Iglewski</surname>
                            <given-names>M</given-names>
                        </name>

                        <collab>Ieee</collab>
</person-group>:
                    <article-title>PICO Extraction by combining the robustness of machine-learning methods with the rule-based methods.</article-title>
                    <source>

                        <italic toggle="yes">2015 World Congress on Information Technology and Computer Applications.</italic>
                    </source>
                    <year>2015</year>.
                    <publisher-loc>New York</publisher-loc>:
                    <publisher-name>Ieee</publisher-name>.</mixed-citation>
            </ref>
            <ref id="ref116">
                <label>116</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Dawes</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The identification of clinically important elements within medical journal abstracts: Patient-Population-Problem, Exposure-Intervention, Comparison, Outcome, Duration and Results (PECODR).</article-title>
                    <source>

                        <italic toggle="yes">Inform Prim Care.</italic>
                    </source>
                    <year>2007</year>;<volume>15</volume>(<issue>1</issue>):<fpage>9</fpage>&#x2013;<lpage>16</lpage>.
                    <pub-id pub-id-type="pmid">17612476</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref117">
                <label>117</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Riley</surname>
                            <given-names>P</given-names>
                        </name>
</person-group>:
                    <article-title>Three pitfalls to avoid in machine learning.</article-title>
                    <source>

                        <italic toggle="yes">Nature.</italic>
                    </source>
                    <year>2019</year>;<volume>572</volume>(<issue>7767</issue>).
                    <pub-id pub-id-type="pmid">31363197</pub-id>
                    <pub-id pub-id-type="doi">10.1038/d41586-019-02307-y</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref118">
                <label>118</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Amir</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Meent</surname>
                            <given-names>J-W</given-names>
                            <prefix>van de</prefix>
                        </name>

                        <name name-style="western">
                            <surname>Wallace</surname>
                            <given-names>BC</given-names>
                        </name>
</person-group>:
                    <article-title>On the impact of random seeds on the fairness of clinical classifiers.</article-title>
                    <italic toggle="yes">arXiv preprint arXiv:2104.06338.</italic>
                    <year>2021</year>.</mixed-citation>
            </ref>
            <ref id="ref119">
                <label>119</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Mehrabi</surname>
                            <given-names>N</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A survey on bias and fairness in machine learning.</article-title>
                    <source>

                        <italic toggle="yes">arXiv.</italic>
                    </source>
                    <year>2019</year>.</mixed-citation>
            </ref>
            <ref id="ref120">
                <label>120</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Brown</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mann</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ryder</surname>
                            <given-names>N</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <italic toggle="yes">Language Models are Few-Shot Learners.</italic>
                    <year>2020</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf">https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref121">
                <label>121</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Liu</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ott</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Goyal</surname>
                            <given-names>N</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Roberta: A robustly optimized bert pretraining approach.</article-title>
                    <italic toggle="yes">arXiv preprint arXiv:1907.11692.</italic>
                    <year>2019</year>.</mixed-citation>
            </ref>
            <ref id="ref122">
                <label>122</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Yang</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Jin</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Tang</surname>
                            <given-names>R</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond.
</article-title>
                    <source>

                        <italic toggle="yes">arXiv preprint arXiv.</italic>
                    </source>
                    <italic toggle="yes">:2304.13712.</italic>
                    <year>2023</year>.
                    <pub-id pub-id-type="pmid">37396605</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref123">
                <label>123</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>OpenAI.</surname>
                        </name>
</person-group>:
                    <article-title>GPT-4 Technical Report.</article-title>
                    <source>

                        <italic toggle="yes">ArXiv.</italic>
                    </source>
                    <year>2023</year>; abs/2303.08774.</mixed-citation>
            </ref>
            <ref id="ref124">
                <label>124</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Shaib</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Li</surname>
                            <given-names>ML</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Joseph</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Summarizing, Simplifying, and Synthesizing Medical Evidence Using GPT-3 (with Varying Success).</article-title>
                    <italic toggle="yes">arXiv preprint arXiv:2305.06299.</italic>
                    <year>2023</year>.
                    <pub-id pub-id-type="pmcid">PMC9128209</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref125">
                <label>125</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wadhwa</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>DeYoung</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Nye</surname>
                            <given-names>B</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Jointly Extracting Interventions, Outcomes, and Findings from RCT Reports with LLMs.</article-title>
                    <source>

                        <italic toggle="yes">arXiv preprint arXiv.</italic>
                    </source>
                    <italic toggle="yes">:2305.03642.</italic>
                    <year>2023</year>.</mixed-citation>
            </ref>
            <ref id="ref126">
                <label>126</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wadhwa</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Amir</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wallace</surname>
                            <given-names>BC</given-names>
                        </name>
</person-group>:
                    <article-title>Revisiting Relation Extraction in the era of Large Language Models.</article-title>
                    <source>

                        <italic toggle="yes">arXiv preprint arXiv.</italic>
                    </source>
                    <italic toggle="yes">:2305.05003.</italic>
                    <year>2023</year>.</mixed-citation>
            </ref>
            <ref id="ref127">
                <label>127</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Schmidt</surname>
                            <given-names>L</given-names>
                        </name>
</person-group>;
                    <article-title>Appendix for base review.</article-title>
                    <source>

                        <italic toggle="yes">Harvard Dataverse, V4, UNF:6:0z0ZlKmB1VglRVObRackrw== [fileUNF].</italic>
                    </source>
                    <year>2020</year>.
                    <pub-id pub-id-type="doi">10.7910/DVN/LNGCOQ</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref128">
                <label>128</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Schmidt</surname>
                            <given-names>L</given-names>
                        </name>
</person-group>;
                    <article-title>Available datasets for SR automation.</article-title>
                    <source>

                        <italic toggle="yes">Harvard Dataverse, V1.</italic>
                    </source>
                    <year>2021</year>.
                    <pub-id pub-id-type="doi">10.7910/DVN/0XTV25</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref129">
                <label>129</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Aletaha</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Nemati-Anaraki</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Keshtkar</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A Scoping Review of Adopted Information Extraction Methods for RCTs.</article-title>
                    <source>

                        <italic toggle="yes">Med J Islam Repub Iran.</italic>
                    </source>
                    <year>2023</year>;<volume>37</volume>:<fpage>95</fpage>.
                    <pub-id pub-id-type="pmid">38021383</pub-id>
                    <pub-id pub-id-type="doi">10.47176/mjiri.37.95</pub-id>
                    <pub-id pub-id-type="pmcid">PMC10657257</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref130">
                <label>130</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Beltagy</surname>
                            <given-names>I</given-names>
                        </name>
                        <name name-style="western">
                            <surname>Lo</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Cohan</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>SciBERT: A pretrained language model for scientific text.</article-title>
                    <source>
                        <italic toggle="yes">Proceedings of the 2019 Conference on Empirical Methods in Natural.</italic>
                    </source>
                    <year>2019</year>.
                    <pub-id pub-id-type="doi">10.18653/v1/d19-1371</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref131">
                <label>131</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Cao</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Liu</surname>
                            <given-names>X</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>LLM Collaboration PLM Improves Critical Information Extraction Tasks in Medical Articles.</article-title>
                    <source>
                        <italic toggle="yes">China Health Information Processing Conference,</italic>
                    </source>
                    <publisher-loc>China</publisher-loc>.<year>2024</year>.</mixed-citation>
            </ref>
            <ref id="ref132">
                <label>132</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Chen</surname>
                            <given-names>Q</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sun</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Liu</surname>
                            <given-names>H</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>An extensive benchmark study on biomedical text generation and mining with ChatGPT.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
                    </source>
                    <year>2023</year>;<volume>39</volume>(<issue>9</issue>).
                    <pub-id pub-id-type="pmid">37682111</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btad557</pub-id>
                    <pub-id pub-id-type="pmcid">PMC10562950</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref133">
                <label>133</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Devane</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Burke</surname>
                            <given-names>NN</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Treweek</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Study within a review (SWAR).</article-title>
                    <source>

                        <italic toggle="yes">J Evid Based Med.</italic>
                    </source>
                    <year>2022</year>;<volume>15</volume>(<issue>4</issue>):<fpage>328</fpage>&#x2013;<lpage>332</lpage>.
                    <pub-id pub-id-type="pmid">36513956</pub-id>
                    <pub-id pub-id-type="doi">10.1111/jebm.12505</pub-id>
                    <pub-id pub-id-type="pmcid">PMC10107874</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref134">
                <label>134</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Dhrangadhariya</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Manzo</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>M&#x00fc;ller</surname>
                            <given-names>H</given-names>
                        </name>
</person-group>:
                    <chapter-title>PICO to PICOS: Weak Supervision to Extend Datasets with New Labels</chapter-title>.
                    <source>

                        <italic toggle="yes">Stud Health Technol Inform.</italic>
                    </source>
                    <year>2024</year>.
                    <pub-id pub-id-type="doi">10.3233/shti240775</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref135">
                <label>135</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Dhrangadhariya</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Muller</surname>
                            <given-names>H</given-names>
                        </name>
</person-group>:
                    <article-title>Distant-cto: a zero cost, distantly supervised approach to improve low-resource entity extraction using clinical trials literature.</article-title>
                    <source>

                        <italic toggle="yes">ACL Anthol.</italic>
                    </source>
                    <year>2022</year>.
                    <pub-id pub-id-type="doi">10.18653/v1/2022.bionlp-1.34</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref136">
                <label>136</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Duan</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Guo</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Jiang</surname>
                            <given-names>H</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Boundary-aware Dual Biaffine Model for Sequential Sentence Classification in Biomedical Documents.</article-title>
                    <source>

                        <italic toggle="yes">IEEE/ACM Trans Comput Biol Bioinform</italic>.</source>
                    <year>2024</year>.
                    <pub-id pub-id-type="doi">10.1109/tcbb.2024.3376566</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref137">
                <label>137</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Fakhare</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hamed, Khalid.</surname>
                        </name>
</person-group>:
                    <article-title>Automated clinical knowledge graph generation framework for evidence based medicine.</article-title>
                    <source>

                        <italic toggle="yes">Expert Syst Appl.</italic>
                    </source>
                    <year>2023</year>.
                    <pub-id pub-id-type="doi">10.1016/j.eswa.2023.120964</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref138">
                <label>138</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ghosh</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mukherjee</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ganguly</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>AlpaPICO: Extraction of PICO frames from clinical trial documents using LLMs.</article-title>
                    <source>

                        <italic toggle="yes">Methods</italic>
                    </source>
                    <year>2024</year>;<volume>226</volume>:<fpage>78</fpage>&#x2013;<lpage>88</lpage>.
                    <pub-id pub-id-type="pmid">38643910</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.ymeth.2024.04.005</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref139">
                <label>139</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ghosh</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mukherjee</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Santra</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <chapter-title>BLINKtextsubscriptLSTM: BioLinkBERT and LSTM based approach for extraction of PICO frame from Clinical Trial Text</chapter-title>.
                    <source>Proceedings of the 7th Joint International Conference on Data Science &amp; Management of Data (11th ACM IKDD CODS and 29th COMAD)</source>,
                    <publisher-loc>Bangalore</publisher-loc>,
                    <publisher-name>India</publisher-name>;<year>2024</year>.</mixed-citation>
            </ref>
            <ref id="ref140">
                <label>140</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ghosh</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Tyagi</surname>
                            <given-names>U</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kumar</surname>
                            <given-names>S</given-names>
                        </name>
                        <etal/>
</person-group>:
                    <article-title>BioAug: Conditional Generation based Data Augmentation for Low-Resource Biomedical NER.</article-title>
                    <source>
                        <italic toggle="yes">SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval.</italic>
                    </source>
                    <year>2023</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://openalex.org/works/W4384656653">https://openalex.org/works/W4384656653</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref141">
                <label>141</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Gu</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Tinn</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Cheng</surname>
                            <given-names>H</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing.</article-title>
                    <source>

                        <italic toggle="yes">ACM Trans Comput Heal.</italic>
                    </source>
                    <year>2020</year>;<volume>3</volume>:<fpage>1</fpage>&#x2013;<lpage>23</lpage>.</mixed-citation>
            </ref>
            <ref id="ref142">
                <label>142</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hammer</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Virgili</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Bilotta</surname>
                            <given-names>F</given-names>
                        </name>
</person-group>:
                    <article-title>Evidence-based literature review: De-duplication a cornerstone for quality.</article-title>
                    <source>

                        <italic toggle="yes">World J Methodol.</italic>
                    </source>
                    <year>2023</year>;<volume>13</volume>(<issue>5</issue>):<fpage>390</fpage>&#x2013;<lpage>398</lpage>.
                    <pub-id pub-id-type="pmid">38229943</pub-id>
                    <pub-id pub-id-type="doi">10.5662/wjm.v13.i5.390</pub-id>
                    <pub-id pub-id-type="pmcid">PMC10789108</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref143">
                <label>143</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hoang</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Guan</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kilicoglu</surname>
                            <given-names>H</given-names>
                        </name>
</person-group>:
                    <article-title>Methodological information extraction from randomized controlled trial publications: a pilot study.</article-title>
                    <source>

                        <italic toggle="yes">AMIA Annu Symp Proc.</italic>
                    </source>
                    <year>2022</year>;<volume>2022</volume>:<fpage>542</fpage>&#x2013;<lpage>551</lpage>.</mixed-citation>
            </ref>
            <ref id="ref144">
                <label>144</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Howard</surname>
                            <given-names>BE</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Phillips</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Tandon</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>SWIFT-Active Screener: Accelerated document screening through active learning and integrated recall estimation.</article-title>
                    <source>

                        <italic toggle="yes">Environ Int.</italic>
                    </source>
                    <year>2020</year>;<volume>138</volume>:<fpage>105623</fpage>.</mixed-citation>
            </ref>
            <ref id="ref145">
                <label>145</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hu</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Keloth</surname>
                            <given-names>VK</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Raja</surname>
                            <given-names>K</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Towards precise PICO extraction from abstracts of randomized controlled trials using a section-specific learning approach.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
                    </source>
                    <year>2023</year>;<volume>39</volume>(<issue>9</issue>).
                    <pub-id pub-id-type="pmid">37669123</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btad542</pub-id>
                    <pub-id pub-id-type="pmcid">PMC10500081</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref146">
                <label>146</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Jardim</surname>
                            <given-names>PSJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Rose</surname>
                            <given-names>CJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ames</surname>
                            <given-names>HM</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Automating risk of bias assessment in systematic reviews: a real-time mixed methods comparison of human researchers to a machine learning system.</article-title>
                    <source>

                        <italic toggle="yes">BMC Med Res Methodol.</italic>
                    </source>
                    <year>2022</year>;<volume>22</volume>(<issue>1</issue>):<fpage>167</fpage>.
                    <pub-id pub-id-type="pmid">35676632</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s12874-022-01649-y</pub-id>
                    <pub-id pub-id-type="pmcid">PMC9174024</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref147">
                <label>147</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Jiang</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Fan</surname>
                            <given-names>Y-C</given-names>
                        </name>
</person-group>:
                    <article-title>Biomedical Abstract Sentence Classification by BERT-Based Reading Comprehension.</article-title>
                    <source>

                        <italic toggle="yes">SN Comput Sci.</italic>
                    </source>
                    <year>2023</year>;<volume>4</volume>(<issue>4</issue>).
                    <pub-id pub-id-type="doi">10.1007/s42979-023-01830-0</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref148">
                <label>148</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Jiang</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lan</surname>
                            <given-names>M-S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Menke</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Text classification models for assessing the completeness of randomized controlled trial publications based on CONSORT reporting guidelines.</article-title>
                    <source>

                        <italic toggle="yes">Sci Rep.</italic>
                    </source>
                    <year>2024</year>;<volume>14</volume>(<issue>1</issue>).
                    <pub-id pub-id-type="pmid">39289403</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41598-024-72130-7</pub-id>
                    <pub-id pub-id-type="pmcid">PMC11408668</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref149">
                <label>149</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Joseph</surname>
                            <given-names>SA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Chen</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Trienes</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>FactPICO: Factuality Evaluation for Plain Language Summarization of Medical Evidence</article-title>
                    <source>
                        <italic toggle="yes">Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Bangkok, Thailand.</italic>
                    </source>
                    <year>2024</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://aclanthology.org/2024.acl-long.459">https://aclanthology.org/2024.acl-long.459</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref150">
                <label>150</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kang</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sun</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kim</surname>
                            <given-names>JH</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>EvidenceMap: a three-level knowledge representation for medical evidence computation and comprehension.</article-title>
                    <source>

                        <italic toggle="yes">J Am Med Inform Assoc.</italic>
                    </source>
                    <year>2023</year>;<volume>30</volume>(<issue>6</issue>):<fpage>1022</fpage>&#x2013;<lpage>1031</lpage>.
                    <pub-id pub-id-type="pmid">36921288</pub-id>
                    <pub-id pub-id-type="doi">10.1093/jamia/ocad036</pub-id>
                    <pub-id pub-id-type="pmcid">PMC10198523</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref151">
                <label>151</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Karabulut</surname>
                            <given-names>ME</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Vijay-Shanker</surname>
                            <given-names>K</given-names>
                        </name>
</person-group>:
                    <article-title>Sectioning of Biomedical Abstracts: A Sequence of Sequence Classification Task.</article-title>
                    <source>
                        <italic toggle="yes">ArXiv Preprint</italic>
                    </source>.<year>2022</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://export.arxiv.org/abs/2201.07112">https://export.arxiv.org/abs/2201.07112</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref152">
                <label>152</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kartchner</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ramalingam</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Al-Hussaini</surname>
                            <given-names>I</given-names>
                        </name>
                        <etal/>
</person-group>:
                    <chapter-title>Zero-Shot Information Extraction for Clinical Meta-Analysis using Large Language Models</chapter-title>.
                    <source>
                        <italic toggle="yes">The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, Canada.</italic>
                    </source>
                    <year>2023</year>.</mixed-citation>
            </ref>
            <ref id="ref153">
                <label>153</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Lam</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pham</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Nguyen</surname>
                            <given-names>T</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>LSTM-based Deep Neural Network With A Focus on Sentence Representation for Sequential Sentence Classification in Medical Scientific Abstracts.</article-title>
                    <source>

                        <italic toggle="yes">ArXiv Preprint</italic>.</source>
                    <year>2024</year>.
                    <comment>
                        <ext-link ext-link-type="uri" xlink:href="https://export.arxiv.org/abs/2401.15854">https://export.arxiv.org/abs/2401.15854</ext-link>
                    </comment>
                </mixed-citation>
            </ref>
            <ref id="ref154">
                <label>154</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Lee</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Paek</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Huang</surname>
                            <given-names>LC</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>SEETrials: Leveraging Large Language Models for Safety and Efficacy Extraction in Oncology Clinical Trials.</article-title>
                    <source>

                        <italic toggle="yes">medRxiv</italic>.</source>
                    <year>2024</year>.
                    <pub-id pub-id-type="pmid">38798420</pub-id>
                    <pub-id pub-id-type="doi">10.1101/2024.01.18.24301502</pub-id>
                    <pub-id pub-id-type="pmcid">PMC11118548</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref155">
                <label>155</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Legate</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Nimon</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Noblin</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>(Semi) automated approaches to data extraction for systematic reviews and meta-analyses in social sciences: A living review [version 2; peer review: 2 approved, 1 approved with reservations].</article-title>
                    <source>

                        <italic toggle="yes">F1000Research.</italic>
                    </source>
                    <year>2024</year>;<volume>13</volume>(<issue>664</issue>).
                    <pub-id pub-id-type="doi">10.12688/f1000research.151493.2</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref156">
                <label>156</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Li</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Luan</surname>
                            <given-names>Z</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Liu</surname>
                            <given-names>Y</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Automated information extraction model enhancing traditional Chinese medicine RCT evidence extraction (Evi-BERT): algorithm development and validation.</article-title>
                    <source>

                        <italic toggle="yes">Front Artif Intell.</italic>
                    </source>
                    <year>2024</year>;<volume>7</volume>.
                    <pub-id pub-id-type="pmid">39210937</pub-id>
                    <pub-id pub-id-type="doi">10.3389/frai.2024.1454945</pub-id>
                    <pub-id pub-id-type="pmcid">PMC11358118</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref157">
                <label>157</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Lin</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Liu</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Moon</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A Sample Size Extractor for RCT Reports.</article-title>
                    <source>

                        <italic toggle="yes">Stud Health Technol Inform.</italic>
                    </source>
                    <year>2022</year>;<volume>290</volume>:<fpage>617</fpage>&#x2013;<lpage>621</lpage>.
                    <pub-id pub-id-type="pmid">35673090</pub-id>
                    <pub-id pub-id-type="doi">10.3233/shti220151</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref158">
                <label>158</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Liu</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ge</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lai</surname>
                            <given-names>H</given-names>
                        </name>

                        <etal/>
</person-group>:<year>2024</year>;
                    <source>

                        <italic toggle="yes">AI-Driven Evidence Synthesis: Data Extraction of Randomized Controlled Trials with Large Language Models</italic>
                    </source>
                    <comment>
                        <italic toggle="yes">SSRN</italic>
                    </comment>.
                    <pub-id pub-id-type="doi">10.2139/ssrn.4870368</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref159">
                <label>159</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Nye</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Jessy Li</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Patel</surname>
                            <given-names>R</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature.</article-title>
                    <source>

                        <italic toggle="yes">Proc Conf Assoc Comput Linguist Meet.</italic>
                    </source>
                    <year>2018</year>;<volume>2018</volume>:<fpage>197</fpage>&#x2013;<lpage>207</lpage>.
                    <pub-id pub-id-type="pmid">30305770</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref160">
                <label>160</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ofori-Boateng</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Aceves-Martins</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wiratunga</surname>
                            <given-names>N</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Towards the automation of systematic reviews using natural language processing, machine learning, and deep learning: a comprehensive review.</article-title>
                    <source>

                        <italic toggle="yes">Artif Intell Rev.</italic>
                    </source>
                    <year>2024</year>;<volume>57</volume>(<issue>8</issue>):<fpage>200</fpage>.
                    <pub-id pub-id-type="doi">10.1007/s10462-024-10844-w</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref161">
                <label>161</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Panayi</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ward</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Benhadji-Schaff</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Evaluation of a prototype machine learning tool to semi-automate data extraction for systematic literature reviews.</article-title>
                    <source>

                        <italic toggle="yes">Syst Rev.</italic>
                    </source>
                    <year>2023</year>;<volume>12</volume>(<issue>1</issue>):<fpage>187</fpage>.
                    <pub-id pub-id-type="pmid">37803451</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13643-023-02351-w</pub-id>
                    <pub-id pub-id-type="pmcid">PMC10557215</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref162">
                <label>162</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Peeters</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Vijverberg</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pouwer</surname>
                            <given-names>W</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Evaluation of SURUS: a Named Entity Recognition System to Extract Knowledge from Interventional Study Records.</article-title>
                    <source>
                        <italic toggle="yes">Medrxiv (Cold Spring Harbor Laboratory).</italic>
                    </source>
                    <year>2024</year>.
                    <pub-id pub-id-type="doi">10.1101/2024.05.31.24308278</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref163">
                <label>163</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Reason</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Benbow</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Langham</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Artificial Intelligence to Automate Network Meta-Analyses: Four Case Studies to Evaluate the Potential Application of Large Language Models.</article-title>
                    <source>

                        <italic toggle="yes">Pharmacoecon Open.</italic>
                    </source>
                    <year>2024</year>;<volume>8</volume>(<issue>2</issue>):<fpage>205</fpage>&#x2013;<lpage>220</lpage>.
                    <pub-id pub-id-type="pmid">38340277</pub-id>
                    <pub-id pub-id-type="doi">10.1007/s41669-024-00476-9</pub-id>
                    <pub-id pub-id-type="pmcid">PMC10884375</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref164">
                <label>164</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Reason</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Langham</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gimblett</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>Automated Mass Extraction of Over 680,000 PICOs from Clinical Study Abstracts Using Generative AI: A Proof-of-Concept Study.</article-title>
                    <source>

                        <italic toggle="yes">Pharm Med.</italic>
                    </source>
                    <year>2024</year>;<volume>38</volume>:<fpage>365</fpage>&#x2013;<lpage>372</lpage>.
                    <pub-id pub-id-type="pmid">39327389</pub-id>
                    <pub-id pub-id-type="doi">10.1007/s40290-024-00539-6</pub-id>
                    <pub-id pub-id-type="pmcid">PMC11473607</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref165">
                <label>165</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Schmidt</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hair</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Graziozi</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Exploring the use of a Large Language Model for Data Extraction in Systematic Reviews: a Rapid Feasibility Study.</article-title>
                    <source>
                        <italic toggle="yes">Proceedings of the 3rd Workshop on Augmented Intelligence for Technology-Assisted Reviews Systems (ALTARS 2024), Glasgow.</italic>
                    </source>
                    <year>2024</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://ceur-ws.org/Vol-3832/paper2.pdf">https://ceur-ws.org/Vol-3832/paper2.pdf</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref166">
                <label>166</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Shi</surname>
                            <given-names>X</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhao</surname>
                            <given-names>W</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>EvidenceTriangulator: A Large Language Model Approach to Synthesizing Causal Evidence across Study Designs.</article-title>
                    <source>
                        <italic toggle="yes">Medrxiv (Cold Spring Harbor Laboratory)</italic>
                    </source>.<year>2024</year>.
                    <pub-id pub-id-type="doi">10.1101/2024.03.18.24304457</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref167">
                <label>167</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Singh, &amp; Sharan, A.</surname>
                        </name>
</person-group>:
                    <article-title>PICO Classification Using Domain-Specific Features.</article-title>
                    <source>

                        <italic toggle="yes">Lecture Notes In Electrical Engineering</italic>
                    </source>
                    <year>2024</year>:<fpage>447</fpage>&#x2013;<lpage>457</lpage>.
                    <pub-id pub-id-type="doi">10.1007/978-981-99-8646-0_35</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref168">
                <label>168</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Tam</surname>
                            <given-names>TYC</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sivarajkumar</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kapoor</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A framework for human evaluation of large language models in healthcare derived from literature review.</article-title>
                    <source>

                        <italic toggle="yes">NPJ Digit Med.</italic>
                    </source>
                    <year>2024</year>;<volume>7</volume>(<issue>1</issue>):<fpage>258</fpage>.
                    <pub-id pub-id-type="pmid">39333376</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41746-024-01258-7</pub-id>
                    <pub-id pub-id-type="pmcid">PMC11437138</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref169">
                <label>169</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Tinn</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Cheng</surname>
                            <given-names>H</given-names>
                        </name>
</person-group>池谷, 裕,
                    <italic toggle="yes">et al.</italic>:
                    <article-title>Fine-tuning large neural language models for biomedical natural language processing.</article-title>
                    <source>

                        <italic toggle="yes">Patterns.</italic>
                    </source>
                    <year>2023</year>;<volume>4</volume>(<issue>4</issue>):<fpage>100729</fpage>&#x2013;<lpage>100729</lpage>.
                    <pub-id pub-id-type="pmid">37123444</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.patter.2023.100729</pub-id>
                    <pub-id pub-id-type="pmcid">PMC10140607</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref170">
                <label>170</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>T&#x00f3;th</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Berek</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gul&#x00e1;csi</surname>
                            <given-names>L</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Automation of systematic reviews of biomedical literature: a scoping review of studies indexed in PubMed.</article-title>
                    <source>

                        <italic toggle="yes">Syst Rev.</italic>
                    </source>
                    <year>2024</year>;<volume>13</volume>(<issue>1</issue>):<fpage>174</fpage>.
                    <pub-id pub-id-type="pmid">38978132</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13643-024-02592-3</pub-id>
                    <pub-id pub-id-type="pmcid">PMC11229257</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref171">
                <label>171</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wadhwa</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>DeYoung</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Nye</surname>
                            <given-names>B</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Jointly Extracting Interventions, Outcomes, and Findings from RCT Reports with LLMs.</article-title>
                    <source>

                        <italic toggle="yes">ArXiv Preprint</italic>.</source>
                    <year>2023</year>.
                    <comment>
                        <ext-link ext-link-type="uri" xlink:href="https://export.arxiv.org/abs/2305.03642">https://export.arxiv.org/abs/2305.03642</ext-link>
                    </comment>
                </mixed-citation>
            </ref>
            <ref id="ref172">
                <label>172</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Shi</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Yu</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Prompt engineering for healthcare: Methodologies and applications.</article-title>
                    <source>

                        <italic toggle="yes">arXiv preprint arXiv:2304.14670</italic>.</source>
                    <year>2023</year>.</mixed-citation>
            </ref>
            <ref id="ref173">
                <label>173</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>Z</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Cao</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Danek</surname>
                            <given-names>B</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Accelerating clinical evidence synthesis with large language models.</article-title>
                    <source>

                        <italic toggle="yes">ArXiv Preprint</italic>.</source>
                    <year>2024</year>.
                    <comment>
                        <ext-link ext-link-type="uri" xlink:href="https://export.arxiv.org/abs/2406.17755">https://export.arxiv.org/abs/2406.17755</ext-link>
                    </comment>
                </mixed-citation>
            </ref>
            <ref id="ref174">
                <label>174</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Windisch</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Dennst&#x00e4;dt</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Koechli</surname>
                            <given-names>C</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <source>

                        <italic toggle="yes">Extracting the Sample Size From Randomized Controlled Trials in Explainable Fashion Using Natural Language Processing.</italic>
                    </source>
                    <year>2024</year>.
                    <pub-id pub-id-type="doi">10.1101/2024.07.09.24310155</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref175">
                <label>175</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Witte</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Cimiano</surname>
                            <given-names>P</given-names>
                        </name>
</person-group>:
                    <article-title>Intra-Template Entity Compatibility based Slot-Filling for Clinical Trial Information Extraction.</article-title>
                    <source>
                        <italic toggle="yes">Proceedings of the 21st Workshop on Biomedical Language Processing</italic>
                    </source>.<year>2022</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://aclanthology.org/2022.bionlp-1.18">https://aclanthology.org/2022.bionlp-1.18</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref176">
                <label>176</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Witte</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Schmidt</surname>
                            <given-names>DM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Cimiano</surname>
                            <given-names>P</given-names>
                        </name>
</person-group>:
                    <article-title>Comparing generative and extractive approaches to information extraction from abstracts describing randomized clinical trials.</article-title>
                    <source>

                        <italic toggle="yes">J Biomed Semantics.</italic>
                    </source>
                    <year>2024</year>;<volume>15</volume>(<issue>1</issue>):<fpage>3</fpage>.
                    <pub-id pub-id-type="pmid">38654304</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13326-024-00305-2</pub-id>
                    <pub-id pub-id-type="pmcid">PMC11036632</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref177">
                <label>177</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Yun</surname>
                            <given-names>HS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pogrebitskiy</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Marshall</surname>
                            <given-names>IJ</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Automatically Extracting Numerical Results from Randomized Controlled Trials with Large Language Models.</article-title>
                    <source>

                        <italic toggle="yes">ArXiv Preprint</italic>.</source>
                    <year>2024</year>.
                    <comment>
                        <ext-link ext-link-type="uri" xlink:href="https://export.arxiv.org/abs/2405.01686">https://export.arxiv.org/abs/2405.01686</ext-link>
                    </comment>
                </mixed-citation>
            </ref>
            <ref id="ref178">
                <label>178</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zafar</surname>
                            <given-names>I</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wali</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kunwar</surname>
                            <given-names>C-O</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A pipeline for medical literature search and its evaluation.</article-title>
                    <source>

                        <italic toggle="yes">J Inf Sci.</italic>
                    </source>
                    <year>2023</year>.
                    <pub-id pub-id-type="doi">10.1177/01655515231161557</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref179">
                <label>179</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zhang</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhou</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hu</surname>
                            <given-names>Y</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A span-based model for extracting overlapping PICO entities from randomized controlled trial publications.</article-title>
                    <source>

                        <italic toggle="yes">J Am Med Inform Assoc.</italic>
                    </source>
                    <year>2024</year>;<volume>31</volume>(<issue>5</issue>):<fpage>1163</fpage>&#x2013;<lpage>1171</lpage>.
                    <pub-id pub-id-type="pmid">38471120</pub-id>
                    <pub-id pub-id-type="doi">10.1093/jamia/ocae065</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref180">
                <label>180</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zhang</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Tian</surname>
                            <given-names>W</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zheng</surname>
                            <given-names>Y</given-names>
                        </name>
                        <etal/>
</person-group>:
                    <chapter-title>Enhancing PICOS Information Extraction with UIE and ERNIE-Health</chapter-title>.
                    <source>
                        <italic toggle="yes">China Health Information Processing Conference, China</italic>
                    </source>.<year>2024</year>.</mixed-citation>
            </ref>
            <ref id="ref181">
                <label>181</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zhang</surname>
                            <given-names>Q</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Qu</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhao</surname>
                            <given-names>Q</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Task-Specific Model Allocation Medical Papers PICOS Information Extraction.</article-title>
                    <source>

                        <italic toggle="yes">Commun Comput Inf Sci.</italic>
                    </source>
                    <year>2024</year>:<fpage>166</fpage>&#x2013;<lpage>177</lpage>.
                    <pub-id pub-id-type="doi">10.1007/978-981-97-1717-0_15</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref182">
                <label>182</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zong</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Yin</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Tong</surname>
                            <given-names>Y</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Overview of CHIP 2023 Shared Task 5: Medical Literature PICOS Identification.</article-title>
                    <source>

                        <italic toggle="yes">Commun Comput Inf Sci.</italic>
                    </source>
                    <year>2024</year>:<fpage>159</fpage>&#x2013;<lpage>165</lpage>.
                    <pub-id pub-id-type="doi">10.1007/978-981-97-1717-0_14</pub-id>
                </mixed-citation>
            </ref>
        </ref-list>
        <fn-group content-type="footnotes">
            <fn id="fn1">
                <label>
                    <sup>1</sup>
                </label>
                <p>

                    <ext-link ext-link-type="uri" xlink:href="https://eppi.ioe.ac.uk/cms/default.aspx?tabid=3790">https://eppi.ioe.ac.uk/cms/default.aspx?tabid=3790</ext-link> (last accessed 31/10/2024)</p>
            </fn>
            <fn id="fn2">
                <label>
                    <sup>2</sup>
                </label>
                <p>

                    <ext-link ext-link-type="uri" xlink:href="https://microsoft.github.io/BLURB/tasks.html">https://microsoft.github.io/BLURB/tasks.html</ext-link> (last accessed 21/10/2024)</p>
            </fn>
            <fn id="fn3">
                <label>
                    <sup>3</sup>
                </label>
                <p>

                    <ext-link ext-link-type="uri" xlink:href="https://openai.com/index/introducing-gpt-4-5/">https://openai.com/index/introducing-gpt-4-5/</ext-link> (last accessed 07/03/2024)</p>
            </fn>
            <fn id="fn4">
                <label>
                    <sup>4</sup>
                </label>
                <p>

                    <ext-link ext-link-type="uri" xlink:href="https://upwork.com">https://upwork.com</ext-link> (last accessed 17/10/2024)</p>
            </fn>
        </fn-group>
    </back>
    <sub-article article-type="reviewer-report" id="report89347">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.54235.r89347</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Amezcua-Prieto</surname>
                        <given-names>Carmen</given-names>
                    </name>
                    <xref ref-type="aff" rid="r89347a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-0957-4057</uri>
                </contrib>
                <aff id="r89347a1">
                    <label>1</label>Department of Preventive Medicine and Public Health, University of Granada, Granada, Spain</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>26</day>
                <month>8</month>
                <year>2021</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2021 Amezcua-Prieto C</copyright-statement>
                <copyright-year>2021</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport89347" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.51117.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>Data extraction in a systematic review is a hard and time-consuming task. The (semi) automation of data extraction in systematic reviews is an advantage for researchers and ultimately for evidence-based clinical practice. This living systematic review examines published approaches for data extraction from reports of clinical studies published up to a cut-off date of 22 April 2020. The authors included more than 50 publications in this version of their review that addressed extraction of data from abstracts, while less (26%) used full texts. They identified more publications describing data extraction for interventional reviews.&#x00a0; Publications extracting epidemiological or diagnostic accuracy data were limited.</p>
            <p> </p>
            <p> Main important issues have been addressed in the systematic review: 
                <list list-type="bullet">
                    <list-item>
                        <p>This living systematic review has been justified.&#x00a0;The field of systematic review (semi) automation is evolving rapidly along with advances in language processing, machine learning, and deep learning.</p>
                    </list-item>
                </list> 
                <list list-type="bullet">
                    <list-item>
                        <p>Searching and update schedules have been clearly defined,&#x00a0;shown in Figure 1.</p>
                    </list-item>
                    <list-item>
                        <p>There are sufficient details of the methods and analysis provided to allow replication.</p>
                    </list-item>
                    <list-item>
                        <p>Conclusions are drawn adequately supported by the results presented in the review.</p>
                    </list-item>
                </list> </p>
            <p> A minor consideration is suggested: 
                <list list-type="bullet">
                    <list-item>
                        <p>&#x00a0;An incomplete sentence in Methods: &#x2018;We included reports published from 2005 until the present day, similar to&#x2019;.</p>
                    </list-item>
                </list>
            </p>
            <p>Are the rationale for, and objectives of, the Systematic Review clearly stated?</p>
            <p>Yes</p>
            <p>Is the statistical analysis and its interpretation appropriate?</p>
            <p>Not applicable</p>
            <p>Have the search and update schedule been clearly defined and justified?</p>
            <p>Yes</p>
            <p>Is the living method justified?</p>
            <p>Yes</p>
            <p>Are sufficient details of the methods and analysis provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn adequately supported by the results presented in the review?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>NA</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report89348">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.54235.r89348</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Kaiser</surname>
                        <given-names>Kathryn A.</given-names>
                    </name>
                    <xref ref-type="aff" rid="r89348a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-6258-4369</uri>
                </contrib>
                <aff id="r89348a1">
                    <label>1</label>Department of Health Behavior, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>12</day>
                <month>8</month>
                <year>2021</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2021 Kaiser KA</copyright-statement>
                <copyright-year>2021</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport89348" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.51117.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The authors have undertaken and documented the steps taken to monitor an area of research methods that is important to many around the world by use of a &#x201c;living systematic review&#x201d;. The specific focus is on automated or semi-automated data extraction around the PICO structure often used in biomedicine, whether it be to summarize a body of literature narratively or using meta-analysis techniques. A significant irony about the body of papers included in this review is that there is a large amount of missingness related to the performance of such methods. Those who conduct systematic reviews know well the degree of missing information sought to summarize a group of studies.</p>
            <p> </p>
            <p> Readers who will be most interested in this ongoing work can maintain an eye on the authors&#x2019; progress in identifying activities in this space. It is not clear, however, how long the funding will support this effort or how long the authors will remain engaged in advancing this project. The data represented in this paper does not give readers confidence that the community is approaching acceptable methods that are superior to other, less automated methods (the latter of which are not well-discussed).</p>
            <p> </p>
            <p> Some aspects of the paper would benefit from additional detail (in no particular order of importance): 
                <list list-type="order">
                    <list-item>
                        <p>The end game for the tracking of this area of literature is not explicitly described in the abstract, nor is it discussed to a great extent at the end of the paper. Much of the results presented do not paint a bright future for this area of research as conditions presently are. While the aim is laid out well in section 1.2, the large amount of missing performance data (reported to be 87%) is unable to address the&#x00a0;&#x201c;Is it reliable?&#x201d; question. One might suspect that if particularly stellar performance were demonstrated by a project, those data would be prominently advertised. Thus, the yet-to-be-done contacting of authors step would be enlightening if either performance data can be obtained, or if authors remain silent on that request. This follow-up task will be a major point of interest for many who will follow updates to this paper. It is likely that the particular research context (e.g.&#x00a0;see Pham&#x00a0;
                            <italic>et al</italic>., 2021
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-89348-1">1</xref>
                            </sup>) will have a large degree of influence on the performance metrics to be had if they can be determined.</p>
                    </list-item>
                    <list-item>
                        <p>The description of how the 17 &#x201c;Key items of interest&#x201d; were determined and if there is a plan to put these forth as methodological guidelines or a reporting checklist would be helpful. Either of these would help to advance the field further.</p>
                    </list-item>
                    <list-item>
                        <p>On Page 5, the exclusions listed have the use of pre-processing of text, yet the results discuss the many papers that appear to have used that in their methods. Perhaps this is a deviation from the original protocol after the review began (an understandable decision)?</p>
                    </list-item>
                    <list-item>
                        <p>In section 2.4 about searching Pubmed, can the authors clarify that the Pubmed 2.0 API or GUI will be used to access candidate literature?</p>
                    </list-item>
                    <list-item>
                        <p>Also relevant to section 2.4 on searching, since GITHUB is so popular, might this also be a fruitful place to routinely search?</p>
                    </list-item>
                    <list-item>
                        <p>Clarification of the ability to obtain cited software packages (whether for no cost or at some cost) would be helpful.</p>
                    </list-item>
                    <list-item>
                        <p>Figure 3 explanation of PICO is a typo &#x2013; &#x201c;PCIO&#x201d;.</p>
                    </list-item>
                    <list-item>
                        <p>Table 5 is shown before Table 1. Please check and correct flow and references to table numbers (5,1,4,2,3 is the flow now).</p>
                    </list-item>
                    <list-item>
                        <p>One of the major limitations to be noted is the unfortunate issue of the lack of specific data in abstracts about interventions and comparators.</p>
                    </list-item>
                </list>
            </p>
            <p>Are the rationale for, and objectives of, the Systematic Review clearly stated?</p>
            <p>Yes</p>
            <p>Is the statistical analysis and its interpretation appropriate?</p>
            <p>Yes</p>
            <p>Have the search and update schedule been clearly defined and justified?</p>
            <p>Yes</p>
            <p>Is the living method justified?</p>
            <p>Yes</p>
            <p>Are sufficient details of the methods and analysis provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn adequately supported by the results presented in the review?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Systematic reviews in biomedicine topics, issues with time and effort required to complete reviews with generally available tools.</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
        <back>
            <ref-list>
                <title>References</title>
                <ref id="rep-ref-89348-1">
                    <label>1</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow</article-title>.
                        <source>
                            <italic>Systematic Reviews</italic>
                        </source>.<year>2021</year>;<volume>10</volume>(<issue>1</issue>) :
                        <elocation-id>10.1186/s13643-021-01700-x</elocation-id>
                        <pub-id pub-id-type="doi">10.1186/s13643-021-01700-x</pub-id>
                    </mixed-citation>
                </ref>
            </ref-list>
        </back>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report85692">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.54235.r85692</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>McFarlane</surname>
                        <given-names>Emma</given-names>
                    </name>
                    <xref ref-type="aff" rid="r85692a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-9276-227X</uri>
                </contrib>
                <aff id="r85692a1">
                    <label>1</label>Centre for Guidelines, National Institute for Health and Care Excellence, London, UK</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>8</day>
                <month>6</month>
                <year>2021</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2021 McFarlane E</copyright-statement>
                <copyright-year>2021</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport85692" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.51117.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>This is a living systematic review of&#x00a0;published methods and tools aimed at automating or semi-automating the process of data extraction in the context of a systematic review. Automating data extraction is an area of interest among evidence-based medicine.&#x00a0;</p>
            <p> </p>
            <p> The methods are sufficiently described to be replicated, but further details of analysis to determine the items of interest would be helpful to link into the results. Additionally, the authors may want to consider commenting on the topic areas covered by the included studies and whether that has an impact on any of the metrics measured.&#x00a0;</p>
            <p> </p>
            <p> In the discussion section, it's interesting that fewer studies extracted data from the full text. Could the authors comment on the implications of this in terms of using tools in a live review as it's not common to manually only extract data from an abstract.</p>
            <p>Are the rationale for, and objectives of, the Systematic Review clearly stated?</p>
            <p>Yes</p>
            <p>Is the statistical analysis and its interpretation appropriate?</p>
            <p>Not applicable</p>
            <p>Have the search and update schedule been clearly defined and justified?</p>
            <p>Yes</p>
            <p>Is the living method justified?</p>
            <p>Yes</p>
            <p>Are sufficient details of the methods and analysis provided to allow replication by others?</p>
            <p>Partly</p>
            <p>Are the conclusions drawn adequately supported by the results presented in the review?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Evidence-based medicine, systematic reviews, automation techniques.</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
</article>
