Background

F1000Research

2046-1402

F1000 Research Limited

London, UK

10.12688/f1000research.151493.2

Systematic Review

Articles

(Semi)automated approaches to data extraction for systematic reviews and meta-analyses in social sciences: A living review

[version 2; peer review: 2 approved, 2 approved with reservations]

Legate

Amanda

Conceptualization Data Curation Formal Analysis Investigation Methodology Project Administration Resources Validation Visualization Writing – Original Draft Preparation Writing – Review & Editing https://orcid.org/0000-0001-7763-7630 a 1 Nimon

Kim

Conceptualization Data Curation Formal Analysis Investigation Methodology Resources Software Supervision Validation Visualization Writing – Original Draft Preparation Writing – Review & Editing https://orcid.org/0000-0003-2543-8386 1 Noblin

Ashlee

Data Curation Formal Analysis Investigation Validation Writing – Original Draft Preparation Writing – Review & Editing 1 1Human Resource Development, The University of Texas at Tyler, Tyler, Texas, 75799, USA

a alegate@patriots.uttyler.edu

No competing interests were disclosed.

26 9 2024

2024

664

23 9 2024

2024

This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Background

An abundance of rapidly accumulating scientific evidence presents novel opportunities for researchers and practitioners alike, yet such advantages are often overshadowed by resource demands associated with finding and aggregating a continually expanding body of scientific information. Data extraction activities associated with evidence synthesis have been described as time-consuming to the point of critically limiting the usefulness of research. Across social science disciplines, the use of automation technologies for timely and accurate knowledge synthesis can enhance research translation value, better inform key policy development, and expand the current understanding of human interactions, organizations, and systems. Ongoing developments surrounding automation are highly concentrated in research for evidence-based medicine with limited evidence surrounding tools and techniques applied outside of the clinical research community. The goal of the present study is to extend the automation knowledge base by synthesizing current trends in the application of extraction technologies of key data elements of interest for social scientists.

Methods

We report the baseline results of a living systematic review of automated data extraction techniques supporting systematic reviews and meta-analyses in the social sciences. This review follows PRISMA standards for reporting systematic reviews.

Results

The baseline review of social science research yielded 23 relevant studies.

Conclusions

When considering the process of automating systematic review and meta-analysis information extraction, social science research falls short as compared to clinical research that focuses on automatic processing of information related to the PICO framework. With a few exceptions, most tools were either in the infancy stage and not accessible to applied researchers, were domain specific, or required substantial manual coding of articles before automation could occur. Additionally, few solutions considered extraction of data from tables which is where key data elements reside that social and behavioral scientists analyze.

Automated data extraction systematic review meta-analysis evidence synthesis social science research APA Journal Article Reporting Standards (JARS)

The author(s) declared that no grants were involved in supporting this work.

Revised Amendments from Version 1

In this revised version, several updates have been made to improve clarity, transparency, and align with peer review feedback: (a) Minor grammatical revisions have been made throughout the manuscript, including the use of passive construction where suggested; (b) Acronyms for databases and search platforms have been spelled out to ensure accessibility for readers; (c) Database search dates have been added to enhance transparency and replicability; (d) Additional references to the research protocol and extended data files have been incorporated to provide greater context and transparency regarding the review methodology; and (e) Search strategy limitations, particularly concerning language restrictions, database selection, and the exclusion of gray literature, have been explicitly addressed in the limitations section.

Introduction

Across disciplines, systematic reviews and meta-analyses are integral to exploring and explaining phenomena, discovering causal inferences, and supporting evidence-based decision making. The concept of metascience represents an array of evidence synthesis approaches which support combining existing research results to summarize what is known about a specific topic ( Davis et al., 2014; Gough et al., 2020). Researchers use a variety of systematic review methodologies to synthesize evidence within their domains or to integrate extant knowledge bases spanning multiple disciplines and contexts. When engaging in quantitative evidence synthesis, researchers often supplement the systematic review with meta-analysis (a principled statistical process for grouping and summarizing quantitative information reported across studies within a research domain). As technology advances, in addition to greater access to data, researchers are presented with new forms and sources of data to support evidence synthesis ( Bosco et al. , 2017; Ip et al., 2012; Wagner et al., 2022).

Systematic reviews and meta-analyses are fundamental to supporting reproducibility and generalizability of research surrounding social and cultural aspects of human behavior, however, the process of extracting data from primary research is a labor-intensive effort, fraught with the potential for human error (see Pigott & Polanin, 2020). Comprehensive data extraction activities associated with evidence synthesis have been described as time-consuming to the point of critically limiting the usefulness of existing approaches ( Holub et al., 2021). Moreover, research indicates that it can take several years for original studies to be included in a new review due to the rapid pace of new evidence generation ( Jonnalagadda et al., 2015).

The need for this review

In the clinical research domain, particularly in Randomized Control Trials (RCTs), automation technologies for data extraction are evolving rapidly (see Schmidt et al., 2023). In contrast with the more defined standards that have evolved throughout clinical research domains, within and across social sciences, substantial variation exists in research designs, reporting protocols, and even publication outlet standards ( Davis et al., 2014; Short et al., 2018; Wagner et al., 2022). In health intervention research, targeted data elements generally include Population (or Problem), Intervention, Control, and Outcome (i.e., PICO; see Eriksen and Frandsen, 2018; Tsafnat et al., 2014). While experimental designs are considered a gold-standard for translational value, many phenomena examined across the social sciences occur within contexts which necessitate research pragmatism in both design and methodological considerations ( Davis et al., 2014).

Consider, for example, the field of Human Resource Development (HRD). In HRD, a primary focal hub for research includes outcomes of workplace interventions intended to inform and improve areas such as learning, training, organizational development, and performance improvement ( Shirmohammadi et al., 2021). While measuring intervention outcomes is a substantial area of discourse, HRD researchers have predominantly relied on cross-sectional survey data and the most commonly employed quantitative method is Structural Equation Modeling ( Park et al., 2021). Thus, meta-analyses are increasingly essential for supporting reproducibility and generalizability of research. In these fields, data elements targeted for extraction would rarely align with the PICO framework, but rather, meta-analytic endeavors would entail extraction of measures such as effect sizes, model fit indices, or instrument psychometric properties ( Appelbaum et al., 2018).

Related research

Serving as a model for the present study, Schmidt et al. (2023) are conducting a living systematic review (LSR) of tools and techniques available for (semi)automated extraction of data elements pertinent to synthesizing the effects of healthcare interventions (see Higgins et al., 2022). Exploring a range of data-mining and text classification methods for systematic reviews, the authors uncovered that early often employed approaches (e.g., rule-based extraction) gave way to classical machine-learning (e.g., naïve Bayes and support vector machine classifiers), and more recently, trends indicate increased application of deep learning architectures such as neural networks and word embeddings (for yearly trends in reported systems architectures, see Schmidt et al., 2021, p. 8).

In social sciences and related disciplines, several related reviews of tools and techniques for automating tasks associated with systematic reviews and meta-analyses have been conducted. Table 1 provides a summary of related research.

Table 1. Related literature.

Reference	Research discipline (Sample size)	Primary focus
Antons et al. (2020)	Innovation ( n=140)	Text mining methods in innovation research
Cairo et al. (2019)	Computer Science ( n=17)	ML techniques for secondary studies
Dridi et al. (2021)	Management ( n=124)	Scholarly data mining applications
Göpfert et al. (2022)	Multidisciplinary ( n=80)	Measurement extraction methods using NLP
Feng et al. (2017)	Software Engineering ( n=32)	Text mining techniques and tools to facilitate SLRs
Kohl et al. (2018)	Multidisciplinary ( n=22)	Tools for systematic reviews and mapping studies
Roldan-Baluis et al. (2022)	Multidisciplinary ( n=46)	NLP and ML for processing unstructured texts in digital format
Sundaram and Berleant (2023)	Multidisciplinary ( n=29)	Text mining-based automation of SLRs
Wagner et al. (2022)	Information Systems and related Social Sciences (NR)	Review of AI in literature reviews
Yang et al. (2023)	Education ( n=161)	Text mining techniques in educational research

Note. AI = Artificial Intelligence, ML = Machine Learning, SLR = Systematic Literature Review, NLP = Natural Language Processing, NR = Not Reported.

Based on extant reviews analyzing trends in Artificial Intelligence (AI) technologies for automating Systematic Literature Review (SLR) efforts outside of clinical domains, we noted several trends. First, techniques to facilitate abstraction, generalization, and grouping of primary studies represent the majority of (semi)automated approaches. Second, extant reviews highlight a predominant focus on supporting search and study selection stages, with significant gaps in (semi)automating data extraction. Third, evaluation concerns underscore the importance of performance metrics, validation procedures, benchmark datasets and improved transparency and reporting standards to ensure the reliability and effectiveness of AI techniques. Finally, challenges in cross-discipline transferability illuminate the need for domain-specific adaptations and infrastructures.

Existing reviews evidence the widespread application of techniques such as topic modeling, clustering, and classification to support abstraction, generalization, and grouping of primary research studies. Topic modeling, particularly Latent Dirichlet Allocation (LDA), is commonly applied to (semi)automate content analysis, facilitating the distillation of complex information into meaningful insights and identification of overarching trends and patterns across a literature corpus ( Antons et al., 2020; Dridi et al., 2021; Roldan-Baluis et al., 2022; Yang et al., 2023). Additionally, classification and clustering techniques are commonly applied for tasks such as mining article metadata and automatically grouping papers by relevance to SLR research questions are ( Feng et al., 2017; Sundaram & Berleant, 2023; Wagner et al., 2022).

(Semi)automation efforts in social sciences and related disciplines have primarily addressed supporting the search and study selection stages of SLRs ( Cairo et al., 2019; Feng et al., 2017), with significant gaps in automation techniques for tasks such as data extraction ( Göpfert et al., 2022; Sundaram & Berleant, 2023). Further, available software tools lack functionality to support activities beyond study selection ( Kohl et al., 2018). Key findings across these reviews underscore the need for more comprehensive automation solutions, particularly for quantitative data extraction ( Göpfert et al., 2022).

Additionally, researchers express transparency concerns regarding AI’s reliance on black box models ( Wagner et al., 2022) and limited visibility into underlying processes and algorithms in proprietary software solutions ( Antons et al., 2020). Adding to these considerations, Antons et al. (2020) identified substantial reporting gaps, including 35 of 140 articles omitting details about software used. Since metrics alone may not be sufficient to objectively assess AI performance ( Dridi et al., 2021), strategies for mitigating bias and ensuring transparency and fairness represent a substantial topic of automation discourse.

Ongoing research of AI tools for clinical studies ( Sundaram & Berleant, 2023) and the extraction of PICO data elements from RCTs ( Wagner et al., 2022) underscore the success of domain-specific adaptation efforts. While the promise of adopting AI-based techniques and tools in social science domains is evident ( Cairo et al., 2019; Feng et al., 2017), extant research reveals challenges in transferring existing technologies across disciplines. Further, many SLR software applications are tailored specifically for health and medical science research ( Kohl et al., 2018). Literature suggests that overcoming global obstacles can be facilitated by concentrated efforts to develop domain-specific knowledge representations, such as standardized construct taxonomies and vocabularies ( Feng et al., 2017; Göpfert et al., 2022; Wagner et al., 2022).

Objectives

In the present study, we conduct a baseline review of existing and emergent techniques for the (semi)automated data extraction which focus on target data entities and elements relevant to evidence synthesis across social sciences research domains. This review covers data extraction tools for a range of data types—both quantitative and qualitative. Per the research protocol, social sciences categories included in this review were based on the branches of science and academic activity boundaries described by Cohen (2021; Chapter 2). Additional description is available in the project repositories, see ‘Data availability’ section. We report findings that supplement the growing body of research dedicated to the automatic extraction of data from clinical and medical research.

Methods Protocol registration

This LSR was conducted following a pre-registered and published protocol ( Legate & Nimon, 2023b). For additional details and project repositories, see ‘Data availability’ section.

Living review

We adopted the LSR methodology for this study primarily due to the pace of emerging evidence, particularly in light of ongoing technological advancements. The ongoing nature of an LSR allows for continuous surveillance, ensuring timely presentation of new information that may influence findings ( Elliott et al., 2014, 2017; Khamis et al., 2019). This baseline review was initiated upon peer approval of the associated protocol ( Legate & Nimon, 2023b). It remains our intent for the review to be continually updated via living methodological surveys of published research ( Khamis et al., 2019) following the workflow schedule as previously published in the protocol (see Figure 1; Legate & Nimon, 2023b). Necessary adjustments to the workflow will be detailed within each subsequent update.

Figure 1. LSR workflow.

This image is reproduced under the terms of a Creative Commons Attribution 4.0 International license (CC-BY 4.0) from Legate and Nimon (2023b).

Note. Arrows represent stages involved in a static systematic review; the dotted line (from “Publish Report” to “Search”) represents the stage at which the review process is repeated from the beginning while the review remains in living status.

Eligibility criteria

As in prior reviews, English language reports, published 2005 or later were considered for inclusion ( Jonnalagadda et al., 2015; O’Mara-Eves et al., 2015; Schmidt et al., 2020). Eligible studies utilized, presented, and/or evaluated semi-automated approaches to support evidence-synthesis research methods (e.g., systematic reviews, psychometric meta-analyses, meta-analyses of effect sizes, etc.). Studies may have reported on any automated technique for data extraction, given that at least one entity was extracted semi-automatically from the abstracts or full-text of a literature corpus and sufficient detail was reported for: a)

entity(ies) or data element(s) extracted;

location of the extracted entities (e.g., abstract, methods, results sections); and

the automation tool and/or technique used to support extraction.

Editorials, briefs, or opinion pieces and/or engaged in narrative discussion without applying automation tools or technologies to extract data from research literature were not considered eligible. Per the protocol, studies were also considered ineligible if they applied tools or techniques to: a)

extract data exclusively from medical, biomedical, clinical (e.g., RCTs), or natural science research;

extract metadata only (i.e., citation details) from research articles; or

extract data from alternative (i.e., non-research literature) sources (e.g., web scraping, electronic communications, transcripts, etc.).

Search sources

The search strategy for this review was developed by adapting the search strategy from a related LSR of clinical research ( Schmidt et al., 2020).

We initially intended to conduct searches in the same databases used by Schmidt et al. (2020, 2021), excluding medical research sources. Because IEEE content is indexed in Web of Science ( Young, 2023), we did not include IEEE Xplore as a separate source. We added two additional databases ( ACL and ArXiv) and conducted a search for data extraction tools in the Systematic Review Toolbox ( Marshall et al., 2022) to capture associated articles. Searches were conducted in the Association for Computational Linguistics (ACL) Anthology, arXiv Research-Sharing Platform (arXiv), and DBLP Computer Science Bibliography (DBLP) on June 15, 2023; in the Web of Science Core Collection (WOS) on June 8, 2023; and in the Systematic Review Toolbox on October 2, 2023.

The Web of Science search and deduplication followed procedures stated in the protocol ( Legate & Nimon, 2023b). We adapted source code developed by Schmidt et al. (2021) for automating search, retrieval, and deduplication functions on full database dumps for ACL, ArXiv, and DBLP platforms. Complete details, including citation indices and specific setting applied, search syntax, and adapted source code are available in the project repository (see ‘Data availability’ section).

Study selection

Title, abstract, and full-text screening was conducted using Rayyan ( Ouzzani et al., 2016; free and subscription accounts available at https://www.rayyan.ai/). Three researchers (1000 abstracts per week) screened all titles and abstracts. Researchers met weekly to review, resolve conflicts, and further develop the codebook for this LSR. All conflicts that arose during the title and abstract screening ( n=103/ N=10,644) were resolved on a weekly basis. Where disagreements arose, they were related to methods for abstractive text summarization and transferability of methods applied to clinical research studies (i.e., RCTs). In cases where level of abstraction and potential for transferability could not be determined from the abstract alone, full text articles were reviewed and discussed by all three researchers until consensus was reached.

For the data extraction stage, a Google form was developed following items of interest as described in the protocol. All data extraction tasks were performed independently in triplicate. Researchers met weekly to review and reach a consensus on coding of extracted items of interest. The extraction form was updated over the course of data extraction to better fit project goals and promote reliability of future updates.

We originally intended to conduct Inter-Rater Reliability (IRR) assessments to provide reliability estimates following each stage of the baseline review ( Belur et al., 2018; Zhao et al., 2022). Given the nascency of our research and scope of our items of interest, coding forms allowed for input of “other” responses (e.g., APA data elements) that were not included in extant reviews that focus on medical and clinical data extraction (e.g., PICO elements). Further, data extraction presented opportunities to develop reporting structure for methods and items of interest that were not reported in prior literature (e.g., NER, open-source tools). A weekly review meeting was used to continually develop the project codebook to promote continuity, structure, and develop an IRR framework for future iterations of this review.

Results Search results

Search results are presented in the PRISMA flowchart (see Figure 2). A total of 11,336 records were identified through all search sources, including databases and publications available through the Systematic Review Toolbox ( Marshall et al., 2022). After deduplication, 10,644 articles were included in the title and abstract screening stage. We retrieved 46 articles for full-text screening. One duplicate print was detected during full text screening and was removed. This iteration of the LSR includes 23 articles. Detailed description of deduplication and preliminary screening procedures are available in the OSF project repository (see ‘Data availability’ section).

Figure 2. PRISMA diagram.

Note. ACL=ACL Anthology ( https://aclanthology.org/), arXiv=arXiv Research-Sharing Platform ( https://arxiv.org/), DBLP=DBLP Computer Science Bibliography ( https://dblp.org/), WOS=Web of Science Core Collection, SRTool=Systematic Review Toolbox ( http://systematicreviewtools.com/).

The following sections describe the rationale for exclusions, followed by a brief overview of studies included in the baseline review. These results are presented in Figures 3 and 4, respectively. An overview of included studies is presented in Table 2.

Figure 3. Excluded publications.

Note. Domain = exclusively medical, biomedical, clinical, or natural science ( n=2); Target entities = Lack of detail in reporting extracted entities ( n=7); Application = no application, testing, or extraction conducted manually ( n=6); Lack of Detail in Reporting Corpus or Wrong Corpus ( n=7).

Figure 4. Included publications.

Note. Presented Tool = Describe/demonstrated a software tool, system, or application for data extraction ( n=12), Developed Method = Developed techniques and/or methods for automated data extraction ( n=9); Evaluated Techniques = Tested or evaluated the performance of existing tools, techniques, or methods ( n=2); Applied Tool = Applied automation tools to conduct secondary research ( n=0).

Table 2. Included studies.

Title	Reference	Summary description
A model for the identification of the functional structures of unstructured abstracts in the social sciences	Shen et al. (2022)	Proposed a high-performance model for identifying functional structures of unstructured abstracts in the social sciences.
A Semi-automatic Data Extraction System for Heterogeneous Data Sources: a Case Study from Cotton Industry	Nayak et al. (2021)	Proposed a novel data extraction system based on text mining approaches to discover relevant and focused information from diverse unstructured data sources.
An interactive query-based approach for summarizing scientific documents	Bayatmakou et al. (2022)	Proposed an interactive multi-document text summarization approach that allows users to specify composition of a summary and refine initial query by user-selected keywords and sentences extracted from retrieved documents.
Automatic results identification in software engineering papers. Is it possible?	Torres et al. (2012)	Analyzed existing methods for sentence classification in scientific papers and evaluates their feasibility in unstructured papers in the Software Engineering area.
Contextual information retrieval in research articles: Semantic publishing tools for the research community	Angrosh et al. (2014)	Introduced conceptual framework (and linked data application) for modeling contexts associated with sentences and converting information extracted from research articles into machine-understandable data.
CORWA: A Citation-Oriented Related Work Annotation Dataset	Li et al. (2022)	Presented new approach to related work generation in academic research papers and introduced annotation dataset for labeling different types of citation text fragments from various information sources.
DASyR (IR) - Document Analysis System for Systematic Reviews (in Information Retrieval)	Piroi et al. (2015)	Introduced a semi-automatic document analysis system/framework for annotating published papers for ontology population, particularly in domains lacking adequate dictionaries.
Detecting In-line Mathematical Expressions in Scientific Documents	Iwatsuki et al. (2017)	Reported preliminary results applying a method for identifying in-line mathematical expressions in PDF documents utilizing both layout and linguistic features.
Extracting the characteristics of life cycle assessments via data mining	Diaz-Elsayed and Zhang (2020)	Proposed a method for automatically extracting key characteristics of life cycle assessments (LCAs) from journal articles.
Machine Reading of Hypotheses for Organizational Research Reviews and Pre-trained Models via R Shiny App for Non-Programmers	Chen et al. (2021)	Introduced NLP models for accelerating the discovery, extraction, and organization of theoretical developments from social science publications.
MetaSeer.STEM: Towards Automating Meta-Analyses	Neppalli et al. (2016)	Proposed a machine learning-based system developed to support automated extraction of data pertinent to STEM education meta-analyses.
Mining Social Science Publications for Survey Variables	Zielinski and Mutschke (2017)	Described a work-in-progress development of new techniques or methods for identifying variables used in social science research.
Ontology-based and User-focused Automatic Text Summarization (OATS): Using COVID-19 Risk Factors as an Example	Chen et al. (2020)	Proposed an ontology-based system which users could access and utilize for automatically generating text summarization from unstructured text.
Ontology-Driven Information Extraction from Research Publications	Pertsas and Constantopoulos (2018)	Introduced a system designed to extract information from research articles, associate it with other sources, and infer new knowledge.
Research Method Classification with Deep Transfer Learning for Semi-Automatic Meta-Analysis of Information Systems Papers	Anisienia et al. (2021)	Presented an artifact that uses deep transfer learning for multi-label classification of research methods for an Information Systems corpus.
Scaling Systematic Literature Reviews with Machine Learning Pipelines	Goldfarb-Tarrant et al. (2020)	Described a pipeline that automates three stages of a systematic review: searching for documents, selecting relevant documents, and extracting data.
Searching for tables in digital documents	Liu et al. (2007)	Introduced an automatic table extraction and search engine system.
Section-wise indexing and retrieval of research articles	Shahid and Afzal (2018)	Described development and evaluation of a technique for tagging paper's content with logical sections appearing in scientific documents.
Sysrev: A FAIR Platform for Data Curation and Systematic Evidence Review	Bozada et al. (2021)	Introduced a platform for aiding in systematic reviews and data extraction by providing access to digital documents and facilitating collaboration in research projects.
Team EP at TAC 2018: Automating data extraction in systematic reviews of environmental agents	Nowak and Kunstman (2019)	Presented a solution for automating data extraction in systematic reviews of environmental agents.
The Canonical Model of Structure for Data Extraction in Systematic Reviews of Scientific Research Articles	Aliyu et al. (2018)	Developed a canonical model of structure approach that identifies sections from documents and extracts the headings and subheadings from the sections.
Towards a Semi-Automated Approach for Systematic Literature Reviews Completed Research	Denzler et al. (2021)	Presented a flexible and modifiable artifact to support systematic literature review processes holistically.
UniHD@CL-SciSumm 2020: Citation Extraction as Search	Aumiller et al. (2020)	Presented method to identify references from citation text spans and classify citation spans by discourse facets.

Excluded publications

Most studies were excluded due to lack of detail in extracted data entities ( n=7) and wrong corpus or data source ( n=7). Carrión-Toro et al. (2022), for example, developed a method and software tool supporting researchers with selection of relevant key criteria in a field of study based on term frequencies. While text summarization has proven valuable for evidence synthesis tasks, the primary focus of this LSR involves efforts to extract specific data points from primary research ( O’Connor et al., 2019). We also excluded extraction techniques that were not applied to abstracts or full text of research articles. Ochoa-Hernández et al. (2018), for instance, presented a method to automatically extract concepts from web blog articles.

The second most common exclusion category were articles that presented techniques or systems utilizing pre-extracted data ( n=4). Ali and Gravino (2018), for example, proposed an ontology-based SLR system with semantic web technologies; however, the data (derived from a prior review conducted by the authors) were added to the ontology system after the manual extraction stage. Finally, articles were excluded due to exclusive application in medical/clinical research ( n=2), or the proposed tool had not yet been implemented ( n=2). Goswami et al. (2019), for example, described and evaluated a supervised ML framework to identify and extract anxiety outcome measures from clinical trial articles. Zhitomirsky-Geffet et al. (2020) presented a conceptual description of a network-based data model capable of mining quantitative results from social sciences articles, but the system had not been implemented at the time of publication.

Included publications

The majority of included studies ( n=12) presented or described a software tool, system, or application to support researchers extracting data from research literature. The second most common inclusion category focused on the development of specialized techniques or methods for automating data extraction tasks ( n=9). We identified two studies that evaluated or tested the performance of existing tools or methods for (semi)automated data extraction. Unlike related reviews of data extraction methods for healthcare interviews (see Schmidt et al., 2023), we did not identify social science studies applying existing automated data extraction tools to conduct secondary research.

Automated approaches

To report approaches identified, we organized the extracted data under four overarching categories, including: (1) data preprocessing and feature engineering, (2) model architectures and components, (3) rule-bases, and (4) evaluation metrics. See ‘Data Availability’ section for labeling and additional descriptions of techniques. We opted to extract and report rule-based techniques separately because the approaches we identified intertwined with various aspects of the data processing and extraction pipeline, spanning data preprocessing to the model architecture itself. This distinction allows for more discussion about the prevalence, scope and utility of these techniques.

Data preprocessing and feature engineering

The data preprocessing category encompasses methods and techniques used to preprocess raw text and data before it is fed into ML/NLP models. This includes tasks such as tokenization, stemming, lemmatization, stop word removal, and other steps necessary to clean and prepare the text data for analysis. Figure 5 plots the aggregate results of preprocessing techniques identified.

Figure 5. Data preprocessing.

Nearly all studies applied tokenization and/or segmentation (83%, n=19) for breaking down text into manageable units. Similarly, PDF parsing/extraction techniques were applied in 65% ( n=15) of studies, the remaining studies applied extraction to other document formats (e.g., journal articles available online in HTML format; see Diaz-Elsayed & Zhang, 2020). While similar methods, which additionally take into account syntactic structure, including chunking and dependency parsing were less frequently applied ( Angrosh et al., 2014; Li et al., 2022; Nayak et al., 2021; Pertsas & Constantopoulos, 2018). Tagging methods, including PoS tagging (assigning grammatical categories, e.g., noun, verb), followed by concept tagging (e.g., semantic annotation), or sequence tagging, where labels were assigned based on order of appearance, were used in 43% ( n=15) of studies. Nine studies used manual annotation for training and/or evaluation.

Among noise reduction approaches, stop-word removal was the most common, stemming, normalization, and lemmatization were applied, though less frequently. For stemming approaches, the Porter stemmer ( Porter, 1980), including its extensions (e.g., Porter2, S-stemmer, snowball stemmer), were as commonly reported as traditional stemmers (see Aumiller et al., 2020; Bayatmakou et al., 2022; Shahid & Afzal, 2018). Optical Character Recognition (OCR) appeared in three studies, however, Iwatsuki et al. (2017) used OCR only as a benchmark for evaluating their CRF method for detecting math expressions.

Feature engineering (e.g., ranking functions, representation learning and feature extraction techniques) covers a range of methods essential for transforming raw text data into structured, machine-readable representations to facilitate downstream ML/NLP tasks ( Kowsari et al., 2019). See Figure 6.

Figure 6. Feature engineering.

Word embeddings were the most frequently used techniques. We grouped ELMo (word embeddings from language models) with traditional word embeddings such as Word2Vec and Glove ( Kowsari et al., 2019; Young et al., 2018, Chapter 6). Of these, GloVe was used in four studies ( Chen et al., 2021; Goldfarb-Tarrant et al., 2020; Nowak & Kunstman, 2019; Anisienia et al., 2021) and ELMo in two ( Nowak & Kunstman, 2019; Anisienia et al., 2021). The most common frequency-based feature representation approaches were Bag-of-Words (BoW, n=5) and Term frequency-Inverse Document Frequency (TF-IDF, n=4). Although less frequently applied in the corpus, methods for representing words or documents as vectors based on semantic properties such as Vector Space Models (VSM) and sentence embeddings were used as early as 2007. Other less commonly reported methods included synonym aggregation/expansion, best match ranking (BM25), shingling, and subject-verb pairings.

Model architectures and components

The model architecture category focuses on the architectures and components of ML/NLP models used for data extraction. Results are shown in Figure 7. Some approaches overlapped across applications – e.g., semantic web or semantic indexing structures and ontology-pipeline approaches – we grouped these techniques into categories to facilitate reporting. Likewise, all transformer-based approaches were grouped into a single category, however, specific architectures and components are discussed in the sections below, and detailed coding of extracted data is available in the supplemental data files (see ‘Underlying Data’ section). Where ruled-based approaches were considered a part of the system architecture, they are reported under the ‘Rule-bases’ section.

Figure 7. Model architectures and components.

Overall, approaches ranged from straightforward implementations to complex layered architectures. Examples of more straightforward approaches included architectures based entirely on rule-bases (e.g., Diaz-Elsayed & Zhang, 2020), applications based one classification method (e.g., naïve Bayes; Neppalli et al., 2016), or those utilizing a single type of probabilistic model ( Angrosh et al., 2014; Iwatsuki et al., 2017). At the other end of the complexity continuum, Nowak and Kunstman (2019) presented an end-to-end deep learning model based on a BI-LSTM-CRF architecture with interleaved alternating LSTM layers and highway connections. In the following sections, we further elaborate on various approaches identified.

Ontology-based and Semantic Web. These pipelines involve leveraging ontologies and semantic web technologies for semantic annotation or knowledge representation. Among included studies, ontology and semantic web capabilities were explored as early as 2014, but the preliminary results from this baseline review suggest an upward trend in recent years. Angrosh et al. (2014), for example, developed a Sentence Context Ontology (SENTCON) for modeling the contexts of information extracted from research documents. Piroi et al. (2015) developed and presented an annotation system for populating ontologies in domains lacking adequate dictionaries. Some work focused on automatically mapping structures of research documents. For example, using an open source lexical database to develop a canonical model of structure, Aliyu et al. (2018) were able to automatically identify and extract target paper sections from research documents. Shahid and Afzal (2018) utilized specialized ontologies to automatically tag content in research papers by logical sections. Chen et al. (2020) presented a novel framework for text summarization, including ontology-based topic identification and user-focused summarization modules.

Transformer-based Approaches. Our results suggested that transformer-based approaches have experienced rapid growth since 2020. Bidirectional Encoder Representations from Transformers (BERT) and other BERT-based language models made up the majority of transformer-based approaches. Specifically BERT ( Aumiller et al., 2020; Shen et al., 2022) and SciBERT ( Goldfarb-Tarrant et al., 2020; Li et al., 2022) were the most utilized for tasks relevant to extracting data from research in social sciences. Others language models included BioBERT ( Chen et al., 2020) and distilBERT ( Goldfarb-Tarrant et al., 2020). We identified a recent application of the Hugging Face LED model ( Li et al., 2022), a pre-trained longformer model developed to address length limitations associated with other transformer-based approaches (see Beltagy et al., 2020).

Named Entity Recognition (NER). Six of the included studies applied Named Entity Recognition (NER) techniques. Increasing availability of tools to support the entire SLR pipeline, including data extraction efforts, may be partially to credit for upward trends in NER applications. Based on applications we identified, NER would best be described as versatile. Some studies incorporated NER as an integral component embedded throughout a larger ML/NLP pipeline (e.g., Goldfarb-Tarrant et al., 2020), others included NER subcomponents leveraged primarily for preprocessing and feature representation tasks (e.g., Pertsas & Constantopoulos, 2018), and in one study, authors took advantage of open source NER tools that could be easily integrated into a highly modifiable artifact serving as platform for future development of holistic approaches to scaling SLR tasks (e.g., Denzler et al., 2021).

Extractive Questing-Answering Models. Extractive questing-answering models involve tasks where a model generates answers to questions based on a given context. Question-answering models appeared in our dataset as early as 2007 ( Liu et al., 2007), with the remaining applications published in 2020 or later. Question answering techniques have a range of applications that most readers are likely familiar with, like chatbots and intelligent assistants (e.g., Alexa, Google Assistant, Siri). However, state-of-the-art approaches for question-answering over knowledge bases are also being put to use in the data extraction arena. The study by Bayatmakou et al. (2022), for example, introduced new methods for interactive multi-document text summarization that allow users to specify summary compositions and interactively refine queries after reviewing complete sentences automatically extracted from documents.

Classifiers. For classification approaches, we followed Schmidt et al. (2021) in reporting instances of Support Vector Machines (SVM) separately from other binary classifiers and likewise found a high prevalence of SVM usage, accounting for 50% of all binary classifiers identified ( Goldfarb-Tarrant et al., 2020; Shahid & Afzal, 2018; Shen et al., 2022; Zielinski & Mutschke, 2017). Among classifiers that use a linear combination of inputs ( Jurafsky & Martin, 2024), naïve Bayes was the most frequent ( Neppalli et al., 2016; Shahid & Afzal, 2018; Torres et al., 2012; Zielinski & Mutschke, 2017). One study used a Perceptron classifier; however, it was extended (i.e., OvR) to handle multiclass problems ( Aumiller et al., 2020). Multi-class classifiers were less common with one instance each of k-Nearest Neighbors (aka KNN/kLog; Zielinski & Mutschke, 2017) and the J48 classifier (C4.5 Decision Trees; Piroi et al., 2015).

Neural Networks. Overall, there were a variety of neural network applications across the included studies. Most used Long Short-term Memory (LSTM), more specifically, Bidirectional LSTM (BiLSTM). We also identified one application Bidirectional Gated Recurrent Unit (BiGRU; Shen et al., 2022). Convolutional Neural Network (CNN) architectures ( Goldfarb-Tarrant et al., 2020; Nowak & Kunstman, 2019; Anisienia et al., 2021) were also present. Several studies evaluated state-of-the-art deep learning methods. For example, Shen et al. (2022) compared the performance of deep learning models (TextCNN and BERT) for sentence classification in social sciences abstracts. In another comparative study, Anisienia et al. (2021) compared methods for pretraining deep contextualized word representations for cutting-edge transfer learning techniques based on CNN and LSTM architectures in addition to classifier models (e.g., SVM).

Probabilistic Models. Among probabilistic models, Conditional Random Field (CRF) applications were predominant in our dataset. CRF was often applied for sequence labeling tasks, such as named entity recognition (e.g., Nayak et al., 2021), or for classification tasks (e.g., Angrosh et al. 2014). Overall, included studies provided evidence that CRF can form a powerful architecture when combined with RNNs (e.g., bi-GRU-CRF, bi-LSTM-CRF; see Nowak & Kunstman, 2019; Shen et al., 2022). We found a single application of the Maximum Entropy Markov Model (MEMM), however, based on experimental results the authors ultimately selected CRF for identifying sentence context for extraction from research publications ( Angrosh et al., 2014).

Rule-bases

Rule-based techniques involve the application of predefined rules or patterns to extract specific features from the text. Versatile and widely applicable, they offer a robust framework for automating data extraction or for capturing relevant information from large volumes of text. See Figure 8 for rule-based approaches reported across included studies.

Figure 8. Rule-bases.

Overall, 70% ( n=16) of included studies utilized rule- or heuristic-based approaches to support a variety of tasks for data extraction. Of these, nearly half ( n=7) reported using Regular Expressions (RegEx). For example, based on rules developed from manual inspection, RegEx was used by Torres et al. (2012) to construct patterns for identifying specific types of sentences (e.g., objective, results, conclusions) and by Goldfarb-Tarrant et al. (2020) for splitting papers into specific sections (e.g., abstract, introduction, methods). Alternatively, Pertsas and Constantopoulos (2018) used RegEx to exploit lexico-syntactic patterns derived from an ontology knowledge base (Activities, Goals, and Propositions). Other RegEx uses included modifying datasets to incorporate patterns related to citation mentions ( Anisienia et al., 2021) or application of rule-based chunking and processing to identify and extract relevant chunks from text ( Nayak et al., 2021). The remaining six studies described custom rule-based algorithms or other heuristic approaches. Li et al. (2022), for example, applied rule-based algorithms PrefixSpan and Gap-Bide for the extraction of frequent discourse sequences. RAKE (Rapid Automatic Keyword Extraction) was applied by Bayatmakou et al. (2022) to extract keywords which served as representations of a document’s content. And Aliyu et al. (2018) described a rule-based algorithm developed for processing full-text documents to identify and extract section headings.

Evaluation metrics

Evaluation metrics are presented in Figure 9. Precision, recall, F-scores, and accuracy were predominantly reported across studies, including the earliest published articles. For assessment of model performance, six studies used cross-validation (CV), a process of “averaging several hold-out estimators of the risk corresponding to different data splits” ( Arlot & Celisse, 2010, p. 53). K-fold CV (5 or 10 folds) was predominantly applied ( Angrosh et al., 2014; Iwatsuki et al., 2017; Neppalli et al., 2016; Shen et al., 2022, with one application of leave-one-out or LOOCV ( Piroi et al.; 2015) and one application of document level CV used as a supplemental technique to k-fold ( Neppalli et al., 2016). Five studies provided description of user feedback and other ratings. User feedback (among other metrics) was reported by Li et al. (2022) who conducted expert human comparative assessment to assess fluency, relevance, coherence, and overall quality of model citation span/sentence generation outputs. This category also included evaluation metrics not listed in the sources we adapted when developing our protocol (see O’Mara-Eaves et al., 2015, p. 3, Table 1; Schmidt et al., 2021, pp. 8-9). For example, in assessing their system on values returned for queries of interest, Nayak et al. (2021) reported suitably, adaptability, relevance scores, and data-dependencies. As another example, Denzler et al. (2021, p. 5) evaluated their artifact based on design science aspects (i.e., validity, efficacy, and utility).

Figure 9. Evaluation metrics.

Given the rapid growth of domain-specific ontologies and pre-trained language models, it is not surprising to find Kappa statistics reported for tasks such as evaluating agreement between human annotators when creating gold standard datasets for training and evaluation (Cohen’s Kappa, see Pertsas & Constantopoulos, 2018; Mezzich’s Kappa or Gwet’s AC1, see Anisienia et al., 2021). Semantic similarity scores, which can be used to compare model generated responses against ground truth responses in query-based or question-answering based applications, were reported in two studies (Jaccard Index, Bayatmakou et al., 2022; DKPro Similarity, Zielinski & Mutschke, 2017).

Availability, accessibility and transferability

While only one study we reviewed presented an existing tool that was accessible to users through an online application ( sysrev.com; Bozada et al., 2021) at the time of conducting this baseline review, two other studies were either being prepared or were available through other means. These included the Holistic Modifiable Literature Review tool ( Denzler et al., 2021), which was listed as, “currently being prepared” (available at https://holimolirev.github.io/HoliMoLiRev/) and HypothesisReader ( Chen et al., 2021), which was available to users through an Rshiny application. SysRev ( Bozada et al., 2021) was also the only tool cataloged in the SR Toolbox ( Marshall et al., 2022). Six of the twenty-three studies (26%) made source code openly available ( Chen et al., 2021; Denzler et al., 2021; Diaz-Elsayed & Zhang, 2020; Goldfarb-Tarrant et al., 2020; Iwatsuki et al., 2017; Li et al. 2022). Article references and corresponding repositories are detailed in Table 3. GitHub stood out as the most popular repository for code and data sharing, and one study made source code available online through an open access publisher.

Table 3. Code repositories.

Reference	Code repository
Chen et al. (2021)	devtools::install_github(“canfielder/HypothesisReader”)
Denzler et al. (2021)	GitHub Repository: https://github.com/HoliMoLiRev/HoliMoLiRev
Diaz-Elsayed and Zhang (2020)	https://methods-x.com/article/S2215-0161(20)30224-7/fulltext#supplementaryMaterial
Goldfarb-Tarrant et al. (2020)	https://github.com/seraphinatarrant/systematic_reviews
Iwatsuki et al. (2017)	https://github.com/Alab-NII/inlinemath
Li et al. (2022)	https://github.com/jacklxc/CORWA

Transferability

In the evolving landscape of systematic reviews and meta-analyses, the adaptability of tools and technologies to new research domains emerged as a critical factor for enhancing research efficiency and scope. The insights provided by many of the authors working towards automation of data extraction illuminate the transferability of various tools and technologies for research targeting the extraction of data elements beyond PICO.

Several authors of reviewed studies specifically addressed transferability in describing the development of their tools, and further subjected these tools to rigorous testing aimed at validating transferable capabilities ( Chen et al., 2020; Goldfarb-Tarrant et al., 2020; Neppalli et al., 2016). For instance, Neppalli et al. (2016) created MetaSeer.STEM with a focus on extraction of data across a range of research domains, including education, management, and health informatics. Chen et al. (2020) highlighted the adaptability of OATS, showcasing its broader application potential to fields beyond the authors’ COVID-19 specific demonstration. Finally, Goldfarb-Tarrant et al. (2020) affirmed the domain-independent nature of their framework, suggesting its suitability for various systematic reviews.

Additionally, other studies highlighted the need for transferability and discussed the potential for their research tools and technologies to be extended and adapted across varying domains, stressing the importance of flexible design principles in the development of these tools ( Angrosh et al., 2014; Diaz-Elsayed & Zhang, 2020). Angrosh et al. (2014) explained how SENTCON’s preliminary design was applied to a specific set of articles in computer science but emphasized that the tool was flexible enough to be applied to other domains through the use of the Web Ontology Language (OWL). Diaz-Elsayed & Zhang (2020) presented methods that were initially applied to wastewater-based resource recovery, but likewise emphasized that the tool was capable of evaluating other engineered systems and retrieving different types of data than those initially extracted.

As noted by Chen et al. (2021), while efforts are being made to assist the process of conducting systematic reviews there is often limited generalizability of domain-specific pre-trained language models. Many studies included in our review dedicated discussion points toward addressing the critical issue of generalizability and transferability of tools developed to support the broader research community in (semi)automated data extraction tasks. Collectively, these studies suggest a positive trend toward the development of adaptable, transferable research tools and technologies. However, they also underscore the need for ongoing effort across and between diverse domains to make continued progress toward broader research applications.

Open source tools

An outcome we did not anticipate was the substantial number of open source tools, toolkits, and frameworks utilized by our relatively small corpus of articles. Because we were unsure what to expect, we made every effort to capture evidence that might prove useful to social science researchers. We identified 50 different open source technologies including platforms, software, software suites, packages/libraries, algorithms, pre-trained models, controlled vocabularies/thesauri, lexical databases, knowledge representations, and more. Open source tools identified are reported in Figure 10. Of the open source resources available to researchers, the overwhelming majority were Python tools ( n=16; see Python Package Index, https://pypi.org/) and 8 of 23 (35%) studies used the Python Natural Language Toolkit (NLTK). The full list of open-source tools and license details are available in the OSF repository (see ‘Underlying Data’ section).

Figure 10. Open source tools. APA data elements

This section discusses potential for extraction of key data elements of interest, as well as locations (i.e., paper sections), structures, and review tasks addressed by the studies reviewed. We limited this section to reporting tools that users could theoretically access and use to support their own research projects. There were 12 studies that presented systems or artifacts designed to facilitate various tasks associated with identifying and extracting data from published literature. To avoid speculating as to the future availability of these tools, we included all studies which presented tools or systems where authors incorporated user interfaces (UIs), regardless of availability at the time of conducting this base review.

Table 4 provides an overview of data elements targeted as outlined by JARS ( Appelbaum et al., 2018, p. 6). Each tool was assessed for potential to extract specific data elements by manuscript section (i.e., methods and results reporting elements pertinent to meta-analytic research; see Legate & Nimon, 2023b). Where the authors did not state a tool name, we used the description of the tool as presented in the paper (e.g., Bayatmakou et al., 2022; Nayak et al., 2021).

Table 4. APA data elements.

			CORWA	CIRRA	DASyR (IR)	Holistic Modifiable Literature Reviewer	Hypothesis Reader	Interactive Text Summarization System	MetaSeer.STEM	OATS	Research Spotlight	Semi-automatic Data Extraction System	SysRev	TableSeer
Manuscript Section	Item	Example Reporting Elements	Li et al. (2022)	Angrosh et al. (2014)	Piroi et al. (2015)	Denzler et al. (2021)	Chen et al. (2021)	Bayatmakou et al. (2022)	Neppalli et al. (2016)	Chen et al. (2020)	Pertsas & Constantopoulos (2018)	Nayak et al. (2021)	Bozada et al. (2021)	Liu et al. (2007)
Methods	Criteria, Data Collection & Participants	Participant selection [setting(s), location(s), date(s), % approached vs. participated], Major/topic-specific demographics [age, sex, ethnicity, achievement level(s), tenure]		✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓
	Sample Size, Power & Precision	Intended vs. actual sample size, Sample size determination [power analysis, parameter estimates]	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓
	Measures & Instrumentation	Measures [primary, secondary, covariates], Psychometric properties [reliability coefficients, internal consistency reliability, discriminant/convergent validity, test-retest coefficients, time lag intervals]	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓
	Conditions & Design	Experimental/manipulated [randomized, nonrandomized], Nonexperimental [observational, single- or multi-group], Other [longitudinal, N-of-1, replication]		✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓
	Data Diagnostics	Post-collection inclusion/exclusion criteria [criteria to infer missing data, processing of outliers, data distributions, data transformations]		✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓
	Analytic Strategy & Hypotheses	Strategy for inferential statistics, Protection against error, Hypothesis (es) [primary, secondary, exploratory]		✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓
Results	Participants & Recruitment	Participants (by group/stage), Dates [recruitment, repeated measures]		✓	✓	✓		✓	✓	✓	✓	✓	✓
	Statistics & Data Analysis	Statistical/data-analytic methods, Missing data (frequency or %), Missing data methods [MCAR, MAR, MNAR], Validity issues [assumptions, distributions]		✓	✓	✓		✓	✓	✓	✓	✓	✓
	Complex Analyses	Analytic approach [SEM, HLM, factor analysis], Model details [fit indices], Software, Estimation Technique, Estimation Issues [e.g., convergence]	✓	✓	✓	✓		✓	✓	✓	✓	✓	✓	✓
	Descriptive & Inferential Statistics	Descriptive [total sample size, sample size subgroups/cases, means, standard deviations, correlation matrices], Inferential [ p-values, degrees of freedom, mean square effects/errors, effect size estimates, confidence intervals]			✓	✓		✓	✓	✓	✓	✓	✓	✓

Unlike ongoing research that focuses on data extraction from clinical literature (e.g., PICO elements/RCTs; see Schmidt et al., 2023), specific reporting guidelines were not a primary focus of the studies we identified. However, authors described target entities and/or research methods of interest with high levels of specificity. For instance, extracting descriptive statistics, sample size, and Likert scale points ( Neppalli et al., 2016) and extracting research hypotheses from published literature in organizational sciences ( Chen et al., 2021). Despite the lack of discourse surrounding specific reporting guidelines, many of the tools reviewed incorporated some form of user-prompted, annotation- or query-based approach to (semi)automated data extraction. Thus, the collective body of work lends optimism surrounding customizable state-of-the art methods that can support extraction for a wide range of disciplines, research designs, and entities or data elements of interest to social science researchers.

One example of a highly flexible approach is extractive question-answering based on pre-trained Transformer models. Extractive question-answering models are able to generate direct answers from knowledge base in response to natural language questions posed by users ( Kwiatkowski et al. 2019). These tools typically offer enhanced flexibility through user-defined prompts and mechanisms for interactive query refinement. Example tools that incorporated question answering techniques included CIRRA ( Piroi et al., 2015), the Interactive Text Summarization System for Scientific Documents ( Bayatmakou et al., 2022), and OATS ( Chen et al., 2021).

Other types of flexible systems allow users to view excerpts related to specific keywords or queries, supporting expedited identification and labeling of target data elements. For example several tools supported user labeling of data, followed by predictive classification based on user annotations. Although these tools do not automatically extract data for users, they do augment human effort by (semi)automating time consuming tasks associated with data annotation and extraction. For instance, Sysrev ( Bozada et al., 2021) supports researchers in labeling and extracting data by leveraging active learning models developed to replicate user decisions across various review tasks. Likewise, MetaSeer ( Neppalli et al., 2016) developed ML techniques to identify and extract numbers from documents, which were then presented to users for manual annotation. Unlike question-answering models, human-computer interactions in these examples are not based on natural language queries, however, human expertise can be used to ‘train’ ML models to predict future annotation decisions. Similarly, to overcome the time-constraints of open-ended annotation in fields that lack domain-specific dictionaries, DASyR ( Piroi et al., 2015) utilized a combination of user annotations, classification models, and contextual information for populating ontologies. They reported substantial reduction in annotation time, stating that through the DASyR UI “five experts added approximately 30,000 annotations at a speed of 4s/annotation” (p. 595).

Lastly, we note the utility of NER for the advancement of (semi)automated extraction of APA defined data elements. NER methodologies can be leveraged alongside classification models ( Nayak et al., 2021), linked to domain specific ontologies or other knowledge bases ( Piroi et al., 2015), or incorporated as stand-alone modules integrated into larger modifiable frameworks ( Denzler et al. 2021). In addition to pre-trained NER models for identification and extraction of named entities, Research Spotlight ( Pertsas & Constantopoulos, 2018) also exploited lexico-syntactic patterns in the scholarly ontology to identify and extract non-named entities. The Semi-automatic Data Extraction System for Heterogeneous Data Sources ( Nayak et al., 2021) combined features of NER and rule-based chunking to identify and extract phrases on regular expressions as well as named entities contained in the documents. Further, NER can be implemented through open source tools as demonstrated by Denzler et al. (2021) and Nayak et al. (2021).

Structure, location, and review tasks

Table 5 provides an overview of structure and location of extracted data elements, followed by review tasks supported by tools identified. The majority developed approaches for (semi)automating extraction of data from any section of full text research articles. Two studies tested the proposed techniques on specific article sections, including titles and abstracts ( Bayatmakou et al., 2022) and introduction and background sections ( Li et al., 2022). Regarding structure from which data were extracted, all except one extracted from unstructured text, two extracted from both tabular structures (i.e., tables) and text ( Nayak et al., 2021; Pertsas & Constantopoulos, 2018), and one was designed specifically to extract elements from tables (TableSeer; Liu et al., 2007).

Table 5. Structure, location, review tasks.

Category	Item	CORWA	CIRRA	DASyR (IR)	Holistic Modifiable Literature Reviewer	Hypothesis Reader	Interactive Text Summarization System	MetaSeer.STEM	OATS	Research Spotlight	Semi-automatic Data Extraction System	SysRev	TableSeer
Category	Item	Li et al. (2022)	Angrosh et al. (2014)	Piroi et al. (2015)	Denzler et al. (2021)	Chen et al. (2021)	Bayatmakou et al. (2022)	Neppalli et al. (2016)	Chen et al. (2020)	Pertsas & Constantopoulos (2018)	Nayak et al. (2021)	Bozada et al. (2021)	Liu et al. (2007)
Structure	Extract from Text	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓
Structure	Extract from Tables									✓	✓		✓
Location	Title & Abstract		✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓
	Introduction & Background	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓
	Methods		✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓
	Results		✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓
	Discussion		✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓
Review Tasks	1 Formulate Review Question
	2 Find Previous Reviews				✓							✓
	3 Write Protocol
	4 Devise Search Strategy
	5 Search				✓		✓		✓	✓		✓	✓
	6 De-duplicate				✓
	7 Screen Abstracts				✓		✓					✓
	8 Obtain Full Text									✓
	9 Screen Full Text											✓
	10 Snowball				✓
	11 Extract Data	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓
	12 Synthesize Data	✓			✓				✓		✓	✓
	13 Re-check literature
	14 Meta-Analyze
	15 Write up review

All tools focused heavily on tasks related to data extraction (e.g., identification, labeling/annotation, ontology population), which was anticipated based on our search strategy and inclusion criteria. However, several studies advanced solutions for supporting other SLR tasks or stages (see Tsafnat et al., 2014). The most common task (excluding data extraction) was literature search ( Bayatmakou et al., 2022; Bozada et al., 2021; Chen et al., 2020; Denzler et al., 2021; Liu et al., 2007; Pertsas & Constantopoulos, 2018). Many tasks listed are likely supported by a range of computational tools and techniques (e.g., synthesize and meta-analyze results); readers interested in (semi)automating other SLR stages are referred to the Systematic Review Toolbox for an extensive catalogue of tools and methods ( Marshall et al., 2022).

Challenges

A number of challenges were reflected within the body of evidence included in this baseline review. These challenges included difficulties in identifying functional structures within unstructured texts ( Shen et al., 2022), extracting data from PDF file sources ( Nayak et al., 2021; Goldfarb-Tarrant et al., 2020; Iwatsuki et al., 2017), and accurately detecting in-line mathematical expressions ( Iwatsuki et al., 2017). Computational complexity created another significant obstacle for researchers, with issues arising from text vectorization methods, optimization problems, and the computational resources required by neural network frameworks ( Bayatmakou et al., 2022; Anisienia et al., 2021). Furthermore, challenges associated with annotation, particularly biases introduced through the automated processes and limitations of available datasets, were a topic of discourse ( Li et al., 2022; Nowak & Kunstman, 2019; Torres et al., 2012).

Compared to the medical field, domain-specific challenges, particularly those in social sciences and related fields, necessitated tailored approaches, which can become time-consuming as researchers often lack sufficient training data to develop robust classifiers ( Chen et al., 2021; Aumiller et al., 2020; Zielinski & Mutschke, 2017). Additionally, meta-analytic methods often face hurdles related to data representation variability, which has significant limitations in the use of data extraction tools, and class imbalance in the development of classification tasks ( Aumiller et al., 2020; Neppalli et al., 2016; Goldfarb-Tarrant et al., 2020).

Conclusions

The findings of the baseline review indicate that when considering the process of automating systematic review and meta-analysis information extraction, social science research falls short as compared to clinical research that focuses on automatic processing of information related to the PICO framework (i.e., Population, Intervention, Control, and Outcome; Eriksenand Frandsen, 2018; Tsafnat et al., 2014). For example, while an LSR focusing on clinical research that is based on the PICO framework yielded 76 studies that included original data extraction ( Schmidt et al., 2023), the present review of social science research yielded only 23 relevant studies. This is not necessarily surprising when considering the breadth of social science research and the lack of unifying frameworks and domain specific ontologies ( Göpfert et al., 2022; Wagner et al., 2022).

With a few exceptions, most tools we identified were either in the infancy stage and not accessible to applied researchers, were domain specific, or require substantial manual coding of articles before automation can occur. Additionally, few solutions considered extraction of data from tables, which is where many elements (e.g., effect sizes) reside that social and behavioral scientists analyze. Further, development appears to have ceased for many of the tools identified.

We found no evidence indicating hesitation on the part of social science researchers to adopt data extraction tools, on the contrary, abstractive text summarization approaches continue to gain traction across social science domains ( Cairo et al., 2019; Feng et al., 2017). While these methods aid researchers in distilling complex information into meaningful insights, there remains a gap in technologies developed to augment human capabilities in the extraction of key data entities of interest for secondary data collection from quantitative empirical reports.

The impact of time-intensive research activities on translational value is not a new concern for the SLR research community. In many social sciences, emphasis is often placed on practical application and translational value, underscoring the importance of efficient research methodologies ( Githens, 2015). Further development of the identified systems and techniques could mitigate time delays that often result in outdated information as researchers cannot feasibly include all new evidence that may be released throughout the lifetime of a given project ( Marshall & Wallace, 2019).

Limitations

As with any method that involves subjectivity, results can be influenced by a variety of factors (e.g., study design, publication bias, researcher judgment, etc.). We worked diligently to conduct this review and document our procedures in a systematic and transparent manner; however, efforts to replicate our search strategy or screening processes may not result in the same corpus or reach the same conclusions ( Belur et al., 2018). This baseline review presented an opportunity to better develop our search and screening strategy, methodological procedures, and research goals. Moving forward, we have developed a codebook and assessment procedures to increase the transparency and reliability of our research.

A second limitation of this study was the omission of snowballing as a search strategy. Though we did not uncover applied secondary research articles utilizing automation tools, several potentially useable tools and systems were discovered in the course of this review. For future iterations of this LSR, we plan to incorporate forward snowballing from relevant articles in previous searches as part of our formalized search strategy (see Wohlin et al., 2022). Additionally, our search strategy has limitations related to its focus on English-language publications, the non-exhaustive list of databases and sources consulted, and the exclusion of grey literature. Addressing these aspects in future updates could enhance the comprehensiveness of findings and provide a broader perspective on the current state of automation tools in secondary research.

Finally, in this baseline review, we did not capture techniques used for optimization, training, or fine-tuning on specific datasets or tasks. Several techniques surfaced while conducting this review, such as class modifiers (e.g., OvR; Aumiller et al., 2020), genetic algorithms ( Bayatmakou et al., 2022; Torres et al. (2012), Adam optimizer ( Nowak & Kunstman, 2019); Shen et al., 2022), cross entropy loss ( Chen et al., 2020; Li et al., 2022), Universal Language Model Fine-tuning (e.g., ULMFiT; Anisienia et al., 2021), and back-propagation optimizers ( Chen et al., 2020; Anisienia et al., 2021). With increasing applications of pre-trained language models that can be fine-tuned for specific applications ( Jurafsky & Martin, 2024), inclusion of training and optimization approaches would provide a more comprehensive framework for reporting findings on ML/NLP approaches to data extraction. We plan to supplement future iterations of this review by capturing various optimization and training methods.

Ethics and consent

Ethical approval and consent were not required.

Data availability Underlying data

OSF: (Semi)Automated Approaches to Data Extraction for Systematic Reviews and Meta-Analyses in Social Sciences: A Living Review ( Legate & Nimon, 2024). Open Science Framework: https://doi.org/10.17605/OSF.IO/C7NSA ( Legate & Nimon, 2024).

This project contains the following underlying data: •

Baseline Review Underlying Data ○

Baseline Review Results.xlsx

○

Baseline Search Results Folder (a folder containing results by each search source)

○

Open Source Tools.xlsx

•

Baseline Review Extended Data ○

Baseline Review Deduplication and Screening.docx

○

Baseline Review Search Strategy.docx

○

Baseline Review PRISMA Checklist.docx

○

LSR Codebook.docx

○

Regex to Boolean Sytnax.xlsx

•

Baseline Review Code ○

Adapted code files and results for automated search and screening for ACL, ArXIV, and DBLP full database dumps.

Data are available under the terms of the Creative Commons Attribution 4.0 International Public License (CC-BY 4.0).

Extended data

Open Science Framework: https://doi.org/10.17605/OSF.IO/C7NSA( Legate & Nimon, 2023a).

This project contains the following extended data: •

Extraction Techniques Revised.docx – categories and descriptions of data extraction techniques, architecture components, and evaluation metrics of interest

•

Review Classifications.docx – review tasks and stages of interest

•

Target Data Elements.docx – key elements of interest for targeted data elements

•

Comprehensive List of Eligible Data Elements.xlsx – comprehensive list of elements with extraction potential per APA JARS

•

Search Strategy.docx – search syntax for preliminary search in Web of Science

•

APA & Cochrane Data Elements.xlsx – tabled data elements for Cochrane reviews, APA Module C (clinical trials), and APA (all study designs)

Data are available under the terms of the Creative Commons Attribution 4.0 International Public License (CC-BY 4.0).

Reporting guidelines

This study follows PRISMA reporting guidelines ( Page et al., 2021).

Open Science Framework: PRISMA checklist for ‘Open Science Framework: (Semi)Automated Approaches to Data Extraction for Systematic Reviews and Meta-Analyses in Social Sciences: A Living Review’. https://doi.org/10.17605/OSF.IO/C7NSA ( Legate & Nimon, 2024).

Data are available under the terms of the Creative Commons Attribution 4.0 International Public License (CC-BY 4.0).

Software availability

•

Source code available from: https://github.com/mcguinlu/COVID_suicide_living

•

Archived source code: http://doi.org/10.5281/zenodo.3871366 ( McGuinness & Schmidt, 2020).

•

The adapted version of the source code for automated searching: https://doi.org/10.17605/OSF.IO/C7NSA

•

Archived source code: https://doi.org/10.17605/OSF.IO/C7NSA ( Legate & Nimon, 2024).

•

License: MIT

References

Aliyu

Iqbal

James

: The canonical model of structure for data extraction in systematic reviews of scientific research articles. 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS). IEEE;2018, October; pp.264–271. 10.1109/SNAMS.2018.8554896

Angrosh

Cranefield

Stanger

: Contextual information retrieval in research articles: Semantic publishing tools for the research community. Semantic Web. 2014;5(4):261–293. 10.3233/SW-130097

Anisienia

Mueller

Kupfer

: Research method classification with deep transfer learning for semi-automatic meta-analysis of information systems papers. Proceedings of the 54th Hawaii International Conference on System Sciences. 2021; pp.6099–6108. Reference Source

Antons

Grünwald

Cichy

: The application of text mining methods in innovation research: Current state, evolution patterns, and development priorities. R&D Manag. 2020;50(3):329–351. 10.1111/radm.12408

Ali

Gravino

: An ontology-based approach to semi-automate systematic literature reviews. 2018 12th International Conference on Open Source Systems and Technologies (ICOSST). IEEE;2018, December; pp.09–16. 10.1109/ICOSST.2018.8632205

Arlot

Celisse

: A survey of cross-validation procedures for model selection. Stat. Surv. 2010;4:40–79. 10.1214/09-SS054

Aumiller

Almasian

Hausner

: UniHD@CL-SciSumm 2020: Citation extraction as search. Proceedings of the First Workshop on Scholarly Document Processing. 2020, November; pp.261–269. 10.18653/v1/2020.sdp-1.29

Appelbaum

Cooper

Kline

: Journal Article Reporting Standards for Quantitative Research in Psychology: The APA Publications and Communications Board Task Force report. Am. Psychol. 2018;73(1):3–25. 29345484

10.1037/amp0000191

Bayatmakou

Mohebi

Ahmadi

: An interactive query-based approach for summarizing scientific documents. Inf. Discov. Deliv. 2022;50(2):176–191. 10.1108/IDD-10-2020-0124

Belur

Tompson

Thornton

: Interrater reliability in systematic review methodology: Exploring variation in coder decision-making. Sociol. Methods Res. 2018;50(2):837–865. 10.1177/0049124118799372

Beltagy

Peters

Cohan

: Longformer: The long-document transformer. arXiv, abs/2004.05150. 2020. 10.48550/arXiv.2004.05150

Bosco

Uggerslev

Steel

: MetaBUS as a vehicle for facilitating meta-analysis. Hum. Resour. Manag. Rev. 2017;27(1):237–254. 10.1016/j.hrmr.2016.09.013

Bozada

Jr Borden

Workman

: Sysrev: A FAIR platform for data curation and systematic evidence review. Front. Artif. Intell. 2021;4:1–18. Article 685298. 34423285

10.3389/frai.2021.685298

PMC8374944

Cairo

Figueiredo Carneiro

de Silva

da : Adoption of machine learning techniques to perform secondary studies: A systematic mapping study for the computer science field. ICEIS. 2019;2:351–356. 10.5220/0007780603510356

Carrión-Toro

Aguilar

Santórum

: iKeyCriteria: A qualitative and quantitative analysis method to infer key criteria since a systematic literature review for the computing domain. Data. 2022;7(6):70. 10.3390/data 7060070

Chen

PHA

Leibrand

Vasko

: Ontology-based and user-focused automatic text summarization (OATS): Using COVID-19 risk factors as an example. arXiv preprint arXiv:2012.02028. 2020. 10.48550/arXiv.2012.02028

Chen

Montano-Campos

Zadrozny

: Machine reading of hypotheses for organizational research reviews and pre-trained models via R Shiny app for non-programmers. 2021. 10.48550/arXiv.2106.16102

Reference Source

Cohen

: The boundary lens: theorising academic activity.In The university and its boundaries: Thriving or surviving in the 21st Century. 1st ed. Routledge;2021; pp.14–41. 10.4324/9781003102953

Davis

Mengersen

Bennett

: Viewing systematic reviews and meta-analysis in social research through different lenses. Springerplus. 2014;3(1):1–9. 25279303

10.1186/2193-1801-3-511

PMC4167883

Denzler

Enders

Akello

: Towards a semi-automated approach for systematic literature reviews. Twenty-Seventh Americas Conference on Information Systems (AMCIS). 2021; Vol.4: pp.1–10. Reference Source

Diaz-Elsayed

Zhang

: Extracting the characteristics of Life Cycle Assessments via data mining. MethodsX. 2020;7(101004):1–6. 10.1016/j.mex.2020.101004

Dridi

Gaber

Azad

RMA

: Scholarly data mining: A systematic review of its applications. Wiley Interdiscip. Rev.: Data Min. Knowl. Discov. 2021;11(2):1–23. 10.1002/widm.1395

Elliott

Turner

Clavisi

: Living systematic reviews: An emerging opportunity to narrow the evidence-practice gap. PLoS Med. 2014;11(2):E1001603. 24558353

10.1371/journal.pmed.1001603

PMC3928029

Elliott

Synnot

Turner

: Living systematic review: 1. Introduction—the why, what, when, and how. J. Clin. Epidemiol. 2017;91:23–30. 28912002

10.1016/j.jclinepi.2017.08.010

Eriksen

Frandsen

: The impact of patient, intervention, comparison, outcome (PICO) as a search strategy tool on literature search quality: a systematic review. J. Med. Libr. Assoc. 2018;106(4):420–431. 30271283

10.5195/jmla.2018.345

PMC6148624

Feng

Chiam

: Text-mining techniques and tools for systematic literature reviews: A systematic literature review. 2017 24th Asia-Pacific Software Engineering Conference (APSEC) 2017, December; pp.41–50. 10.1109/APSEC.2017.10

Githens

: Critical action research in human resource development. Hum. Resour. Dev. Rev. 2015;14(2):185–204. 10.1177/1534484315581934

Goldfarb-Tarrant

Robertson

Lazic

: Scaling systematic literature reviews with machine learning pipelines. arXiv preprint arXiv:2010.04665. 2020. 10.48550/arXiv.2010.04665

Göpfert

Kuckertz

Weinand

: Measurement extraction with natural language processing: A review. Findings of the Association for Computational Linguistics: EMNLP 2022. 2022;2191–2215. 10.18653/v1/2022.findings-emnlp.161

Goswami

Pal

Goldsworthy

: An effective machine learning framework for data elements extraction from the literature of anxiety outcome measures to build systematic review. Abramowicz

Corchuelo

, editors. Business Information Systems. BIS 2019. Lecture Notes in Business Information Processing. Vol.353. Cham: Springer;2019; pp.265–277. 10.1007/978-3-030-20485-3_19

Gough

Davies

Jamtvedt

: Evidence Synthesis International (ESI): Position statement. Syst. Rev. 2020;9(1):155. 32650823

10.1186/s13643-020-01415-5

PMC7353688

Higgins

JPT

Thomas

Chandler

: Cochrane Handbook for Systematic Reviews of Interventions version 6.3 (updated February 2022). Cochrane;2022;2022. Reference Source

Holub

Hardy

Kallmes

: Toward automated data extraction according to tabular data structure: Cross-sectional pilot survey of the comparative clinical literature. JMIR Form. Res. 2021;5(11):E33124. 34821562

10.2196/33124

PMC8663462

Hadar

Keefe

: A web-based archive of systematic review data. Syst. Rev. 2012;1(1):15. 22588052

10.1186/2046-4053-1-15

PMC3351737

Iwatsuki

Sagara

Hara

: Detecting in-line mathematical expressions in scientific documents. Proceedings of the 2017 ACM Symposium on Document Engineering. 2017, August; pp.141–144. 10.1145/3103010.3121041

Jonnalagadda

Goyal

Huffman

: Automating data extraction in systematic reviews: A systematic review. Syst. Rev. 2015;4(1):78. 26073888

10.1186/s13643-015-0066-7

PMC4514954

Jurafsky

Martin

: Speech and language processing [Feb 2024 release]. 2024. Reference Source

Khamis

Kahale

Pardo-Hernandez

: Methods of conduct and reporting of living systematic reviews: A protocol for a living methodological survey [version 1; peer review: 2 approved]. F1000 Res. 2019;8:221. 10.12688/f1000research.18005.2

Kohl

McIntosh

Unger

: Online tools supporting the conduct and reporting of systematic reviews and systematic maps: A case study on CADIMA and review of existing tools. Environ. Evid. 2018;7(1):1–17. 10.1186/s13750-018-0115-5

Kowsari

Meimandi

Heidarysafa

: Text classification algorithms: A survey. arXiv, abs/1904.08067. 2019. 10.3390/info10040150

Kwiatkowski

Palomaki

Redfield

: Natural questions: A benchmark for question answering research. Trans. Assoc. Comput. Linguist. 2019;7:453–466. 10.1162/tacl_a_00276

Legate

Nimon

: Updated supplemental files: (Semi)automated approaches to data extraction for systematic reviews and meta-analyses in social sciences: A living review protocol. 2023a, January 12. 10.17605/OSF.IO/EWFKP

Legate

Nimon

: (Semi)automated approaches to data extraction for systematic reviews and meta-analyses in social sciences: A living review protocol [version 2; peer review: 2 approved, 1 approved with reservations]. F1000Res. 2023b;11:1036. 10.12688/f1000research.125198.2

Legate

Nimon

: (Semi) Automated Approaches to Data Extraction for Systematic Reviews and Meta-Analyses in Social Sciences: A Living Review.[Dataset]. OSF. 2024, May 5. 10.17605/OSF.IO/C7NSA

Mandal

Ouyang

: CORWA: A citation-oriented related work annotation dataset. arXiv preprint arXiv:2205.03512. 2022. 10.48550/arXiv.2205.03512

Liu

Bai

Mitra

: Searching for tables in digital documents. Ninth International Conference on Document Analysis and Recognition (ICDAR 2007). IEEE;2007, September; Vol.2: pp.934–938. 10.1109/ICDAR.2007.4377052

Marshall

Sutton

O’Keefe

, editors. The Systematic Review Toolbox. 2022. Reference Source

Marshall

Wallace

: Toward systematic review automation: A practical guide to using machine learning tools in research synthesis. Syst. Rev. 2019;8(1):110–163. 31296265

10.1186/s13643-019-1074-9

PMC6621996

McGuinness

Schmidt

: mcguinlu/COVID_suicide_living: Initial Release (v1.0.0).[Data set]. Zenodo. 2020. 10.5281/zenodo.3871366

Nayak

Balasubramaniam

Kutty

: A Semi-automatic Data Extraction System for Heterogeneous Data Sources: A Case Study from Cotton Industry. Data Mining: 19th Australasian Conference on Data Mining, AusDM 2021. Singapore: Springer;2021, December; pp.209–222. 10.1007/978-981-16-8531-6_15

Neppalli

Caragea

Mayes

: MetaSeer. STEM: Towards automating meta-analyses. Proc. AAAI Conf. Artif. Intell. 2016, February;30(2):4035–4040. 10.1609/aaai.v30i2.19081

Nowak

Kunstman

: Team EP at TAC 2018: Automating data extraction in systematic reviews of environmental agents. arXiv preprint arXiv:1901.02081. 2019. 10.48550/arXiv.1901.02081

Ochoa-Hernández

Barcelo-Valenzuela

Sanchez-Smitz

: Concept identification from single-documents. Valencia-García

Alcaraz-Mármol

Cioppo-Morstadt

Del , editors. Technologies and Innovation. CITI 2018. Communications in Computer and Information Science. Vol.883. Cham: Springer;2018; pp.141–152. 10.1007/978-3-030-00940-3_12

O’Connor

Tsafnat

Thomas

: A question of trust: Can we build an evidence base to gain trust in systematic review automation technologies? Syst. Rev. 2019;8(1):143. 31215463

10.1186/s13643-019-1062-0

PMC6582554

O’Mara-Eves

Thomas

McNaught

: Using text mining for study identification in systematic reviews: A systematic review of current approaches. Syst. Rev. 2015;4(1):5. 25588314

10.1186/2046-4053-4-5

PMC4320539

Ouzzani

Hammady

Fedorowicz

: Rayyan-a web and mobile app for systematic reviews. Syst. Rev. 2016;5(1):210. 27919275

10.1186/s13643-016-0384-4

PMC5139140

Page

McKenzie

Bossuyt

: The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. J. Clin. Epidemiol. 2021;88:105189–105906. 33789826

10.1016/j.ijsu.2021.105906

Park

Kim

Han

: Research method trends in the field of human resource development [Refereed Extended Abstract]. 2021 AHRD Virtual Conference. 2021, February 17-19. Reference Source

Pertsas

Constantopoulos

: Ontology-driven information extraction from research publications. Aalberg

Papatheodorou

Dobreva

, editors. Digital Libraries for Open Knowledge: 22nd International Conference on Theory and Practice of Digital Libraries, (TPDL 2018). Springer International Publishing;2018; pp.241–253. 10.1007/978-3-030-00066-0_21

Pigott

Polanin

: Methodological guidance paper: High-quality meta-analysis in a systematic review. Rev. Educ. Res. 2020;90(1):24–46. 10.3102/0034654319877153

Piroi

Lipani

Lupu

: DASyR (IR)-document analysis system for systematic reviews (in Information Retrieval). 2015 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE;2015, August; pp.591–595. 10.1109/ICDAR.2015.7333830

Porter

: An algorithm for suffix stripping. Program: Electronic Library and Information Systems. 1980;14(3):130–137. 10.1108/eb046814

Roldan-Baluis

Zapata

Vasquez

MSM

: The effect of natural language processing on the analysis of unstructured text: A systematic review. Int. J. Adv. Comput. Sci. Appl. 2022;13(5):43–51. 10.14569/IJACSA.2022.0130507

Schmidt

Olorisade

McGuinness

: Data extraction methods for systematic review (semi)automation: A living systematic review [version 1; peer review: 3 approved]. F1000Res. 2021;10:401. 10.12688/f1000research.51117.1

Schmidt

Olorisade

McGuinness

: Data extraction methods for systematic review (semi)automation: A living review protocol (Version 2; peer review: 2 approved). F1000Res. 2020;9:210. 32724560

10.12688/f1000research.22781.2

PMC7338918

Schmidt

Finnerty Mutlu

Elmore

: Data extraction methods for systematic review (semi)automation: Update of a living systematic review (Version 2; peer review: 3 approved). F1000Res. 2023;10:401. 34408850

10.12688/f1000research.51117.2

PMC8361807

Shahid

Afzal

: Section-wise indexing and retrieval of research articles. Clust. Comput. 2018;21(1):481–492. 10.1007/s10586-017-0914-4

Shen

Jiang

: A model for the identification of the functional structures of unstructured abstracts in the social sciences. Electron. Libr. 2022;40(6):680–697. 10.1108/EL-10-2021-0190

Shirmohammadi

Mehdiabadi

Beigi

: Mapping human resource development: Visualizing the past, bridging the gaps, and moving toward the future. Hum. Resour. Dev. Q. 2021;32(2):197–224. 10.1002/hrdq.21415

Sundaram

Berleant

: Automating systematic literature reviews with natural language processing and text mining: A systematic literature review. Eighth International Congress on Information and Communication Technology (ICICT). Singapore: Springer;2023; pp.73–92. 10.1007/978-981-99-3243-6_7

Short

McKenny

Reid : More than words? Computer-aided text analysis in organizational behavior and psychology research. Annu. Rev. Organ. Psych. Organ. Behav. 2018;5(1):415–435. 10.1146/annurev-orgpsych-032117-104622

Torres

JAS

Cruzes

Nascimento Salvador

do : Automatic results identification in software engineering papers. Is it possible? 2012 12th International Conference on Computational Science and Its Applications. IEEE;2012, June; pp.108–112. 10.1109/ICCSA.2012.27

Tsafnat

Glasziou

Choong

: Systematic review automation technologies. Syst. Rev. 2014;3(1):74. 25005128

10.1186/2046-4053-3-74

PMC4100748

Wagner

Lukyanenko

Paré

: Artificial intelligence and the conduct of literature reviews. J. Inf. Technol. 2022;37(2):209–226. 10.1177/02683962211048201

Wohlin

Kalinowski

Felizardo

: Successful combination of database search and snowballing for identification of primary studies in systematic literature studies. Inf. Softw. Technol. 2022;147:106908. 10.1016/j.infsof.2022.106908

Yang

Kinshuk An

: A survey of the literature: how scholars use text mining in Educational Studies? Educ. Inf. Technol. 2023;28(2):2071–2090. 10.1007/s10639-022-11193-3

Young

: Abstract and Index and Web Discovery Services IEEE Partners. IEEE Xplore;2023, January. Reference Source

Young

Hazarika

Poria

: Recent trends in deep learning based natural language processing. 2018. Reference Source

Zhao

Feng

: Interrater reliability estimators tested against true interrater reliabilities. BMC Med. Res. Methodol. 2022;22:232. 36038846

10.1186/s12874-022-01707-5

PMC9426226

Zhitomirsky-Geffet

Bergman

Hilel

: Towards a wider perspective in the social sciences using a network of variables based on thousands of results. Scientometrics. 2020;123:1385–1406. 10.1007/s11192-020-03446-0

Zielinski

Mutschke

: Mining social science publications for survey variables. Proceedings of the Second Workshop on NLP and Computational Social Science. 2017, August; pp.47–52. 10.18653/v1/W17-2907

10.5256/f1000research.172337.r349101

Reviewer response for version 2

Polanin

Joshua

1 Referee https://orcid.org/0000-0001-5100-0164 1American Institutes for Research, Arlington, Virginia, USA

Competing interests: No competing interests were disclosed.

3 1 2025

2025

This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

recommendation

approve-with-reservations

Thank you for the opportunity to submit this peer-review. I found the manuscript engaging and timely. I have several issues the authors can address ahead of final submission.

1. I truly appreciated the lit review of related efforts. However, the organization of the section made it difficult for me to follow what had been done and what was ongoing, and importantly, where the other work potentially overlaps with this manuscript's work. I'm not sure exactly how the authors should restructure this section, but I would strongly urge the authors to consider it.

2. Please make more clear that this manuscript is only looking at evaluations of tools/models/applications, and not a roundup of all the available AI tools. I agree that the manuscript's framing is useful, but it took me a while to understand that the authors were only interested in that aspect.

3. I thought the methods section is pretty good and clear.

4. The results are useful and well organized. Some of the figures are difficult to read and could use something beyond the base ggplot design. Shading or color or plots go a long way.

5. I think the authors should re-examine their Conclusions section and really try and outline main findings in a really clear way. Right now it's tough to tell what they are. Relatedly, I think there's more limitations here than what is listed. This again goes back to the scope, but I think readers who zoom over the lit review will miss that this manuscript is only interested in evaluations of current tools. This means that many applications making claims about their usefulness (i.e., Elicit and other products) have not been included. Especially given the emphasis on qualitative summaries, the authors need to make clear that those tools have *not* been evaluated in the types of ways that the tools mentioned this article have.

6. I appreciated the transparency and reporting done. Nice work!

Overall this is a great article and will make a strong contribution. But a bit more could be done to clarify. I wish the authors good luck in finalizing!

Are the rationale for, and objectives of, the Systematic Review clearly stated?

Yes

Is the statistical analysis and its interpretation appropriate?

Yes

If this is a Living Systematic Review, is the ‘living’ method appropriate and is the search schedule clearly defined and justified? (‘Living Systematic Review’ or a variation of this term should be included in the title.)

Yes

Are sufficient details of the methods and analysis provided to allow replication by others?

Partly

Are the conclusions drawn adequately supported by the results presented in the review?

Partly

Reviewer Expertise:

I'm an expert in research synthesis methods, designing applications for conducting syntheses. I've worked on both AI-based and non-AI-based architectures.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

10.5256/f1000research.166140.r298395

Reviewer response for version 1

Schmidt

Lena

1 Referee https://orcid.org/0000-0003-0709-8226 1National Institute for Health and Care Research Innovation Observatory, Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, England, UK

Competing interests: No competing interests were disclosed.

30 8 2024

2024

recommendation

approve

Thank you for providing this interesting review about automated data extraction from social science studies and model architectures, evaluation, data processing and more. Especially the information in tables, such as tables 3, 4, and 5 are adding a lot of value. I am not an expert in social science studies but am familiar with the general field of automated data extraction. Therefore I have just some very minor comments and questions for clarification below:

The reference from Yu et al. below is mostly concerning screening automation and not data extraction if I am not missing a major point in the paper. If that is correct then there may exist better works to reference in this context? “the process of extracting data from primary research is a labor-intensive effort, fraught with the potential for human error (see Pigott & Polanin, 2020; Yu et al., 2018).” I am not an expert in social science research, but a few included references in Table 2 caught my eye. For example Iwatsuki et al. (2017) about detecting in-line mathematical expressions or Torres et al. (2012) about software engineering or later Nayak et al. (2021) about cotton industry? Regarding this sentence in the conclusions, it might be more up-to-date to reference the review update from 2023 with 76 included papers: “For example, while an LSR focusing on clinical research that is based on the PICO framework yielded 53 studies that included original data extraction (Schmidt et al., 2021)” One of the challenges with living updates is to adapt the search whenever there are new developments in a field of research. You may have already considered adapting the search strategy to make sure that new methods relying on large language models (LLM) like GPT or T5 are picked up? There may be relevant articles coming through soon, for example https://arxiv.org/abs/2405.14445 may be of interest for a future review update as it looks at social science study data extraction and if it is, then it would be good to make sure that the search can pick up the terminology correctly. In the methodology section, could you please state the dates when the search relevant to the baseline review cutoff was conducted (for each data source if different) ?"

Are the rationale for, and objectives of, the Systematic Review clearly stated?

Yes

Is the statistical analysis and its interpretation appropriate?

Yes

Are sufficient details of the methods and analysis provided to allow replication by others?

Yes

Are the conclusions drawn adequately supported by the results presented in the review?

Yes

Reviewer Expertise:

Systematic review automation, automated data extraction (clinical trials), natural language processing, living reviews

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Legate

Amanda

University of Texas Tyler

Competing interests: No competing interests were disclosed.

22 9 2024

We are honored that you agreed to review our research and sincerely appreciate your thoughtful review and feedback. Please find responses to each comment below.

Comment 1

Comment 1: Response

Thank you for catching this oversight. You are absolutely correct; Yu et al. (2018) primarily focused on screening automation for primary study selection rather than data extraction stages of SLRs. We have removed this reference from the sentence to align with the context.

Comment 2

I am not an expert in social science research, but a few included references in Table 2 caught my eye. For example Iwatsuki et al. (2017) about detecting in-line mathematical expressions or Torres et al. (2012) about software engineering or later Nayak et al. (2021) about cotton industry?

Comment 2: Response

Thank you for your insights on the relevance of references in Table 2. Our search strategy was intentionally broad to include studies utilizing (semi)automated data extraction methods across various domains, provided they were not solely focused on clinical research. The goal was to ensure comprehensiveness; however, we understand your concern regarding the ambiguity of some references' relevance to social sciences. As our study is a "living" review, we see this as an excellent opportunity to consider refining our inclusion criteria in future updates. We will explore more targeted approaches that can help streamline the search strategy, potentially focusing on research that more directly applies to social sciences or explicitly demonstrates transferable methodologies that align with the needs of social science researchers. Additionally, we are discussing options for collaborating with experts who specialize in bibliometric analysis or search strategy optimization to ensure that our review remains focused, relevant, and complementary to your work.

Comment 3

Regarding this sentence in the conclusions, it might be more up-to-date to reference the review update from 2023 with 76 included papers: “For example, while an LSR focusing on clinical research that is based on the PICO framework yielded 53 studies that included original data extraction (Schmidt et al., 2021)”.

Comment 3: Response

Thank you for pointing out this important update, we appreciate your diligence in ensuring that references are current and reflective of the most recent findings. The manuscript has been revised to reference the 2023 update of your LSR to accurately reflect the most up-to-date results.

Comment 4

One of the challenges with living updates is to adapt the search whenever there are new developments in a field of research. You may have already considered adapting the search strategy to make sure that new methods relying on large language models (LLM) like GPT or T5 are picked up? There may be relevant articles coming through soon, for example https://arxiv.org/abs/2405.14445 may be of interest for a future review update as it looks at social science study data extraction and if it is, then it would be good to make sure that the search can pick up the terminology correctly.

Comment 4: Response

Thank you for highlighting the importance of adapting the search strategy to capture emerging developments in automation technologies, particularly those involving large language models (LLMs) like GPT or T5. We completely agree that a key aspect of maintaining the relevance and rigor of a living systematic review is to continuously update the search strategy to reflect the current state-of-the-art in the field. We will incorporate this valuable feedback into future iterations by updating our search terms and strategies to include LLM-related methodologies and terminologies, ensuring the inclusion of new and relevant articles. The paper you referenced ( https://arxiv.org/abs/2405.14445) serves as an excellent example, and we will use it to refine our search criteria. This approach will help us stay current with advances in data extraction techniques. Thank you for providing specific references to guide this adaptation.

Comment 5

In the methodology section, could you please state the dates when the search relevant to the baseline review cutoff was conducted (for each data source if different) ?"

Comment 5: Response

Thank you for this thoughtful suggestion. While we reported the search dates in the extended data files housed in OSF, we agree with you that including them directly in the Methods section would add clarity and value for readers. We have updated the section to specify the dates when searches were conducted for each data source, ensuring this information is clear and accessible to readers.

10.5256/f1000research.166140.r298402

Reviewer response for version 1

Macura

Biljana

1 Referee https://orcid.org/0000-0002-4253-1390 1Stockholm Environment Institute, Stockholm, Sweden

Competing interests: No competing interests were disclosed.

19 8 2024

2024

recommendation

approve-with-reservations

This manuscript represents an important contribution to the evidence synthesis methodology. Given the rise of AI technology, a living evidence base on approaches to data extraction will be very useful. However, the manuscript could benefit from improved clarity. Below are my comments:

Title:

Clarify the type of data being extracted (qualitative, quantitative, or mixed).

Since this review does not include any qualitative or quantitative synthesis per se, but rather provides an overview of the field (methods for semi-automated data extraction), I suggest removing "living systematic review" and adding "living systematic map."

Abstract:

The summary of methods could include more detailed information on searches, screening, critical appraisal, and synthesis. Please specify which standards for review conduct were followed.

The summary of results could provide more information (briefly) about the included studies.

Keywords:

Avoid repeating terms already present in the title

Introduction:

The focus of this review—extraction tools for quantitative data—should be more explicitly stated. This emphasis needs to be clearer in the introduction and reflected in the title, as mentioned earlier. Specifically, the first paragraph of the Introduction should be revised to concentrate on the review topic—quantitative data extraction and existing tools—rather than a general introduction to meta-science or related areas.

Additional details are needed on how this review contributes to and complements existing reviews on the topic. This information should be included in the "Related Research" section.

Objectives:

It would be helpful to define what is included under “social science research domains”.

Methods:

Authors should be transparent and explicit about the guidelines and standards for both conduct and reporting that were used. Please clarify this at the beginig of the Methods section.

The methods section should begin by addressing any deviations from the protocol. If there were no deviations, this should be clearly stated as well.

Did you use any automation technologies to screen or select studies for this review? If yes, please clarify.

Methods/Eligibility criteria:

The eligibility criteria should be explicit about the field within which methods for (semi)automated data extraction are applied.

A definition of “(semi)automated” is needed. The eligibility criteria currently state that semi-automated approaches will be eligible but then refer to “any automated approach to data extraction” in the next sentence. This needs to be clarified—are the focus and criteria on semi-automated or automated approaches? Be more explicit and precise in the description of the eligibility criteria, and ensure alignment with the protocol.

Instead of “We excluded studies labeled as editorials, briefs..” you may write “Editorials, briefs, …were not considered eligible” (and similar changes may be applied to the following sentence)

Methods/ Searches

Be explicit about the citation indices included in your Web of Science subscription and note which library was used to access WoS. This will increase transparency and replicability of your searches.

Clarify why following Schmidt et al.'s search strategy was important, given the different scope of this review. Consider including more social science databases to ensure comprehensive coverage. Did you include the Social Science Citation Index (within WoS)?

Provide explanations for all abbreviations (IEEE, ACL, etc.) in the text.

Methods/Study selection

Clarify if three researchers simultaneously screened titles and abstracts (TA), and whether inter-rater reliability (IRR) was calculated for TA screening. How you trained reviewers to apply eligibility criteria?

The sentence , “ In cases where level of abstraction and potential for transferability could not be determined from the abstract alone, full text articles were reviewed and discussed by all three researchers until consensus was reached”, should more clearly state that there was NO full-text screening of all records (if this is correct), only of a sub-sample where abstracts did not clearly describe AI technology, etc.

Relatedly, Figure 1 should be adjusted to avoid giving the false impression that all records were screened in full text.

This review seems to involve META-data extraction rather than DATA extraction. Please adjust the text and figures accordingly.

It is not clear if IRR assessments were conducted for meta-data extraction. Please clarify/be explicit. If IRR was not done, describe how researchers were trained to use the extraction form.

The sentence, “ coding forms allowed for input of “other” responses (e.g., APA data elements) that were not included in extant reviews that focus on medical and clinical data extraction (e.g., PICO elements)” is unclear. Consider removing or clarifying and linking it better with the rest of the text.

Describe the procedure for screening and meta-data extraction of studies authored by the review team.

Methods/Critical appraisal and Synthesis

These sections are missing. Please state clearly if a critical appraisal of included studies was conducted and if so, how was it performed. Also, describe how synthesis was conducted.

Results/Challenges

Clarify that the described challenges reflect issues within the body of evidence included in this (baseline) review (otherwise this section can be mixed up with review limitations).

Conclusions/Limitations

Organize limitations into those related to the methodology used and those related to the evidence base.

Discuss limitations related to the focus on publications in English, the inexhaustive list of search sources, and the lack of grey literature.

Are the rationale for, and objectives of, the Systematic Review clearly stated?

Partly

Is the statistical analysis and its interpretation appropriate?

Not applicable

Partly

Are sufficient details of the methods and analysis provided to allow replication by others?

Partly

Are the conclusions drawn adequately supported by the results presented in the review?

Yes

Reviewer Expertise:

Systematic Evidence Synthesis Methodology

Legate

Amanda

University of Texas Tyler

Competing interests: No competing interests were disclosed.

22 9 2024

Dear Dr. Macura,

Thank you for your thoughtful and detailed feedback on our manuscript. We appreciate the time and effort you have invested in providing suggestions to enhance our work. We also value rigorous research methods and reporting transparency and would like to clarify several points regarding the reporting guidelines we adhered to and the journal's policies and requirements.

Our manuscript follows the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. As noted in the F1000Research “Article Standards of Reporting” (https://f1000research.com/about/policies#stofrep), systematic reviews published in this journal must adhere to PRISMA guidelines. We have ensured that our reporting aligns with PRISMA's emphasis on transparency, replicability, and comprehensiveness.

We would like to express our genuine appreciation for the important work you and your colleagues have done in developing the ROSES (Reporting standards for Systematic Evidence Syntheses) guidelines for systematic evidence synthesis in environmental science. Improving transparency and standardization in research reporting is a goal we fully support. While we acknowledge the value of the ROSES guidelines, they were not the reporting standard required or appropriate for our systematic review. We noticed that many of your comments seem to assess our manuscript against the ROSES guidelines (Haddaway et al., 2017a; 2017b; 2018; Haddaway & Macura, 2018). For example, the suggestion to emphasize "meta-data extraction" aligns more with ROSES, whereas PRISMA does not require such differentiation and focuses on clarity in describing the data collection process, whether it involves meta-data or primary data points.

We believe it is essential to assess our work based on the scope and framework provided by PRISMA rather than extend it beyond its current focus to fit an alternative reporting framework. We are committed to making revisions that enhance the clarity and rigor of our research while remaining consistent with the standards required by the journal.

Thank you again for your constructive feedback and for considering our clarifications.

References

Haddaway, N. R., & Macura, B. (2018). The role of reporting standards in producing robust literature reviews. Nature Climate Change, 8(6), 444–447. https://doi.org/10.1038/s41558-018-0180-3

Haddaway, N. R., Macura, B., Whaley, P., & Pullin, A. S. (2017a). ROSES for systematic map reports (Version 1.0) [Data file]. https://doi.org/10.6084/m9.figshare.5897299

Haddaway, N. R., Macura, B., Whaley, P., & Pullin, A. S. (2017b). ROSES for systematic review reports (Version 1.0) [Data file]. https://doi.org/10.6084/m9.figshare.5897272

Haddaway, N. R., Macura, B., Whaley, P., & Pullin, A. S. (2018). ROSES reporting standards for systematic evidence syntheses: Pro forma, flow-diagram and descriptive summary of the plan and conduct of environmental systematic reviews and systematic maps. Environmental Evidence, 7(7). https://doi.org/10.1186/s13750-018-0121-7

Comment 1

[Title] Clarify the type of data being extracted (qualitative, quantitative, or mixed).

Comment 2

[Title] Since this review does not include any qualitative or quantitative synthesis per se, but rather provides an overview of the field (methods for semi-automated data extraction), I suggest removing "living systematic review" and adding "living systematic map."

Comment 1 & 2: Response

Thank you for these valuable suggestions regarding the title. We have considered (1) specifying the type of data being extracted in the title and (2) changing the title from "living systematic review" to "living systematic map." However, we have retained the original title to ensure consistency with our pre-registered protocol, adhere to PRISMA reporting standards, and comply with F1000Research guidelines.

Comment 3

[Abstract] The summary of methods could include more detailed information on searches, screening, critical appraisal, and synthesis. Please specify which standards for review conduct were followed.

Comment 3: Response

Thank you for the suggestion to provide more detailed information on searches, screening, critical appraisal, and synthesis in the abstract to better align with ROSES reporting recommendations. To ensure compliance with the journal's requirements, we followed the PRISMA guidelines for a structured summary, which emphasize conciseness in presenting objectives, eligibility criteria, methods, results, and conclusions. While we understand the desire for additional details, we believe the current abstract aligns with these guidelines but will review it again to ensure optimal clarity.

Comment 4

[Abstract] The summary of results could provide more information (briefly) about the included studies.

Comment 4: Response

Thank you for the recommendation to provide more information about the included studies within the abstract. We will incorporate a brief summary of the included studies' key characteristics and findings in future updates to this review to enhance clarity and completeness.

Comment 5

[Keywords] Avoid repeating terms already present in the title

Comment 5: Response

Thank you for highlighting ROSES guidance indicating that keywords do not repeat the title but rather provide additional context. Where appropriate, we will revise keywords to avoid redundancy and enhance discoverability.

Comment 6

[Introduction] The focus of this review—extraction tools for quantitative data—should be more explicitly stated. This emphasis needs to be clearer in the introduction and reflected in the title, as mentioned earlier. Specifically, the first paragraph of the Introduction should be revised to concentrate on the review topic—quantitative data extraction and existing tools—rather than a general introduction to meta-science or related areas.

Comment 6: Response

Thank you for your feedback on clarifying the focus of our review. Our study does not exclusively focus on extraction tools for quantitative data; it encompasses approaches to data extraction for both quantitative and qualitative data elements relevant to evidence synthesis in systematic reviews and meta-analyses within social sciences. To better reflect this broader focus, we have revised the objective section to explicitly state that the review covers data extraction tools for a range of data types. We hope this adjustment will provide clearer insight into the comprehensive scope of our review.

Comment 7

[Introduction] Additional details are needed on how this review contributes to and complements existing reviews on the topic. This information should be included in the "Related Research" section.

Comment 7: Response

Thank you for this insightful comment. We agree on the importance of clearly situating our review within the existing literature to highlight its unique contributions. Although we did not adhere to ROSES guidelines for explaining the review's relevance to existing literature, we followed PRISMA guidelines in the "Related Literature" section to identify relevant prior reviews and synthesize their focus, findings, and limitations. We will consider ways to enhance this section to better emphasize our review's distinct contributions moving forward.

Comment 8

[Objectives] It would be helpful to define what is included under “social science research domains”.

Comment 8: Response

Thank you for this suggestion. Our pre-registered research protocol and the "Baseline Review Search Strategy" document (available in the project's OSF repository) provide a comprehensive list of over 100 subject categories included under social science research domains, ranging from sociology and political science to interdisciplinary areas such as "Social Sciences Mathematical Methods." To enhance clarity, we have updated the objectives section to include more details and a reference to the extended data file.

Comment 9

[Methods] Authors should be transparent and explicit about the guidelines and standards for both conduct and reporting that were used. Please clarify this at the beginning of the Methods section.

Comment 9: Response

We acknowledge that the ROSES guidelines recommend transparency in reporting the guidelines and standards for both conduct and reporting at the beginning of the Methods section. However, in accordance with the journal's article guidelines for living systematic reviews (available from: https://f1000research.com/for-authors/article-guidelines/living-systematic-reviews), this information is provided in the "Reporting Guidelines" section.

Comment 10

[Methods] The methods section should begin by addressing any deviations from the protocol. If there were no deviations, this should be clearly stated as well.

Comment 10: Response

Thank you for highlighting this important aspect. While we did not adopt the ROSES reporting standards for this research, we recognize their guidance on stating any deviations from the protocol at the beginning of the methods section. We have addressed any deviations in the appropriate sections of the paper, and additional descriptions are provided in the extended data files to ensure transparency and replicability.

Comment 11

[Methods] Did you use any automation technologies to screen or select studies for this review? If yes, please clarify.

Comment 11: Response

Thank you for your question. The use of automation technologies is detailed in the "Search Sources" and "Study Selection" subsections of the Methods section. Additionally, to ensure transparency and replicability, further details are provided in the "Software Availability" section, as per F1000Research guidelines.

Comment 12

[Methods/Eligibility criteria] The eligibility criteria should be explicit about the field within which methods for (semi)automated data extraction are applied.

Comment 12: Response

Thank you for this comment. To ensure clarity, we have referenced the extended data files in the text, which provide comprehensive details and a full list of over 100 research fields. These details are openly available in the project repository, as specified in the protocol (please see response to Comment 8).

Comment 13

[Methods/Eligibility criteria] A definition of “(semi)automated” is needed. The eligibility criteria currently state that semi-automated approaches will be eligible but then refer to “any automated approach to data extraction” in the next sentence. This needs to be clarified—are the focus and criteria on semi-automated or automated approaches? Be more explicit and precise in the description of the eligibility criteria and ensure alignment with the protocol.

Comment 13: Response

Thank you for your suggestion to clarify the phrasing regarding the eligibility criteria. We have revised the description to specify that the focus is on any "technique" applied for extracting data from literature in a semi-automated manner. This adjustment aligns with the study protocol.

Comment 14

[Methods/Eligibility criteria] Instead of “We excluded studies labeled as editorials, briefs..” you may write “Editorials, briefs, …were not considered eligible” (and similar changes may be applied to the following sentence).

Comment 14: Response

Thank you for the suggestion. We have revised the text to use passive construction, as recommended. We have also applied similar changes to the following sentence for consistency.

Comment 15

[Methods/ Searches] Be explicit about the citation indices included in your Web of Science subscription and note which library was used to access WoS. This will increase transparency and replicability of your searches.

Comment 15: Response

Thank you for this suggestion. To avoid redundancy in the manuscript, we have added a statement directing readers to the extended data files, which provide additional detail related to WoS indices and search settings.

Comment 16

[Methods/ Searches] Clarify why following Schmidt et al.'s search strategy was important, given the different scope of this review.

Comment 16: Response

Thank you for this comment. To clarify, we followed Schmidt et al.'s search strategy to ensure comprehensive coverage of relevant databases and consistency in methodological rigor, which is important even with a different scope. To avoid redundancy, we have added a reference in the manuscript directing readers to the extended data file and research protocol in our open-access repository, where this rationale is explained in detail.

Comment 17

[Methods/ Searches] Consider including more social science databases to ensure comprehensive coverage.

Comment 17: Response

Thank you for this valuable suggestion. We appreciate the importance of comprehensive coverage and will consider including additional social science databases in future updates to further enhance the scope of our review.

Comment 18

[Methods/ Searches] Did you include the Social Science Citation Index (within WoS)?

Comment 18: Response

Yes, the Social Science Citation Index within Web of Science was included. We have updated the text to clarify that all editions, settings, and search syntax used are detailed in the extended data files available in the open-access repository.

Comment 19

[Methods/ Searches] Provide explanations for all abbreviations (IEEE, ACL, etc.) in the text.

Comment 19: Response

Thank you for the suggestion. We have added explanations for all source abbreviations (e.g., IEEE, ACL) in the text to improve clarity for readers.

Comment 20

[Methods/ Study selection] Clarify if three researchers simultaneously screened titles and abstracts (TA), and whether inter-rater reliability (IRR) was calculated for TA screening. How you trained reviewers to apply eligibility criteria?

Comment 20: Response

Thank you for your question. The "Study Selection" section of the paper details independent screening procedures, training process for reviewers on applying eligibility criteria, and inter-rater reliability (IRR) considerations.

Comment 21

[Methods/ Study selection] The sentence , “In cases where level of abstraction and potential for transferability could not be determined from the abstract alone, full text articles were reviewed and discussed by all three researchers until consensus was reached”, should more clearly state that there was NO full-text screening of all records (if this is correct), only of a sub-sample where abstracts did not clearly describe AI technology, etc.

Comment 21: Response

Thank you for this observation. While ROSES guidelines provide alternative flowchart formatting and descriptions, we adhered to PRISMA guidelines. According to the PRISMA flowchart (Figure 1), a total of 11,336 records were identified, and after deduplication, 10,644 articles underwent title and abstract screening. As indicated in the flowchart, only 46 articles proceeded to the full-text screening stage, which occurred separately.

Comment 22

[Methods/ Study selection] Relatedly, Figure 1 should be adjusted to avoid giving the false impression that all records were screened in full text.

Comment 22: Response

Thank you for highlighting this concern. The current figure indicates that 46 articles were included in the full-text screening stage, making it clear that not all records were screened in full text. However, we will consider expanding the flowchart in future updates to provide additional details that could further enhance transparency and clarity.

Comment 23

[Methods/ Study selection] This review seems to involve META-data extraction rather than DATA extraction. Please adjust the text and figures accordingly.

Comment 23: Response

Thank you for your observation. While ROSES emphasizes distinguishing between meta-data extraction and data extraction, PRISMA does not make this distinction as explicitly. Our paper follows PRISMA guidelines, focusing on transparency and completeness in documenting the tools, databases, and criteria used for extraction.

Comment 24

[Methods/ Study selection] It is not clear if IRR assessments were conducted for meta-data extraction. Please clarify/be explicit. If IRR was not done, describe how researchers were trained to use the extraction form.

Comment 24: Response

Thank you for raising this point. The "Study Selection" section of the paper details this information and discusses inter-rater reliability (IRR) assessments.

Comment 25

[Methods/ Study selection] The sentence, “ coding forms allowed for input of “other” responses (e.g., APA data elements) that were not included in extant reviews that focus on medical and clinical data extraction (e.g., PICO elements)” is unclear. Consider removing or clarifying and linking it better with the rest of the text.

Comment 25: Response

Thank you for this suggestion. We have refined the statement to improve clarity and ensure it is better linked with the surrounding text.

Comment 26

[Methods/ Study selection] Describe the procedure for screening and meta-data extraction of studies authored by the review team.

Comment 26: Response

Thank you for highlighting ROSES guidance surrounding procedures for handling studies authored by the review team. However, no alternative procedures were implemented; therefore, there are no additional procedures to report.

Comment 27

[Methods/ Critical appraisal and Synthesis] These sections are missing. Please state clearly if a critical appraisal of included studies was conducted and if so, how was it performed. Also, describe how synthesis was conducted.

Comment 27: Response

Thank you for your comment. These sections are specific to ROSES guidelines. These sections are not required by PRISMA or the journal's reporting standards.

Comment 28

[Results/Challenges] Clarify that the described challenges reflect issues within the body of evidence included in this (baseline) review (otherwise this section can be mixed up with review limitations).

Comment 28: Response

To avoid confusion with review limitations, we have revised the first sentence of this section to clarify that the challenges discussed specifically reflect issues within the body of evidence included in this baseline review.

Comment 29

[Conclusions/Limitations] Organize limitations into those related to the methodology used and those related to the evidence base.

Comment 29: Response

While our research and protocol were developed following PRISMA guidelines rather than ROSES, which requires a structured discussion of limitations, we appreciate the value of differentiating between methodological constraints and evidence base gaps. We will consider this distinction in future updates to enhance clarity.

Comment 30

[Conclusions/Limitations] Discuss limitations related to the focus on publications in English, the inexhaustive list of search sources, and the lack of grey literature.

Comment 30: Response

Thank you for the suggestion. We have updated the limitations section to address the focus on publications in English, the inexhaustive list of search sources, and the lack of grey literature.

10.5256/f1000research.166140.r298396

Reviewer response for version 1

Oswald

Fred

1 Referee https://orcid.org/0000-0002-7275-5408 1Rice University, Houston, Texas, USA

Competing interests: No competing interests were disclosed.

16 7 2024

2024

recommendation

approve

Overall, this paper is an excellent review of automated data-extraction methods for the purposes of synthetic reviews and meta-analysis. To my knowledge, there is no such review in the literature, and yet given the rise in AI-based technologies, there is a rising need for researchers to have a single resource identifying these extraction methods. This review nicely summarizes the types of tools that are out there, but it might further tie the tools more closely to a checklist that reflects must be typically must be accomplished when conducting meta-analysis (e.g., identifying literature, extracting sample sizes and effect sizes, converting effect sizes when necessary, coding effect sizes into variables, associated moderators, associated reliability coefficients, dealing with missing data). This would give the reader a better sense of what* gets automated and serves their purposes (e.g., you can take the ‘model architectures and components’ section and populate the checklist/framework with these AI tools/functions) . Also, though it is certainly useful to document when and how humans are compared to automated systems, the level of accuracy reported (e.g., errors of commission and omission by automated systems) would be useful as well (i.e., are these automated systems any good? when systems agree with humans, when are they agreeing in an accurate way vs. a biased way?)

Thank you for the opportunity to review – again, this will be a valuable paper to readers.

Are the rationale for, and objectives of, the Systematic Review clearly stated?

Yes

Is the statistical analysis and its interpretation appropriate?

Yes

Are sufficient details of the methods and analysis provided to allow replication by others?

Yes

Are the conclusions drawn adequately supported by the results presented in the review?

Yes

Reviewer Expertise:

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Legate

Amanda

University of Texas Tyler

Competing interests: No competing interests were disclosed.

22 9 2024

Comment 1

Also, though it is certainly useful to document when and how humans are compared to automated systems, the level of accuracy reported (e.g., errors of commission and omission by automated systems) would be useful as well (i.e., are these automated systems any good? when systems agree with humans, when are they agreeing in an accurate way vs. a biased way?)

Thank you for the opportunity to review – again, this will be a valuable paper to readers.

Comment 1: Response

Thank you for this insightful suggestion. We agree that evaluating the accuracy of automated systems compared to human assessments, particularly regarding errors of commission and omission, would provide valuable insights into their effectiveness and potential biases. Understanding when automated systems align with human judgments accurately is indeed crucial for advancing the field.

Given the "living" nature of our review, we see this as an important focus for future updates. Although additional technical expertise may be required to conduct a comprehensive comparative assessment of these accuracy measures, we hope to expand our team to include experts in areas of data science and AI evaluation. This addition will enhance the rigor of our review and address critical questions surrounding the reliability of automated tools.

We appreciate your valuable feedback and are committed to integrating these considerations in future versions of our living review.