(Semi)automated approaches to data extraction for systematic reviews and meta-analyses in social sciences: A living review protocol [version 1; peer review: awaiting peer review]

Background : An abundance of rapidly accumulating scientific evidence presents novel opportunities for researchers and practitioners alike, yet such advantages are often overshadowed by resource demands associated with finding and aggregating a continually expanding body of scientific information. Across social science disciplines, the use of automation technologies for timely and accurate knowledge synthesis can enhance research translation value, better inform key policy development, and expand the current understanding of human interactions, organizations, and systems. Ongoing developments surrounding automation are highly concentrated in research for evidence-based medicine with limited evidence surrounding tools and techniques applied outside of the clinical research community. Our objective is to conduct a living systematic review of automated data extraction techniques supporting systematic reviews and meta-analyses in the social sciences. The aim of this study is to extend the automation knowledge base by synthesizing current trends in the application of extraction technologies of key data elements of interest for social scientists. Methods : The proposed study is a living systematic review employing a partial replication framework based on extant literature surrounding automation of data extraction for systematic reviews and meta-analyses. Protocol development, base review, and updates follow PRISMA standards for reporting systematic reviews. This protocol is preregistered in OSF: (Semi)Automated Approaches to Data Extraction for Systematic Reviews and Meta-Analyses in Social Sciences: A Living Review Protocol on August 14, 2022. Conclusions : Anticipated outcomes of this study include: (a) generate insights supporting advancement in transferring existing reliable methods to social science research; (b) provide a foundation for protocol development leading to enhancement of comparability and benchmarking across disciplines; and (c) uncover exigencies that spur continued value-adding innovation and interdisciplinary collaboration for the benefit of the collective systematic review community.


Introduction
Across disciplines, systematic reviews and meta-analyses are integral to exploring and explaining phenomena, discovering causal inferences, and supporting evidence-based decision making. The concept of metascience represents an array of evidence synthesis approaches which support combining existing research results to summarize what is known about a specific topic (Davis et al., 2014;Gough et al., 2020). Researchers use a variety of systematic review methodologies to synthesize evidence within their domains or to integrate extant knowledge bases spanning multiple disciplines and contexts. When engaging in quantitative evidence synthesis, researchers often supplement the systematic review with meta-analysis (a principled statistical process for grouping and summarizing quantitative information reported across studies within a research domain; Shamseer et al., 2015). As technology advances, in addition to greater access to data, researchers are presented with new forms and sources of data to support evidence synthesis (Bosco et al., 2017;Ip et al., 2012;Wagner et al., 2022). An abundance of accumulated scientific evidence presents novel opportunities for translational value, yet advantages are often overshadowed by resource demands associated with locating and aggregating a continually expanding body of information. In the social sciences, the number of published systematic reviews and meta-analyses have experienced continual growth over the past 20 years, with an annual increase approximating 21% based on citation reports from Web of Science (see Figure 1).

Background
Comprehensive data extraction activities associated with evidence synthesis have been described as time-consuming to the point of critically limiting the usefulness of existing approaches (Holub et al., 2021). Moreover, research indicates that it can take several years for original studies to be included in a new review due to the rapid pace of new evidence generation (Jonnalagadda et al., 2015). As such, research communities are increasingly interested in the application of automation technologies to reduce the workload associated with systematic reviews. Tsafnat et al. (2014, p. 2) delineated fifteen tasks associated with systematic reviews and meta-analyses, illuminating the automation potential for eachincluding the steps involved in repetitive data extraction. Recent studies and conference proceedings have outlined critical factors influencing the development and adoption of automation efforts across social and behavioral sciences. Including, but not limited to, (a) an absence of tools developed for use outside of medical science research (Marshall & Wallace, 2019); (b) a lack of universal terminology (Gough et al., 2020); and (c) nonuniformity in presenting and reporting data (Yarkoni et al., 2021). Notwithstanding these contributions, important questions related to how social scientists are addressing known challenges remain unanswered.
The need for this review in social sciences Social sciences encompass a broad range of research disciplines, however, what social scientists share is an interest in expanding a collective understanding of human behaviors, interactions, systems, and organizations (National Institute of Social Sciences, n.d.). Systematic reviews and meta-analyses are fundamental to supporting reproducibility and Figure 1. Social sciences systematic review and meta-analysis publications by year. Note. Figure was generated using the Web of Science Core Collection database. A title search was conducted in the Social Sciences Citation Index (SSCI) for articles and reviews published between 2000-2022 including variations of the terms "Systematic Review" and "Meta-analysis". Search Syntax: ((TI=("meta-analy*" or "meta analy*" or metaanaly* or "system* review" or "literature review")) AND PY= ) AND DT=(Article OR Review). generalizability of research surrounding social and cultural aspects of human behavior, however, the process of extracting data from primary research is a labor-intensive effort, fraught with the potential for human error (see Pigott & Polanin, 2020;Yu et al., 2018). In contrast with the more defined standards that have evolved throughout the clinical research domain, within and across social sciences, substantial variation exists in research designs, reporting protocols, and even publication outlet standards (Davis et al., 2014;Short et al., 2018;Wagner et al., 2022). Notwithstanding that application of automation technologies in the social sciences could benefit from greater standardization of reporting protocols and terminology, understanding of the current state of (semi) automated extraction across these disciplines is largely speculative.
In the clinical research community, automation technologies are rapidly evolving for data extraction. Tools applying intelligent technologies for the purpose of data extraction are increasingly common for research involving Randomized Control Trials (RCT; see Schmidt et al., 2021). As data elements targeted for extraction from clinical studies and healthcare interventions often differ from those targeted by social scientists, transferability of technological solutions remains constrained. Figure 2 presents a general overview of methodologies covered by quantitative reporting guidelines applicable to social sciences per the American Psychological Association (APA, 2020). As of 2018, the APA Journal Article Reporting Standards (JARS) were updated to include clinical trial reports (represented in Figure 2 by the block labeled "Clinical Trials Module C"), incorporating elements also identified by the Cochrane Handbook for Systematic Reviews of Interventions; Higgins et al., 2022).
To elaborate, in health intervention research, targeted data elements generally include Population (or Problem), Intervention, Control, and Outcome (i.e., PICO; see Eriksen et al., 2018;Tsafnat et al., 2014). In social research, elements targeted for extraction are similarly a function of study design, but targets can take numerous forms based on research questions considered. Researchers in social sciences often rely on APA JARS guidelines, which delineate key elements and respective reporting locations for authors to follow when presenting results of qualitative (JARS-Qual) and quantitative (JARS-Quant) research (APA, 2020; Appelbaum et al., 2018; see also Purdue Online Writing Lab, n.d.). For example, in addition to descriptive statistics (e.g., sample size, mean, standard deviation), meta-analytic efforts typically aim to extract and aggregate inferential elements such as effect sizes and p-values. Where structural equation models are involved, a researcher may be interested in extracting model fit indices; or when conducting a reliability generalization, extraction of instrument psychometric properties would be imperative (Appelbaum et al., 2018; see "Extended Data" for supplementary files containing target data elements).
To further illustrate the scope of eligible extraction targets for which automation technologies could benefit social science research, we used wordcloud2 package (v0.2.2; Lang, 2022; R Statistical Software v4.1.2; R Core Team 2021) to present simple comparative visualizations of reporting items for systematic reviews and meta-analyses (see Figures 3 and 4). In Figure 3, the gray text represents data collection elements identified by both the Cochrane Handbook for Systematic Reviews of Interventions (Li et al., 2022) and data reporting standards for clinical trials recommended by APA JARS (Applebaum et al., 2018; Table 2, Module C). The orange text represents reporting elements unique to APA JARS. Notwithstanding substantial overlap, the minimal unique terms are likely attributable to variation in disciplinary phrasing. Figure 4 similarly shows target data elements (with orange text likewise representing items unique to APA) but also includes recommended reporting items for all study designs (e.g., replications, structural equational modeling, Bayesian techniques, etc.). If automation techniques experiencing rapid growth in clinical research hold potential for transferability to the range of study designs prevalent throughout social and behavioral sciences, benefits could be far reaching for the development and validation of theoretical models, measurement scales, and much more.
Given that evidence-based medicine is often associated with superior protocol standards and systematic guidelines (i.e., gold standards; Grimmer et al., 2021), the task of transferring even the most reliable automation technologies to social science research presents a substantial challenge. Even within more technical disciplines, such as Information Systems, researchers grapple with automation challenges associated with a lack of uniformity in description and presentation of constructs and measurement items (Wagner et al., 2022, p. 12). While discourse surrounding the delayed  Table 2, Module C) and The Cochrane Handbook checklist of items to consider in data collection (Table 5.3.a). Orange represents element unique to Appelbaum et al. (2018, Table 2, Module C). Terms appearing in word cloud are not exhaustive and are limited to frequencies greater than one.  Tables 1-9) and The Cochrane Handbook checklist of items to consider in data collection (Table 5.3.a). Orange represents element unique to Appelbaum et al. (2018, Tables 1 -9). Terms appearing in word cloud are not exhaustive and are limited to frequencies greater than two. uptake of automation tools in the social sciences is occurring, the question of application transferability to domains outside of clinical research remains underexplored. Despite known barriers, delays in interdisciplinary methodological progress inhibit opportunities for collaborative knowledge synthesis both within and across fields (Gough et al., 2020).

Objectives of this living review
The purpose of this study is to conduct a living systematic review (LSR) to extend the automation knowledge base by identifying existing and emergent systems for (semi) automated extraction of data used by social science researchers conducting systematic reviews and meta-analyses. Following Schmidt et al. (2020bSchmidt et al. ( , 2021, who are conducting a review of data extraction techniques for healthcare interventions (i.e., RCTs, case studies, and cohort studies; see Schmidt et al., 2020a), we apply an adapted version of their methodological strategy for social science disciplines where observational research is widespread practice. This effort entails targeting extraction of JARS data elements identified by the APA (Appelbaum et al., 2018; see "Extended Data").
Employing a differentiated replication framework, we apply the LSR methodology to iteratively aggregate and report: (a) the extant state of technology-assisted data extraction in social science research; (b) application trends in automation tools/techniques for extraction of data from abstracts and full text documents outside of biomedical and clinical research corpora; (c) evidence synthesis stages and tasks for which automation technologies are predominantly applied across social science disciplines; (d) specific data elements and structures targeted for automated extraction efforts by social science researchers; and (e) applied benchmarking standards for performance evaluation.

Related research
To inform this protocol and assess the extent to which our questions have been addressed in prior literature, we explored existing (semi) automated data extraction reviews. We identified six literature reviews (three static, one living, one scoping, and one cross-sectional pilot survey), two software user surveys, and one conference proceeding report. Table 1 provides a summary of scoped studies. Where some efforts focused on software applications, or "tools" that perform or assist with systematic review tasks (Harrison et al., 2020;Scott et al., 2021), others directed attention to underlying methods or techniques (e.g., machine learning algorithms) or reviewed multiple categorizations (see Blaizot et al., 2022;Schmidt et al., 2021;O'Connor et al., 2019). The extant knowledge base and ongoing developments surrounding systematic review automation are highly concentrated in research for evidence-based medicine (e.g., medical research, clinical trials, healthcare interventions) with limited evidence supporting how automation techniques are applied outside of the medical community (see O'Connor et al., 2019). This is not surprising given the unique relevance of systematic reviews for informing healthcare practice and policy development . However, while technologies to support data extraction from primary literature have advanced rapidly, many existing tools were not developed for application outside of research on the effectiveness of health-related interventions. O'Mara-Eves et al. (2015), for example, reported that text-mining techniques for classifying and prioritizing (i.e., ranking) relevant studies had undergone substantial methodological advancement, yet also highlighted that where assessment methods could be implemented with relatively high confidence in clinical research, much work was needed to determine how systems might perform in other disciplines. Other researchers similarly noted issues such as heterogeneity in testing and performance metrics (Blaizot et al., 2022;Jonnalagadda et al., 2015;Tsafnat et al., 2014) as well as risk of systemic biases resulting from inconsistent annotations in training corpora (Schmidt et al., 2021). Across projects reviewed, calls resounded for additional assessment of automation methods, including testing methods across different datasets and domains and testing the same datasets across different automation methods Existing platforms are available to support research teams in a range of time-consuming manual tasks (Blaizot et al., 2022). Even with these expediencies, not all key activities within the overall review process have received equal attention in application and technique development (O'Connor et al., 2019;Scott et al., 2021). Only a few years ago (semi) automated screening approaches such as text-mining for processing full-texts were not commonly available (O' Mara-Eves et al., 2015). As relevant study details were not always included in abstracts and often appeared throughout and across various sections of a given study (including tables and figures) discussion turned toward development of data extraction methods supporting full-text corpora (Tsafnat et al., 2014). Today, researchers supporting evidence-based medicine benefit from more robust data extraction techniques; especially efforts targeting PICO-related elements (Schmidt et al., 2021). Software tools are available for data extraction (e.g., Abstracktr, Robot Reviewer, SWIFT-Review; see Blaizot et al., 2022, p. 359), however, they have received mixed reviews related to their respective effectiveness. Notwithstanding substantial methodological strides in recent years, limited multidisciplinary reviews evaluating application effectiveness in non-clinical contexts may offer some explanation for the reported delays in uptake outside of evidence-based medicine. Further, the nominal extant research comparing techniques applied in both clinical and social contexts suggests that existing tools may not "perform as well on 'messy' social science datasets" (Miwa et al., 2014; as cited in O'Mara-Eves et al., 2015, p. 16). Even within structured tabular reporting contexts (i.e., tables), our understanding of technology applicability and transferability across disciplines is limited (Holub et al., 2021).
Serving as a model for the present study, Schmidt et al. (2021) reviews tools and techniques available for (semi) automated extraction of data elements pertinent to synthesizing the effects of healthcare interventions (see Higgins et al., 2022). Their noteworthy living review is exploring a range of data-mining and text classification methods for systematic reviews. The authors uncovered that early often employed approaches (e.g., rule-based extraction) gave way to classical machine-learning (e.g., naïve Bayes and support vector machine classifiers), and more recently, trends indicate increased application of deep learning architectures such as neural networks and word embeddings (for yearly trends in reported systems architectures, see Schmidt et al., 2021, p. 8). Overall, the future of automated data extraction for systematic reviews and meta-analytic research is very bright. As the earlier (i.e., preliminary) stages of the systematic review process have experienced rapid advancement in functionality and capability, development of techniques for all stages is foreseeable in the near future. Just as software tools and data extraction techniques vary in scope, purpose, and financial commitment, so too will research questions, goals, and study designs. Interdisciplinary groups and applied researchers alike call for increased collaboration to spur innovation and further advance the state of computer-assisted evidence synthesis (O' Mara-Eves et al., 2015;O'Connor et al., 2019). Though it can be inferred that not all developments spawned by the medical sciences community are easily transferrable to social sciences, necessity in fields inundated with new evidence production has carved a path for other disciplines; a path in which challenges and opportunities are openly displayed to serve as a foundation for the entire systematic review community to build upon. Additional inquiries surrounding approaches applied in social sciences may introduce previously unencountered demands that spur innovation and create valuable contributions for the entire systematic review community.

Protocol
A LSR involves similar resource demands as would a static review, but is ongoing (i.e., continually reprised). The methodological rationale for selecting LSR for the proposed study is based predominantly on the pace of emerging evidence (Khamis et al., 2019). Given the uncertainty surrounding existing evidence, and the rapid pace of technological advancement, continual surveillance will allow for faster presentation of new and emergent information that may impact findings and offer value for readers (Elliott et al., 2014(Elliott et al., , 2017. The following sections present the planned methodological approach of our living review.

Protocol registration and guidelines
This protocol is pre-registered in the Open Science Framework (OSF), an openly accessible repository facilitating the management, storage, and sharing of research processes and pertinent data files (Soderberg, 2018). This protocol adheres to the PRISMA-P guidelines Shamseer et al., 2015). A completed PRISMA-P checklist is available at (Semi) Automated Approaches to Data Extraction for Systematic Reviews and Meta-Analyses in Social Sciences: A Living Review Protocol (https://osf.io/j894w). No human subjects are involved in this study.

Search sources
Search strategy for this review follows existing research with protocol strategy adapted to fit goals and key elements of interest in social science domains. The model study initiated a LSR of processes supporting the (semi) automation of data extraction from research studies (e.g., clinical trials, epidemiological research; Schmidt et al., 2021, p. 26 For the base review, this strategy will be replicated (to the extent possible) for remaining databases and any deviations openly reported in the project repository and subsequent publications.

Workflow
The workflow structure follows existing research and guidance developed by Elliott et al. (2017) for transitioning to living status, frequency of monitoring, and incorporation of new evidence (see Figure 5). Transparent reporting of the base review and updates will follow PRISMA guidelines (Page et al., 2021). We intend to report results from the base review and later searches separately (Kahale et al., 2022). As quantity of new citations is unknown, necessary adjustments to the workflow described in this protocol will be detailed in future versions of the manuscript, noted in corresponding PRISMA reporting framework, and made available via the project repository.

Search and updates
The base review will begin upon publication and peer approval of this protocol. All citations (i.e., titles and abstracts) identified by the search(es) will be independently screened by two researchers. Citation and abstract screening will be coordinated using Rayyan (Ouzzani et al., 2016). Full-text documents retrieved will be reviewed in duplicate and reliability assessment will be based on all documents screened. In the event of questionable inter-rater reliability, additional qualified reviewer(s) will be consulted. All relevant details pertaining to review decisions, including resolutions, will be available in the project repository (see "Data Availability Statement"). The review will be continually updated via living methodological surveys of newly published literature (Khamis et al., 2019). Updates will include search and screening of new evidence quarterly (every three months) with a cross-sectional analysis of relevant full texts at intervals of no less than twice per year (Khamis et al., 2019). Synthesis and publication of new evidence arising from continual surveillance will occur no less than once per year or until the review is no longer in living status.

Eligibility criteria
Papers considered for inclusion are refereed publications and conference proceedings in social sciences and tangent disciplines which describe the application of automation techniques or tools to support tasks associated with the extraction of data elements from primary research studies. As in prior reviews, English language reports, published 2005 or later will be considered for inclusion (Jonnalagadda et al., 2015;O'Mara-Eves et al., 2015;Schmidt et al., 2020bSchmidt et al., , 2021. The model article includes secondary goals related to reproducibility, transparency, and assessment of evaluation methods (Schmidt et al., 2021). The present study will also consider and synthesize reported evaluation metrics; however, we will not exclude studies omitting robust performance tests. To refine and test eligibility criteria, the complete list of Web of Science subject categories was reviewed, and inclusion decisions determined jointly by both researchers. Each category was evaluated based on the scientific branches and academic activity boundaries described by Cohen (2021). See the project supplementary data files for category selection procedures and criteria. Subjects deemed appropriate for inclusion in the initial search, title, and abstract screening stages (see Tsafnat et al., 2014) include foundational and applied formal sciences, social sciences, and social science tangent disciplines. Excluded subjects include natural sciences and applied clinical or medicinal science categories. In all cases, over-inclusion is prioritized to maximize search recall. See "Extended Data" for comprehensive search strategy details.

Included papers
Eligible records include those which: • employ an evidence-synthesis method (e.g., systematic reviews, psychometric assessments, meta-analysis of effect sizes, etc.) and/or present a proof of concept, tool tests, or otherwise review automation technologies.
• apply an existing or proposed tool or technique for the purpose of technology-assisted data extraction from the abstracts or full-text of a literature corpus.
• report on any automated approach to data extraction (e.g., NLP, ML, TM), provided that at least one entity is extracted semi-automatically and sufficient detail is reported for: ○ entities (i.e., data elements) targeted for automated extraction (per APA JARS) ○ location of the extracted entities or data elements (e.g., abstract, methods, results) ○ the automation tool and/or technique used to support data extraction Figure 5. Living review workflow. Note. Arrows represent stages involved in a static systematic review; the dotted line (from "Publish Report" to "Search") represents the stage at which the review process is repeated from the beginning while the review remains in living status.

Excluded papers
Studies considered ineligible for inclusion in this review are those which: • apply tools or techniques to synthesize evidence from medical, biomedical, clinical (e.g., RCTs), or natural science research.
• present guidelines, protocols, or user surveys without applying and/or testing at least one automation technique or tool.
• are labeled as editorials, briefs, or opinion pieces.
• do not apply an existing, proposed, or prototype tool or technique for the purpose of technology-assisted data extraction from the abstracts or full-text of a literature corpus (e.g., extraction of citation data only, narrative discussion that is not accompanied by application or testing).
• do not apply automation for the extraction of data from scientific literature (e.g., web scraping, electronic communications, transcripts, or alternative data sources).

Key items of interest
O'Connor et al., (2019) described data extraction activities as the process of "extracting the relevant content data from a paper's methods and results and the meta-data about the paper" (p. 4), therefore, we primarily target key reporting items for methods and results sections recommended by the APA. However, to support an exhaustive review and accommodate anticipated variation across automation approaches and reporting formats, a secondary area of interest includes identifying all paper sections for which data extraction technologies have been applied (see "Extended Data").
Primary anticipated outcomes include identification of (a) tools/techniques applied to (semi) automate the extraction of data elements from research articles; (b) data elements targeted for extraction based on APA JARS standards; (c) systematic review and meta-analysis stages for which automation technologies are utilized; (d) evaluation metrics reported for applied automation technologies; and (e) where tools or technologies are presented, the potential for transferability to social sciences. Secondary anticipated outcomes include identification of (a) specific sections of research papers from which data is successfully extracted from primary corpora; (b) structure of content extracted using automation technologies; and (c) challenges reported by social science researchers related to the application of (semi) automated data extraction tools or technologies. Primary and secondary outcome items of interest are further described below, and supplementary data files referenced for readers to access additional information.
Primary items of interest 1. Techniques, tools, systems architectures, and/or automation approaches applied for data extraction from research documents (abstracts and full text).

○
A comprehensive list is provided in extended data files; see "Extraction Techniques.doc." 2. Data elements targeted for extraction using automation technologies as outlined by JARS (APA, 2020) and further explicated by Appelbaum et al. (2018, p. 6). 2. Structure of content from which data entities were extracted (where named).
3. Challenges identified by social science researchers when applying automation technology to support data extraction efforts.

Dissemination of information
Authors plan to submit the base review results and subsequent update(s) to F1000Research for publication. Following the FAIR principles (i.e., findable, accessible, interoperable, and reusable; Wilkinson et al., 2016), all corresponding data will be available via the OSF project repository.

Study status
To support preparation of this protocol, a preliminary search was conducted in Web of Science. No formal search, screening, or review activities have been initiated. This protocol was preregistered via OSF Registries on August 14, 2022.

Data availability
Underlying data No data are associated with this article.
This project contains the following extended data: • Extraction Techniques.docxcategories and descriptions of data extraction techniques, architecture components, and evaluation metrics of interest
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias • You can publish traditional articles, null/negative results, case reports, data notes and more • The peer review process is transparent and collaborative • Your article is indexed in PubMed after passing peer review • Dedicated customer support at every stage • For pre-submission enquiries, contact research@f1000.com