<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="other" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.164345.1</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Software Tool Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>
                    <italic>&#x2018;dstidyverse&#x2019;: An Implementation of&#x00a0;</italic>
                    <italic>TidyverseWithin the DataSHIELD&#x00a0;</italic>
                    <italic>Ecosystem</italic>
                </article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 1; peer review: 2 approved]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Cadman</surname>
                        <given-names>Tim</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-7682-5645</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Slofstra</surname>
                        <given-names>Mariska</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-0400-0468</uri>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Avraam</surname>
                        <given-names>Demetris</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Hyde</surname>
                        <given-names>Eleanor</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Kikkert</surname>
                        <given-names>Niels</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0009-0000-5122-4328</uri>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>van der Geest</surname>
                        <given-names>Marije</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Postma</surname>
                        <given-names>Dick</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Veenstra</surname>
                        <given-names>Ruben</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Wheater</surname>
                        <given-names>Stuart</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a4">4</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Zwart</surname>
                        <given-names>Erik</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-4552-003X</uri>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Swertz</surname>
                        <given-names>Morris</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Funding Acquisition</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>Department of Genetics, Genomics Coordination Center, University Medical Centre, Groningen, The Netherlands</aff>
                <aff id="a2">
                    <label>2</label>Barcelona Institute for Global Health (ISGlobal), Barcelona, Spain</aff>
                <aff id="a3">
                    <label>3</label>Department of Public Health, University of Copenhagen, &#x00d8;ster Farimagsgade, Copenhagen, Denmark</aff>
                <aff id="a4">
                    <label>4</label>Newcastle Helix, Urban Science Building, Newcastle upon Tyne, Arjuna Technologies, Newcastle, UK</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:t.j.cadman@umcg.nl">t.j.cadman@umcg.nl</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>20</day>
                <month>6</month>
                <year>2025</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2025</year>
            </pub-date>
            <volume>14</volume>
            <elocation-id>606</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>10</day>
                    <month>6</month>
                    <year>2025</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2025 Cadman T et al.</copyright-statement>
                <copyright-year>2025</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/14-606/pdf"/>
            <abstract>
                <sec>
                    <title>Background</title>
                    <p>DataSHIELD is a mature, R-based federated learning platform that enables multi-site analysis without sharing individual participant data. While DataSHIELD includes many packages for data analysis, it lacks user-friendly data manipulation tools.</p>
                </sec>
                <sec>
                    <title>Methods</title>
                    <p>To address this gap, we developed 
                        <bold>dsTidyverse</bold>, an implementation of selected functions from the popular Tidyverse package within the DataSHIELD client-server architecture. Disclosure checks were implemented to prevent individual-level data leakage.</p>
                </sec>
                <sec>
                    <title>Results</title>
                    <p>This package provides functionality for selecting, renaming, and creating columns; conditional recoding; combining data frames by rows or columns; filtering and arranging rows; grouping and ungrouping data; and converting data frames to tibbles. Through examples, we demonstrate how 
                        <bold>dsTidyverse</bold> simplifies common data manipulation tasks within DataSHIELD.</p>
                </sec>
                <sec>
                    <title>Conclusions</title>
                    <p>By providing additional data manipulation functionality, 
                        <bold>dsTidyverse</bold> improves the user experience and analytical efficiency within DataSHIELD. The package is open-source and freely available on CRAN and GitHub, and welcomes further development: 
                        <uri xlink:href="https://github.com/molgenis/ds-tidyverse">https://github.com/molgenis/ds-tidyverse</uri>.</p>
                </sec>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>datashield</kwd>
                <kwd>federated analysis</kwd>
                <kwd>R</kwd>
                <kwd>tidyverse</kwd>
                <kwd>data manipulation</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1" xlink:href="https://doi.org/10.13039/100018707">
                    <funding-source>HORIZON EUROPE Reforming and enhancing the European Research and Innovation system</funding-source>
                    <award-id>874583</award-id>
                </award-group>
                <funding-statement>This project was funded by the European Union&#x2019;s Horizon Europe programme under grant agreement No. 101137317 (IHENproject). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or European Health and Digital Executive Agency (HADEA). Neither the European Union nor the granting authority can be held responsible for them. Funding was also received from the European Union&#x2019;s Horizon 2020 research and innovation programme under grant agreement No 874583 (ATHLETE).</funding-statement>
                <funding-statement>
                    <italic>The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</italic>
                </funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        <sec id="sec5" sec-type="intro">
            <title>1. Introduction</title>
            <p>While the analysis of single data sources is a core part of epidemiological research, incorporating data from multiple sources has a number of advantages. These include increased statistical power to detect rare disease outcomes and the opportunity to replicate studies in different populations (
                <xref ref-type="bibr" rid="ref8">Pinot de Moira et al. (2021)</xref> 
                <xref ref-type="bibr" rid="ref1">Cadman et al. (2024)</xref>). Historically, the analysis of multiple data sources has been conducted by either (i) data transfer or (ii) each partner conducting analyses separately and sharing summary statistics. Although both approaches are effective in many situations, they have drawbacks. The physical transfer of data can be restricted by data protection legislation and local data management policies, while requiring each partner to conduct parallel analyses can be time inefficient and inflexible (
                <xref ref-type="bibr" rid="ref6">Knoppers et al. (2011)</xref>).</p>
            <p>A promising alternative is federated (remote) analysis which does not share individual-level data. Federated analysis allows one researcher to conduct all analyses flexibly, while allowing control of the data to remain with the data owner (
                <xref ref-type="bibr" rid="ref2">Doiron et al. (2013)</xref>). One mature implementation of federated analysis is the open-source R-based platform DataSHIELD (
                <xref ref-type="bibr" rid="ref4">Gaye et al. (2014)</xref>). DataSHIELD is based on a client-server architecture. In a multisite setting, individual study participants&#x2019; data are stored on the server of each data source, often protected by a firewall. The data from each site are not directly viewable or accessible to the analyst and cannot be copied or transferred. On the client side, the researcher has access to several DataSHIELD-specific R packages. Using the functions from these packages, the researcher issues analysis commands that are then sent to each server. There are two types of DataSHIELD functions: (i) assign-type functions, which create a new object on the server (e.g., recoding a variable), and (ii) aggregate-type functions, which return summary statistics to the researcher (e.g., means, standard deviations and model parameters). These commands are evaluated on each server, and automated checks are performed to ensure that the operations do not disclose individual-level
 data.</p>
            <p>DataSHIELD has been successfully used in many large European research projects including LifeCycle (researching the role of novel integrated markers of early-life stressors on health across the lifecycle; 
                <xref ref-type="bibr" rid="ref5">Jaddoe et al. (2020)</xref>, 
                <xref ref-type="bibr" rid="ref8">Pinot de Moira et al. (2021)</xref>) and ATHLETE (understanding and preventing health effects of environmental hazards and their mixtures; 
                <xref ref-type="bibr" rid="ref9">Vrijheid et al. (2021)</xref>). It has an ever-expanding set of packages supporting a wide range of analyses, including omics, exposure, mediation, survival and machine learning (
                <xref ref-type="bibr" rid="ref3">Escriba-Montagut et al. (2024)</xref>).</p>
            <p>However, a key weakness of DataSHIELD is that it currently lacks effective functionality to perform basic data manipulation, as most developments have focused on extending the analysis capabilities. Many researchers have complained that it is cumbersome to perform basic operations in DataSHIELD, which would normally be straightforward using R. For example, within DataSHIELD, there are currently limited options to (i) recode variables using if-else style operations, (ii) rename variables, (iii) subset columns by column name, (iv) subset rows by multiple conditions, or (v) group data and perform operations by group.</p>
            <p>Complicated workarounds are possible, but these greatly increase computational time and lead to verbose analysis scripts. Consider the example of transforming the continuous variable &#x2018;mpg&#x2019; (miles per gallon) within the &#x2018;mtcars&#x2019; dataset into a 4-level categorical variable (0-15, 15-20, 20-25, &gt;=25). Using the core DataSHIELD package (dsBaseClient), the user is required to first create separate vectors indicating whether participants are above each threshold, which are then added together to create the final variable:

                <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">

                    <monospace>ds. Boole(V1 = &#x201c;mtcars$mpg&#x201d;, V2 = 15, Boolean.operator = &#x201c;&gt;=&#x201d;, newobj</monospace>

                    <monospace>= &#x201c;mpg_cat_1&#x201d;)</monospace>

                    <monospace>ds. Boole(V1 = &#x201c;mtcars$mpg&#x201d;, V2 = 20, Boolean.operator = &#x201c;&gt;=&#x201d;, newobj</monospace>

                    <monospace>= &#x201c;mpg_cat_2&#x201d;)</monospace>

                    <monospace>ds. Boole(V1 = &#x201c;mtcars$mpg&#x201d;, V2 = 25, Boolean.operator = &#x201c;&gt;=&#x201d;, newobj</monospace>

                    <monospace>= &#x201c;mpg_cat_3&#x201d;)</monospace>

                    <monospace>ds.assign (expr = &#x201c;mpg_cat_1 + mpg_cat_2 + mpg_cat_3&#x201d;, newobj = &#x201c;mpg_category&#x201d;)</monospace>
</preformat>
            </p>
            <p>In contrast, within R outside DataSHIELD, there are many options for efficient data manipulation. One widely used set of packages is the &#x201c;Tidyverse,&#x201d; which comprises a set of packages for data science that share a common design philosophy, grammar and data structures (
                <xref ref-type="bibr" rid="ref10">Wickham et al. (2019)</xref>). These include packages for data manipulation (dplyr), advanced data frames (tibble), and packages for functional programming (purrr) and many others.</p>
            <p>Whilst the functionality provided by these packages would greatly improve the user-experience with DataSHIELD, they cannot be used &#x2018;off-the-shelf.&#x2019; They first need to be translated into a bespoke DataSHIELD package using the client-server architecture described above, and additional checks need to be written to ensure that they do not inadvertently facilitate the leakage of individual participant data. Here, we report the development of dsTidyverse, a DataSHIELD implementation of selected Tidyverse functions available as free open-source software (LGPLv3) at GitHub and the R CRAN.</p>
        </sec>
        <sec id="sec6">
            <title>2. Implementation</title>
            <sec id="sec7">
                <title>2.1 Package structure</title>
                <p>As described above, each DataSHIELD package contains two components: a client-side and server-side package. The client-side package is installed locally by the researcher and contains functions called in their analysis scripts. The server-side package is installed on the server with the data and contains functions called by the client-side package. For example, to return the mean of a vector, two functions are required: ds.mean() (client-side, included in the dsBaseClient package) and meanDS() (server-side, included in the dsBase package). When an analyst makes a call to ds.mean(), the following steps occur: (i) arguments are checked for validity on the client-side; (ii) an invocation requesting the calling of the function meanDS() is made via the DataSHIELD Interface (DSI) package which handles API calls to the server; (iii) the request, method and arguments are checked for validity on the server-side; (iv) the server-side function meanDS() calculates the mean and performs checks that this value is not disclosive; and (v) the mean of the vector is returned to the client. Following this architecture we implemented two packages: dsTidyverse and dsTidyverseClient. All code was reviewed by co-author SW (an experienced DataSHIELD developer and maintainer of dsBase) to ensure that it met the DataSHIELD disclosure protection standards.</p>
            </sec>
            <sec id="sec8">
                <title>2.2 Functionality</title>
                <p>Given that DataSHIELD functions need to be implemented individually, it is not realistic to implement the entire set of Tidyverse functions. Instead, we reviewed the existing functionality in DataSHIELD and chose those Tidyverse functions that we believed would significantly improve data manipulation within DataSHIELD. Currently, these functions are from the packages dplyr and tibble, although we are open to adding further functions on request and welcoming Github pull requests. The functions implemented at the time of writing are listed in 
                    <xref ref-type="table" rid="T1">
Table 1</xref>.</p>
                <table-wrap id="T1" orientation="portrait" position="float">
                    <label>
Table 1. </label>
                    <caption>
                        <title>Implemented Tidyverse functions.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Package</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Function</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">
Description</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">dplyr</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">select</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Choose columns from a data frame.</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">dplyr</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">rename</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Rename columns in a data frame.</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">dplyr</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">mutate</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Create or modify columns.</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">dplyr</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">if_else</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">A vectorised conditional function.</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">dplyr</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">case_when</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">A general vectorised conditional function.</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">dplyr</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">bind_cols</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Combine data frames by columns.</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">dplyr</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">bind_rows</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Combine data frames by rows.</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">dplyr</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">filter</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Filter rows based on conditions.</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">dplyr</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">slice</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Select rows by position.</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">dplyr</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">arrange</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Arrange rows by values of a column or multiple columns.</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">dplyr</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">group_by</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Group data by one or more columns.</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">dplyr</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">ungroup</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Remove grouping from data.</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">dplyr</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">group_keys</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Retrieve the group keys from a grouped data frame.</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">dplyr</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">distinct</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Return unique rows based on certain columns.</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">tibble</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">as_tibble</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Convert data to a tibble.</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>dsTidyverse supports non-standard evaluation (
                    <xref ref-type="bibr" rid="ref7">Mailund and Mailund (2018)</xref>). The name of the server-side data frame is passed in quotes to 
                    <ext-link ext-link-type="uri" xlink:href="http://df.name">df.name</ext-link>, whilst the variable names are passed unquoted and are evaluated as columns within the data frame. Various helper functions can also be used within the &#x2018;tidy_expr&#x2019; argument (for example &#x2018;all_of&#x2019; and &#x2018;any_of&#x2019;) to specify multiple variables in filter conditions. See examples at the end of this section on the use of dsTidyverse and the package vignette for a more detailed guide.</p>
            </sec>
            <sec id="sec9">
                <title>2.3 Disclosure checks</title>
                <p>A key feature of DataSHIELD is the various disclosure checks performed by the server-side package to ensure that individual participant data or any other output that can be used to infer any individual participant information is not returned to the analyst. All but one of the dsTidyverse functions currently implemented are assign-type functions, and these carry a lower risk or direct disclosure, as they do not return anything to the client. However, they carry a risk of indirect exposure, especially in the case of subsetting operations. For example, by creating a subset of data with only one row less than the original data, the summary statistics of the two data frames can be compared to reveal the values of the row in difference. To mitigate against these risks, we implemented the following disclosure checks:
                    <list list-type="order">
                        <list-item>
                            <label>1.</label>
                            <p>We specified a list of permitted functions that can be passed within the &#x2018;tidy_expr&#x2019; argument of assign-type functions calls; non-permitted functions will be blocked. The currently permitted functions are:</p>
                        </list-item>
                    </list>

                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">

                        <monospace>&#x201c;everything&#x201d;, &#x201c;last_col&#x201d;, &#x201c;group_cols&#x201d;, &#x201c;starts_with&#x201d;, &#x201c;ends_with&#x201d;, &#x201c;contains&#x201d;, &#x201c;matches&#x201d;, &#x201c;num_range&#x201d;, &#x201c;all_of&#x201d;, &#x201c;any_of&#x201d;, &#x201c;where&#x201d;, &#x201c;rename&#x201d;, &#x201c;mutate&#x201d;, &#x201c;if_else&#x201d;, &#x201c;case_when&#x201d;, &#x201c;mean&#x201d;, &#x201c;median&#x201d;, &#x201c;mode&#x201d;, &#x201c;desc&#x201d;, &#x201c;last_col&#x201d;, &#x201c;nth&#x201d;, &#x201c;where&#x201d;, &#x201c;num_range&#x201d;, &#x201c;exp&#x201d;, &#x201c;sqrt&#x201d;, &#x201c;scale&#x201d;, &#x201c;round&#x201d;, &#x201c;floor&#x201d;, &#x201c;ceiling&#x201d;, &#x201c;abs&#x201d;, &#x201c;sd&#x201d;, &#x201c;var&#x201d;, &#x201c;sin&#x201d;, &#x201c;cos&#x201d;, &#x201c;tan&#x201d;, &#x201c;asin&#x201d;, &#x201c;acos&#x201d;, &#x201c;atan&#x201d;, &#x201c;c&#x201d;.</monospace>
</preformat>

                    <list list-type="order">
                        <list-item>
                            <label>2.</label>
                            <p>We check that the variable names passed within the &#x2018;tidy_expr&#x2019; argument are not longer than a specified parameter to reduce the risk of malicious code being passed.</p>
                        </list-item>
                        <list-item>
                            <label>3.</label>
                            <p>To guard against subsetting attacks (malicious attempts to infer individual-level data by taking subsets of data), we check that no subsets are created (e.g. by ds.filter()) with (i) the number of rows lower than a specified parameter or (ii) with the difference between the number of rows of the original dataset and the subset dataset less than a given parameter.</p>
                        </list-item>
                        <list-item>
                            <label>4.</label>
                            <p>We check that the output from &#x2018;ds.group_keys&#x2019; (the groups in a grouped data frame) does not contain more groups than a specified parameter relative to the length of the data frame. If no checks were performed this would be highly disclosive, for example if the number of groups was the same as the number of rows, this would return the entire column of participant data.</p>
                        </list-item>
                        <list-item>
                            <label>5.</label>
                            <p>We integrate this package with DataSHIELD disclosure control options that can be set by data owners. This enables data owners to permit or block certain collections of functions depending on the level of privacy security required. For example, dsFilter could be vulnerable to subsetting attacks, so it is blocked in the &#x2018;avocado&#x2019; mode (designed to prevent such attacks), but permitted in other privacy modes.</p>
                        </list-item>
                    </list>
                </p>
            </sec>
        </sec>
        <sec id="sec10">
            <title>3. Examples</title>
            <p>To illustrate the improvements brought about by dsTidyverse, we provide three examples using the well-known &#x2018;mtcars&#x2019; dataset. Each example contrasts the approach using dsBaseClient with the streamlined alternative using dsTidyverseClient.</p>
            <sec id="sec11">
                <title>Example 1: Recoding a continuous variable as categorical</title>
                <p>We return to the example provided in the introduction, of recoding the continuous variable mpg (miles per gallon) into four fuel efficiency categories:
                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>0: &lt;15 (very low)</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>1: 15&#x2013;20 (low)</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>2: 20&#x2013;25 (moderate)</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>3: &gt;25 (high)</p>
                        </list-item>
                    </list>
                </p>
                <p>We previously saw how performing this operation with dsBaseClient was quite verbose. Using dsTidyverseClient, this is achieved in a single call:

                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">

                        <monospace>ds.case_when(tidy_expr = list (mtcars$mpg &lt; 15 ~ 0, mtcars$mpg &gt;= 15 &amp; mtcars$mpg &lt; 20 ~ 1, mtcars$mpg &gt;= 20 &amp; mtcars$mpg &lt; 25 ~ 2, mtcars$mpg &gt;= 25 ~ 3), newobj = &#x201c;mpg_category&#x201d;)</monospace>
</preformat>
</p>
            </sec>
            <sec id="sec12">
                <title>Example 2: Creating a subset of columns</title>
                <p>We want to retain only the columns &#x2018;mpg&#x2019;, &#x2018;cyl&#x2019;, &#x2018;hp&#x2019;, &#x2018;wt&#x2019;, and &#x2018;gear&#x2019;. Using dsBaseClient requires identifying column indices and creating a subset:

                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">

                        <monospace>ds.colnames(&#x201c;mtcars&#x201d;) (&#x201c;mpg&#x201d;  &#x201c;cyl&#x201d;  &#x201c;disp&#x201d; &#x201c;hp&#x201d;   &#x201c;drat&#x201d; &#x201c;wt&#x201d;   &#x201c;qsec&#x201d; &#x201c;vs&#x201d;   &#x201c;am&#x201d;   &#x201c;gear&#x201d; &#x201c;carb&#x201d;)</monospace>

                        <monospace>ds.dataFrameSubset (
                            <ext-link ext-link-type="uri" xlink:href="http://df.name">df.name</ext-link> = &#x201c;mtcars&#x201d;, V1 = &#x201c;id_var&#x201d;, V2 = &#x201c;id_var&#x201d;, Boolean.operator = &#x201c;==&#x201d;, keep.cols = c(&#x201c;1&#x201d;, &#x201c;2&#x201d;, &#x201c;4&#x201d;, &#x201c;6&#x201d;, &#x201c;10&#x201d;), newobj = &#x201c;subset_mtcars&#x201d;)</monospace>
</preformat>
                </p>
                <p>Using dsTidyverseClient, this is greatly simplified:

                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">

                        <monospace>ds.select (
                            <ext-link ext-link-type="uri" xlink:href="http://df.name">df.name</ext-link> = &#x201c;mtcars&#x201d;, tidy_expr = list (mpg, cyl, hp, wt, gear), newobj = &#x201c;subset_mtcars&#x201d;)</monospace>
</preformat>
</p>
            </sec>
            <sec id="sec13">
                <title>Example 3: Filtering on multiple conditions</title>
                <p>We create a subset where cars have:
                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>More than 6 cylinders</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>Horsepower greater than 150</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>Weight (wt) less than 3.5</p>
                        </list-item>
                    </list>
                </p>
                <p>Using dsBaseClient, this requires chaining three calls:

                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">

                        <monospace>ds.dataFrameSubset (
                            <ext-link ext-link-type="uri" xlink:href="http://df.name">df.name</ext-link> = &#x201c;mtcars&#x201d;, V1 = &#x201c;cyl&#x201d;, V2 = &#x201c;6&#x201d;, Boolean.operator = &#x201c;&gt;&#x201d;, newobj = &#x201c;step1&#x201d;)</monospace>

                        <monospace>ds.dataFrameSubset (
                            <ext-link ext-link-type="uri" xlink:href="http://df.name">df.name</ext-link> = &#x201c;step1&#x201d;, V1 = &#x201c;hp&#x201d;, V2 = &#x201c;150&#x201d;, Boolean.operator = &#x201c;&gt;&#x201d;, newobj = &#x201c;step2&#x201d;)</monospace>

                        <monospace>ds.dataFrameSubset (
                            <ext-link ext-link-type="uri" xlink:href="http://df.name">df.name</ext-link> = &#x201c;step2&#x201d;, V1 = &#x201c;wt&#x201d;, V2 = &#x201c;3.5&#x201d;, Boolean.operator = &#x201c;&lt;&#x201d;, newobj = &#x201c;filtered_mtcars&#x201d;)</monospace>
</preformat>
                </p>
                <p>Using dsTidyverseClient, the same logic is done in one line:

                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">

                        <monospace>ds.filter (
                            <ext-link ext-link-type="uri" xlink:href="http://df.name">df.name</ext-link> = &#x201c;mtcars&#x201d;, tidy_expr = list (cyl &gt; 6 &amp; hp &gt; 150 &amp; wt &lt; 3.5), newobj = &#x201c;filtered_mtcars&#x201d;)</monospace>
</preformat>
                </p>
                <p>These three examples highlight how dsTidyverseClient reduces both code complexity and time investment for common data manipulation tasks. Further use cases and advanced patterns are provided in the package vignette.</p>
            </sec>
        </sec>
        <sec id="sec14">
            <title>4. Summary</title>
            <p>In this paper we have illustrated the development of dsTidyverseClient, a DataSHIELD implementation of selected tidyverse functions. We hope that this package will provide researchers with more flexible and powerful tools for data manipulation and greatly improve the user experience of DataSHIELD.</p>
        </sec>
        <sec id="sec15">
            <title>5. Operation</title>
            <p>To use dsTidyverse, the analyst must have:
                <list list-type="bullet">
                    <list-item>
                        <label>&#x2022;</label>
                        <p>R version &#x2265;4.4.0 installed locally</p>
                    </list-item>
                    <list-item>
                        <label>&#x2022;</label>
                        <p>The dsTidyverseClient package installed from CRAN</p>
                    </list-item>
                    <list-item>
                        <label>&#x2022;</label>
                        <p>An active DataSHIELD client-server infrastructure with the dsTidyverse package installed on the server side</p>
                    </list-item>
                    <list-item>
                        <label>&#x2022;</label>
                        <p>An active internet connection and authentication credentials for the federated environment</p>
                    </list-item>
                </list>
            </p>
            <p>Full details of setting up DataSHIELD are provided in the DataSHIELD wiki (
                <ext-link ext-link-type="uri" xlink:href="https://wiki.datashield.org/en/home">https://wiki.datashield.org/en/home</ext-link>).</p>
        </sec>
    </body>
    <back>
        <sec id="sec18" sec-type="data-availability">
            <title>Data availability</title>
            <p>No data associated with this article. All vignettes in this paper use the &#x2018;mtcars&#x2019; dataset, which is freely available with RStudio.</p>
            <sec id="sec19">
                <title>Software availablility</title>
                <p>dsTidyverse is maintained as part of the long-running MOLGENIS open-source project for scientific software (
                    <ext-link ext-link-type="uri" xlink:href="https://molgenis.org/">https://molgenis.org/</ext-link>). Requests for the implementation of new functions are welcome, as are contributions from developers.</p>
                <p>The packages are available to install from 
                    <ext-link ext-link-type="uri" xlink:href="https://cran.r-project.org/web/packages/dsTidyverse/index.html">https://cran.r-project.org/web/packages/dsTidyverse/index.html</ext-link> and 
                    <ext-link ext-link-type="uri" xlink:href="https://cran.r-project.org/web/packages/dsTidyverseClient/index.html">https://cran.r-project.org/web/packages/dsTidyverseClient/index.html</ext-link>
                </p>
                <p>Source code is available from: 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/molgenis/dsTidyverse">https://github.com/molgenis/dsTidyverse</ext-link> and 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/molgenis/dsTidyverseClient">https://github.com/molgenis/dsTidyverseClient</ext-link>
                </p>
                <p>Archived source coded is available at 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.15462381">https://doi.org/10.5281/zenodo.15462381</ext-link>.</p>
                <p>The packages are licensed under LGPLv3.</p>
            </sec>
        </sec>
        <ref-list>
            <title>References</title>
            <ref id="ref1">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Cadman</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Elhakeem</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Vinther</surname>
                            <given-names>JL</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Associations of Maternal Educational Level, Proximity to Green Space During Pregnancy, and Gestational Diabetes with Body Mass Index from Infancy to Early Adulthood: A Proof-of-Concept Federated Analysis in 18 Birth Cohorts.</article-title>
                    <source>

                        <italic toggle="yes">Am. J. Epidemiol.</italic>
</source>
                    <year>2024</year>;<volume>193</volume>(<issue>5</issue>):<fpage>753</fpage>&#x2013;<lpage>763</lpage>.
                    <pub-id pub-id-type="pmid">37856700</pub-id>
                    <pub-id pub-id-type="doi">10.1093/aje/kwad206</pub-id>
                    <pub-id pub-id-type="pmcid">PMC11367017</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref2">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Doiron</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Burton</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Marcon</surname>
                            <given-names>Y</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Data Harmonization and Federated Analysis of Population-Based Studies: The BioSHaRE Project.</article-title>
                    <source>

                        <italic toggle="yes">Emerg. Themes Epidemiol.</italic>
</source>
                    <year>2013</year>;<volume>10</volume>:<fpage>1</fpage>&#x2013;<lpage>8</lpage>.
                    <pub-id pub-id-type="doi">10.1186/1742-7622-10-12</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref3">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Escriba-Montagut</surname>
                            <given-names>X</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Marcon</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Anguita-Ruiz</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Federated PrivacyProtected Meta-and Mega-Omics Data Analysis in Multi-Center Studies with a Fully Open-Source Analytic Platform.</article-title>
                    <source>

                        <italic toggle="yes">PLoS Comput. Biol.</italic>
</source>
                    <year>2024</year>;<volume>20</volume>(<issue>12</issue>):<fpage>e1012626</fpage>.
                    <pub-id pub-id-type="pmid">39652598</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pcbi.1012626</pub-id>
                    <pub-id pub-id-type="pmcid">PMC11658699</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref4">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Gaye</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Marcon</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Isaeva</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>DataSHIELD: Taking the Analysis to the Data, Not the Data to the Analysis.</article-title>
                    <source>

                        <italic toggle="yes">Int. J. Epidemiol.</italic>
</source>
                    <year>2014</year>;<volume>43</volume>(<issue>6</issue>):<fpage>1929</fpage>&#x2013;<lpage>1944</lpage>.
                    <pub-id pub-id-type="pmid">25261970</pub-id>
                    <pub-id pub-id-type="doi">10.1093/ije/dyu188</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4276062</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref5">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Jaddoe</surname>
                            <given-names>VWV</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Felix</surname>
                            <given-names>JF</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Andersen</surname>
                            <given-names>A-MN</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The LifeCycle Project-EU Child Cohort Network: A Federated Analysis Infrastructure and Harmonized Data of More Than 250,000 Children and Parents.</article-title>
                    <source>

                        <italic toggle="yes">Eur. J. Epidemiol.</italic>
</source>
                    <year>2020</year>;<volume>35</volume>:<fpage>709</fpage>&#x2013;<lpage>724</lpage>.
                    <pub-id pub-id-type="pmid">32705500</pub-id>
                    <pub-id pub-id-type="doi">10.1007/s10654-020-00662-z</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7387322</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref6">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Knoppers</surname>
                            <given-names>BM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Harris</surname>
                            <given-names>JR</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Tass&#x00e9;</surname>
                            <given-names>AM</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Towards a Data Sharing Code of Conduct for International Genomic Research.</article-title>
                    <source>

                        <italic toggle="yes">Genome Med.</italic>
</source>
                    <year>2011</year>;<volume>3</volume>:<fpage>44</fpage>&#x2013;<lpage>46</lpage>.
                    <pub-id pub-id-type="doi">10.1186/gm262</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref7">
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Mailund</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mailund</surname>
                            <given-names>T</given-names>
                        </name>
</person-group>:
                    <article-title>&#x201c;Tidy Evaluation.&#x201d; Domain-Specific Languages in R: Advanced Statistical Programming.</article-title>
                    <year>2018</year>;<fpage>135</fpage>&#x2013;<lpage>157</lpage>.</mixed-citation>
            </ref>
            <ref id="ref8">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Moira</surname>
                            <given-names>P</given-names>
                            <prefix>de</prefix>
                        </name>

                        <name name-style="western">
                            <surname>Angela</surname>
                            <given-names>SH</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Strandberg-Larsen</surname>
                            <given-names>K</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The EU Child Cohort Network&#x2019;s Core Data: Establishing a Set of Findable, Accessible, Interoperable and Re-Usable (FAIR) Variables.</article-title>
                    <source>

                        <italic toggle="yes">Eur. J. Epidemiol.</italic>
</source>
                    <year>2021</year>;<volume>36</volume>:<fpage>565</fpage>&#x2013;<lpage>580</lpage>.
                    <pub-id pub-id-type="pmid">33884544</pub-id>
                    <pub-id pub-id-type="doi">10.1007/s10654-021-00733-9</pub-id>
                    <pub-id pub-id-type="pmcid">PMC8159791</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref9">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Vrijheid</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Basaga&#x00f1;a</surname>
                            <given-names>X</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gonzalez</surname>
                            <given-names>JR</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Advancing Tools for Human Early Lifecourse Exposome Research and Translation (ATHLETE): Project Overview.</article-title>
                    <source>

                        <italic toggle="yes">Environ. Epidemiol.</italic>
</source>
                    <year>2021</year>;<volume>5</volume>(<issue>5</issue>):<fpage>e166</fpage>.
                    <pub-id pub-id-type="pmid">34934888</pub-id>
                    <pub-id pub-id-type="doi">10.1097/EE9.0000000000000166</pub-id>
                    <pub-id pub-id-type="pmcid">PMC8683140</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref10">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wickham</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Averick</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Bryan</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Welcome to the Tidyverse.</article-title>
                    <source>

                        <italic toggle="yes">J. Open Source Softw.</italic>
</source>
                    <year>2019</year>;<volume>4</volume>(<issue>43</issue>):<fpage>1686</fpage>.
                    <pub-id pub-id-type="doi">10.21105/joss.01686</pub-id>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report419324">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.180842.r419324</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Awe</surname>
                        <given-names>Olaitan I</given-names>
                    </name>
                    <xref ref-type="aff" rid="r419324a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-4257-3611</uri>
                </contrib>
                <aff id="r419324a1">
                    <label>1</label>African Society for Bioinformatics and Computational Biology, Cape Town, South Africa</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>6</day>
                <month>11</month>
                <year>2025</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2025 Awe OI</copyright-statement>
                <copyright-year>2025</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport419324" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.164345.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The authors of the manuscript entitled, &#x2018;dstidyverse&#x2019;: An Implementation of TidyverseWithin the DataSHIELD Ecosystem, described a library or package named dstidyverse which is based on the R programming language. This would make data manipulation easier for users in the DataSHIELD ecosystem. Tidyverse is a popular data wragling library within the R ecosystem and now dsTidyverse implements some of those Tidyverse functions for data manipulation and integration into the DataSHIELD architecture. dstidyverse is open source and freely available and installable from CRAN. I was able to install dsTidyverse in my R version 4.5.0 environment. My minor comment is that the authors should provide examples in their R documentation and the methods should be described with a bit more detail. The package is also available on GitHub (https://github.com/molgenis/ds-tidyverse) thereby making the code findable and reproducible.</p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>Yes</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Yes</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Yes</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Yes</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Bioinformatics</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report395741">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.180842.r395741</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Puskaric</surname>
                        <given-names>Miroslav</given-names>
                    </name>
                    <xref ref-type="aff" rid="r395741a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-2487-8822</uri>
                </contrib>
                <aff id="r395741a1">
                    <label>1</label>University of Stuttgart, Stuttgart, Germany</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>13</day>
                <month>8</month>
                <year>2025</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2025 Puskaric M</copyright-statement>
                <copyright-year>2025</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport395741" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.164345.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>This article reports a data manipulation software package dstidyverse, which is already available as a part of the R Data SHIELD library. It addresses the issue of manipulating data such as renaming variables, defining subsets, or conditional formatting, which was prior possible through multiple steps, thus being an overhead for data scientists, even more if analyzing data from multiple sites.&#x00a0;</p>
            <p> Data SHIELD is popular in the health data analysis sector, where there are many use cases. The paper demonstrates the software functionalities on the dataset containing vehicle related information. As a further complement, it would be interested to explore any related work outside of health data.</p>
            <p> A common use case is conducting pooled (federated) analyses across multiple sites, which can be particularly challenging when the data is not harmonized. Describing how this software package facilitates such scenarios would be valuable.</p>
            <p> Many thanks to the authors for the great work, which will further improve workflows for the non-disclosive analysis of sensitive data.</p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>Yes</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Yes</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Yes</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Yes</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>privacy enhancing technologies, management of sensitive data</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
</article>
