<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="other" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.111658.1</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Software Tool Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>GRAPE: genomic relatedness detection pipeline</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 1; peer review: 2 approved with reservations]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Medvedev</surname>
                        <given-names>Alexander</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Lebedev</surname>
                        <given-names>Mikhail</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Ponomarev</surname>
                        <given-names>Andrew</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Kosaretskiy</surname>
                        <given-names>Mikhail</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-2059-9121</uri>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Osipenko</surname>
                        <given-names>Dmitriy</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Tischenko</surname>
                        <given-names>Alexander</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Kosaretskiy</surname>
                        <given-names>Egor</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Wang</surname>
                        <given-names>Hui</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-4043-5060</uri>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a4">4</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Kolobkov</surname>
                        <given-names>Dmitry</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Chamberlain-Evans</surname>
                        <given-names>Vitalina</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a5">5</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Vakhitov</surname>
                        <given-names>Ruslan</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-6001-2271</uri>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Nikonorov</surname>
                        <given-names>Pavel</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Funding Acquisition</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-8471-2069</uri>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>GenX Global Ltd, Hinxton, Cambridgeshire, UK</aff>
                <aff id="a2">
                    <label>2</label>Skolkovo Institute of Science and Technology, Moscow, Russian Federation</aff>
                <aff id="a3">
                    <label>3</label>Atlas Biomed Group Ltd, London, UK</aff>
                <aff id="a4">
                    <label>4</label>Huazhong Agricultural University, Wuhan, China</aff>
                <aff id="a5">
                    <label>5</label>University of Cambridge, Cambridge, UK</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:gloriouslair@gmail.com">gloriouslair@gmail.com</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>A.M., M.L., A.P., A.T., E.K., H.W., D.K., V.C-E., R.V., and P.N. are employees of GenX Global Ltd. M.K. and D.O. are employees of Atlas Biomed Group Ltd. The authors declare no other competing interests</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>30</day>
                <month>5</month>
                <year>2022</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2022</year>
            </pub-date>
            <volume>11</volume>
            <elocation-id>589</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>19</day>
                    <month>5</month>
                    <year>2022</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2022 Medvedev A et al.</copyright-statement>
                <copyright-year>2022</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/11-589/pdf"/>
            <abstract>
                <p>Classifying the degree of relatedness between pairs of individuals has both scientific and commercial applications. As an example, genome-wide association studies (GWAS) may suffer from high rates of false positive results due to unrecognized population structure. This problem becomes especially relevant with recent increases in large-cohort studies. Accurate relationship classification is also required for genetic linkage analysis to identify disease-associated loci. Additionally, DNA relatives matching service is one of the leading drivers for the direct-to-consumer genetic testing market. Despite the availability of scientific and research information on the methods for determining kinship and the accessibility of relevant tools, the assembly of the pipeline, that stably operates on a real-world genotypic data, requires significant research and development resources. Currently, there is no open source end-to-end solution for relatedness detection in genomic data, that is fast, reliable and accurate for both close and distant degrees of kinship, combines all the necessary processing steps to work on real data, and is ready for production integration. To address this, we developed GRAPE: Genomic RelAtedness detection PipelinE. It combines data preprocessing, identity-by-descent (IBD) segments detection, and accurate relationship estimation. The project uses software development best practices, as well as Global Alliance for Genomics and Health (GA4GH) standards and tools. Pipeline efficiency is demonstrated on both simulated and real-world datasets. GRAPE is available from: https://github.com/genxnetwork/grape.</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>kinship and relationship estimation</kwd>
                <kwd>identity-by-descent</kwd>
                <kwd>snakemake workflow</kwd>
                <kwd>bioinformatics pipeline</kwd>
                <kwd>phasing and imputation</kwd>
                <kwd>sequencing data</kwd>
                <kwd>genetic testing companies</kwd>
                <kwd>genotype</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1">
                    <funding-source>GenX Global Limited</funding-source>
                </award-group>
                <funding-statement>This project was funded by GenX Global Limited.</funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        <sec id="sec1" sec-type="intro">
            <title>Introduction</title>
            <p>Distant relationship estimation has both scientific and commercial applications. 
                <italic toggle="yes">Scientific</italic> applications may include the identification of monogenic (single gene) Mendelian diseases.
                <sup>
                    <xref ref-type="bibr" rid="ref1">1</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref2">2</xref>
                </sup> Also, relatedness detection can be used during data quality control for genome-wide association studies (GWAS), since close relatives should be excluded to ensure that no pair of individuals is more closely related than second degree relatives.
                <sup>
                    <xref ref-type="bibr" rid="ref3">3</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref4">4</xref>
                </sup> Otherwise, GWAS may suffer from high rates of false positive results. The potential 
                <italic toggle="yes">commercial</italic> application of relationship estimation is utilized by direct-to-consumer genetic testing companies to find possible distant relatives for their customers.
                <sup>
                    <xref ref-type="bibr" rid="ref5">5</xref>
                </sup> The applicability of existent tools and pipelines to both applications is limited. Currently, there is no open-source end-to-end solution for relatedness detection in genomic data, that: (a) is reliable and accurate for both close and distant degrees; (b) includes all necessary processing steps to work with real data; (c) is flexible enough to be adapted for various applications; (d) is user-friendly and ready for production integration. Specifically, for commercial usage, the pipeline should deal with data heterogeneity and efficiently process newly added data samples. Samples can be genotyped with different chips and may have been passed thought different quality controls. Typical databases of genetic testing companies contain a lot of data from the previously used chips, which need to be combined together during the processing. Scientific database may also contains heterogeneous data. For example, UK Biobank dataset
                <sup>
                    <xref ref-type="bibr" rid="ref6">6</xref>
                </sup> contains samples which were genotyped using two different chips.</p>
            <p>Driven by the idea of open and user-led innovation, we have developed GRAPE (Genomic RelAtedness detection PipelinE), the first open-source end-to-end solution for relatedness detection that is able successfully address the above-mentioned difficulties. As a preliminary step, we comprehensively studied available approaches and software instruments for relatedness estimation from genotype data, such as identity-by-descent (IBD) segments detection tools: GERMLINE,
                <sup>
                    <xref ref-type="bibr" rid="ref7">7</xref>
                </sup> IBIS,
                <sup>
                    <xref ref-type="bibr" rid="ref8">8</xref>
                </sup> RaPID,
                <sup>
                    <xref ref-type="bibr" rid="ref9">9</xref>
                </sup> PhasedIBD
                <sup>
                    <xref ref-type="bibr" rid="ref10">10</xref>
                </sup>; relationship inference tools: DRUID,
                <sup>
                    <xref ref-type="bibr" rid="ref11">11</xref>
                </sup> ERSA,
                <sup>
                    <xref ref-type="bibr" rid="ref12">12</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref13">13</xref>
                </sup> KING
                <sup>
                    <xref ref-type="bibr" rid="ref14">14</xref>
                </sup>; other tools, which may be required during data preprocessing steps, like algorithms of phasing (Eagle
                <sup>
                    <xref ref-type="bibr" rid="ref15">15</xref>
                </sup>) and genotype imputation (Minimac
                <sup>
                    <xref ref-type="bibr" rid="ref16">16</xref>
                </sup>). Then, we selected several perspective tools and joined them together into the user-friendly GRAPE pipeline.</p>
            <p>GRAPE adapts the best practices for software development, including the Snakemake
                <sup>
                    <xref ref-type="bibr" rid="ref17">17</xref>
                </sup> workflow management system, Conda
                <sup>
                    <xref ref-type="bibr" rid="ref18">18</xref>
                </sup> virtual environments, Docker
                <sup>
                    <xref ref-type="bibr" rid="ref19">19</xref>
                </sup> containerisation, Funnel task execution service, which implements GA4GH Task Execution Schema,
                <sup>
                    <xref ref-type="bibr" rid="ref20">20</xref>
                </sup> and CI/CD with automatic testing. The pipeline requires a single multi-sample VCF file as input and has a separate workflow for downloading of reference datasets and checking their consistency. IBD segments detection workflows of GRAPE can work with both phased and unphased data. As real-world datasets are often heterogeneous and inconsistent, GRAPE incorporates various data preprocessing and quality control (QC) options. GRAPE has a modular architecture that allows switching between tools and adjust tools parameters for better control of precision and recall levels. The pipeline also contains a simulation workflow with an in-depth evaluation of pipeline accuracy using simulated and reference data.</p>
            <p>GRAPE can work like a standalone version as well as from the dedicated Docker container. We recommend to use the containerized version of GRAPE since it has all the dependencies already installed. We published GRAPE image in both Docker Hub and Dockstore
                <sup>
                    <xref ref-type="bibr" rid="ref21">21</xref>
                </sup> repositories to satisfy GA4GH
                <sup>
                    <xref ref-type="bibr" rid="ref22">22</xref>
                </sup> standards for sharing Docker-based tools. As a results we got a robust, reliable, and easy-to-use tool. Analysis for 10k samples with 600k SNPs requires half an hour, and about 22 hours have been required to process 100k samples dataset. Precision/recall analysis for GRAPE was performed using simulated datasets. We compared GRAPE with TRIBES,
                <sup>
                    <xref ref-type="bibr" rid="ref23">23</xref>
                </sup> another open-source pipeline for relatedness detection, and showed the advantages of our solution in a sense of the precision/recall metrics.</p>
        </sec>
        <sec id="sec2" sec-type="methods">
            <title>Methods</title>
            <p>This section describes the input data, reference datasets, and the main pipeline steps. The scheme of the pipeline is presented in 
                <xref ref-type="fig" rid="f1">Figure 1</xref>. As input, GRAPE uses a single VCF file containing the genotypes of a set of individuals. Configuration of the pipeline is managed by config.yaml file, or via the parameters of the GRAPE launcher. Reference data should be previously downloaded and stored on a hard drive. After that, the pipeline performs relationship inference accordingly to one of the three possible workflows (described in a corresponding section below). The main pipeline steps are:
                <list list-type="order">
                    <list-item>
                        <label>1.</label>
                        <p>Downloading of the reference datasets.</p>
                    </list-item>
                    <list-item>
                        <label>2.</label>
                        <p>Quality control and data preprocessing.</p>
                    </list-item>
                    <list-item>
                        <label>3.</label>
                        <p>Application of the relationship inference workflow.</p>
                    </list-item>
                </list>
            </p>
            <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                <label>Figure 1. </label>
                <caption>
                    <title>Scheme of the GRAPE pipeline.</title>
                </caption>
                <graphic id="gr1" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/123378/0d946404-7cd7-4f51-b3d4-a6d6a52e2f53_figure1.gif"/>
            </fig>
            <p>GRAPE also includes simulation workflow to evaluate precision/recall metrics on the simulated data. To simulate artificial pedigrees, Ped-sim tool
                <sup>
                    <xref ref-type="bibr" rid="ref24">24</xref>
                </sup> is used, while unrelated founders for the simulation are taken from the 1000 Genomes Project data.
                <sup>
                    <xref ref-type="bibr" rid="ref25">25</xref>
                </sup>
            </p>
            <p>There are two main approaches for relationship estimation implemented in GRAPE. The first one is based on allele frequencies calculation (KING
                <sup>
                    <xref ref-type="bibr" rid="ref14">14</xref>
                </sup>). This approach is optionally used for the first three degrees of relatedness, and for calculation of the kinship coefficients. The second approach relies on searching of pairwise identical-by-descent (IBD) segments. There are two options for this purpose based on two different tools: (a) IBIS
                <sup>
                    <xref ref-type="bibr" rid="ref8">8</xref>
                </sup> and (b) GERMLINE.
                <sup>
                    <xref ref-type="bibr" rid="ref7">7</xref>
                </sup> IBIS is a fast tool that can operate with unphased data. IBIS performs an IBD detection using homozygous single nucleotide polymorphisms (SNPs) only. It breaks the genome into windows of fixed length and searches for homozygous SNP mismatches for each genotype window. If the number of mismatches does not exceed some predefined threshold, the window becomes a part of an IBD segment. In contrast, GERMLINE is slower and can work only with phased data, but under some circumstances it may produce higher precision results. Once the pairwise IBD segments search is completed, relationship degrees are estimated using the ERSA tool.
                <sup>
                    <xref ref-type="bibr" rid="ref12">12</xref>
                </sup>
            </p>
            <sec id="sec3">
                <title>Downloading of the reference datasets</title>
                <p>GRAPE requires various reference data to perform preprocessing, quality control, phasing, and imputation. Reference data is also required for the simulation workflow. In order to facilitate collection of reference data, we created a separate workflow that automates this step. It can be run by specifying reference command to the GRAPE pipeline launcher. The workflow downloads data, unpacks it, and performs the required post-processing procedures. If phasing and genotype imputation are required, one should also add the additional flags--phase and --impute to the command. It affects the amount of downloaded data. If these flags are specified, the workflow downloads additional reference dataset to make phasing and imputation possible.</p>
                <p>There is another option to download all required reference data as a single file. This file is prepared by us and preloaded on our side in the cloud. It can be done by specifying the additional flag --use-bundle to the workflow. This way is faster, since all the post-processing procedures have already been performed. Reference files consist of three main groups.
                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>
                                <bold>Files for the preprocessing</bold>. These files include genetic recombination maps for mapping SNPs coordinates from base pairs (bp) to centimorgans (cM); files with the SNPs information from the 1000 Genomes Project for the SNPs quality control; reference genome of hg37 build; liftOver chain file.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>
                                <bold>Files for phasing and imputation</bold>. Phasing is required as a preliminary step for GERMLINE tool, if input data is unphased. Upon the phasing is done, genotype imputation can be additionally applied. These files are space demanding and require a considerable amount of post-processing time.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>
                                <bold>Files for simulation</bold>. These files include phased per-chromosome files from the 1000 Genomes Project; Affymetrix chip data that is used as a source of founders for the simulation; sex-specific recombination maps for better Ped-sim simulation results.
                                <sup>
                                    <xref ref-type="bibr" rid="ref24">24</xref>
                                </sup>
                            </p>
                        </list-item>
                    </list>
                </p>
            </sec>
            <sec id="sec4">
                <title>Quality control and data preprocessing</title>
                <p>GRAPE have a versatile and configurable preprocessing workflow. One part of the preprocessing is required and must be performed before the relationship inference workflow. It is launched by the preprocess command of the GRAPE. Along with some necessary technical procedures, preprocessing includes the following steps.
                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>
                                <bold>[Required] SNPs quality control by minor allele frequency (MAF) and the missingness rate</bold>. We discovered that blocks of rare SNPs with low MAF value in genotype arrays may produce false positive IBD segments. To address this problem, we filter SNPs by minor allele frequency. We remove SNPs with a MAF value less than 0.02. Additionally, we remove multiallelic SNPs, insertions/deletions, and SNPs with the high missingness rate, because such SNPs are inconsistent with IBD detection tools.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>
                                <bold>[Required] Per-sample quality control, using missingness and heterozygosity</bold>. Extensive testing revealed that samples with an unusually low level of heterozygosity could produce many false relatives matches among individuals. GRAPE excludes such samples from the analysis and creates a report file with the description of the exclusion reason.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>
                                <bold>[Required] Control for strands and SNP IDs mismatches</bold>. During this step GRAPE fixes inconsistencies in strands and reference alleles.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>
                                <bold>[Optional] LiftOver from hg38 to hg37</bold>. Currently GRAPE uses hg37 build version of the human genome reference. The pipeline supports input in hg38 and hg37 builds. One should specify the genome build version by the dedicated flag 
                                <monospace>--assembly</monospace> of the pipeline launcher. If hg38 build is selected 
                                <monospace>(--assembly hg38)</monospace>, then GRAPE applies the liftOver tool to the input data in order to match the hg37 reference assembly. This parameter should also be specified during the simulation workflow.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>
                                <bold>[Optional] Phasing and imputation</bold>. GRAPE supports phasing and genotype imputation. GERMLINE IBD detection tool requires phased data. So, if input data is unphased, one should include phasing 
                                <monospace>(--phase flag)</monospace> into the preprocessing before running the GERMLINE workflow. If input data is highly 
                                <italic toggle="yes">heterogeneous</italic> in a sense of available SNPs positions, we recommend to include imputation procedure as well 
                                <monospace>(--impute)</monospace>.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>
                                <bold>[Optional] Removal of imputed SNPs</bold>. We found that if the input data is 
                                <italic toggle="yes">homogeneous</italic> in a sense of SNPs positions, the presence of imputed SNPs does not affect the overall IBD detection accuracy of the IBIS tool, but it significantly slows down the overall performance. For this particular case, when input data initially contains a lot of imputed SNPs, we recommend to remove them by specifying 
                                <monospace>--remove-</monospace>imputation flag to the GRAPE launcher. GRAPE removes all SNPs which are marked with the IMPUTED flag in the input VCF file.</p>
                        </list-item>
                    </list>
                </p>
            </sec>
            <sec id="sec5">
                <title>GRAPE workflows</title>
                <p>There are three relationship inference workflows implemented in GRAPE. These workflows are activated by the find command of the launcher. Workflow selection is made by the 
                    <monospace>--flow</monospace> parameter.
                    <list list-type="order">
                        <list-item>
                            <label>1.</label>
                            <p>
                                <bold>IBIS + ERSA</bold>, 
                                <monospace>--flow ibis</monospace>. During this workflow IBD segments detection is performed by IBIS,
                                <sup>
                                    <xref ref-type="bibr" rid="ref8">8</xref>
                                </sup> and estimation of relationship degree is carried out by means of ERSA algorithm.
                                <sup>
                                    <xref ref-type="bibr" rid="ref12">12</xref>
                                </sup> This is the fastest workflow.</p>
                        </list-item>
                        <list-item>
                            <label>2.</label>
                            <p>
                                <bold>IBIS + ERSA &amp; KING</bold>, 
                                <monospace>--flow ibis-king</monospace>. KING
                                <sup>
                                    <xref ref-type="bibr" rid="ref14">14</xref>
                                </sup> is a well-known method for the inference of close relationships. It&#x2019;s fast and can work with unphased data. During this workflow, GRAPE uses KING tool for the first three degrees of relationships, and IBIS + ERSA approach for higher order degrees (see 
                                <xref ref-type="fig" rid="f1">Figure 1</xref>). Comparison of evaluation time between IBIS + ERSA and IBIS + ERSA &amp; KING workflows is presented in 
                                <xref ref-type="fig" rid="f3">Figure 3</xref>.</p>
                        </list-item>
                        <list-item>
                            <label>3.</label>
                            <p>
                                <bold>GERMLINE + ERSA &amp; KING</bold>, 
                                <monospace>--flow germline-king</monospace>. The workflow uses GERMLINE for IBD segments detection. KING is used to identify relationships for the first three degrees, and ERSA algorithm is used for higher order degrees. This workflow was added to GRAPE mainly for the case when input data is already phased and accurately preprocessed.</p>
                        </list-item>
                    </list>
                </p>
            </sec>
            <sec id="sec6">
                <title>Pedigree simulation</title>
                <p>We added a simulation workflow into GRAPE to perform a precision/recall analysis of the pipeline. It&#x2019;s accessible by simulate command of the pipeline launcher and incorporates the following steps: (1) pedigree simulation with unrelated founders; here we use the Ped-sim simulation package
                    <sup>
                        <xref ref-type="bibr" rid="ref24">24</xref>
                    </sup>; (2) relatedness degrees estimation; (3) comparison between true and estimated degrees. The source dataset for the simulation is taken from CEU (Northern Europeans from Utah) population data of 1000 Genomes Project.
                    <sup>
                        <xref ref-type="bibr" rid="ref25">25</xref>
                    </sup> As CEU data consists of trios, we picked no more than one member of each trio as a founder. We also ran GRAPE on selected individuals to remove all cryptic relationships up to the 6th degree. Then, we randomly assigned sex to each individual and used sex-specific genetic maps to take into account the differences in recombination rates between men and women.
                    <sup>
                        <xref ref-type="bibr" rid="ref26">26</xref>
                    </sup> Results of our precision/recall analysis for the simulated datasets are presented in the corresponding Results section.</p>
            </sec>
            <sec id="sec7">
                <title>IBD segments weighing</title>
                <p>Distribution of IBD segments among non-related (background) individuals within a population may be quite heterogeneous.
                    <sup>
                        <xref ref-type="bibr" rid="ref13">13</xref>
                    </sup> There may exist genome regions with extremely high rates of overall matching, which are not inherited from the recent common ancestors.
                    <sup>
                        <xref ref-type="bibr" rid="ref27">27</xref>
                    </sup> Instead, these regions more likely reflect other demographic factors of the population. The implication is that IBD segments detected in such regions are expected to be less useful for estimating recent relationships. Moreover, such regions are potentially prone to false-positive IBD segments.</p>
                <p>GRAPE can use two different approaches to address this issue. The first one is based on genome regions exclusion mask, wherein some genome regions are completely excluded from the consideration. This approach was proposed by authors of the ERSA algorithm, see Ref. 
                    <xref ref-type="bibr" rid="ref13">13</xref>. The mask was computed based on whole-genome sequencing data for European individuals. The computed mask is built-in into ERSA 1.0 algorithm and is used by GRAPE by default.</p>
                <p>The second approach is based on the so-called IBD segments weighing. This idea reminds one proposed in the Ancestry DNA Matching White Paper
                    <sup>
                        <xref ref-type="bibr" rid="ref28">28</xref>
                    </sup> (see description of their Timber algorithm there). The key idea is to down-weight IBD segment, i.e. reduce the IBD segment length, if the segment crosses regions with a high rate of matching. The approach can be briefly described as follows. At first, one should evaluate IBD segments 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mfenced close="}" open="{">
                                <mml:msub>
                                    <mml:mi>B</mml:mi>
                                    <mml:mi>j</mml:mi>
                                </mml:msub>
                            </mml:mfenced>
                        </mml:math>
                    </inline-formula> for some background population. After that, we break chromosomes into windows 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:msub>
                                <mml:mi>W</mml:mi>
                                <mml:mi>i</mml:mi>
                            </mml:msub>
                        </mml:math>
                    </inline-formula> of fixed length and compute total overlap length 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:msub>
                                <mml:mi>c</mml:mi>
                                <mml:mi>i</mml:mi>
                            </mml:msub>
                        </mml:math>
                    </inline-formula> between found IBD segments 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mfenced close="}" open="{">
                                <mml:msub>
                                    <mml:mi>B</mml:mi>
                                    <mml:mi>j</mml:mi>
                                </mml:msub>
                            </mml:mfenced>
                        </mml:math>
                    </inline-formula> and each window 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:msub>
                                <mml:mi>W</mml:mi>
                                <mml:mi>i</mml:mi>
                            </mml:msub>
                        </mml:math>
                    </inline-formula>, 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:msub>
                                <mml:mi>c</mml:mi>
                                <mml:mi>i</mml:mi>
                            </mml:msub>
                            <mml:mo>=</mml:mo>
                            <mml:msub>
                                <mml:mo>&#x2211;</mml:mo>
                                <mml:mi>j</mml:mi>
                            </mml:msub>
                            <mml:mo>|</mml:mo>
                            <mml:msub>
                                <mml:mi>W</mml:mi>
                                <mml:mi mathvariant="italic">ij</mml:mi>
                            </mml:msub>
                            <mml:mo>|</mml:mo>
                        </mml:math>
                    </inline-formula>, where 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:msub>
                                <mml:mi>W</mml:mi>
                                <mml:mi mathvariant="italic">ij</mml:mi>
                            </mml:msub>
                        </mml:math>
                    </inline-formula> is an overlap between the segment 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:msub>
                                <mml:mi>B</mml:mi>
                                <mml:mi>j</mml:mi>
                            </mml:msub>
                        </mml:math>
                    </inline-formula> and the window 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:msub>
                                <mml:mi>W</mml:mi>
                                <mml:mi>i</mml:mi>
                            </mml:msub>
                        </mml:math>
                    </inline-formula>. Obtained overlap lengths are transformed into weights 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:msub>
                                <mml:mi>w</mml:mi>
                                <mml:mi>i</mml:mi>
                            </mml:msub>
                        </mml:math>
                    </inline-formula> which are assigned to the windows, 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:msub>
                                <mml:mi>w</mml:mi>
                                <mml:mi>i</mml:mi>
                            </mml:msub>
                            <mml:mo>=</mml:mo>
                            <mml:mi>f</mml:mi>
                            <mml:mfenced close=")" open="(">
                                <mml:msub>
                                    <mml:mi>c</mml:mi>
                                    <mml:mi>i</mml:mi>
                                </mml:msub>
                            </mml:mfenced>
                            <mml:mo>&#x2208;</mml:mo>
                            <mml:mfenced close="]" open="[" separators=";">
                                <mml:mn>0</mml:mn>
                                <mml:mn>1</mml:mn>
                            </mml:mfenced>
                        </mml:math>
                    </inline-formula>. Here 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>f</mml:mi>
                        </mml:math>
                    </inline-formula> is a weighing function that reflects the following heuristic: if the overlap length 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:msub>
                                <mml:mi>c</mml:mi>
                                <mml:mi>k</mml:mi>
                            </mml:msub>
                        </mml:math>
                    </inline-formula> for a window 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:msub>
                                <mml:mi>W</mml:mi>
                                <mml:mi>k</mml:mi>
                            </mml:msub>
                        </mml:math>
                    </inline-formula> is a relative outlier among all 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mfenced close="}" open="{">
                                <mml:msub>
                                    <mml:mi>c</mml:mi>
                                    <mml:mi>i</mml:mi>
                                </mml:msub>
                            </mml:mfenced>
                        </mml:math>
                    </inline-formula>, then the value of 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>f</mml:mi>
                            <mml:mfenced close=")" open="(">
                                <mml:msub>
                                    <mml:mi>c</mml:mi>
                                    <mml:mi>k</mml:mi>
                                </mml:msub>
                            </mml:mfenced>
                        </mml:math>
                    </inline-formula> is close to 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mn>0</mml:mn>
                        </mml:math>
                    </inline-formula>, otherwise it&#x2019;s close to 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mn>1</mml:mn>
                        </mml:math>
                    </inline-formula>. Given that weights for all windows are computed, for each IBD segment 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>G</mml:mi>
                        </mml:math>
                    </inline-formula> one can compute its weighted length 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:msub>
                                <mml:mfenced close="|" open="|">
                                    <mml:mi>G</mml:mi>
                                </mml:mfenced>
                                <mml:mi mathvariant="normal">w</mml:mi>
                            </mml:msub>
                        </mml:math>
                    </inline-formula> using the formula
                    <disp-formula id="e1">
                        <mml:math display="block">
                            <mml:msub>
                                <mml:mfenced close="|" open="|">
                                    <mml:mi>G</mml:mi>
                                </mml:mfenced>
                                <mml:mi mathvariant="normal">w</mml:mi>
                            </mml:msub>
                            <mml:mo>=</mml:mo>
                            <mml:munder>
                                <mml:mo movablelimits="false">&#x2211;</mml:mo>
                                <mml:mi>i</mml:mi>
                            </mml:munder>
                            <mml:msub>
                                <mml:mi>w</mml:mi>
                                <mml:mi>i</mml:mi>
                            </mml:msub>
                            <mml:mo>|</mml:mo>
                            <mml:msub>
                                <mml:mi>G</mml:mi>
                                <mml:mi>i</mml:mi>
                            </mml:msub>
                            <mml:mo>|</mml:mo>
                            <mml:mo>,</mml:mo>
                        </mml:math>
                        <label>(1)</label>
                    </disp-formula>where 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mo>|</mml:mo>
                            <mml:msub>
                                <mml:mi>G</mml:mi>
                                <mml:mi>i</mml:mi>
                            </mml:msub>
                            <mml:mo>|</mml:mo>
                        </mml:math>
                    </inline-formula> denotes the overlap length between IBD segment 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>G</mml:mi>
                        </mml:math>
                    </inline-formula> and a window 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:msub>
                                <mml:mi>W</mml:mi>
                                <mml:mi>i</mml:mi>
                            </mml:msub>
                        </mml:math>
                    </inline-formula>. Weighted lengths of IBD segments then are used by the ERSA algorithm (while the ERSA mask is disabled).</p>
                <p>GRAPE provides an ability to compute the weight mask from the VCF file with presumably unrelated individuals. It breaks each chromosome into 1cM windows to compute overlaps. After that, GRAPE detects outliers among overlap lengths by means of the minimum covariance determinant (MCD) algorithm,
                    <sup>
                        <xref ref-type="bibr" rid="ref29">29</xref>
                    </sup> and then determines the outliers upper bound 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>h</mml:mi>
                        </mml:math>
                    </inline-formula>. This upper bound is used to compute weights, 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:msub>
                                <mml:mi>w</mml:mi>
                                <mml:mi>i</mml:mi>
                            </mml:msub>
                            <mml:mo>=</mml:mo>
                            <mml:mi>h</mml:mi>
                            <mml:mo>/</mml:mo>
                            <mml:msub>
                                <mml:mi>c</mml:mi>
                                <mml:mi>i</mml:mi>
                            </mml:msub>
                        </mml:math>
                    </inline-formula>, if 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:msub>
                                <mml:mi>c</mml:mi>
                                <mml:mi>i</mml:mi>
                            </mml:msub>
                            <mml:mo>&gt;</mml:mo>
                            <mml:mi>h</mml:mi>
                        </mml:math>
                    </inline-formula>; and 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:msub>
                                <mml:mi>w</mml:mi>
                                <mml:mi>i</mml:mi>
                            </mml:msub>
                            <mml:mo>=</mml:mo>
                            <mml:mn>1</mml:mn>
                        </mml:math>
                    </inline-formula> otherwise. Computed mask can be further used as a parameter for relationship inference workflow (see the 
                    <monospace>--weight-mask</monospace> parameter). As an example, 
                    <xref ref-type="fig" rid="f2">Figure 2</xref> depicts the weight mask computed for the individuals of East Asian Ancestry taken from 1000 Genomes Project.
                    <sup>
                        <xref ref-type="bibr" rid="ref25">25</xref>
                    </sup> To detect IBD segments, IBIS workflow was used with parameters 
                    <monospace>--ibis-seg-len 5, --ibis-min-snp 400</monospace>. High matching regions (
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:msub>
                                <mml:mi>w</mml:mi>
                                <mml:mi>i</mml:mi>
                            </mml:msub>
                            <mml:mo>&#x2248;</mml:mo>
                            <mml:mn>0</mml:mn>
                            <mml:mo stretchy="true">)</mml:mo>
                        </mml:math>
                    </inline-formula> are marked with the blue color. For comparison, ERSA 1.0 masked regions are depicted with the pick hatching.</p>
                <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                    <label>Figure 2. </label>
                    <caption>
                        <title>Weight mask computed for the individuals of East Asian Ancestry taken from 1000 Genomes Project; compared to ERSA 1.0 masked regions (pink hatching).</title>
                        <p>IBIS workflow parameters: 
                            <monospace>--ibis-seg-len 5, --ibis-min-snp 400</monospace>.</p>
                    </caption>
                    <graphic id="gr2" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/123378/0d946404-7cd7-4f51-b3d4-a6d6a52e2f53_figure2.gif"/>
                </fig>
            </sec>
            <sec id="sec8">
                <title>Operation</title>
                <p>GRAPE can be run inside a Docker container. This way is recommended. Another option is to run the pipeline from scratch with all the dependencies pre-installed. We successfully ran the pipeline on Ubuntu 18.04 Linux distribution
                    <sup>
                        <xref ref-type="bibr" rid="ref30">30</xref>
                    </sup> with eight CPUs and 32 GB of RAM to evaluate the performance on huge datasets with up to 100k samples and millions of SNPs.</p>
            </sec>
            <sec id="sec9">
                <title>Resource allocation</title>
                <p>GRAPE can utilize multiple cores. For that, one should specify the cores number via the --cores parameter. The default number of cores is equal to the total number of available CPUs minus 1.</p>
            </sec>
            <sec id="sec10">
                <title>Execution by scheduler</title>
                <p>The pipeline can be run using Funnel,
                    <sup>
                        <xref ref-type="bibr" rid="ref31">31</xref>
                    </sup> a lightweight task scheduler that implements Task Execution Schema
                    <sup>
                        <xref ref-type="bibr" rid="ref20">20</xref>
                    </sup> developed by GA4GH.
                    <sup>
                        <xref ref-type="bibr" rid="ref22">22</xref>
                    </sup> The scheduler can work in various environments, from a regular virtual machines to Kubernetes cluster with the support of resource quotas. We provide several examples of the task specifications for Funnel. Each sample represents a JSON file with the task description. These files are available in the GRAPE GitHub repository within the corresponding funnel subfolder.</p>
            </sec>
            <sec id="sec11">
                <title>Performance</title>
                <p>To estimate the performance of the GRAPE pipeline we used a machine with eight CPU cores, Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30 GHz, and 32 GB of RAM. The fastest IBIS + ERSA 
                    <monospace>(--flow ibis)</monospace> relatedness inference workflow takes about 22 hours to process 100k individuals dataset, see 
                    <xref ref-type="fig" rid="f3">Figure 3</xref>. IBIS tool has a quadratic time complexity with respect to the total number of individuals. The addition of KING 
                    <monospace>(--flow ibis-king)</monospace> increases the total running time by roughly 50%. Performance analysis confirmed that IBIS is a simple and efficient tool. It allows the pipeline to process hundred of thousands of individuals in a reasonable amount of time.</p>
                <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                    <label>Figure 3. </label>
                    <caption>
                        <title>Performance comparison between IBIS + ERSA and IBIS + ERSA &amp; KING workflows.</title>
                        <p>GRAPE was evaluated on a machine with 8 CPU cores, Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz, and 32 GB of RAM. Both axes are in logarithmic scale.</p>
                    </caption>
                    <graphic id="gr3" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/123378/0d946404-7cd7-4f51-b3d4-a6d6a52e2f53_figure3.gif"/>
                </fig>
            </sec>
        </sec>
        <sec id="sec12">
            <title>Use cases</title>
            <p>This section gives examples of the GRAPE pipeline commands which have to be run to infer relationships, or to evaluate precision/recall metrics on a simulated dataset.</p>
            <sec id="sec13">
                <title>Relationship inference with IBIS + ERSA</title>
                <p>As the first step, reference data must be downloaded. We suppose that reference data to be stored in/media/ref directory. Here and below we specify the 
                    <monospace>--real-run</monospace> flag. Without this flag GRAPE performs a dry-run.</p>
                <p>Listing 1. Reference downloading for the IBIS + ERSA workflow.</p>
                <p>
                    <inline-graphic id="gr9" xlink:href="https://f1000research-files.f1000.com/manuscripts/123378/0d946404-7cd7-4f51-b3d4-a6d6a52e2f53_graphic1.gif"/>
                </p>
                <p>As the second step, preprocessing is performed. We suppose that the input file is located at/media/input.vcf.gz, and it&#x2019;s in hg38 build. Input file location specified by the flag 
                    <monospace>--vcf-file</monospace>. GRAPE working directory is/media/data. It&#x2019;s specified by the 
                    <monospace>--directory flag</monospace>.</p>
                <p>Listing 2. Preprocessing for the IBIS + ERSA workflow.</p>
                <p>
                    <inline-graphic id="gr10" xlink:href="https://f1000research-files.f1000.com/manuscripts/123378/0d946404-7cd7-4f51-b3d4-a6d6a52e2f53_graphic2.gif"/>
                </p>
                <p>The third step is the relationship inference. It is launched with the the find command. We use IBIS + ERSA workflow that corresponds to the ibis value of the 
                    <monospace>--flow</monospace> parameter (default).</p>
                <p>Listing 3. Relationship inference with the IBIS + ERSA workflow.</p>
                <p>
                    <inline-graphic id="gr11" xlink:href="https://f1000research-files.f1000.com/manuscripts/123378/0d946404-7cd7-4f51-b3d4-a6d6a52e2f53_graphic3.gif"/>
                </p>
                <p>GRAPE has an ability to specify additional parameters to the ERSA and IBIS algorithms to control the sensitivity and the false positive rate. These parameters are described below. Default GRAPE parameters are quite conservative. They provide a low false positive rate and low sensitivity in high (9+) degrees.
                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>[IBIS] 
                                <monospace>--ibis-seg-len</monospace>. Minimum length of the IBD segment to be found by IBIS. Higher values reduce false positive rate and give less distant matches (default = 7 cM).</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>[IBIS] 
                                <monospace>--ibis-min-snp</monospace>. Minimum number of SNPs per IBD segment to be detected (default = 500 SNPs).</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>[ERSA] 
                                <monospace>--zero-seg-count</monospace>. Mean number of shared segments for two unrelated individuals in the population. Smaller values tend to give more distant matches and increase the false positive rate (default = 0.5).</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>[ERSA] 
                                <monospace>--zero-seg-len</monospace>. Average length of IBD segment for two unrelated individuals in the population. Smaller values tend to give more distant matches and increase the false positive rate (default = 5 cM).</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>[ERSA] 
                                <monospace>--alpha</monospace>. ERSA significance level
                                <sup>
                                    <xref ref-type="bibr" rid="ref12">12</xref>
                                </sup> (default = 0.01).</p>
                        </list-item>
                    </list>
                </p>
            </sec>
            <sec id="sec14">
                <title>Relationship inference with IBIS + ERSA &amp; KING</title>
                <p>The first and second steps are the same as for the previous use case. The third step is launched with specifying of the ibis-king flow parameter. For this case GRAPE performs an estimation of the first three degrees of relationship with KING. Other degrees are estimated with ERSA (see 
                    <xref ref-type="fig" rid="f1">Figure 1</xref>). The resulted report of individual relationships for this case contains additional king_degree and kinship columns. KING algorithm has no additional parameters.</p>
                <p>Listing 4. Relationship inference with the IBIS + ERSA &amp; KING workflow.</p>
                <p>
                    <inline-graphic id="gr12" xlink:href="https://f1000research-files.f1000.com/manuscripts/123378/0d946404-7cd7-4f51-b3d4-a6d6a52e2f53_graphic4.gif"/>
                </p>
            </sec>
            <sec id="sec15">
                <title>Relationship inference with GERMLINE + ERSA &amp; KING</title>
                <p>At first, reference data must be downloaded. The GERMLINE tool works with phased data only. So, if input data is unphased, one should download additional reference dataset to perform phasing. For that purpose we use a reference panel from 1000 Genomes Project. This panel takes a considerable amount of disk space (&#x223c;25 GB), and requires significant time to download. To download this panel one should specify the 
                    <monospace>--phase</monospace> flag while using the reference command.</p>
                <p>Listing 5. Reference downloading.</p>
                <p>
                    <inline-graphic id="gr13" xlink:href="https://f1000research-files.f1000.com/manuscripts/123378/0d946404-7cd7-4f51-b3d4-a6d6a52e2f53_graphic5.gif"/>
                </p>
                <p>The second step is the data preprocessing. We use the 
                    <monospace>--phase</monospace> flag to apply the phasing procedure during this stage.</p>
                <p>Listing 6. Preprocessing for the GERMLINE + ERSA &amp; KING workflow.</p>
                <p>
                    <inline-graphic id="gr14" xlink:href="https://f1000research-files.f1000.com/manuscripts/123378/0d946404-7cd7-4f51-b3d4-a6d6a52e2f53_graphic6.gif"/>
                </p>
                <p>The third step is the relationship inference. One should specify 
                    <monospace>--flow</monospace> germline to use the GERMLINE tool for the IBD segments detection. ERSA parameters are the same as for the IBIS + ERSA workflow. Parameters of the GERMLINE are not configurable. Currently, we use the following sets of GERMLINE parameters: 
                    <monospace>-min_m 2.5, -err_hom 2, -err_het 1</monospace>. See GERMLINE documentation for the parameters description.
                    <sup>
                        <xref ref-type="bibr" rid="ref7">7</xref>
                    </sup>
                </p>
                <p>Listing 7. Relationship inference with GERMLINE + ERSA &amp; KING workflow.</p>
                <p>
                    <inline-graphic id="gr15" xlink:href="https://f1000research-files.f1000.com/manuscripts/123378/0d946404-7cd7-4f51-b3d4-a6d6a52e2f53_graphic7.gif"/>
                </p>
            </sec>
            <sec id="sec16">
                <title>Evaluation of the IBIS + ERSA workflow on a simulated dataset</title>
                <p>To perform simulation, at the first step one should download 
                    <italic toggle="yes">full</italic> reference dataset by using the reference command with the 
                    <monospace>--phase</monospace> and 
                    <monospace>--impute</monospace> flags enabled. The second step is the simulation workflow. For that, one should use the simulate command of the GRAPE launcher.</p>
                <p>Listing 8. Evaluation of the IBIS + ERSA workflow on a simulated dataset.</p>
                <p>
                    <inline-graphic id="gr16" xlink:href="https://f1000research-files.f1000.com/manuscripts/123378/0d946404-7cd7-4f51-b3d4-a6d6a52e2f53_graphic8.gif"/>
                </p>
                <p>Along with the parameters for preprocessing, ERSA and IBIS tools, simulation workflow has two additional parameters listed below.
                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>
                                <monospace>--sim-params-file</monospace>. File with parameters of simulation for the Ped-sim tool. For more information see.
                                <sup>
                                    <xref ref-type="bibr" rid="ref24">24</xref>
                                </sup> We have prepared several simulation parameters files, and stored them in the GRAPE repository on GitHub.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>
                                <monospace>--sim-samples-file</monospace>. File with a list of individuals from the 1000 Genomes Project which are used as founders for the simulations. One can choose 
                                <monospace>ceph_unrelated_all.tsv</monospace> (unrelated individuals from CEU population), or all.tsv (all individuals available in 1KGP).</p>
                        </list-item>
                    </list>
                </p>
                <p>The output files include a list of kinship matches found in the simulated dataset, precision/recall plots, and a confusion matrix to compare the detected degrees of relationship with the true degrees. Detailed information on the computed metrics is presented in the Results section.</p>
            </sec>
            <sec id="sec17">
                <title>Evaluation of GERMLINE + ERSA &amp; KING workflow on a simulated dataset</title>
                <p>The first step is to download the reference dataset (see the previous section). The second step is to run the simulate command, specifying germline-king flow and the 
                    <monospace>--phase flag</monospace>.</p>
                <p>Listing 9. Evaluation of the GERMLINE + ERSA &amp; KING workflow on a simulated dataset.</p>
                <p>
                    <inline-graphic id="gr17" xlink:href="https://f1000research-files.f1000.com/manuscripts/123378/0d946404-7cd7-4f51-b3d4-a6d6a52e2f53_graphic9.gif"/>
                </p>
            </sec>
            <sec id="sec18">
                <title>Computation of the IBD segments weighing mask</title>
                <p>GRAPE has a dedicated command to compute the weight mask, compute-weight-mask. It performs IBD segments detection for the input VCF file and then analyse IBD segments distribution to compute the weight mask. The resulting files consist of a weight mask file in JSON format and a visualization of the mask (see 
                    <xref ref-type="fig" rid="f2">Figure 2</xref>).</p>
                <p>Listing 10. Computation of the IBD segments weighing mask.</p>
                <p>
                    <inline-graphic id="gr18" xlink:href="https://f1000research-files.f1000.com/manuscripts/123378/0d946404-7cd7-4f51-b3d4-a6d6a52e2f53_graphic10.gif"/>
                </p>
                <p>For the example above the resulting files are store in the/media/background/weight-mask/directory.</p>
            </sec>
            <sec id="sec19">
                <title>Usage of the IBD segments weighing mask</title>
                <p>To apply weight mask during the relatedness detection (find command), one should specify the mask file with the 
                    <monospace>--weight-mask</monospace> parameter. When used, the ERSA 1.0 exclusion mask is disabled.</p>
                <p>Listing 11. Usage of the IBD segments weighing mask.</p>
                <p>
                    <inline-graphic id="gr19" xlink:href="https://f1000research-files.f1000.com/manuscripts/123378/0d946404-7cd7-4f51-b3d4-a6d6a52e2f53_graphic11.gif"/>
                </p>
            </sec>
        </sec>
        <sec id="sec20" sec-type="results">
            <title>Results</title>
            <p>To test the accuracy and flexibility of the pipeline, we performed extensive testing on both real and simulated datasets. As a sanity check, we took the Allen Ancient DNA Resource (AADR) dataset
                <sup>
                    <xref ref-type="bibr" rid="ref32">32</xref>
                </sup> and made sure that GRAPE does not produce any kinship matches between ancient and present-day individuals.</p>
            <p>Next, we have run Khazar origins dataset
                <sup>
                    <xref ref-type="bibr" rid="ref33">33</xref>
                </sup> through the GRAPE. The Khazar dataset contains 1770 samples from 106 Jewish and non-Jewish populations. The dataset contains a significant amount of data from small homogeneous populations. The KING analysis was previously applied to this dataset by the authors
                <sup>
                    <xref ref-type="bibr" rid="ref33">33</xref>
                </sup> to identify close relatives up to the third degree. We applied GRAPE and found 1715 putative relationships. Most of them have a degree of 4+. This result highlights the fact that the total length of IBD segments in small homogeneous populations is several orders of magnitude higher then for the heterogeneous populations in Europe and Asia.
                <sup>
                    <xref ref-type="bibr" rid="ref34">34</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref35">35</xref>
                </sup> This is an obvious obstacle for the relatedness detection. There is no method known to us to address this issue while using genotypic data obtained from SNP arrays. Whole genome sequencing has a potential to solve this problem, since rare mutations should break long IBD segments.</p>
            <p>As for the data from genetic testing companies, GRAPE has been successfully applied to the database of 100k+ customers of Atlas Biomed,
                <sup>
                    <xref ref-type="bibr" rid="ref36">36</xref>
                </sup> a company that provides direct-to-consumer genetic tests. During this test GRAPE was proven to be able to handle diverse data obtained from different chips, reference alignments, and other possible data inconsistencies.</p>
            <sec id="sec21">
                <title>Precision/recall analysis on simulated datasets</title>
                <p>Finally, GRAPE was evaluated on simulated datasets. We performed the simulation using unrelated founders from 1KGP Affymetrix genotype chip data. This chip contains approximately 900k SNPs. The simulation was carried out with the Ped-sim package. Using this tool, we produced several pedigree structures with eight generations, and a maximum degree of relationships equal to 14. Then we joined all of the pedigrees into one dataset, and performed the precision/recall analysis of the GRAPE pipeline for the different flows. For each degree 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>i</mml:mi>
                        </mml:math>
                    </inline-formula> of relationships we computed precision and recall metrics:
                    <disp-formula id="e2">
                        <mml:math display="block">
                            <mml:mtext>Precision</mml:mtext>
                            <mml:mfenced close=")" open="(">
                                <mml:mi>i</mml:mi>
                            </mml:mfenced>
                            <mml:mo>=</mml:mo>
                            <mml:mfrac>
                                <mml:mrow>
                                    <mml:mi>TP</mml:mi>
                                    <mml:mfenced close=")" open="(">
                                        <mml:mi>i</mml:mi>
                                    </mml:mfenced>
                                </mml:mrow>
                                <mml:mrow>
                                    <mml:mi>TP</mml:mi>
                                    <mml:mfenced close=")" open="(">
                                        <mml:mi>i</mml:mi>
                                    </mml:mfenced>
                                    <mml:mo>+</mml:mo>
                                    <mml:mi>FP</mml:mi>
                                    <mml:mfenced close=")" open="(">
                                        <mml:mi>i</mml:mi>
                                    </mml:mfenced>
                                </mml:mrow>
                            </mml:mfrac>
                            <mml:mo>;</mml:mo>
                            <mml:mspace width="1em"/>
                            <mml:mtext>Recall</mml:mtext>
                            <mml:mfenced close=")" open="(">
                                <mml:mi>i</mml:mi>
                            </mml:mfenced>
                            <mml:mo>=</mml:mo>
                            <mml:mfrac>
                                <mml:mrow>
                                    <mml:mi>TP</mml:mi>
                                    <mml:mfenced close=")" open="(">
                                        <mml:mi>i</mml:mi>
                                    </mml:mfenced>
                                </mml:mrow>
                                <mml:mrow>
                                    <mml:mi>TP</mml:mi>
                                    <mml:mfenced close=")" open="(">
                                        <mml:mi>i</mml:mi>
                                    </mml:mfenced>
                                    <mml:mo>+</mml:mo>
                                    <mml:mi>FN</mml:mi>
                                    <mml:mfenced close=")" open="(">
                                        <mml:mi>i</mml:mi>
                                    </mml:mfenced>
                                </mml:mrow>
                            </mml:mfrac>
                            <mml:mo>.</mml:mo>
                        </mml:math>
                        <label>(2)</label>
                    </disp-formula>
                </p>
                <p>Here 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>TP</mml:mi>
                            <mml:mfenced close=")" open="(">
                                <mml:mi>i</mml:mi>
                            </mml:mfenced>
                        </mml:math>
                    </inline-formula>, 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>FP</mml:mi>
                            <mml:mfenced close=")" open="(">
                                <mml:mi>i</mml:mi>
                            </mml:mfenced>
                        </mml:math>
                    </inline-formula>, 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>FN</mml:mi>
                            <mml:mfenced close=")" open="(">
                                <mml:mi>i</mml:mi>
                            </mml:mfenced>
                        </mml:math>
                    </inline-formula> are the numbers of true positive, false positive, and false negative relationship matches predicted for the degree 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>i</mml:mi>
                        </mml:math>
                    </inline-formula>. In our analysis we used non-exact (fuzzy) interval metrics. For the 1st degree, we require an exact match. For the 2nd, 3rd, and 4th degrees, we allow a degree interval of 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mo>&#x00b1;</mml:mo>
                            <mml:mn>1</mml:mn>
                        </mml:math>
                    </inline-formula>. For example, for the 2nd true degree we consider a predicted 3rd degree as a true positive match. For the 5th+ degrees, we use the ERSA confidence intervals which are typically 3-4 degrees wide. For 10th+ degrees, these intervals are 6-7 degrees wide. We also plot a confusion matrix for the predicted vs true degrees.</p>
                <p>Results of the simulation for the IBIS + ERSA &amp; KING workflow is presented in 
                    <xref ref-type="fig" rid="f4">Figure 4</xref>. The following set of parameters was used: 
                    <monospace>--ibis-seg-len 7, --ibis-min-snp 500, --zero-seg-count 0.5, --zero-seg-len 5, --alpha 0.01</monospace>. Confusion matrix is presented on the right panel in 
                    <xref ref-type="fig" rid="f4">Figure 4</xref>. There -1 stands for no-relationship. The pipeline shows recall above 90%+ for degrees from 1 to 5. It detects all relatives with 1-4 degrees. GRAPE found no false positive matches, i.e. it does not find any relationships among unrelated individuals, 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mtext>Recall</mml:mtext>
                            <mml:mfenced close=")" open="(">
                                <mml:mrow>
                                    <mml:mo>&#x2212;</mml:mo>
                                    <mml:mn>1</mml:mn>
                                </mml:mrow>
                            </mml:mfenced>
                            <mml:mo>=</mml:mo>
                            <mml:mn>1</mml:mn>
                        </mml:math>
                    </inline-formula>. Precision is above 90% among all detected degrees. We set these parameters as default for this GRAPE workflow. These parameters are quite conservative, i.e. they provide high precision but low sensitivity.</p>
                <fig fig-type="figure" id="f4" orientation="portrait" position="float">
                    <label>Figure 4. </label>
                    <caption>
                        <title>Interval (fuzzy) precision/recall (left panel) and confusion matrix (right panel) for the IBIS + ERSA &amp; KING workflow, obtained for a simulated dataset.</title>
                        <p>Parameters of the workflow: 
                            <monospace>--ibis-seg-len 7, --ibis-min-snp 500, --zero-seg-count 0.5, --zero-seg-len 5, --alpha 0.01.</monospace>
                        </p>
                    </caption>
                    <graphic id="gr4" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/123378/0d946404-7cd7-4f51-b3d4-a6d6a52e2f53_figure4.gif"/>
                </fig>
                <p>One can relax GRAPE parameters to get more relatedness matches for high degrees. On the other hand, the number of false positive matches increases as well. 
                    <xref ref-type="fig" rid="f5">Figure 5</xref> shows the simulation results for the IBIS + ERSA &amp; KING workflow for slightly relaxed parameters: 
                    <monospace>--ibis-seg-len 5, --ibis-min-snp 400, --zero-seg-count 0.1, --zero-seg-len 5, --alpha 0.01</monospace>. The number of detected relationships increases significantly for higher (
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mo>&#x2265;</mml:mo>
                        </mml:math>
                    </inline-formula>8) degrees. But false positive matches arise as well, i.e. 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mtext>Recall</mml:mtext>
                            <mml:mfenced close=")" open="(">
                                <mml:mrow>
                                    <mml:mo>&#x2212;</mml:mo>
                                    <mml:mn>1</mml:mn>
                                </mml:mrow>
                            </mml:mfenced>
                            <mml:mo>&#x2260;</mml:mo>
                            <mml:mn>1</mml:mn>
                        </mml:math>
                    </inline-formula>.</p>
                <fig fig-type="figure" id="f5" orientation="portrait" position="float">
                    <label>Figure 5. </label>
                    <caption>
                        <title>Interval (fuzzy) precision/recall (left panel) and confusion matrix (right panel) for the IBIS + ERSA &amp; KING workflow, obtained for a simulated dataset.</title>
                        <p>Parameters of the workflow: 
                            <monospace>--ibis-seg-len 5, --ibis-min-snp 400, --zero-seg-count 0.1, --zero-seg-len 5, --alpha 0.01.</monospace>
                        </p>
                    </caption>
                    <graphic id="gr5" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/123378/0d946404-7cd7-4f51-b3d4-a6d6a52e2f53_figure5.gif"/>
                </fig>
                <p>Simulation results for the GERMLINE + ERSA &amp; KING workflow is presented in 
                    <xref ref-type="fig" rid="f6">Figure 6</xref>. We used the same ERSA parameters as for 
                    <xref ref-type="fig" rid="f4">Figure 4</xref>: 
                    <monospace>--zero-seg-count 0.5, --zero-seg-len 5, --alpha 0.01</monospace>. In comparison to 
                    <xref ref-type="fig" rid="f4">Figure 4</xref>, one can see that GERMLINE slightly decreases recall for 6th and 7th degrees, but improves it for higher (
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mo>&#x2265;</mml:mo>
                        </mml:math>
                    </inline-formula>8) degrees. Precision is above 95% among all detected degrees. No false positive matches were found, i.e. 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mtext>Recall</mml:mtext>
                            <mml:mfenced close=")" open="(">
                                <mml:mrow>
                                    <mml:mo>&#x2212;</mml:mo>
                                    <mml:mn>1</mml:mn>
                                </mml:mrow>
                            </mml:mfenced>
                            <mml:mo>=</mml:mo>
                            <mml:mn>1</mml:mn>
                        </mml:math>
                    </inline-formula>. Our experiments showed that, in comparison to IBIS, GERMLINE is better suited for a careful analysis of relatively small cohorts with phased data. The IBIS algorithm is less sensitive, but for segments with length above 7 cM, it produces the same results as GERMLINE, while working with unphased data.</p>
                <fig fig-type="figure" id="f6" orientation="portrait" position="float">
                    <label>Figure 6. </label>
                    <caption>
                        <title>Interval (fuzzy) precision/recall (left panel) and confusion matrix (right panel) for the GERMLINE + ERSA &amp; KING workflow, obtained for a simulated dataset.</title>
                        <p>Parameters of the workflow: 
                            <monospace>--zero-seg-count 0.5, --zero-seg-len 5, --alpha 0.01.</monospace>
                        </p>
                    </caption>
                    <graphic id="gr6" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/123378/0d946404-7cd7-4f51-b3d4-a6d6a52e2f53_figure6.gif"/>
                </fig>
            </sec>
            <sec id="sec22">
                <title>Performance of the IBD segments weighing</title>
                <p>Our simulation experiments showed that both weighing and exclusion approaches reduce the false-positive rate and significantly improves the overall performance of the pipeline. By the way, weighing mask can be better adapted to specific ancestries, and after additional parameters tuning may slightly outperform ERSA 1.0 approach. In 
                    <xref ref-type="fig" rid="f7">Figure 7</xref> comparison between between the ERSA 1.0 exclusion mask and the GRAPE weight mask is presented for a simulated dataset with the founders of East Asian Ancestry from 1KGP. IBIS workflow was used. Weight mask was taken from 
                    <xref ref-type="fig" rid="f2">Figure 2</xref>. Parameter --zero-seg-count was varied while using weight mask to achieve the same level of precision. One can see, that with approximately the same precision, the weighing approach gives several percentages higher recall.</p>
                <fig fig-type="figure" id="f7" orientation="portrait" position="float">
                    <label>Figure 7. </label>
                    <caption>
                        <title>Interval (fuzzy) precision/recall comparison between the ERSA 1.0 exclusion mask and the GRAPE weight mask.</title>
                        <p>Metrics differences are marked with colors: green, if the metric has increased while using weighing; and red, if the metric has decreased. Common parameters for the both workflows: 
                            <monospace>--ibis-seg-len 5, --ibis-min-snp 400 --zero-seg-len 5, --alpha 0.01</monospace>. Parameter --zero-seg-count equals 0.1 while using weight mask, and 0.3 while using original ERSA 1.0 exclusion mask.</p>
                    </caption>
                    <graphic id="gr7" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/123378/0d946404-7cd7-4f51-b3d4-a6d6a52e2f53_figure7.gif"/>
                </fig>
            </sec>
            <sec id="sec23">
                <title>Comparison with TRIBES</title>
                <p>In the end, we compared GRAPE with TRIBES.
                    <sup>
                        <xref ref-type="bibr" rid="ref23">23</xref>
                    </sup> TRIBES is an earlier open-source pipeline for relatedness detection. The pipeline combines the GERMLINE algorithm for IBD segments detection and the calculation of the genome proportion with zero alleles inferred IBD (IBD0) for each pair to detect the relatedness. If the data is not phased, TRIBES provides an ability to phase data with the EAGLE tool. This part of TRIBES is similar to one of the corresponding GRAPE workflows of IBD segments detection. In contrast to GRAPE, TRIBES estimates degrees of relationship according to expected IBD0 segments proportion ranges. GRAPE uses the ERSA algorithm, which, to our knowledge, is a more advanced approach.</p>
                <p>We have run the TRIBES pipeline on the same simulated datasets. Since the simulated datasets contain unphased data, we also applied built-in phasing procedure from TRIBES. The result of the analysis is presented in 
                    <xref ref-type="fig" rid="f8">Figure 8</xref>. TRIBES has demonstrated high detection power for distant relationships of up to the 12th degree. Given that many 13+ degree relatives do not share any IBD segments, this is near a theoretical limit. However, TRIBES produces a 
                    <italic toggle="yes">huge</italic> number of false positive matches, see confusion matrix in the right panel of the 
                    <xref ref-type="fig" rid="f8">Figure 8</xref>. Since TRIBES lacks options that allow users to control false positives rates by varying pipeline parameters, it becomes a crucial drawback. This obstacle does not allow the TRIBES pipeline to be adapted for applications, where desired precision/recall rate may vary depending on different business or research objectives.</p>
                <fig fig-type="figure" id="f8" orientation="portrait" position="float">
                    <label>Figure 8. </label>
                    <caption>
                        <title>Interval (fuzzy) precision/recall (left panel) and confusion matrix (right panel) for the TRIBES pipeline results, obtained for a simulated dataset.</title>
                    </caption>
                    <graphic id="gr8" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/123378/0d946404-7cd7-4f51-b3d4-a6d6a52e2f53_figure8.gif"/>
                </fig>
            </sec>
        </sec>
        <sec id="sec24" sec-type="conclusions">
            <title>Conclusions</title>
            <p>In the current paper, we introduced GRAPE: genomic relatedness detection pipeline. We performed the careful selection of tools, and combined various preprocessing steps, IBD segments detection tools, and algorithms of estimations of relationship into a single pipeline. One of the possible workflows of the pipeline is based on IBIS tools and can work with unphased data. It&#x2019;s suitable for the analysis of large cohorts in a relatively short time. Using this workflow GRAPE can perform relationship estimation among 100k samples in 22 hours. Another possible workflow is based on GERMLINE IBD segments detection tool and works with phased data. Our experiments showed that GERMLINE has the most detection power, while IBIS option is the fastest, easiest to use, and has sufficient accuracy for 1-8 degrees of relationship. Finally, we compared GRAPE with TRIBES, another relatedness detection pipeline. In contrast to GRAPE, TRIBES produce a huge number of false positive matches, requires phased data, and lacks important preprocessing and evaluation options, which makes it impractical for various applications. GRAPE is proved to be a reliable and accurate tool for the analysis of close and distant degrees of kinship. It provides an ability to control a false positive rate, can work with heterogeneous data obtained from various chips, and is ready for production integration.</p>
        </sec>
        <sec id="sec25">
            <title>Data availability</title>
            <sec id="sec26">
                <title>Source data</title>
                <p>Publicly available datasets were used to test GRAPE. These datasets are available from:
                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>1000 Genomes Project
                                <sup>
                                    <xref ref-type="bibr" rid="ref25">25</xref>
                                </sup>;</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>URL: 
                                <ext-link ext-link-type="uri" xlink:href="https://www.internationalgenome.org/data-portal/data-collection/phase-3">https://www.internationalgenome.org/data-portal/data-collection/phase-3</ext-link>.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>Khazar Origin for Ashkenazi Jews
                                <sup>
                                    <xref ref-type="bibr" rid="ref33">33</xref>
                                </sup>;</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>URL: 
                                <ext-link ext-link-type="uri" xlink:href="https://evolbio.ut.ee/khazar">https://evolbio.ut.ee/khazar</ext-link>.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>Allen Ancient DNA Resource (AADR)
                                <sup>
                                    <xref ref-type="bibr" rid="ref2">2</xref>
                                </sup>;</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>URL: 
                                <ext-link ext-link-type="uri" xlink:href="https://reich.hms.harvard.edu/allen-ancient-dna-resource-aadr-downloadable-genotypes-present-day-and-ancient-dna-data">https://reich.hms.harvard.edu/allen-ancient-dna-resource-aadr-downloadable-genotypes-present-day-and-ancient-dna-data</ext-link>.</p>
                        </list-item>
                    </list>
                </p>
            </sec>
            <sec id="sec27">
                <title>Software availability</title>
                <p>GRAPE pipeline can be accessed from the following pubic available resources:
                    <list list-type="bullet">
                        <list-item>
                            <label>&#x2022;</label>
                            <p>GitHub: 
                                <ext-link ext-link-type="uri" xlink:href="https://github.com/genxnetwork/grape">https://github.com/genxnetwork/grape</ext-link>.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>Docker Hub: 
                                <ext-link ext-link-type="uri" xlink:href="https://hub.docker.com/r/genxnetwork/grape">https://hub.docker.com/r/genxnetwork/grape</ext-link>.</p>
                        </list-item>
                        <list-item>
                            <label>&#x2022;</label>
                            <p>Dockstore: 
                                <ext-link ext-link-type="uri" xlink:href="https://dockstore.org/organizations/GenX/collections/GRAPE">https://dockstore.org/organizations/GenX/collections/GRAPE</ext-link>.</p>
                        </list-item>
                    </list>
                </p>
                <p>Archived source code at time of publication: 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.6482561">https://doi.org/10.5281/zenodo.6482561</ext-link>.
                    <sup>
                        <xref ref-type="bibr" rid="ref37">37</xref>
                    </sup>
                </p>
                <p>License: 
                    <ext-link ext-link-type="uri" xlink:href="http://www.gnu.org/licenses/gpl-3.0.en.html">GPLv3</ext-link>.</p>
            </sec>
        </sec>
    </body>
    <back>
        <ack>
            <title>Acknowledgments</title>
            <p>An earlier version of this article can be found on bioRxiv (doi: 
                <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1101/2022.03.11.483988">https://doi.org/10.1101/2022.03.11.483988</ext-link>).</p>
        </ack>
        <ref-list>
            <title>References</title>
            <ref id="ref1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Posey</surname>
                            <given-names>JE</given-names>
                        </name>

                        <name name-style="western">
                            <surname>O&#x2019;Donnell-Luria</surname>
                            <given-names>AH</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Chong</surname>
                            <given-names>JX</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Insights into genetics, human biology and disease gleaned from family based genomic studies.</article-title>
                    <source>

                        <italic toggle="yes">Genet. Med.</italic>
</source>
                    <year>Apr 2019</year>;<volume>21</volume>(<issue>4</issue>):<fpage>798</fpage>&#x2013;<lpage>812</lpage>.
                    <pub-id pub-id-type="pmid">30655598</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41436-018-0408-7</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Posey</surname>
                            <given-names>JE</given-names>
                        </name>
</person-group>.
                    <article-title>Genome sequencing and implications for rare disorders.</article-title>
                    <source>

                        <italic toggle="yes">Orphanet J. Rare Dis.</italic>
</source>
                    <year>Jun 2019</year>;<volume>14</volume>(<issue>1</issue>):<fpage>153</fpage>.
                    <pub-id pub-id-type="pmid">31234920</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13023-019-1127-0</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Marees</surname>
                            <given-names>AT</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kluiver</surname>
                            <given-names>H</given-names>
                            <prefix>de</prefix>
                        </name>

                        <name name-style="western">
                            <surname>Stringer</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A tutorial on conducting genome-wide association studies: Quality control and statistical analysis.</article-title>
                    <source>

                        <italic toggle="yes">Int. J. Methods Psychiatr. Res.</italic>
</source>
                    <year>2018</year>;<volume>27</volume>(<issue>2</issue>):<fpage>e1608</fpage>.
                    <pub-id pub-id-type="pmid">29484742</pub-id>
                    <pub-id pub-id-type="doi">10.1002/mpr.1608</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Turner</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Armstrong</surname>
                            <given-names>LL</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Bradford</surname>
                            <given-names>Y</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Ritchie. Quality control procedures for genome-wide association studies.</article-title>
                    <source>

                        <italic toggle="yes">Curr. Protoc. Hum. Genet.</italic>
</source>
                    <year>2011</year>;<volume>Chapter 1</volume>(<issue>1</issue>):<fpage>Unit1.19</fpage>&#x2013;<lpage>1.19.18</lpage>.
                    <pub-id pub-id-type="pmid">21234875</pub-id>
                    <pub-id pub-id-type="doi">10.1002/0471142905.hg0119s68</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ramstetter</surname>
                            <given-names>MD</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Dyer</surname>
                            <given-names>TD</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lehman</surname>
                            <given-names>DM</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Benchmarking relatedness inference methods with genome-wide data from thousands of relatives.</article-title>
                    <source>

                        <italic toggle="yes">Genetics.</italic>
</source>
                    <year>2017</year>;<volume>207</volume>(<issue>1</issue>):<fpage>75</fpage>&#x2013;<lpage>82</lpage>.
                    <pub-id pub-id-type="pmid">28739658</pub-id>
                    <pub-id pub-id-type="doi">10.1534/genetics.117.1122</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Sudlow</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gallacher</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Allen</surname>
                            <given-names>N</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age.</article-title>
                    <source>

                        <italic toggle="yes">PLoS Med.</italic>
</source>
                    <year>03 2015</year>;<volume>12</volume>(<issue>3</issue>):<fpage>e1001779</fpage>&#x2013;<lpage>e1001710</lpage>.
                    <pub-id pub-id-type="pmid">25826379</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pmed.1001779</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Gusev</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lowe</surname>
                            <given-names>JK</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Stoffel</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Whole population, genome-wide mapping of hidden relatedness.</article-title>
                    <source>

                        <italic toggle="yes">Genome Res.</italic>
</source>
                    <year>2009</year>;<volume>19</volume>(<issue>2</issue>):<fpage>318</fpage>&#x2013;<lpage>326</lpage>.
                    <pub-id pub-id-type="pmid">18971310</pub-id>
                    <pub-id pub-id-type="doi">10.1101/gr.081398.108</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Seidman</surname>
                            <given-names>DN</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Shenoy</surname>
                            <given-names>SA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kim</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Rapid, Phase-free Detection of Long Identity-by-Descent Segments Enables Effective Relationship Classification.</article-title>
                    <source>

                        <italic toggle="yes">Am. J. Hum. Genet.</italic>
</source>
                    <year>2020</year>;<volume>106</volume>(<issue>4</issue>):<fpage>453</fpage>&#x2013;<lpage>466</lpage>.
                    <pub-id pub-id-type="pmid">32197076</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.ajhg.2020.02.012</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Seidman</surname>
                            <given-names>DN</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Shenoy</surname>
                            <given-names>SA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kim</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Rapid, phase-free detection of long identity-by-descent segments enables effective relationship classification.</article-title>
                    <source>

                        <italic toggle="yes">Am. J. Hum. Genet.</italic>
</source>
                    <year>2020</year>;<volume>106</volume>(<issue>4</issue>):<fpage>453</fpage>&#x2013;<lpage>466</lpage>.
                    <pub-id pub-id-type="pmid">32197076</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.ajhg.2020.02.012</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Freyman</surname>
                            <given-names>WA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mcmanus</surname>
                            <given-names>KF</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Shringarpure</surname>
                            <given-names>SS</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Fast and Robust Identity-by-Descent Inference with the Templated Positional Burrows-Wheeler Transform.</article-title>
                    <source>

                        <italic toggle="yes">Mol. Biol. Evol.</italic>
</source>
                    <year>2021</year>;<volume>38</volume>(<issue>5</issue>):<fpage>2131</fpage>&#x2013;<lpage>2151</lpage>.
                    <pub-id pub-id-type="pmid">33355662</pub-id>
                    <pub-id pub-id-type="doi">10.1093/molbev/msaa328</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ramstetter</surname>
                            <given-names>MD</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Shenoy</surname>
                            <given-names>SA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Dyer</surname>
                            <given-names>TD</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Inferring identical-by-descent sharing of sample ancestors promotes high-resolution relative detection.</article-title>
                    <source>

                        <italic toggle="yes">Am. J. Hum. Genet.</italic>
</source>
                    <year>Jul 2018</year>;<volume>103</volume>(<issue>1</issue>):<fpage>30</fpage>&#x2013;<lpage>44</lpage>.
                    <pub-id pub-id-type="pmid">29937093</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.ajhg.2018.05.008</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Huff</surname>
                            <given-names>CD</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Witherspoon</surname>
                            <given-names>DJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Simonson</surname>
                            <given-names>TS</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Maximum-likelihood estimation of recent shared ancestry (ERSA).</article-title>
                    <source>

                        <italic toggle="yes">Genome Res.</italic>
</source>
                    <year>2011</year>;<volume>21</volume>(<issue>5</issue>):<fpage>768</fpage>&#x2013;<lpage>774</lpage>.
                    <pub-id pub-id-type="pmid">21324875</pub-id>
                    <pub-id pub-id-type="doi">10.1101/gr.115 972.110</pub-id>
                    <pub-id pub-id-type="pmcid">PMC3083094</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref13">
                <label>13</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Li</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Glusman</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hu</surname>
                            <given-names>H</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Relationship Estimation from Whole-Genome Sequence Data.</article-title>
                    <source>

                        <italic toggle="yes">PLoS Genet.</italic>
</source>
                    <year>2014</year>;<volume>10</volume>(<issue>1</issue>):<fpage>e1004144</fpage>.
                    <pub-id pub-id-type="doi">10.1371/journal.pgen.1004144</pub-id>
                    <pub-id pub-id-type="pmid">24497848</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref14">
                <label>14</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Manichaikul</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mychaleckyj</surname>
                            <given-names>JC</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Rich</surname>
                            <given-names>SS</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Robust relationship inference in genome-wide association studies.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics (Oxford, England).</italic>
</source>
                    <year>Nov 2010</year>;<volume>26</volume>(<issue>22</issue>):<fpage>2867</fpage>&#x2013;<lpage>2873</lpage>.
                    <pub-id pub-id-type="pmid">20926424</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btq559</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref15">
                <label>15</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Loh</surname>
                            <given-names>P-R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Danecek</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Palamara</surname>
                            <given-names>PF</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Reference-based phasing using the Haplotype Reference Consortium panel.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Genet.</italic>
</source>
                    <year>Nov 2016</year>;<volume>48</volume>(<issue>11</issue>):<fpage>1443</fpage>&#x2013;<lpage>1448</lpage>.
                    <pub-id pub-id-type="pmid">27694958</pub-id>
                    <pub-id pub-id-type="doi">10.1038/ng.3679</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref16">
                <label>16</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Fuchsberger</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Abecasis</surname>
                            <given-names>GR</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hinds</surname>
                            <given-names>DA</given-names>
                        </name>
</person-group>:
                    <article-title>Minimac2: Faster genotype imputation.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2015</year>;<volume>31</volume>(<issue>5</issue>):<fpage>782</fpage>&#x2013;<lpage>784</lpage>.
                    <pub-id pub-id-type="pmid">25338720</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btu704</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref17">
                <label>17</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>M&#x00f6;lder</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Jablonski</surname>
                            <given-names>KP</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Letcher</surname>
                            <given-names>B</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Sustainable data analysis with snakemake [version 2; peer review: 2 approved].</article-title>
                    <source>

                        <italic toggle="yes">F1000Res.</italic>
</source>
                    <volume>10</volume>(<issue>33</issue>):<fpage>2021</fpage>.
                    <pub-id pub-id-type="pmid">34035898</pub-id>
                    <pub-id pub-id-type="doi">10.12688/f1000research.29032.2</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref18">
                <label>18</label>
                <mixed-citation publication-type="other">
                    <collab>Anaconda software distribution</collab>:<year>2020</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://docs.anaconda.com/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref19">
                <label>19</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Merkel</surname>
                            <given-names>D</given-names>
                        </name>
</person-group>:
                    <article-title>Docker: lightweight linux containers for consistent development and deployment.</article-title>
                    <source>

                        <italic toggle="yes">Linux Journal.</italic>
</source>
                    <year>2014</year>;<volume>2014</volume>(<issue>239</issue>):<fpage>2</fpage>.</mixed-citation>
            </ref>
            <ref id="ref20">
                <label>20</label>
                <mixed-citation publication-type="other">
                    <collab>Task Execution Service (TES) API</collab>:<year>2022</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/ga4gh/task-execution-schemas">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref21">
                <label>21</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>O&#x2019;Connor</surname>
                            <given-names>BD</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Yuen</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Chung</surname>
                            <given-names>V</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows [version 1; peer review: 2 approved].</article-title>
                    <source>

                        <italic toggle="yes">F1000Res.</italic>
</source>
                    <year>2017</year>;<volume>6</volume>(<issue>52</issue>).
                    <pub-id pub-id-type="pmid">28344774</pub-id>
                    <pub-id pub-id-type="doi">10.12688/f1000research.10137.1</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref22">
                <label>22</label>
                <mixed-citation publication-type="other">
                    <collab>Global Alliance for Genomics and Health (GA4GH)</collab>:<year>2022</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://www.ga4gh.org/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref23">
                <label>23</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Twine</surname>
                            <given-names>NA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Szul</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Henden</surname>
                            <given-names>L</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>TRIBES: A user-friendly pipeline for relatedness detection and disease gene discovery.</article-title>
                    <source>

                        <italic toggle="yes">bioRxiv.</italic>
</source>
                    <year>2019</year>; pages 0&#x2013;1.
                    <pub-id pub-id-type="doi">10.1101/686253</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref24">
                <label>24</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Caballero</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Seidman</surname>
                            <given-names>DN</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Qiao</surname>
                            <given-names>Y</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Crossover interference and sex-specific genetic maps shape identical by descent sharing in close relatives.</article-title>
                    <source>

                        <italic toggle="yes">PLoS Genet.</italic>
</source>
                    <year>2019</year>;<volume>15</volume>(<issue>12</issue>):<fpage>1</fpage>&#x2013;<lpage>29</lpage>.
                    <pub-id pub-id-type="doi">10.1371/journal.pgen.1007979</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref25">
                <label>25</label>
                <mixed-citation publication-type="journal">
                    <collab>The 1000 Genomes Project Consortium</collab>:
                    <article-title>A global reference for human genetic variation.</article-title>
                    <source>

                        <italic toggle="yes">Nature.</italic>
</source>
                    <year>2015</year>;<volume>526</volume>(<issue>7571</issue>):<fpage>68</fpage>&#x2013;<lpage>74</lpage>.
                    <pub-id pub-id-type="pmid">26432245</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nature15393</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref26">
                <label>26</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bherer</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Campbell</surname>
                            <given-names>CL</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Auton</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>Refined genetic maps reveal sexual dimorphism in human meiotic recombination at multiple scales.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Commun.</italic>
</source>
                    <year>2017</year>;<volume>8</volume>.
                    <pub-id pub-id-type="pmid">28440270</pub-id>
                    <pub-id pub-id-type="doi">10.1038/ncomms14994</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref27">
                <label>27</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Albrechtsen</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Moltke</surname>
                            <given-names>I</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Nielsen</surname>
                            <given-names>R</given-names>
                        </name>
</person-group>:
                    <article-title>Natural selection and the distribution of identity-by-descent in the human genome.</article-title>
                    <source>

                        <italic toggle="yes">Genetics.</italic>
</source>
                    <year>Sep 2010</year>;<volume>186</volume>(<issue>1</issue>):<fpage>295</fpage>&#x2013;<lpage>308</lpage>.
                    <pub-id pub-id-type="pmid">20592267</pub-id>
                    <pub-id pub-id-type="doi">10.1534/genetics.110.113977</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref28">
                <label>28</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ball</surname>
                            <given-names>CA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Barber</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Byrnes</surname>
                            <given-names>JK</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>AncestryDNA Matching White Paper Discovering genetic matches across a massive, expanding genetic database.</article-title>
                    <year>2016</year>.</mixed-citation>
            </ref>
            <ref id="ref29">
                <label>29</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Rousseeuw</surname>
                            <given-names>PJ</given-names>
                        </name>
</person-group>:
                    <article-title>Least median of squares regression.</article-title>
                    <source>

                        <italic toggle="yes">J. Am. Stat. Assoc.</italic>
</source>
                    <year>1984</year>;<volume>79</volume>(<issue>388</issue>):<fpage>871</fpage>&#x2013;<lpage>880</lpage>.
                    <pub-id pub-id-type="doi">10.1080/01621459.1984.10477105</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref30">
                <label>30</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Sobell</surname>
                            <given-names>MG</given-names>
                        </name>
</person-group>:
                    <source>

                        <italic toggle="yes">A practical guide to Ubuntu Linux.</italic>
</source>
                    <publisher-name>Pearson Education</publisher-name>;<year>2015</year>.</mixed-citation>
            </ref>
            <ref id="ref31">
                <label>31</label>
                <mixed-citation publication-type="other">
                    <collab>Funnel</collab>:<year>2022</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/ohsu-comp-bio/funnel">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref32">
                <label>32</label>
                <mixed-citation publication-type="other">
                    <collab>Allen Ancient DNA Resource (AADR)</collab>:<year>2021</year>.Accessed: 2021-02-27.
                    <ext-link ext-link-type="uri" xlink:href="https://reichdata.hms.harvard.edu/pub/datasets/amh_repo/curated_releases/index_v44.3.html">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref33">
                <label>33</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Behar</surname>
                            <given-names>DM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Metspalu</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Baran</surname>
                            <given-names>Y</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>No Evidence from Genome-wide Data of a Khazar Origin for the Ashkenazi Jews.</article-title>
                    <source>

                        <italic toggle="yes">Hum. Biol.</italic>
</source>
                    <year>2013</year>;<volume>85</volume>(<issue>6</issue>):<fpage>859</fpage>&#x2013;<lpage>900</lpage>.
                    <pub-id pub-id-type="pmid">25079123</pub-id>
                    <pub-id pub-id-type="doi">10.13110/humanbiology.85.6.0859</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref34">
                <label>34</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zhuang</surname>
                            <given-names>Z</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gusev</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Cho</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Detecting Identity by Descent and Homozygosity Mapping in Whole-Exome Sequencing Data.</article-title>
                    <source>

                        <italic toggle="yes">PLoS One.</italic>
</source>
                    <year>2012</year>;<volume>7</volume>(<issue>10</issue>):<fpage>e47618</fpage>&#x2013;<lpage>7</lpage>.
                    <pub-id pub-id-type="pmid">23071825</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pone.0047618</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref35">
                <label>35</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Henn</surname>
                            <given-names>BM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hon</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Macpherson</surname>
                            <given-names>JM</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Cryptic distant relatives are common in both isolated and cosmopolitan genetic samples.</article-title>
                    <source>

                        <italic toggle="yes">PLoS One.</italic>
</source>
                    <year>2012</year>;<volume>7</volume>(<issue>4</issue>).
                    <pub-id pub-id-type="pmid">22509285</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pone.0034267</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref36">
                <label>36</label>
                <mixed-citation publication-type="other">
                    <collab>Atlas Biomed</collab>:<year>2022</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://atlasbiomed.com/uk">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref37">
                <label>37</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Medvedev</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lebedev</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ponomarev</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Grape: genomic relatedness detection pipeline.</article-title>
                    <year>Apr 2022</year>.
                    <pub-id pub-id-type="doi">10.5281/zenodo.6482561</pub-id>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report143937">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.123378.r143937</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Naseri</surname>
                        <given-names>Ardalan</given-names>
                    </name>
                    <xref ref-type="aff" rid="r143937a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-2747-2193</uri>
                </contrib>
                <aff id="r143937a1">
                    <label>1</label>School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>3</day>
                <month>8</month>
                <year>2022</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2022 Naseri A</copyright-statement>
                <copyright-year>2022</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport143937" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.111658.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>Medvedev et al. have developed a user-friendly software/pipeline for relatedness inferences from genetic data. The pipeline includes&#x00a0;data preprocessing, identity-by-descent (IBD) segment detection, and relationship&#x00a0;inferences. The pipeline has incorporated several software inside a Docker that enables a very convenient solution for users aiming to infer relationships from genetic data.&#x00a0;</p>
            <p> </p>
            <p> The authors have documented the performance of the pipeline regarding the accuracy and run time using both simulated and real-world data. There are a few suggestions/comments that could improve this work: 
                <list list-type="order">
                    <list-item>
                        <p>The memory usage of the pipeline has not been mentioned in the article. It would be useful to provide more information about the required memory usage.</p>
                    </list-item>
                    <list-item>
                        <p>Current biobank-scale data contain hundreds of thousands of individuals. The run time for 100k individuals was reported as 22 hours using 8 CPU cores (for IBIS + ERSA). The user may want to know if they can run the pipeline in large panels within a reasonable time. The run time of the pipeline using IBD detection (with GERMLINE) was also not included. More recent IBD detection tools (e.g. hapIBD, RaPID and iLASH) can handle very large cohorts efficiently. Incorporating other IBD detection tools might also improve the efficiency of the pipeline regarding the run time and/or memory usage.</p>
                    </list-item>
                    <list-item>
                        <p>Robustness against genotyping errors, different marker densities, and different populations have not been studied well. The authors have pointed out the limitation with a small isolated population though.&#x00a0;</p>
                    </list-item>
                </list>
            </p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>Yes</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Yes</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Yes</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Yes</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Computational Biology, Bioinformatics, Population Genetics</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment9459-143937">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Medvedev</surname>
                            <given-names>Aleksandr</given-names>
                        </name>
                        <aff/>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>15</day>
                    <month>3</month>
                    <year>2023</year>
                </pub-date>
            </front-stub>
            <body>
                <p>Medvedev et al. have developed a user-friendly software/pipeline for relatedness inferences from genetic data. The pipeline includes&#x00a0;data preprocessing, identity-by-descent (IBD) segment detection, and relationship&#x00a0;inferences. The pipeline has incorporated several software inside a Docker that enables a very convenient solution for users aiming to infer relationships from genetic data.</p>
                <p> </p>
                <p> The authors have documented the performance of the pipeline regarding the accuracy and run time using both simulated and real-world data. There are a few suggestions/comments that could improve this work: 
                    <list list-type="order">
                        <list-item>
                            <p>The 
                                <bold>memory usage</bold> of the pipeline has not been mentioned in the article. It would be useful to provide more information about the required memory usage.</p>
                            <p> &#x00a0; 
                                <list list-type="order">
                                    <list-item>
                                        <p>
                                            <bold>Answer:</bold> We implemented several batch preprocessing and postprocessing routines to ensure that for the dataset of every size memory usage will not be more than 16GB RAM.</p>
                                    </list-item>
                                </list> </p>
                        </list-item>
                        <list-item>
                            <p>Current 
                                <bold>biobank-scale data</bold> contain hundreds of thousands of individuals. The run time for 100k individuals was reported as 22 hours using 8 CPU cores (for IBIS + ERSA). The user may want to know if they can run the pipeline in 
                                <bold>large panels</bold> within a reasonable time. The run time of the pipeline using IBD detection (with 
                                <bold>GERMLINE</bold>) was also 
                                <bold>not included</bold>. More recent IBD detection tools (e.g. 
                                <bold>hapIBD, RaPID and iLASH)</bold> can handle very large cohorts efficiently. Incorporating other IBD detection tools might also improve the efficiency of the pipeline regarding the run time and/or memory usage.</p>
                            <p> &#x00a0; 
                                <list list-type="order">
                                    <list-item>
                                        <p>
                                            <bold>Answer:</bold> Based on Figure 3 and the fact that IBIS and ERSA algorithms has O(n^2) complexity, when n is the number of samples, we can estimate that running time on n~=500k will be ~20-25 days. This is comparable with time required for phasing a dataset of this size. We are planning to investigate accuracy and performance on hapIBD, RaPID and iLASH for phased datasets, but unique advantage of IBIS-based workflows is that it does not require phasing. We chose GERMLINE as the well-known method with an appropriate license. iLASH does not have an MIT or GPL-like license.</p>
                                    </list-item>
                                    <list-item>
                                        <p>
                                            <bold>Answer:&#x00a0;</bold>In the new article version we added beta RaPID support and compared it with IBIS and GERMLINE. RaPID-based workflow took 45 min to run on 100K dataset and 6 hours to run on 500K dataset. But the dataset was already phased.</p>
                                    </list-item>
                                </list> </p>
                        </list-item>
                        <list-item>
                            <p>
                                <bold>Robustness against genotyping errors, different marker densities, and different populations</bold> have not been studied well. The authors have pointed out the limitation with a small isolated population though.</p>
                            <p> &#x00a0; 
                                <list list-type="order">
                                    <list-item>
                                        <p>
                                            <bold>Answer:</bold> IBIS-based workflows detect relatively big IBD segments (more than 5-7 cM). We hope that it will provide some robustness to genotyping error. We also provide a &#x2014;remove imputation option in the preprocessing workflow, because we found out that some imputed regions produce many false-positive IBD matches.</p>
                                    </list-item>
                                    <list-item>
                                        <p>
                                            <bold>Answer:&#x00a0;</bold>IBIS and RaPID-based workflows should not detect many false-positive IBD segments in the low-density regions, because they constrain both minimum number of SNPs in an IBD match and minimum cM length. However, low-density regions could produce many false-negatives.</p>
                                    </list-item>
                                </list> </p>
                        </list-item>
                    </list>
                </p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report139363">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.123378.r139363</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Zhou</surname>
                        <given-names>Ying</given-names>
                    </name>
                    <xref ref-type="aff" rid="r139363a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-8107-3927</uri>
                </contrib>
                <aff id="r139363a1">
                    <label>1</label>Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>12</day>
                <month>7</month>
                <year>2022</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2022 Zhou Y</copyright-statement>
                <copyright-year>2022</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport139363" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.111658.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>Medvedev et al.&#x00a0;presented a user-friendly pipeline for detecting both the distant and the close relatedness. The design is straightforward, and this pipeline includes different inference strategies that can deal with both phased and unphased data, and it can be run inside a Docker container, which makes it very applicable in real analysis. It is also appreciable that authors devoted lots of times in building this pipeline. I have only three comments: 
                <list list-type="order">
                    <list-item>
                        <p>I may understand why to use the King software for close relatedness inference while using IBD based method for distant relatedness inference. It is because large IBD cutoff for example 7cM could lead to loss of IBD segments and bias the kinship estimation. However, when the target population is structured or admixed, the King software may not work as well as IBD based inference.</p>
                    </list-item>
                    <list-item>
                        <p>What should I do if I need to run analysis on a large sample size, for example, UK biobank (n~=500k)? The largest test data in this work is 100k, while some other methods declare they can deal with kinship estimation on much larger sample sizes, see ref
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-139363-1">1</xref>
                            </sup>.</p>
                    </list-item>
                    <list-item>
                        <p>If I want to find out all potential relatives of one target sample in a reference database, with this pipeline, I need to merge the data first and then detect all relatedness. This is not efficient. An alternative solution is to use query based IBD detection, see ref
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-139363-2">2</xref>
                            </sup>.</p>
                    </list-item>
                </list>
            </p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>Yes</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Yes</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Yes</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Yes</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>population genetics, computational genetics</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <back>
            <ref-list>
                <title>References</title>
                <ref id="rep-ref-139363-1">
                    <label>1</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>IBDkin: fast estimation of kinship coefficients from identity by descent segments</article-title>.
                        <source>
                            <italic>Bioinformatics</italic>
                        </source>.<year>2020</year>;<volume>36</volume>(<issue>16</issue>) :
                        <elocation-id>10.1093/bioinformatics/btaa569</elocation-id>
                        <fpage>4519</fpage>-<lpage>4520</lpage>
                        <pub-id pub-id-type="doi">10.1093/bioinformatics/btaa569</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-139363-2">
                    <label>2</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Efficient haplotype matching between a query and a panel for genealogical search</article-title>.
                        <source>
                            <italic>Bioinformatics</italic>
                        </source>.<year>2019</year>;<volume>35</volume>(<issue>14</issue>) :
                        <elocation-id>10.1093/bioinformatics/btz347</elocation-id>
                        <fpage>i233</fpage>-<lpage>i241</lpage>
                        <pub-id pub-id-type="doi">10.1093/bioinformatics/btz347</pub-id>
                    </mixed-citation>
                </ref>
            </ref-list>
        </back>
        <sub-article article-type="response" id="comment9458-139363">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Medvedev</surname>
                            <given-names>Aleksandr</given-names>
                        </name>
                        <aff/>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>15</day>
                    <month>3</month>
                    <year>2023</year>
                </pub-date>
            </front-stub>
            <body>
                <p>
                    <list list-type="order">
                        <list-item>
                            <p>I may understand why to use the KING software for close relatedness inference while using IBD based method for distant relatedness inference. It is because a large IBD cutoff for example 7cM could lead to a loss of IBD segments and bias the kinship estimation. However, 
                                <bold>when the target population is structured or admixed</bold>, the 
                                <bold>KING</bold> software 
                                <bold>may not work</bold> as well as 
                                <bold>IBD-based inference</bold>. [paper updated]</p>
                            <p> &#x00a0; 
                                <list list-type="order">
                                    <list-item>
                                        <p>
                                            <bold>Answer:</bold> For the close relatedness inference we use the KING ibdseg option by default, which also uses IBD segments for the degree estimation. Also, we calculate the kinship coefficient using the legacy KING kinship option and also report kinship-based degrees in our output file. We changed the description of the usage of KING in our paper.</p>
                                    </list-item>
                                </list> </p>
                        </list-item>
                        <list-item>
                            <p>What should I do if I need to run the analysis on a large sample size, for example, UK biobank (n~=
                                <bold>500k</bold>)? The largest test data in this work is 100k, while some other methods declare they can deal with kinship estimation on much larger sample sizes, see 
                                <ext-link ext-link-type="uri" xlink:href="https://academic.oup.com/bioinformatics/article/36/16/4519/5858978">https://academic.oup.com/bioinformatics/article/36/16/4519/5858978</ext-link>. 
                                <list list-type="order">
                                    <list-item>
                                        <p>
                                            <bold>Answer:</bold> Based on Figure 3 and the fact that the IBIS and ERSA algorithms have O(n^2) complexity when n is the number of samples, we can estimate that running time on n~=500k will be ~20-25 days on the system specified in our article. This is comparable with time required for phasing a dataset of this size. We looked into IBDKin and we are willing to evaluate it in our future GRAPE version as a King substitution. However, as far as we understand, it does not infer distant degrees yet.</p>
                                    </list-item>
                                    <list-item>
                                        <p>
                                            <bold>Answer:&#x00a0;</bold>In the new article version we added beta RaPID support and compared it with IBIS and GERMLINE. RaPID-based workflow took 45 min to run on the 100K dataset and 6 hours to run on the 500K dataset. But the dataset was already phased.</p>
                                    </list-item>
                                </list> </p>
                        </list-item>
                        <list-item>
                            <p>If I want to find out all potential relatives of 
                                <bold>one target sample</bold> 
                                <bold>in a reference database</bold>, with this pipeline, I need to merge the data first and then detect all relatedness. This is not efficient. An alternative solution is to use 
                                <bold>query-based IBD detection</bold>, see 
                                <ext-link ext-link-type="uri" xlink:href="https://academic.oup.com/bioinformatics/article/35/14/i233/5529240">https://academic.oup.com/bioinformatics/article/35/14/i233/5529240</ext-link>. 
                                <list list-type="order">
                                    <list-item>
                                        <p>
                                            <bold>Answer:</bold> We plan to implement a query-based IBD detection for IBIS in one of the future GRAPE releases. We checked the paper about PBWT queries and it appears that it works only with haplotypes and not genotypes, i.e. phased data. However, we plan to test RaPID and hapIBD on phased data as a GERMLINE replacement and possibly implement query-based IBD detection for them.</p>
                                    </list-item>
                                </list> </p>
                        </list-item>
                    </list>
                </p>
            </body>
        </sub-article>
    </sub-article>
</article>
