<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="other" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.27214.2</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Software Tool Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>iCAT: diagnostic assessment tool of immunological history using high-throughput T-cell receptor sequencing</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 2; peer review: 2 approved]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Rajeh</surname>
                        <given-names>Ahmad</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-8916-6845</uri>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Wolf</surname>
                        <given-names>Kyle</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Schiebout</surname>
                        <given-names>Courtney</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Sait</surname>
                        <given-names>Nabeel</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Kosfeld</surname>
                        <given-names>Tim</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>DiPaolo</surname>
                        <given-names>Richard J.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Funding Acquisition</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Ahn</surname>
                        <given-names>Tae-Hyuk</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-7281-9459</uri>
                    <xref ref-type="corresp" rid="c2">b</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>Program in Bioinformatics and Computational Biology, Saint Louis University, St. Louis, MO, 63103, USA</aff>
                <aff id="a2">
                    <label>2</label>Molecular Microbiology and Immunology, Saint Louis University School of Medicine, St. Louis, MO, 63104, USA</aff>
                <aff id="a3">
                    <label>3</label>Computer Science, Saint Louis University, St. Louis, MO, 63103, USA</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:richard.dipaolo@health.slu.edu">richard.dipaolo@health.slu.edu</email>
                </corresp>
                <corresp id="c2">
                    <label>b</label>
                    <email xlink:href="mailto:taehyuk.ahn@slu.edu">taehyuk.ahn@slu.edu</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>29</day>
                <month>6</month>
                <year>2021</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2021</year>
            </pub-date>
            <volume>10</volume>
            <elocation-id>65</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>10</day>
                    <month>6</month>
                    <year>2021</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2021 Rajeh A et al.</copyright-statement>
                <copyright-year>2021</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/10-65/pdf"/>
            <abstract>
                <p>The pathogen exposure history of an individual is recorded in their T-cell repertoire and can be accessed through the study of T-cell receptors (TCRs) if the tools to identify them were available.&#x00a0;For each T-cell, the TCR loci undergoes genetic rearrangement that creates a unique DNA sequence. In theory these unique sequences can be used as biomarkers for tracking T-cell responses and cataloging immunological history. We developed the immune Cell Analysis Tool (iCAT), an R software package that analyzes TCR sequencing data from exposed (positive) and unexposed (negative) samples to identify TCR sequences statistically associated with positive samples. The presence and absence of associated sequences in samples trains a classifier to diagnose pathogen-specific exposure.&#x00a0;We demonstrate the high accuracy of iCAT by testing on three TCR sequencing datasets.&#x00a0;First, iCAT successfully diagnosed smallpox vaccinated versus na&#x00ef;ve samples in an independent cohort of mice with 95% accuracy.&#x00a0;Second, iCAT displayed 100% accuracy classifying na&#x00ef;ve and monkeypox vaccinated mice. Finally, we demonstrate the use of iCAT on human samples before and after exposure to SARS-CoV-2, the virus behind the COVID-19 global pandemic.&#x00a0;We were able to correctly classify the exposed samples with perfect accuracy.&#x00a0;These experimental results show that iCAT capitalizes on the power of TCR sequencing to simplify infection diagnostics.&#x00a0;iCAT provides the option of a graphical, user-friendly interface on top of usual R interface allowing it to reach a wider audience.</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>T-cell receptor sequencing</kwd>
                <kwd>diagnostic classification</kwd>
                <kwd>R-package</kwd>
                <kwd>biomarkers</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1">
                    <funding-source>Federal Bureau of Investigations</funding-source>
                </award-group>
                <funding-statement>Funding by a research contract from the Federal Bureau of Investigations (FBI) to R.D. </funding-statement>
                <funding-statement>
                    <italic>The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</italic>
                </funding-statement>
            </funding-group>
        </article-meta>
        <notes>
            <sec sec-type="version-changes">
                <label>Revised</label>
                <title>Amendments from Version 1</title>
                <p>In the new version, revisions in various sections have been made following the reviewers' recommendations. 
                    <list list-type="bullet">
                        <list-item>
                            <p>In the Introduction section, we updated that the rarity of specific T cells to a specific antigen is also a challenge in addition to the diversity and magnitude of the TCR repertoire as the reviewer commented.</p>
                        </list-item>
                        <list-item>
                            <p>In the Methods section, we&#x00a0;considered an uncertainty metric in the Prediction tab based on the difference between the probability density in the pre-exposure distribution and the probability density in the post-exposure distribution.&#x00a0;This new feature has been implemented and now available on GitHub iCAT.</p>
                        </list-item>
                        <list-item>
                            <p>In the Use Cases, we included computing time for use cases.</p>
                        </list-item>
                        <list-item>
                            <p>We updated the iCAT program&#x00a0;to allow inputting either a space or a colon to delimit range values as the reviewer commented.</p>
                        </list-item>
                        <list-item>
                            <p>The manuscript has been updated to clarify some ambiguous words and sentences per the reviewers' comments.</p>
                        </list-item>
                        <list-item>
                            <p>Figure 1 and Figure 4 were updated.</p>
                        </list-item>
                        <list-item>
                            <p>Several minor typos were corrected.</p>
                        </list-item>
                        <list-item>
                            <p>We upgraded the iCAT program to avoid any possible installing errors for all possible operating systems including Linux, Mac, and Windows.</p>
                        </list-item>
                    </list>
                </p>
            </sec>
        </notes>
    </front>
    <body>
        <sec id="sec1" sec-type="intro">
            <title>Introduction</title>
            <p>T- and B-cell responses are responsible for long-lasting immune memory responses to infectious agents, such as bacteria and viruses. Expansion of pathogen specific T-cells provide us with a robust resource for understanding whether an individual has been infected with a pathogen. The T-cell receptor (TCR), located on the surface of T-cells, is responsible for recognizing pathogen-specific peptides, leading to immune response and development of protective immune memory.
                <sup>
                    <xref ref-type="bibr" rid="ref1">1</xref>
                </sup> During T-cell development, the loci that encode TCRs 
                <italic toggle="yes">&#x03b1;</italic> and 
                <italic toggle="yes">&#x03b2;</italic>-chains are rearranged by recombination of the variable (TCRV), diversity (TCRD), and joining (TCRJ) gene segments, encoding the complementary determining region 3 (CDR3).
                <sup>
                    <xref ref-type="bibr" rid="ref2">2</xref>
                </sup> These genetic rearrangement events result in a high degree of diversity in the CDR3 regions of individual TCR loci.
                <sup>
                    <xref ref-type="bibr" rid="ref3">3</xref>
                </sup>
            </p>
            <p>During an infection or vaccination, T-cells that carry receptors recognizing pathogen associated peptides become activated and undergo rapid clonal expansion. The clonally expanded T-cells carry the same unique TCR rearrangement and a portion remain in circulation long after the pathogen has been cleared to provide long-lived immunological memory. The persistence of memory T-cells in circulation make the genetically rearranged TCR loci a stable biomarker documenting an individual&#x2019;s immunological history. To utilize the diverse TCR repertoire as a potential biomarker for specific pathogen exposure, pathogen-specific TCR sequences common to different individuals exposed to the same pathogen need to be identified. This poses significant challenges given the diversity and magnitude of the TCR repertoire. On average, &#x223c; 10
                <sup>7</sup> unique TCR
                <italic toggle="yes">&#x03b2;</italic> chains can be identified from the &#x223c; 10
                <sup>12</sup> circulating T-cells present in a healthy human adult.
                <sup>
                    <xref ref-type="bibr" rid="ref4">4</xref>
                </sup> A healthy human adult can have 10
                <sup>18</sup> mathematically possible TCR recombinations resulting from the genetic rearrangement.
                <sup>
                    <xref ref-type="bibr" rid="ref4">4</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref5">5</xref>
                </sup> The potential diversity of the repertoire coupled with the limited number of T-cells present in individuals makes identifying identical TCR sequences among multiple individuals exceptionally challenging. In addition, a specific TCR response to particular antigen can be extremely rare, which can pose an even greater challenge to identifying signals of T cell memory.
                <sup>
                    <xref ref-type="bibr" rid="ref6">6</xref>
                </sup> However, by analyzing the large and diverse TCR repertoire using high-throughput TCR
                <italic toggle="yes">&#x03b2;</italic> sequencing, it is possible to identify pathogen specific TCRs shared among different individuals exposed to the same infectious agent.
                <sup>
                    <xref ref-type="bibr" rid="ref7">7</xref>
                </sup>
            </p>
            <p>Recently, high-throughput next-generation sequencing (NGS) techniques were employed to analyze the diverse immune cell repertoire.
                <sup>
                    <xref ref-type="bibr" rid="ref8 ref9 ref10">8&#x2010;10</xref>
                </sup> Additionally, recent publications described an analytical approach for computationally identifying common/public TCR sequences.
                <sup>
                    <xref ref-type="bibr" rid="ref5">5</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref10">10</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref11">11</xref>
                </sup> However, analyses of the TCR repertoire for diagnostic purposes have remained largely resource- and time-intensive efforts.</p>
            <p>Utilizing the diagnostic methodologies described in Wolf, et al. (2018),
                <sup>
                    <xref ref-type="bibr" rid="ref5">5</xref>
                </sup> we have developed an R package with a user-friendly interface, the immune Cell Analysis Tool (
                <monospace>iCAT</monospace>), to identify TCR sequences statistically associated with pathogen exposure and to distinguish infected from non-infected samples. By providing an interactive interface through 
                <monospace>iCAT</monospace>, sample exposure can be assessed and predicted conveniently without requiring command-line skills. 
                <monospace>iCAT</monospace> has an ability to classify target-associated receptor sequences (TARSs) and diagnose exposure with a high accuracy.</p>
        </sec>
        <sec id="sec2" sec-type="methods">
            <title>Methods</title>
            <sec id="sec3">
                <title>Implementation</title>
                <p>We developed 
                    <monospace>iCAT</monospace>, an R package utilizing high-throughput TCR sequencing data to analyze TCR sequences and to diagnose infection in a user-friendly format (
                    <xref ref-type="fig" rid="f1">Figure 1</xref>). 
                    <monospace>iCAT</monospace> provides both a graphical user interface (GUI) in the form of a web-application utilizing R-Shiny
                    <sup>
                        <xref ref-type="bibr" rid="ref12">12</xref>
                    </sup> and a command-line R interface for batch processing of large-scale data. The simplest method to install 
                    <monospace>iCAT</monospace> on a system is directly from GitHub using 
                    <monospace>devtools</monospace> and 
                    <monospace>install_github</monospace>: 
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">

                        <monospace>install.packages("devtools")</monospace>

                        <monospace>devtools::install_github("BioHPC/iCAT")</monospace>
                    </preformat>
                </p>
                <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                    <label>Figure 1. </label>
                    <caption>
                        <title>Workflow for TCR repertoire sequencing and diagnostic assessment of prior antigen exposure using iCAT.</title>
                        <p>A) Flow chart depicting the purification of DNA from blood samples and the production of TCR repertoires after TCR-specific amplification and sequencing. B) Visual representation of the iCAT methodology.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/57705/6353c9f6-f2ae-45e2-a660-6586574d19c0_figure1.gif"/>
                </fig>
                <p>In addition to 
                    <monospace>shiny</monospace>, 
                    <monospace>iCAT</monospace> also uses 
                    <monospace>shinyjs, data.table, ggplot2, DT, hash, and magrittr</monospace>. Required packages will be installed through the 
                    <monospace>install_github</monospace> step. Alternatively, users can clone or download the repository from GitHub and run 
                    <monospace>devtools::install("iCAT/")</monospace>.</p>
                <p>A user can upload multiple TCR sequence repertoires from negative (control) and positive (experimental) cohorts. 
                    <monospace>iCAT</monospace> accepts tab-delimited files with the size limit of 10 gigabytes per file with multiple options to define TCR clonotypes within samples. The 
                    <monospace>iCAT</monospace> shiny app has three tabs, separating major functionalities: Training, Library, and Prediction. Under the &#x201c;Training&#x201d; tab, clicking &#x2018;Train Model&#x2019; will start the pipeline to statistically identify a subset of TARSs that will act as feature selections for training the diagnostic classifier, diagnosing samples as either negative or positive. Upon training, 
                    <monospace>iCAT</monospace>&#x2019;s main tab provides a table summary of the data, a figure shows the distribution of TARSs between the positive and negative samples, and a classification matrix predicting the exposure status of samples used in the training data (
                    <xref ref-type="fig" rid="f2">Figure 2</xref>). All figures and tables can be downloaded to the user&#x2019;s machine. A progress bar will show on the bottom-right corner to update on the status of training.</p>
                <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                    <label>Figure 2. </label>
                    <caption>
                        <title>iCAT Training tab.</title>
                        <p>After samples are uploaded, clicking &#x201c;Training&#x201d; will start training to select features for the diagnostic classifier from the negative and positive samples.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/57705/6353c9f6-f2ae-45e2-a660-6586574d19c0_figure2.gif"/>
                </fig>
                <p>A separate tab, &#x201c;Library&#x201d;, is unlocked upon training and shows a table where each row describes a TARS and its presence in the positive and negative samples. All tables and figures are supplemented with a custom button for easy download (
                    <xref ref-type="fig" rid="f3">Figure 3</xref>).</p>
                <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                    <label>Figure 3. </label>
                    <caption>
                        <title>iCAT Library tab.</title>
                        <p>The library tab shows a table of target-associated receptor sequences (TARS).</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/57705/6353c9f6-f2ae-45e2-a660-6586574d19c0_figure3.gif"/>
                </fig>
                <p>The third tab of iCAT, &#x201c;Prediction&#x201d;, also unlocks after training and allows the user to upload one or more independent TCR-sequencing samples for classification (
                    <xref ref-type="fig" rid="f4">Figure 4</xref>).</p>
                <fig fig-type="figure" id="f4" orientation="portrait" position="float">
                    <label>Figure 4. </label>
                    <caption>
                        <title>iCAT Prediction tab.</title>
                        <p>The prediction tab allows the user to upload one or more independent TCR-sequencing samples for classification.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/57705/6353c9f6-f2ae-45e2-a660-6586574d19c0_figure4.gif"/>
                </fig>
            </sec>
            <sec id="sec4">
                <title>Operation</title>
                <p>Samples, such as from blood or lymph tissue, are collected and genetic material is purified. TCR sequences present in the sample are selectively amplified and then sequenced (
                    <xref ref-type="fig" rid="f1">Figure 1A</xref>). The first step of iCAT is the &#x201c;
                    <monospace>Training</monospace>&#x201d; step. A user should provide multiple negative training samples (na&#x00ef;ve, unexposed, uninfected, etc.) using the 
                    <monospace>Browse</monospace> button. Then, repeat the step for positive training samples (exposed, infected, vaccinated, etc.). The user should select the type of training feature. iCAT provides three options: (1) 
                    <monospace>CDR3 Amino Acid Sequence</monospace> (TCRs will need the same CDR3 region to be called &#x201d;Identical&#x201d;), (2) 
                    <monospace>TCRV-CDR3-TCRJ</monospace> (TCRs will need the same TCRBV segment, CDR3 region, and TCRJ segment to be called &#x201d;Identical&#x201d;), (3) 
                    <monospace>Nucleic Acid (DNA)</monospace> (TCRs will need the exact same DNA rearrangements/sequence across TCRBV, CDR3, and TCRJ). Selecting 
                    <monospace>TCRV-CDR3-TCRJ</monospace> is recommended as a balance between sensitivity and specificity and this option has been used for all the use cases in this paper. In addition, users can customize the range acceptable of copies per clonotype, and the minimum threshold of public sequences, which determines the minimum samples a TCR sequence must be observed in to be considered for analysis.</p>
                <p>One important option of this &#x201c;
                    <monospace>Training</monospace>&#x201d; tab is 
                    <monospace>Max p-value (default: 0.1)</monospace>, which determines the minimal degree of statistical significance that iCAT will accept as being potentially &#x201d;associated&#x201d; with the positive group. The statistical methodology of iCAT is based on identifying a subset of TARSs that informs classification.
                    <sup>
                        <xref ref-type="bibr" rid="ref5">5</xref>
                    </sup> TCR sequences significantly associated with positive samples as opposed to negative samples are identified by performing a one-tailed Fisher&#x2019;s exact test. iCAT determines the optimal p-value cutoff to generate the TARS library based on the idea of coverage ratio. To determine an optimal p-value threshold for identifying vaccine-associated TCR
                    <italic toggle="yes">&#x03b2;</italic> sequences, we applied a heuristic test that selected the optimal p-value threshold based on the &#x201d;coverage&#x201d; provided by the library for both vaccinated (
                    <italic toggle="yes">C</italic>
                    <sub>
                        <italic toggle="yes">v</italic>
                    </sub>) and na&#x00ef;ve samples (
                    <italic toggle="yes">C</italic>
                    <sub>
                        <italic toggle="yes">n</italic>
                    </sub>).
                    <sup>
                        <xref ref-type="bibr" rid="ref5">5</xref>
                    </sup> &#x201c;Coverage&#x201d; is defined as the summation of the number of samples containing each TARS divided by the number of samples. In the equations below, 
                    <italic toggle="yes">x</italic>
                    <sub>
                        <italic toggle="yes">i</italic>
                    </sub> denotes the number of vaccinated samples a single TCR
                    <italic toggle="yes">&#x03b2;</italic> is identified and 
                    <italic toggle="yes">n</italic>
                    <sub>
                        <italic toggle="yes">v</italic>
                    </sub> represents the number of positive samples in the training data. 
                    <italic toggle="yes">y</italic>
                    <sub>
                        <italic toggle="yes">i</italic>
                    </sub> also denotes for na&#x00ef;ve samples and 
                    <italic toggle="yes">n</italic>
                    <sub>
                        <italic toggle="yes">n</italic>
                    </sub> represents the number of na&#x00ef;ve samples.
                    <disp-formula id="e1">
                        <mml:math display="block">
                            <mml:msub>
                                <mml:mrow>
                                    <mml:mi>C</mml:mi>
                                </mml:mrow>
                                <mml:mrow>
                                    <mml:mi>v</mml:mi>
                                </mml:mrow>
                            </mml:msub>
                            <mml:mo>=</mml:mo>
                            <mml:mfrac>
                                <mml:mrow>
                                    <mml:msub>
                                        <mml:mrow>
                                            <mml:mo>&#x2211;</mml:mo>
                                        </mml:mrow>
                                        <mml:mrow>
                                            <mml:mi>i</mml:mi>
                                            <mml:mo>=</mml:mo>
                                            <mml:mn>1</mml:mn>
                                        </mml:mrow>
                                    </mml:msub>
                                    <mml:msub>
                                        <mml:mrow>
                                            <mml:mi>x</mml:mi>
                                        </mml:mrow>
                                        <mml:mrow>
                                            <mml:mi>i</mml:mi>
                                        </mml:mrow>
                                    </mml:msub>
                                </mml:mrow>
                                <mml:mrow>
                                    <mml:msub>
                                        <mml:mrow>
                                            <mml:mi>n</mml:mi>
                                        </mml:mrow>
                                        <mml:mrow>
                                            <mml:mi>v</mml:mi>
                                        </mml:mrow>
                                    </mml:msub>
                                </mml:mrow>
                            </mml:mfrac>
                            <mml:mo>,</mml:mo>
                            <mml:msub>
                                <mml:mrow>
                                    <mml:mi>C</mml:mi>
                                </mml:mrow>
                                <mml:mrow>
                                    <mml:mi>n</mml:mi>
                                </mml:mrow>
                            </mml:msub>
                            <mml:mo>=</mml:mo>
                            <mml:mfrac>
                                <mml:mrow>
                                    <mml:msub>
                                        <mml:mrow>
                                            <mml:mo>&#x2211;</mml:mo>
                                        </mml:mrow>
                                        <mml:mrow>
                                            <mml:mi>i</mml:mi>
                                            <mml:mo>=</mml:mo>
                                            <mml:mn>1</mml:mn>
                                        </mml:mrow>
                                    </mml:msub>
                                    <mml:msub>
                                        <mml:mrow>
                                            <mml:mi>y</mml:mi>
                                        </mml:mrow>
                                        <mml:mrow>
                                            <mml:mi>i</mml:mi>
                                        </mml:mrow>
                                    </mml:msub>
                                </mml:mrow>
                                <mml:mrow>
                                    <mml:msub>
                                        <mml:mrow>
                                            <mml:mi>n</mml:mi>
                                        </mml:mrow>
                                        <mml:mrow>
                                            <mml:mi>n</mml:mi>
                                        </mml:mrow>
                                    </mml:msub>
                                </mml:mrow>
                            </mml:mfrac>
                            <mml:mo>.</mml:mo>
                        </mml:math>
                        <label>(1)</label>
                    </disp-formula>The ratio of 
                    <italic toggle="yes">C</italic>
                    <sub>
                        <italic toggle="yes">v</italic>
                    </sub> to 
                    <italic toggle="yes">C</italic>
                    <sub>
                        <italic toggle="yes">n</italic>
                    </sub> is determined for each p-value. The p-value with the largest 
                    <italic toggle="yes">C</italic>
                    <sub>
                        <italic toggle="yes">v</italic>
                    </sub>:
                    <italic toggle="yes">C</italic>
                    <sub>
                        <italic toggle="yes">n</italic>
                    </sub> ratio and offers significant coverage to distinguish vaccinated (
                    <italic toggle="yes">v</italic>) from na&#x00ef;ve (
                    <italic toggle="yes">n</italic>) samples was chosen.</p>
                <p>For the classification of vaccinated and na&#x00ef;ve samples, iCAT calculates the percentage of TARS (% TARS) present in a sample. The % TARS for each sample in the training data is compared against the % TARS normal distribution for each group to predict if each sample is &#x201d;positive&#x201d; or &#x201d;negative&#x201d;, determined by which group a sample is more closely associated with.
                    <sup>
                        <xref ref-type="bibr" rid="ref5">5</xref>
                    </sup> Such a normal distribution has been adopted to calculate the distance of a sample to the mean. In detail, the normal distributions for the na&#x00ef;ve and vaccinated populations in our training data were calculated based on a function of the difference between a single sample value (
                    <italic toggle="yes">x</italic>) and the mean of a set of data (
                    <italic toggle="yes">&#x03bc;</italic>) over the standard deviation of that set of data (
                    <italic toggle="yes">&#x03c3;</italic>) from the below equation. 
                    <disp-formula id="e2">
                        <mml:math display="block">
                            <mml:mi>f</mml:mi>
                            <mml:mo stretchy="false">(</mml:mo>
                            <mml:mi>x</mml:mi>
                            <mml:mo stretchy="false">|</mml:mo>
                            <mml:mi>&#x03bc;</mml:mi>
                            <mml:mo>,</mml:mo>
                            <mml:msup>
                                <mml:mrow>
                                    <mml:mi>&#x03c3;</mml:mi>
                                </mml:mrow>
                                <mml:mrow>
                                    <mml:mn>2</mml:mn>
                                </mml:mrow>
                            </mml:msup>
                            <mml:mo stretchy="false">)</mml:mo>
                            <mml:mo>=</mml:mo>
                            <mml:mo stretchy="false">(</mml:mo>
                            <mml:mfrac>
                                <mml:mrow>
                                    <mml:mn>1</mml:mn>
                                </mml:mrow>
                                <mml:mrow>
                                    <mml:msqrt>
                                        <mml:mrow>
                                            <mml:mn>2</mml:mn>
                                            <mml:mi>&#x03c0;</mml:mi>
                                            <mml:msup>
                                                <mml:mrow>
                                                    <mml:mi>&#x03c3;</mml:mi>
                                                </mml:mrow>
                                                <mml:mrow>
                                                    <mml:mn>2</mml:mn>
                                                </mml:mrow>
                                            </mml:msup>
                                        </mml:mrow>
                                    </mml:msqrt>
                                </mml:mrow>
                            </mml:mfrac>
                            <mml:mo stretchy="false">)</mml:mo>
                            <mml:msup>
                                <mml:mrow>
                                    <mml:mo>exp</mml:mo>
                                </mml:mrow>
                                <mml:mrow>
                                    <mml:mo>&#x2212;</mml:mo>
                                    <mml:mfrac>
                                        <mml:mrow>
                                            <mml:msup>
                                                <mml:mrow>
                                                    <mml:mo stretchy="false">(</mml:mo>
                                                    <mml:mi>x</mml:mi>
                                                    <mml:mo>&#x2212;</mml:mo>
                                                    <mml:mi>&#x03bc;</mml:mi>
                                                    <mml:mo stretchy="false">)</mml:mo>
                                                </mml:mrow>
                                                <mml:mrow>
                                                    <mml:mn>2</mml:mn>
                                                </mml:mrow>
                                            </mml:msup>
                                        </mml:mrow>
                                        <mml:mrow>
                                            <mml:mn>2</mml:mn>
                                            <mml:msup>
                                                <mml:mrow>
                                                    <mml:mi>&#x03c3;</mml:mi>
                                                </mml:mrow>
                                                <mml:mrow>
                                                    <mml:mn>2</mml:mn>
                                                </mml:mrow>
                                            </mml:msup>
                                        </mml:mrow>
                                    </mml:mfrac>
                                </mml:mrow>
                            </mml:msup>
                            <mml:mo>.</mml:mo>
                        </mml:math>
                        <label>(2)</label>
                    </disp-formula>If the value is bigger, we can conclude that the sample is more associated with the training group. By comparing a sample against the normal distribution of vaccinated and na&#x00ef;ve training groups, we can determine which group a sample is more statistically associated with. A progress bar will show on the bottom-right corner to update on the status of training. After finishing, the &#x201c;
                    <monospace>Training</monospace>&#x201d; tab will show some exploratory tables and a figure regarding the training data and the model built, which can all be downloaded to the user&#x2019;s machine easily. In addition, the &#x201c;
                    <monospace>Library</monospace>&#x201d; and &#x201c;
                    <monospace>Prediction</monospace>&#x201d; tabs will unlock.</p>
                <p>The &#x201c;
                    <monospace>Library</monospace>&#x201d; tab displays a table consisting of the TARS, determined to be statistically associated with exposure to the target/agent/pathogen (
                    <xref ref-type="fig" rid="f3">Figure 3</xref>). The table displays each sequence, number of positive and negative training samples the sequence is present in/absent from, and how statistically associated the sequence is to the positive training data (p-value). The table can be downloaded to the user&#x2019;s computer for further analysis (
                    <xref ref-type="fig" rid="f3">Figure 3</xref>).</p>
                <p>To allow the diagnostic classifier to test independent cohorts of samples, a third tab in iCAT, &#x201c;
                    <monospace>Prediction</monospace>&#x201d; tab, also unlocks after training and allows the user to upload one or more independent TCR-sequencing samples for classification using the parameters generated by the training data (
                    <xref ref-type="fig" rid="f4">Figure 4</xref>). The &#x201c;
                    <monospace>Prediction</monospace>&#x201d; tab allows the user to diagnose unknown samples (e.g. not included in the previous training data) for classification as &#x201c;Positive&#x201d; or &#x201c;Negative&#x201d; and determining the accuracy of the diagnostic assay. Use the 
                    <monospace>Browse</monospace> button to upload such independent samples for prediction. Multiple samples can be uploaded simultaneously. Click 
                    <monospace>Predict Independent Sample</monospace> will analyze the dataset. A table will appear after analysis is complete. The table displays sample names along with the prediction &#x201c;Positive&#x201d; (red) or &#x201c;Negative&#x201d; (blue), and displays the %TARS that is the percent of individual sequences from the sample that are included in the TARS library. The prediction results can be downloaded as a table.</p>
                <p>iCAT requires R 3.4.0 or upper and can be run on any operating system with common specifications (1 GB disk space, 4 GB memory, and multicore CPU is recommended).</p>
            </sec>
        </sec>
        <sec id="sec5">
            <title>Use cases</title>
            <p>To evaluate the efficacy of iCAT, we used three TCR sequencing data sets that are publicly available. The first viral data set consists of 148 training and 20 independent mouse samples. The training set has 32 mouse pre-treatment na&#x00ef;ve group and 116 vaccinated samples, each cohort inoculated intranasally with the ACAM2000 smallpox vaccine. The second viral data set consists of 133 (27 negative and 96 positive) training and 15 (5 negative and 20 positive) monkeypox virus. The two mouse datasets are publicly available from 
                <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.17632/cf92gt44zf.1">https://doi.org/10.17632/cf92gt44zf.1</ext-link>. The third data set is human TCR samples exposed to the novel SARS-CoV- 2 virus which is the cause of the ongoing COVID-19 global pandemic.
                <sup>
                    <xref ref-type="bibr" rid="ref12">12</xref>
                </sup> The sample size is small (two cohorts), but those two cohorts&#x2019; TCR repertoires were obtained for other projects one and two years prior to infection. Therefore, this negative and positive SARS-CoV-2 human TCR sequencing data set from same cohorts can be a great example to show the great potential of iCAT as a diagnostic assay. This data is available at 
                <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.3835956">https://doi.org/10.5281/zenodo.3835956</ext-link>.</p>
            <sec id="sec6">
                <title>Use case 1: smallpox mouse data</title>
                <p>32 pre-exposure (na&#x00ef;ve) samples were analyzed, which included 2,049,383 unique TCR sequences (clonotypes). We setup iCAT options to analyze by either the CDR3 amino acid sequence or the V-gene and J-gene names. 714,522 amino acid sequence-gene name combinations were found in the na&#x00ef;ve samples. 58 samples taken 2- and 8-weeks post-vaccination for smallpox were analyzed, which included 1,581,619 clonotypes and 573,612 unique amino acid sequence-gene name combinations. After training, iCAT accurately generated the same virus-associated TCR library (314 TCR sequences) identified in Wolf, et al., 2018.
                    <sup>
                        <xref ref-type="bibr" rid="ref5">5</xref>
                    </sup> When applied to the training data as a baseline check, iCAT correctly classified 32 of 32 na&#x00ef;ve samples as &#x201d;unexposed&#x201d; and 58 of 58 vaccinated samples as &#x201d;exposed&#x201d; (100% accuracy). We utilized TCR-sequencing files from 10 mice pre- and post-smallpox vaccination that were not involved in the training of the diagnostic classifier to act as independent cohorts to test the diagnostic accuracy of the iCAT generated classifier. The classification results are displayed in the &#x201c;Prediction&#x201d; tab and can be downloaded as a .txt file. From a total of 20 samples, 90% of pre-vaccination samples (9 of 10) were correctly classified as &#x201c;negative&#x201d; and 100% of samples post-smallpox vaccination (10 of 10) were classified as &#x201d;positive&#x201d;. Training time was 2.36 minutes and classification time was 30.6 seconds. Overall, this data displays that the iCAT platform computationally identifies target-associated public TCRs, utilized to train a diagnostic classifier capable of distinguishing between exposed and unexposed samples with a high degree of accuracy.</p>
            </sec>
            <sec id="sec7">
                <title>Use case 2: monkeypox mouse data</title>
                <p>We tested iCAT using another TCR-sequencing mouse dataset which included 27 na&#x00ef;ve samples and 48 samples 2- and 8-weeks post infection with monkeypox. We chose to analyze based on CDR3 amino acid sequence in addition to V-gene and J-gene names. The p-value cutoff was set to 0.1 and the minimum number of public sequences was set to 1. Those parameters produced the best separation experimentally. Na&#x00ef;ve samples included 1,772,085 clonotypes and 630,381 unique amino acid sequence-gene name combinations. Exposed samples included 1,070,615 clonotypes and 382,906 unique amino acid sequence-gene name combinations. iCAT correctly classified this training data with 100% accuracy. When tested on an independent monkey pox data set &#x2013; set up by excluding 5 samples from the na&#x00ef;ve group and 10 samples from the exposed group pre-training &#x2013; iCAT correctly classified the 5 na&#x00ef;ve samples as negative and the 10 exposed samples as positive. Thus, we demonstrated a 100% classification accuracy using iCAT on this monkeypox dataset. Training time was&#x00a0;47.81 seconds and classification time was 6.83 seconds.</p>
            </sec>
            <sec id="sec8">
                <title>Use case 3: human SARS-CoV-2 data</title>
                <p>We further tested iCAT on TCR-sequencing data from two human individuals exposed to the novel SARS-CoV-2 virus.
                    <sup>
                        <xref ref-type="bibr" rid="ref13">13</xref>
                    </sup> Data included 4 na&#x00ef;ve samples from 2018 and 2019, and 4 samples collected 15- and 30-days post-infection. We chose the iCAT option to analyze by CDR3 amino acid sequences only. The p-value cutoff was set to 0.1 and the minimum number of public sequences was set to 1. The na&#x00ef;ve data included 2,935,893 clonotypes and 1,120,606 unique CDR3 amino acid sequences. The exposed data included 1,987,608 clonotypes and 541,111 unique amino acid sequences. iCAT achieved a perfect classification accuracy on the training data, correctly assigning the 4 na&#x00ef;ve and 4 exposed samples. Further, iCAT correctly classified 4 independent exposed samples as positive. Training time was 1.50 minutes and classification time was 33.12 seconds. This demonstrates the wide utility of iCAT and the methodology it implements.</p>
            </sec>
        </sec>
        <sec id="sec9" sec-type="conclusions">
            <title>Conclusions</title>
            <p>In this article, we have presented iCAT, a powerful software tool for determining pathogen exposure through TCR sequencing data. It has significant clinical applications in disease diagnosis, surveillance, as well as for determining potential vaccine efficacy. Once data interpretation is fully automated, the TCR sequencing analysis and other types of NGS will likely become a standard tool for diagnosis and management of disease. Our current datasets are from pre- and post- exposure to viruses, and serve as a proof of principle that TCR sequencing analysis can be utilized to identify individuals exposed to infectious agents or vaccines with great accuracy, speed, and accessibility. We demonstrated the use of iCAT for accurately detecting exposure to the SARS-CoV-2, the virus behind COVID-19. Although this use case was based on a few number of samples, it shows the immense potential of our software the utilization of TCRs as a biomarker. This type of analysis may be used to distinguish between two different but highly related infections, such as Zika virus and Dengue, which is one of the global concerns considering Zika virus&#x2019;s association with fetal complications in infected pregnant women, and current laboratory testing cannot distinguish between the two. Parallel endeavors in our group show promising results in identifying virus-associated TCR sequences uniquely associated with a prior Zika versus Dengue virus infection in mice using iCAT. Further, the iCAT platform may prove useful for diagnosing individuals in the early stages of autoimmunity, by identifying auto-reactive TCRs before symptoms and significant tissue damage occurs. Earlier diagnosis may allow for preventative measures, better treatment, and better outcomes. Broadly, our approach can be used to diagnose autoimmune disease and possibly immune responses to cancer before or after immunotherapy.</p>
        </sec>
        <sec id="sec10">
            <title>Data availability</title>
            <p>Mendeley Data: Identifying and Tracking Low Frequency Virus-Specific TCR Clonotypes Using High-Throughput Sequencing, 
                <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.17632/cf92gt44zf.1">https://doi.org/10.17632/cf92gt44zf.1</ext-link>.
                <sup>
                    <xref ref-type="bibr" rid="ref14">14</xref>
                </sup>
            </p>
            <p>This project contains the raw sequencing data from HLA-A2 transgenic mice before and after infection with either the ACAM2000 smallpox virus or highly releated monkeypox virus.</p>
            <p>Zenodo: Longitudinal high-throughput TCR repertoire profiling reveals the dynamics of T cell memory formation after mild COVID-19 infection, 
                <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.3835956">https://doi.org/10.5281/zenodo.3835956</ext-link>.
                <sup>
                    <xref ref-type="bibr" rid="ref15">15</xref>
                </sup>
            </p>
            <p>This project contains the third human TCR samples exposed to the novel SARS-CoV-2 virus is available from zenodo in mixcr format.</p>
            <p>Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).</p>
        </sec>
        <sec id="sec11">
            <title>Software availability</title>
            <p>Source code available from: 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/BioHPC/iCAT">https://github.com/BioHPC/iCAT</ext-link>.</p>
            <p>Archived sourced code as at time of publication: 
                <ext-link ext-link-type="uri" xlink:href="http://doi.org/10.5281/zenodo.4436485">http://doi.org/10.5281/zenodo.4436485</ext-link>.
                <sup>
                    <xref ref-type="bibr" rid="ref16">16</xref>
                </sup>
            </p>
            <p>License: MIT</p>
        </sec>
    </body>
    <back>
        <ref-list>
            <title>References</title>
            <ref id="ref1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Rosati</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Dowds</surname>
                            <given-names>CM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Liaskou</surname>
                            <given-names>E</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Overview of methodologies for t-cell receptor repertoire analysis.</article-title>
                    <source>

                        <italic toggle="yes">BMC Biotechnol</italic>
</source>
                    <year>2017</year>;<volume>17</volume>(<issue>1</issue>):<fpage>61</fpage>.
                    <issn>ISSN 1472-6750</issn>.
                    <pub-id pub-id-type="pmid">28693542</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s12896-017-0379-9</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5504616</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Venturi</surname>
                            <given-names>V</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kedzierska</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Turner</surname>
                            <given-names>SJ</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Methods for comparing the diversity of samples of the t cell receptor repertoire.</article-title>
                    <source>

                        <italic toggle="yes">J Immunol Methods</italic>
</source>
                    <year>2007</year>;<volume>321</volume>.
                    <pub-id pub-id-type="pmid">17337271</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.jim.2007.01.019</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Cabaniols</surname>
                            <given-names>JP</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Fazilleau</surname>
                            <given-names>N</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Casrouge</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Most alpha/beta t cell receptor diversity is due to terminal deoxynucleotidyl transferase.</article-title>
                    <source>

                        <italic toggle="yes">J Exp Med.</italic>
</source>
                    <year>2001</year>;<volume>194</volume>(<issue>9</issue>):<fpage>1385</fpage>&#x2013;<lpage>1390</lpage>.
                    <issn>ISSN 0022-1007 1540-9538</issn>.
                    <pub-id pub-id-type="pmid">11696602</pub-id>
                    <pub-id pub-id-type="doi">10.1084/jem.194.9.1385</pub-id>
                    <pub-id pub-id-type="pmcid">PMC2195970</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Robins</surname>
                            <given-names>HS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Campregher</surname>
                            <given-names>PV</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Srivastava</surname>
                            <given-names>SK</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Comprehensive assessment of t-cell receptor beta-chain diversity in alphabeta t cells.</article-title>
                    <source>

                        <italic toggle="yes">Blood</italic>
</source>
                    <year>2009</year>;<volume>114</volume>(<issue>19</issue>):<fpage>4099</fpage>&#x2013;<lpage>107</lpage>.
                    <issn>ISSN 1528-0020 (Electronic) 0006-4971 (Linking)</issn>.
                    <pub-id pub-id-type="pmid">19706884</pub-id>
                    <pub-id pub-id-type="doi">10.1182/blood-2009-04-217604</pub-id>
                    <pub-id pub-id-type="pmcid">19706884</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wolf</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hether</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gilchuk</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Identifying and tracking low-frequency virus-specific tcr clonotypes using high-throughput sequencing.</article-title>
                    <source>

                        <italic toggle="yes">Cell Rep</italic>
</source>
                    <year>2018</year>;<volume>25</volume>(<issue>9</issue>):<fpage>2369</fpage>&#x2013;<lpage>2378</lpage>,
                    <elocation-id>e4</elocation-id>,
                    <issn>ISSN 2211-1247</issn>.
                    <ext-link ext-link-type="uri" xlink:href="http://www.sciencedirect.com/science/article/pii/S2211124718317558">Reference Source</ext-link>
                    <pub-id pub-id-type="pmid">30485806</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.celrep.2018.11.009</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7770954</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Pogorelyy</surname>
                            <given-names>MV</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Fedorova</surname>
                            <given-names>AD</given-names>
                        </name>

                        <name name-style="western">
                            <surname>McLaren</surname>
                            <given-names>JE</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Exploring the pre-immune landscape of antigen-specific T cells.</article-title>
                    <source>

                        <italic toggle="yes">Genome Med</italic>
</source>
                    <year>2018</year>;<volume>10</volume>:<fpage>68</fpage>.
                    <pub-id pub-id-type="doi">10.1186/s13073-018-0577-7</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Emerson</surname>
                            <given-names>RO</given-names>
                        </name>

                        <name name-style="western">
                            <surname>DeWitt</surname>
                            <given-names>WS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Vignali</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Immunosequencing identifies signatures of cytomegalovirus exposure history and hla-mediated effects on the t cell repertoire.</article-title>
                    <source>

                        <italic toggle="yes">Cell Rep</italic>
</source>
                    <year>2017</year>;<volume>49</volume>(<issue>5</issue>):<fpage>659</fpage>&#x2013;<lpage>665</lpage>.
                    <issn>ISSN 1546-1718</issn>.
                    <pub-id pub-id-type="pmid">28369038</pub-id>
                    <pub-id pub-id-type="doi">10.1038/ng.3822</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>DeWitt</surname>
                            <given-names>WS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Emerson</surname>
                            <given-names>RO</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lindau</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Dynamics of the cytotoxic t cell response to a model of acute viral infection.</article-title>
                    <source>

                        <italic toggle="yes">J Virol</italic>
</source>
                    <year>2015</year>;<volume>89</volume>(<issue>8</issue>):<fpage>4517</fpage>&#x2013;<lpage>4526</lpage>.
                    <ext-link ext-link-type="uri" xlink:href="https://jvi.asm.org/content/jvi/89/8/4517.full.pdf">Reference Source</ext-link>
                    <pub-id pub-id-type="pmid">25653453</pub-id>
                    <pub-id pub-id-type="doi">10.1128/JVI.03474-14</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4442358</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kirsch</surname>
                            <given-names>I</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Vignali</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Robins</surname>
                            <given-names>H</given-names>
                        </name>
</person-group>:
                    <article-title>T-cell receptor profiling in cancer.</article-title>
                    <source>

                        <italic toggle="yes">Mol Oncol</italic>
</source>
                    <year>2015</year>;<volume>9</volume>(<issue>10</issue>):<fpage>2063</fpage>&#x2013;<lpage>2070</lpage>.
                    <issn>ISSN 1574-7891</issn>.
                    <ext-link ext-link-type="uri" xlink:href="https://febs.onlinelibrary.wiley.com/doi/abs/10.1016/j.molonc.2015.09.003">Reference Source</ext-link>
                    <pub-id pub-id-type="pmid">26404496</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.molonc.2015.09.003</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5528728</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Gerritsen</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pandit</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Andeweg</surname>
                            <given-names>AC</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Rtcr: a pipeline for complete and accurate recovery of t cell repertoires from high throughput sequencing data.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics</italic>
</source>
                    <year>2016</year>;<volume>32</volume>(<issue>20</issue>):<fpage>3098</fpage>&#x2013;<lpage>3106</lpage>,
                    <issn>ISSN 1367-4811 (Electronic) 1367-4803 (Linking)</issn>.
                    <pub-id pub-id-type="pmid">27324198</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btw339</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5048062</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Nazarov</surname>
                            <given-names>VI</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pogorelyy</surname>
                            <given-names>MV</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Komech</surname>
                            <given-names>EA</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>tcr: an r package for t cell receptor repertoire advanced data analysis.</article-title>
                    <source>

                        <italic toggle="yes">BMC Bioinformatics</italic>
</source>
                    <year>2015</year>;<volume>16</volume>:<fpage>175</fpage>,
                    <issn>ISSN 1471-2105 (Electronic) 1471-2105 (Linking)</issn>.
                    <pub-id pub-id-type="pmid">26017500</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s12859-015-0613-1</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4445501</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref12">
                <label>12</label>
                <mixed-citation publication-type="web">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Chang</surname>
                            <given-names>W</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Cheng</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Allaire</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>
                        <italic toggle="yes">shiny: Web Application Framework for R.</italic>
                    </article-title>
                    <year>2018</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://cran.r-project.org/web/packages/shiny/shiny.pdf">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref13">
                <label>13</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Minervina</surname>
                            <given-names>AA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Komech</surname>
                            <given-names>EA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Titov</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Longitudinal high-throughput tcr repertoire profiling reveals the dynamics of t cell memory formation after mild covid-19 infection.</article-title>
                    <source>

                        <italic toggle="yes">bioRxiv</italic>
</source>
                    <year>2020</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://www.biorxiv.org/content/early/2020/05/18/2020.05.18.100545">Reference Source</ext-link>
                    <pub-id pub-id-type="pmid">33399535</pub-id>
                    <pub-id pub-id-type="doi">10.7554/eLife.63502</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7806265</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref14">
                <label>14</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wolf</surname>
                            <given-names>K</given-names>
                        </name>
</person-group>:
                    <article-title>Identifying and Tracking Low Frequency Virus-Specific TCR Clonotypes Using High-Throughput Sequencing.</article-title>
                    <source>

                        <italic toggle="yes">Mendeley Data</italic>
</source>
                    <year>2018</year>;<volume>V1</volume>.
                    <pub-id pub-id-type="pmid">30485806</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.celrep.2018.11.009</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7770954</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref15">
                <label>15</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Minervina</surname>
                            <given-names>AA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Komech</surname>
                            <given-names>EA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Titov</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Longitudinal high-throughput TCR repertoire profiling reveals the dynamics of T cell memory formation after mild COVID-19 infection (Version 1.0) [Data set].</article-title>
                    <publisher-name>Zenodo</publisher-name>;<year>2020</year>
                    <ext-link ext-link-type="uri" xlink:href="http://doi.org/10.5281/zenodo.3835956">Reference Source</ext-link>
                    <pub-id pub-id-type="pmid">33399535</pub-id>
                    <pub-id pub-id-type="doi">10.7554/eLife.63502</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7806265</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref16">
                <label>16</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Rajeh</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ahn</surname>
                            <given-names>TH</given-names>
                        </name>
</person-group>: (2021, January 13). BioHPC/iCAT: First release of iCAT (Version v1.0.0). Zenodo.
                    <pub-id pub-id-type="doi">10.5281/zenodo.4436485</pub-id>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report88494">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.57705.r88494</article-id>
            <title-group>
                <article-title>Reviewer response for version 2</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Jun</surname>
                        <given-names>Se-Ran</given-names>
                    </name>
                    <xref ref-type="aff" rid="r88494a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-2681-3950</uri>
                </contrib>
                <aff id="r88494a1">
                    <label>1</label>Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>12</day>
                <month>7</month>
                <year>2021</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2021 Jun SR</copyright-statement>
                <copyright-year>2021</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport88494" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.27214.2"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>Thank you so much for the improved manuscript. I have minor comments: 
                <list list-type="bullet">
                    <list-item>
                        <p>Please add a sentence(s) on how you calculated uncertainty.</p>
                    </list-item>
                    <list-item>
                        <p>Please check the following sentence: "The p-value with the largest 
                            <italic>C
                                <sub>v</sub>
                            </italic>:
                            <italic>C
                                <sub>n</sub>
                            </italic> ratio and offers significant coverage to distinguish vaccinated (
                            <italic>v</italic>) from na&#x00ef;ve (
                            <italic>n</italic>) samples was chosen."</p>
                    </list-item>
                    <list-item>
                        <p>Is the following sentence correct: "If the value is bigger, we can conclude that the sample is more associated with the training group."?</p>
                    </list-item>
                </list>
            </p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>Yes</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Partly</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Partly</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Partly</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>Genomic Epidemiology, Microbiome Epidemiology, Multiomics</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report88495">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.57705.r88495</article-id>
            <title-group>
                <article-title>Reviewer response for version 2</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Qiao</surname>
                        <given-names>Shuo-Wang</given-names>
                    </name>
                    <xref ref-type="aff" rid="r88495a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-0935-8397</uri>
                </contrib>
                <contrib contrib-type="author">
                    <name>
                        <surname>Yao</surname>
                        <given-names>Ying</given-names>
                    </name>
                    <xref ref-type="aff" rid="r88495a2">2</xref>
                    <role>Co-referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-8679-1557</uri>
                </contrib>
                <aff id="r88495a1">
                    <label>1</label>Department of Immunology, Oslo University Hospital, Rikshospitalet, University of Oslo, Oslo, Norway</aff>
                <aff id="r88495a2">
                    <label>2</label>Department of Immunology, University of Oslo, Oslo, Norway</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>6</day>
                <month>7</month>
                <year>2021</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2021 Qiao SW and Yao Y</copyright-statement>
                <copyright-year>2021</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport88495" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.27214.2"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>No further comments.</p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>Yes</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Yes</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Partly</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>No</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Immunology, T cells, TCR, single-cell receptor sequencing,</p>
            <p>We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report82161">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.30069.r82161</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Jun</surname>
                        <given-names>Se-Ran</given-names>
                    </name>
                    <xref ref-type="aff" rid="r82161a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-2681-3950</uri>
                </contrib>
                <aff id="r82161a1">
                    <label>1</label>Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>13</day>
                <month>4</month>
                <year>2021</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2021 Jun SR</copyright-statement>
                <copyright-year>2021</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport82161" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.27214.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>This article is a valuable resource for T-cell receptor (TCR) sequencing data-based infection diagnostics. The authors provided a R software package named iCAT which provides a step-by-step analysis of TCR sequencing data to predict if individual subjects are infected or not. First, iCAT takes positive and negative pathogen-specific TCR datasets to estimate parameters involved in in model. Second, iCAT provides the Library tab which summarizes the target associated receptors sequences with p-values. Third, iCAT provides prediction tab which takes test TCR datasets and make diagnostic decision. I enjoyed reading this article and github tutorial.</p>
            <p> </p>
            <p> 
                <bold>Major comments:</bold>
            </p>
            <p> It seems like that the performance of this tool does depend on the number of samples.&#x00a0;&#x00a0;Please provide discussion or guideline on the number of negative and positive samples relating to the performance.</p>
            <p> </p>
            <p> It seems like that this is the first tool available for TCR data-based diagnostic tool. Please clarify if this is the first tool which uses TCR sequencing data for the purpose of diagnostic. If this is not the first tool, then comparison results with other tools should be included.</p>
            <p> </p>
            <p> 
                <bold>Minor comments:</bold>
            </p>
            <p> First, I have a question for clarification purpose. Before TCR sequencing data is uploaded, does TCR sequencing data need any preprocessing step to make input compatible with iCAT?</p>
            <p> </p>
            <p> Install.packages(&#x201c;devtools&#x201d;) does not work with R4.0.3 on Mac OS X. The error I got is as follows:</p>
            <p> </p>
            <p> Downloading GitHub repo BioHPC/iCAT@HEAD</p>
            <p> Error in utils::download.file(url, path, method = method, quiet = quiet,&#x00a0; :</p>
            <p> download from 'https://api.github.com/repos/BioHPC/iCAT/tarball/HEAD' failed</p>
            <p> </p>
            <p> Each input file is limited to 10Gigabyte in size. Where does this limitation come from?</p>
            <p> </p>
            <p> I downloaded one of test cases which contains many files. It was not easy to figure out which files to be uploaded. It would be nice that this R package provides example datasets along with the package so that they can be played in R directly.</p>
            <p> </p>
            <p> There is an option named &#x2018;Other&#x2019; for &#x2018;analyze clonotypes by (column names in parantheses)&#x2019;. What is other?</p>
            <p> </p>
            <p> Please include information of computational time for training and prediction steps for each case.</p>
            <p> </p>
            <p> Although feature from an option of &#x2018;TCRV-CDR3-TCRJ&#x2019; is recommended, please include the performance for other features for each test case which could help readers understand and compare features.</p>
            <p> </p>
            <p> What does TCRBV stand for?</p>
            <p> </p>
            <p> What does VATS stand for?</p>
            <p> </p>
            <p> A correction constant is not shown in equation (2).</p>
            <p> </p>
            <p> Whole genome sequences can distinguish DENGU from ZIKE with 100% accuracy. For the sentence &#x2018;current laboratory testing cannot distinguish between the two&#x2019;, are you excluding whole genome sequencing data?</p>
            <p> </p>
            <p> Is it possible to make change range of acceptable into something like 1:99 or 1~99 instead of 1 99 to indicate range?</p>
            <p> </p>
            <p> Does the max p-value change automatically according to the training data provided? Or should it be changed manually based on the selected one by Fisher&#x2019;s t-test?</p>
            <p> </p>
            <p> Please revise typo in Figure 1(B)</p>
            <p> </p>
            <p> Please revise typo with &#x2018;the minimum samples&#x2019;</p>
            <p> </p>
            <p> Please revise the following sentences:</p>
            <p> - The ratio of Cv to Cn is determined for each p-value. The p-value with the largest Cv:Cn ratio and offers significant coverage to distinguish (v)accinated from (n)a&#x00ef;ve samples was chosen.</p>
            <p> - iCAT was configured to analyze by CDR3 amino acid sequence only</p>
            <p> </p>
            <p> If the value is bigger, we can conclude that the sample is more associated to the the training group. Is this a correct sentence?</p>
            <p> </p>
            <p> From the training data, iCAT correctly classified 32 of 32 na&#x00ef;ve samples as &#x201d;unexposed&#x201d; and 58 of 58 vaccinated samples as &#x201d;exposed&#x201d; (100% accuracy). Is this a correct sentence?</p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>Yes</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Partly</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Partly</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Partly</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>Genomic epidemiology, Multiomics</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment6777-82161">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Ahn</surname>
                            <given-names>Tae-Hyuk</given-names>
                        </name>
                        <aff>Saint Louis University, USA</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>10</day>
                    <month>6</month>
                    <year>2021</year>
                </pub-date>
            </front-stub>
            <body>
                <p>
                    <bold>Major comments:</bold>
                </p>
                <p> </p>
                <p> 1. It seems like that the performance of this tool does depend on the number of samples.&#x00a0; Please provide discussion or guideline on the number of negative and positive samples relating to the performance.</p>
                <p> </p>
                <p> Response: We really appreciate all the great and thoughtful comments from the reviewer. The sample size is of course very important in determining the power and accuracy of the diagnostic.&#x00a0; We have generated positive and negative datasets of various sample sizes to determine the impact of sample numbers on accuracy.&#x00a0; The conclusion was that the number of samples required to generate an accurate diagnosis is determined by the nature of the samples. For example, a dataset from Zika infected mice produced a small but conserved virus-specific TCR repertoire due to the small number of immunogenic epitopes (Hassert and Wolf et. al. 2020) ; this allowed for the generation of a powerful diagnostic assay using relatively small numbers of samples.&#x00a0; Comparatively, using a dataset from mice before and after smallpox vaccination resulted in a large pool of virus-specific TCRs due to a large number of potential epitopes; the less focused response required a greater number of samples in each group to generate sufficient coverage and diagnostic accuracy (Wolf et. al. 2018).</p>
                <p> </p>
                <p> 2. It seems like that this is the first tool available for TCR data-based diagnostic tool. Please clarify if this is the first tool which uses TCR sequencing data for the purpose of diagnostic. If this is not the first tool, then comparison results with other tools should be included.</p>
                <p> </p>
                <p> Response: To our limited knowledge, this is the first publicly available tool for diagnostic classification using TCR sequencing data.</p>
                <p> </p>
                <p> 
                    <bold>Minor comments:</bold>
                </p>
                <p> </p>
                <p> 1. First, I have a question for clarification purpose. Before TCR sequencing data is uploaded, does TCR sequencing data need any preprocessing step to make input compatible with iCAT?</p>
                <p> </p>
                <p> Response: iCAT takes in TCR data in tab-delimited clonotype abundance tables. These tables are the result of mapping of reads from Variable, Diversity, and Joining segments then the assembly of clonotypes. iCAT does not perform these steps so they may be considered pre-processing. Clonotype tables are provided through the immunoSEQ assay from Adaptive Technologies, which is the most commonly used immunosequencing pipeline in recently published studies. While iCAT input-parsing was built around immunoSEQ&#x2019;s format, it also has a flexible interface that allows handling of different formats with different user-selected column names.</p>
                <p> </p>
                <p> 2. Install.packages(&#x201c;devtools&#x201d;) does not work with R4.0.3 on Mac OS X. The error I got is as follows:</p>
                <p> </p>
                <p> This issue is resolved and updated on iCAT.</p>
                <p> </p>
                <p> 3. Each input file is limited to 10Gigabyte in size. Where does this limitation come from?</p>
                <p> </p>
                <p> Response: This is an arbitrary limitation that we coded into the tool&#x2019;s interface. It is not a required limitation for any of the underlying functions in iCAT. The reason we put this size limit in place is because one of our future goals with iCAT is to host it on a website where users can upload their data for analysis, and thus data size can become a limiting factor.</p>
                <p> </p>
                <p> 4. I downloaded one of test cases which contains many files. It was not easy to figure out which files to be uploaded. It would be nice that this R package provides example datasets along with the package so that they can be played in R directly.</p>
                <p> </p>
                <p> Response: Example negative, positive, and independent datasets are included in the R package under the path &#x201c;inst/extdata/&#x201d;.</p>
                <p> </p>
                <p> 5. There is an option named &#x2018;Other&#x2019; for &#x2018;analyze clonotypes by (column names in parantheses)&#x2019;. What is other?</p>
                <p> </p>
                <p> Response: This option exists to allow users to input a custom column name to analyze by. TCR data can vary in formatting (e.g. some files might describe the amino acid column as &#x201c;aminoAcid&#x201d; while others can describe it as &#x201c;aa&#x201d;) so this option can help users load different data formats into iCAT.</p>
                <p> </p>
                <p> 6. Please include information of computational time for training and prediction steps for each case.</p>
                <p> </p>
                <p> Response:</p>
                <p> </p>
                <p> Use case 1:</p>
                <p> Training time: 2.36 minutes</p>
                <p> Classification time: 30.6 seconds</p>
                <p> </p>
                <p> Use case 2:</p>
                <p> Training time: 47.81 seconds</p>
                <p> Classification time: 6.83 seconds</p>
                <p> </p>
                <p> Use case 3:</p>
                <p> Training time: 1.50 minutes</p>
                <p> Classification time: 33.12 seconds</p>
                <p> </p>
                <p> The manuscript was updated to reflect this information.</p>
                <p> </p>
                <p> 7. Although feature from an option of &#x2018;TCRV-CDR3-TCRJ&#x2019; is recommended, please include the performance for other features for each test case which could help readers understand and compare features.</p>
                <p> </p>
                <p> Response: In general, we found that using the J-gene and V-gene info improves accuracy in our testing, but other features are available in case researchers to believe they might get better results with them. Using the amino acid sequence for the monkeypox data,&#x00a0; 1/5 negative samples were identified correctly, while 6/10 positive samples were identified correctly. Using the DNA sequence, 5/5 negative samples were identified correctly and 3/10 positive samples were identified correctly.</p>
                <p> </p>
                <p> 8. What does TCRBV stand for?</p>
                <p> </p>
                <p> Response: It stands for T cell receptor beta chain variable region.</p>
                <p> </p>
                <p> 9. What does VATS stand for?</p>
                <p> </p>
                <p> Response: It stood for vaccine-associated target sequences. This is now changed to target-associated receptor sequences (TARSs) for consistency in the manuscript.</p>
                <p> </p>
                <p> 10. A correction constant is not shown in equation (2).</p>
                <p> </p>
                <p> Response: That is correct We updated the sentence for the equation in the manuscript.</p>
                <p> </p>
                <p> 11. Whole genome sequences can distinguish DENGU from ZIKE with 100% accuracy. For the sentence &#x2018;current laboratory testing cannot distinguish between the two&#x2019;, are you excluding whole genome sequencing data?</p>
                <p> </p>
                <p> Response: We are limiting that comment to typical laboratory diagnostics which typically does not include whole-genome sequencing.</p>
                <p> </p>
                <p> 12. Is it possible to make change range of acceptable into something like 1:99 or 1~99 instead of 1 99 to indicate range?</p>
                <p> </p>
                <p> Response: We reflected this comment and iCAT now allows inputting either a space or a colon to delimit range values.</p>
                <p> </p>
                <p> 13. Does the max p-value change automatically according to the training data provided? Or should it be changed manually based on the selected one by Fisher&#x2019;s t-test?</p>
                <p> </p>
                <p> </p>
                <p> Response: The max p-value can be changed manually.&#x00a0; The purpose is to allow the user to modify the stringency of the diagnostic. This could be done after generating the diagnostic library and the user determines they would prefer more (or less) stringent parameters.</p>
                <p> 14. Please revise typo in Figure 1(B)</p>
                <p> </p>
                <p> Response: Fixed typo in the new manuscript version.</p>
                <p> </p>
                <p> 15. Please revise typo with &#x2018;the minimum samples&#x2019;</p>
                <p> </p>
                <p> 16. Please revise the following sentences:</p>
                <p> - The ratio of Cv to Cn is determined for each p-value. The p-value with the largest Cv:Cn ratio and offers significant coverage to distinguish (v)accinated from (n)a&#x00ef;ve samples was chosen.</p>
                <p> - iCAT was configured to analyze by CDR3 amino acid sequence only</p>
                <p> </p>
                <p> Response: Sentences were revised in the new manuscript version to make them more clear.</p>
                <p> </p>
                <p> 17. If the value is bigger, we can conclude that the sample is more associated to the the training group. Is this a correct sentence?</p>
                <p> </p>
                <p> Response: The sentence was revised in the new manuscript version.</p>
                <p> </p>
                <p> 18. From the training data, iCAT correctly classified 32 of 32 na&#x00ef;ve samples as &#x201d;unexposed&#x201d; and 58 of 58 vaccinated samples as &#x201d;exposed&#x201d; (100% accuracy). Is this a correct sentence?</p>
                <p> </p>
                <p> Response: The sentence was revised in the new manuscript version.</p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report82160">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.30069.r82160</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Qiao</surname>
                        <given-names>Shuo-Wang</given-names>
                    </name>
                    <xref ref-type="aff" rid="r82160a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-0935-8397</uri>
                </contrib>
                <contrib contrib-type="author">
                    <name>
                        <surname>Yao</surname>
                        <given-names>Ying</given-names>
                    </name>
                    <xref ref-type="aff" rid="r82160a2">2</xref>
                    <role>Co-referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-8679-1557</uri>
                </contrib>
                <aff id="r82160a1">
                    <label>1</label>Department of Immunology, Oslo University Hospital, Rikshospitalet, University of Oslo, Oslo, Norway</aff>
                <aff id="r82160a2">
                    <label>2</label>Department of Immunology, University of Oslo, Oslo, Norway</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>12</day>
                <month>4</month>
                <year>2021</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2021 Qiao SW and Yao Y</copyright-statement>
                <copyright-year>2021</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport82160" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.27214.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>Diagnostic methods based on the detection of disease-specific or disease-associated T-cell receptor (TCR) sequences are novel approaches whose development is still in its infancy. This paper introduces an intuitive, easy-to-use, software package for the analysis of TCR deep sequencing data with the purpose of grouping data into &#x2018;na&#x00ef;ve/healthy&#x2019; or &#x2018;diseased/exposed&#x2019; groups. The statistical methods used is rather rudimentary, but serve the purpose. It demonstrates remarkably high accuracy in three use cases, including a small dataset of COVID-19. However, the choice of parameter settings and how data is split into training and test datasets, is not well explained. Thus, one is left with a feeling that some over-fitting and over-tuning may have occurred. Nevertheless the shortcomings, overall this software package is first of its kind and will be welcomed by the growing community of scientists working with immune receptors and disease prediction.</p>
            <p> </p>
            <p> 
                <bold>Major comments:</bold> 
                <list list-type="order">
                    <list-item>
                        <p>Introduction: Aside from the immensity of TCR diversity, both theoretical and within each individual, by far the largest challenge to identifying signals of T-cell memory to a particular antigen, is the rarity of specific T cells to any given antigen. In circulation during the homeostatic memory phase, this number can be as low as 1 in 100&#x00a0;000 for CD4+ T cells, and may be up to 1% in the CD8 compartment.</p>
                    </list-item>
                    <list-item>
                        <p>Operation: Does iCAT support both TCRB and TCRA data, separately, or in the same dataset? Or does it only support TCRB data?</p>
                    </list-item>
                    <list-item>
                        <p>Setting (2) during training: at which level of TRBV and TRBJ allelic similarity is the tool tuned? In other words, would human TRBV7-2*01 and TRBV7-2*02 be considered the same, or different? How about human TRBV4-2 and TRBV4-3 that are very similar, especially at the 3&#x2019;-end where the last 40 amino acids are identical. Many HTS short reads would not be able to distinguish between these two gene segments.&#x00a0;</p>
                    </list-item>
                    <list-item>
                        <p>Use Case 1: The authors wrote that they analysed this dataset either by the CDR# amino acid sequence only, or with the V- and J-gene identity in addition. Did these two options behave differently?</p>
                    </list-item>
                    <list-item>
                        <p>In the use cases, it is not common to report the accuracy on training dataset. it is not clear how the data was split into the training and testing dataset. More common practice is to do cross validation, and report the averaged precision, sensitivity, maybe F-score, especially for unbalanced samples. For Use case 2 for example, how were the 5 na&#x00ef;ve and 10 exposed samples in the test group chosen, randomly? Would one get the same 100% accuracy if another random set of 5 na&#x00ef;ve and 10 exposed samples was chosen?</p>
                    </list-item>
                    <list-item>
                        <p>In the training tab, the default Max p-value is 0.1. Usually in disease associated studies, a more stringent cut-off p value is recommended. Considering the main purpose of iCAT is for diagnosis, the specificity is not as crucial, 0.1 might be a good choice. Is there any analysis supporting the choice, e.g. in the use cases or on other public available data which antigen specific sequences is known?</p>
                    </list-item>
                    <list-item>
                        <p>In the prediction tab, it would be better to provide a measurement of uncertainty in addition to %TARS</p>
                    </list-item>
                    <list-item>
                        <p>Some studies such as,&#x00a0;Schneider-Hohendorf et al. (2018)
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-82160-1">1</xref>
                            </sup>,&#x00a0;Britanova et al. (2014)
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-82160-2">2</xref>
                            </sup>, and&#x00a0;DeWitt&#x00a0;et al. (2018)
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-82160-3">3</xref>
                            </sup> suggested that MHC, age and sex also effect the immune receptor repertoire. Incorporating such meta information should be useful, even if in cases where the sample size is too limited for these factors to be modelled in the training, they are better to be randomized in training and testing datasets.</p>
                    </list-item>
                    <list-item>
                        <p>How about computing time? How long did it take to do the training, and classification of for instance the use case 2 data?</p>
                    </list-item>
                    <list-item>
                        <p>There are substantial public resources including some antigen-specific database and repertoire database. It might be useful to allow loading some commonly used datasets as controls in case the user has limited sample size.</p>
                    </list-item>
                </list> 
                <bold>Minor comments:</bold> 
                <list list-type="order">
                    <list-item>
                        <p>Figure legend 1: TCR sequencing of blood samples can be done with, but not limited to, genomic DNA. Many groups also use cDNA.</p>
                    </list-item>
                    <list-item>
                        <p>Why does the tip of the&#x00a0;iCAT&#x2019;s tail has an antibody? If this were a TCR analysis tool, would it not be more appropriate with a cartoon drawing of TCR?</p>
                    </list-item>
                    <list-item>
                        <p>What is VATS in last paragraph in Page 5, is it the same as TARS?</p>
                    </list-item>
                    <list-item>
                        <p>In the library tab, pattern for significant TARS would make more sense, e.g. are they all similar or there were numbers of clusters based on the similarity?</p>
                    </list-item>
                </list>
            </p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>Yes</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Yes</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Partly</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>No</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Immunology, T cells, TCR, single-cell receptor sequencing,</p>
            <p>We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above.</p>
        </body>
        <back>
            <ref-list>
                <title>References</title>
                <ref id="rep-ref-82160-1">
                    <label>1</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Sex bias in MHC I-associated shaping of the adaptive immune system</article-title>.
                        <source>
                            <italic>Proceedings of the National Academy of Sciences</italic>
                        </source>.<year>2018</year>;<volume>115</volume>(<issue>9</issue>) :
                        <elocation-id>10.1073/pnas.1716146115</elocation-id>
                        <fpage>2168</fpage>-<lpage>2173</lpage>
                        <pub-id pub-id-type="doi">10.1073/pnas.1716146115</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-82160-2">
                    <label>2</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Age-related decrease in TCR repertoire diversity measured with deep and normalized sequence profiling.</article-title>
                        <source>
                            <italic>J Immunol</italic>
                        </source>.<year>2014</year>;<volume>192</volume>(<issue>6</issue>) :
                        <elocation-id>10.4049/jimmunol.1302064</elocation-id>
                        <fpage>2689</fpage>-<lpage>98</lpage>
                        <pub-id pub-id-type="pmid">24510963</pub-id>
                        <pub-id pub-id-type="doi">10.4049/jimmunol.1302064</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-82160-3">
                    <label>3</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Human T cell receptor occurrence patterns encode immune history, genetic background, and receptor specificity</article-title>.
                        <source>
                            <italic>eLife</italic>
                        </source>.<year>2018</year>;<volume>7</volume>:
                        <elocation-id>10.7554/eLife.38358</elocation-id>
                        <pub-id pub-id-type="doi">10.7554/eLife.38358</pub-id>
                    </mixed-citation>
                </ref>
            </ref-list>
        </back>
        <sub-article article-type="response" id="comment6776-82160">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Ahn</surname>
                            <given-names>Tae-Hyuk</given-names>
                        </name>
                        <aff>Saint Louis University, USA</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>10</day>
                    <month>6</month>
                    <year>2021</year>
                </pub-date>
            </front-stub>
            <body>
                <p>
                    <bold>Major comments:</bold>
                </p>
                <p> </p>
                <p> 1. Introduction: Aside from the immensity of TCR diversity, both theoretical and within each individual, by far the largest challenge to identifying signals of T-cell memory to a particular antigen, is the rarity of specific T cells to any given antigen. In circulation during the homeostatic memory phase, this number can be as low as 1 in 100 000 for CD4+ T cells, and may be up to 1% in the CD8 compartment.</p>
                <p> </p>
                <p> Response: We really appreciate all the great and thoughtful comments from the reviewer. As the reviewer commented, the rarity of specific T cells to a specific antigen is also a challenge in addition to the diversity and magnitude of the TCR repertoire. We updated the Introduction from the comment cited a reference for it.</p>
                <p> </p>
                <p> 2. Operation: Does iCAT support both TCRB and TCRA data, separately, or in the same dataset? Or does it only support TCRB data?</p>
                <p> </p>
                <p> Response: iCAT has been developed and tested to analyze TCR beta chain of the T-cell receptor in the alpha/beta T cells. The column call for the V and J are vGeneName and jGeneName and do not distinguish between alpha and beta. So, as long as the remaining formatting is the same, there is no reason we couldn&#x2019;t do either TCRA or TCRB.&#x00a0; However, it is not designed to do so in tandem.</p>
                <p> </p>
                <p> 3. Setting (2) during training: at which level of TRBV and TRBJ allelic similarity is the tool tuned? In other words, would human TRBV7-2*01 and TRBV7-2*02 be considered the same, or different? How about human TRBV4-2 and TRBV4-3 that are very similar, especially at the 3&#x2019;-end where the last 40 amino acids are identical. Many HTS short reads would not be able to distinguish between these two gene segments.</p>
                <p> </p>
                <p> Response: We do not look that deeply into the different V and J chains. TRBV7-2*01 and TRBV7-2*02 are in the column vMaxResolved or jMaxResolved, which we do not use for the V and J calling. We use the vGeneName and jGeneName, which for this example, both would be called as TRBV7-2. But TRBV4-2 and TRBV4-3 will be called differently as those would be distinguished in the vGeneName column.&#x00a0; You can see this in the data table of the generated library, the selected V and J regions do not include the *01 or *02 details. This is also true when you look in the TSV file.</p>
                <p> </p>
                <p> 4. Use Case 1: The authors wrote that they analysed this dataset either by the CDR# amino acid sequence only, or with the V- and J-gene identity in addition. Did these two options behave differently?</p>
                <p> </p>
                <p> Response: Using the V- and J-gene identity in addition to the CDR3 amino acid sequence results in significantly higher prediction accuracy in our testing, likely due to identifying more unique markers. Specifically, it results in fewer false positives. For example, in use case 1, using the amino acid sequence only resulted in correctly identifying only 1 / 10 of the pre-vaccination samples, while still correctly identifying 10/10 of the post-vaccination samples.</p>
                <p> </p>
                <p> 5. In the use cases, it is not common to report the accuracy on training dataset. it is not clear how the data was split into the training and testing dataset. More common practice is to do cross validation, and report the averaged precision, sensitivity, maybe F-score, especially for unbalanced samples. For Use case 2 for example, how were the 5 na&#x00ef;ve and 10 exposed samples in the test group chosen, randomly? Would one get the same 100% accuracy if another random set of 5 na&#x00ef;ve and 10 exposed samples was chosen?</p>
                <p> </p>
                <p> Response: We agree it is not common to report the accuracy on the training data and it is not as useful as the accuracy on the independent testing data in evaluating a classification tool. However, we chose to include it to demonstrate this feature of the iCAT interface. This feature may be useful as a baseline-check after training in the usual scenarios where the ground truth for the testing data is not available.</p>
                <p> The independent samples in use case 2 were chosen randomly before training and excluded from the training set. We also repeated the test three times after receiving this question using a script to randomly select 15 samples for testing. The accuracy was at a 100% for all the new random tests.</p>
                <p> </p>
                <p> 6. In the training tab, the default Max p-value is 0.1. Usually in disease associated studies, a more stringent cut-off p value is recommended. Considering the main purpose of iCAT is for diagnosis, the specificity is not as crucial, 0.1 might be a good choice. Is there any analysis supporting the choice, e.g. in the use cases or on other public available data which antigen specific sequences is known?</p>
                <p> </p>
                <p> Response: The max P-value is set at default to 0.1, it can be changed to any value.&#x00a0; We found through our own testing groups that it was a reasonable cut-off (for a default value) as p-values above this tended to generate libraries that were not stringent enough. However, the user can manually alter the p-value to more stringent conditions.&#x00a0; It is worth noting that while the max p-value of the assay may be defaulted at 0.1, iCAT includes a feature for determining the optimal p-value threshold for library generation, which is likely to be well below the max allowed p-value.</p>
                <p> </p>
                <p> 7. In the prediction tab, it would be better to provide a measurement of uncertainty in addition to %TARS</p>
                <p> </p>
                <p> Response:&#x00a0; This is a great comment and we totally agree that a measurement of uncertainty in addition to %TARS might be useful for iCAT users. We considered an uncertainty metric based on the difference between the probability density in the pre-exposure distribution and the probability density in the post-exposure distribution.&#x00a0;This new feature has been implemented and now available on GitHub iCAT and we updated the README screenshot of the prediction tab.</p>
                <p> </p>
                <p> 9. Some studies such as, Schneider-Hohendorf et al. (2018)1, Britanova et al. (2014)2, and DeWitt et al. (2018)3 suggested that MHC, age and sex also effect the immune receptor repertoire. Incorporating such meta information should be useful, even if in cases where the sample size is too limited for these factors to be modelled in the training, they are better to be randomized in training and testing datasets.</p>
                <p> </p>
                <p> Response: We agree this meta information can provide insightful context for different use cases. This is one of our goals for future versions of iCAT (iCAT 2.0). For example, we are especially excited to study the effect of clustering data based on HLA subtypes. In our opinion, this would be better explored in a separate paper.</p>
                <p> </p>
                <p> *manuscript modification*10. How about computing time? How long did it take to do the training, and classification of for instance the use case 2 data?</p>
                <p> </p>
                <p> Response: Following are the results from use case 2. 
                    <list list-type="bullet">
                        <list-item>
                            <p>Training time: 47.81 seconds</p>
                        </list-item>
                        <list-item>
                            <p>Classification time: 6.83 seconds</p>
                        </list-item>
                    </list> We updated the manuscript to reflect this information.</p>
                <p> </p>
                <p> 11. There are substantial public resources including some antigen-specific database and repertoire database. It might be useful to allow loading some commonly used datasets as controls in case the user has limited sample size.</p>
                <p> </p>
                <p> Response: Providing commonly used datasets as default controls would be a convenient feature for some use cases. We considered including such datasets with the iCAT software package, but decided against it in the current version due to size consideration. We aimed to create a versatile and easy to use software tool. Using the instructions in this paper and on GitHub, we believe loading public datasets as needed can be performed conveniently by the average iCAT user. We will consider this feature in iCAT 2.0.</p>
                <p> </p>
                <p> 
                    <bold>Minor comments:</bold>
                </p>
                <p> </p>
                <p> 1. Figure legend 1: TCR sequencing of blood samples can be done with, but not limited to, genomic DNA. Many groups also use cDNA.</p>
                <p> </p>
                <p> Response: Fixed in the new manuscript version.</p>
                <p> 2. Why does the tip of the iCAT&#x2019;s tail has an antibody? If this were a TCR analysis tool, would it not be more appropriate with a cartoon drawing of TCR?</p>
                <p> </p>
                <p> </p>
                <p> Response: While it is true that a TCR tail would be more descriptive of our tool, it is our opinion that an antibody makes a more aesthetic cat tail than a TCR. Its shape is also more widely recognizable and can give a general idea to non-specialists about what iCAT deals with at first glance.</p>
                <p> 3. What is VATS in last paragraph in Page 5, is it the same as TARS?</p>
                <p> </p>
                <p> Response: Yes. Fixed in the new manuscript version.</p>
                <p> 4. In the library tab, pattern for significant TARS would make more sense, e.g. are they all similar or there were numbers of clusters based on the similarity?</p>
                <p> </p>
                <p> Response: The current version of iCAT does not perform clustering of the target-associated receptor sequences based on similarity, but this can be developed and incorporated in future versions. We agree that it would be a useful addition for exploratory analyses. iCAT currently focuses on using the sequences for prediction, but provides a starting point for further analyses by allowing the download of those significant sequences and some statistical information about their frequency in the training data.</p>
            </body>
        </sub-article>
    </sub-article>
</article>
