<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="other" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.53842.1</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Opinion Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>Python for gene expression</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 1; peer review: 2 approved]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Bystrykh</surname>
                        <given-names>Leonid</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-6924-5602</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>ERIBA, University Medical Center Groningen, University Medical Center Groningen, Groningen, 9713 AV, The Netherlands</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:l.bystrykh@rug.nl">l.bystrykh@rug.nl</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>31</day>
                <month>8</month>
                <year>2021</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2021</year>
            </pub-date>
            <volume>10</volume>
            <elocation-id>870</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>1</day>
                    <month>7</month>
                    <year>2021</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2021 Bystrykh L</copyright-statement>
                <copyright-year>2021</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/10-870/pdf"/>
            <abstract>
                <p>Genome biology shows substantial progress in its analytical and computational part in the last decades. Differential gene expression is one of many computationally intense areas; it is largely developed under R programming language. Here we explain possible reasons for such dominance of R in gene expression data. Next, we discuss the prospects for Python to become competitive in this area of research in coming years. We indicate that Python can be used already in a field of a single cell differential gene expression. We pinpoint still missing parts in Python and possibilities for improvement.</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>differential gene expression</kwd>
                <kwd>single cell expression</kwd>
                <kwd>python</kwd>
                <kwd>R</kwd>
                <kwd>limma</kwd>
            </kwd-group>
            <funding-group>
                <funding-statement>The author(s) declared that no grants were involved in supporting this work.</funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        <sec id="sec1" sec-type="intro">
            <title>Introduction</title>
            <p>Fundamental breakthrough in sequencing technologies in late 1990 promoted explosive growth of the data accumulated in biology in the last two decades. First, the introduction of expression microarrays has initiated accumulation of genome-wide gene expression data from different organisms, which stimulated creation of dedicated databases and development of computational tools for its analysis. Second, a more substantial wave of expression data arrived along with progress in high-throughput DNA sequencing,
                <sup>
                    <xref ref-type="bibr" rid="ref1">1(p)</xref>,
                    <xref ref-type="bibr" rid="ref2">2</xref>
                </sup> which demanded even bigger data storage and more sophisticated means of maintenance, programming support and analysis.
                <sup>
                    <xref ref-type="bibr" rid="ref3">3</xref>-
                    <xref ref-type="bibr" rid="ref5">5</xref>
                </sup> This coincided with improved performance of our computers accompanied by the development of programming languages, especially those that paid attention to the biology-specific demands in data analysis, such as R and Python.
                <sup>
                    <xref ref-type="bibr" rid="ref6">6</xref>-
                    <xref ref-type="bibr" rid="ref8">8</xref>
                </sup> Although the current list of known programming languages is approaching 400 (compiled by Wikipedia), there are only a handful of languages supporting dedicated biology-oriented packages (
                <xref ref-type="table" rid="T1">Table 1</xref>). Thus, the theoretical choices for languages with specialized support of biological applications is still very limited.</p>
            <table-wrap id="T1" orientation="portrait" position="float">
                <label>Table 1. </label>
                <caption>
                    <title>Programming languages supporting biological packages, their names and major focus.</title>
                </caption>
                <table content-type="article-table" frame="hsides">
                    <thead>
                        <tr>
                            <th align="left" colspan="1" rowspan="1" valign="middle">Programming language</th>
                            <th align="left" colspan="1" rowspan="1" valign="middle">Name</th>
                            <th align="left" colspan="1" rowspan="1" valign="middle">Major applications</th>
                        </tr>
                    </thead>
                    <tbody>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">C++</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Bio++</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Sequencing and phylogenetics</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="middle">Java</td>
                            <td align="left" colspan="1" rowspan="1" valign="middle">BioJava</td>
                            <td align="left" colspan="1" rowspan="1" valign="middle">DNA/RNA/Protein sequence analysis</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="middle">JavaScript</td>
                            <td align="left" colspan="1" rowspan="1" valign="middle">BioJS</td>
                            <td align="left" colspan="1" rowspan="1" valign="middle">Mostly Sequence analysis, some elements of GO and visualizations</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="middle">Perl</td>
                            <td align="left" colspan="1" rowspan="1" valign="middle">BioPerl</td>
                            <td align="left" colspan="1" rowspan="1" valign="middle">Mostly sequencing related</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="middle">PHP</td>
                            <td align="left" colspan="1" rowspan="1" valign="middle">BioPHP</td>
                            <td align="left" colspan="1" rowspan="1" valign="middle">Mostly sequencing related</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="middle">Python</td>
                            <td align="left" colspan="1" rowspan="1" valign="middle">BioPython
                                <break/>Snakemake
                                <sup>
                                    <xref ref-type="fn" rid="tfn1">
                                        <sup>1</sup>
                                    </xref>
                                </sup>
                            </td>
                            <td align="left" colspan="1" rowspan="1" valign="middle">Mostly sequencing related
                                <break/>Special package to reproducibly organize complex pipelines</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="middle">Ruby</td>
                            <td align="left" colspan="1" rowspan="1" valign="middle">BioRuby</td>
                            <td align="left" colspan="1" rowspan="1" valign="middle">Mostly sequencing related</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="middle">R</td>
                            <td align="left" colspan="1" rowspan="1" valign="middle">Bioconductor</td>
                            <td align="left" colspan="1" rowspan="1" valign="middle">Huge collection of different kinds, no specific subject. Not really for sequencing</td>
                        </tr>
                    </tbody>
                </table>
                <table-wrap-foot>
                    <fn id="tfn1">
                        <label>
                            <sup>1</sup>
                        </label>
                        <p>Snakemake
                            <sup>
                                <xref ref-type="bibr" rid="ref9">9</xref>
                            </sup> is python based workflow managing system, in other words pipelines organizing software, which is more than a regular package (compared to others mentioned in this table).</p>
                    </fn>
                    <p>It is also worth mentioning Bioconda
                        <sup>
                            <xref ref-type="bibr" rid="ref10">10</xref>
                        </sup> installation package, which assists finding and installing various tools for biological data analysis. It is a sort of a spin-off the Anaconda installation package for Python, but with extended spectrum of options and possibilities.</p>
                </table-wrap-foot>
            </table-wrap>
        </sec>
        <sec id="sec2">
            <title>R and Perl</title>
            <p>Python, as a fully functional and ready for tasks of general programming, arrived with as version 2.0 in 2000. By that time R was already a well-established language in bioinformatics, especially for statistical applications, see for instance.
                <sup>
                    <xref ref-type="bibr" rid="ref11">11</xref>
                </sup> At that time Perl was probably the most used programming language in genome biology (especially suited for string operations on DNA, RNA and protein sequences), due to its better computational performance,
                <sup>
                    <xref ref-type="bibr" rid="ref12">12</xref>
                </sup> and it stays strong in a field of genome sequence analysis even now, although it&#x2019;s difficult syntax and accumulating problems with maintenance of the packages has caused a gradual decline in popularity (as for instance recorded in codementor.io site for the worst programming languages). Nevertheless, Perl scripts can be seen on the back pages of Ensembl BioMart and also Unigene pages.</p>
            <p>Since the introduction of the Affymetrix expression microarrays in the 2000s, it immediately required means of programming development; and the R language with its strong statistical component was ready for the immediate use in the field of expression data analysis. The key elements in e establishing R as a standard language in the field was resolving a problem of (microarray) data normalization
                <sup>
                    <xref ref-type="bibr" rid="ref13">13</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref14">14</xref>
                </sup> and subsequent implementation as an R package (for instance Bioconductor preprocessCore
                <sup>
                    <xref ref-type="bibr" rid="ref15">15</xref>
                </sup>). Publishing the Limma package
                <sup>
                    <xref ref-type="bibr" rid="ref16">16</xref>,
                    <xref ref-type="bibr" rid="ref17">17</xref>
                </sup> was absolutely crucial for success of R in that area; it resolved a problem of a small sample size for microarray expression data systematically provided by biologists at that time. Since 2003-2005 clear separation of tasks became visible: Perl was focused on tasks of sequencing analysis, while R covered statistics and differential analysis, including expression microarrays.</p>
        </sec>
        <sec id="sec3">
            <title>Limma package</title>
            <p>Since the first publication of the Limma package by the group of Gordon Smith,
                <sup>
                    <xref ref-type="bibr" rid="ref16">16</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref18">18</xref>
                </sup> it became a central and indispensable element of major differential expression protocols in R for at least a decade since its introduction. In the early 2000s, microarrays were expensive and many labs could afford only a limited number of samples to analyze. The core issue resolved by this package was how to bypass a dilemma of a small number of samples in groups and still obtain credible and statistically validated results. Suppose you try to apply a t-test to a set of data with only 2 or 3 replicates per group and a total number of tests up to 20000 times. This is equivalent to analyzing expression microarray data containing 20000 gene expression in a series of 2 controls and 2 experimental samples. Regular t-test with correction for multiple testing has little chance for success. Limma has two essential steps circumventing this problem. One is using a linear model for a data fit for the entire table of data, followed by using empirical Bayesian statistics to recalculate probabilities based on the entire distribution of the expression data for all genes across the expression array.</p>
            <p>This concept was directly inherited in later protocols for the bulk RNAseq analysis with edgeR package.
                <sup>
                    <xref ref-type="bibr" rid="ref19">19</xref>
                </sup> Namely, the Voom function in edgeR implements very similar steps of data conversion compared to the original Limma package. Another popular protocol in R, namely deseq2
                <sup>
                    <xref ref-type="bibr" rid="ref20">20</xref>(p2)</sup> (as well as deseq) used a similar approach, although not directly copying the Limma algorithms. Details can be found in corresponding tutorials to the packages in Bioconductor.</p>
            <p>No packages were designed in other programming languages. This lack of diversity of choices created a monopoly of R protocols for the &#x201c;classic&#x201d; gene-expression analysis based on microarray data or bulk-RNAseq with limited number of samples per group.</p>
            <p>Technically Python allows to &#x201c;wrap&#x201d; or quote other programming languages within its own scripts. Python can currently &#x201c;quote&#x201d; some lines from JavaScript, especially when ipynb file format is used. For R language there is a special wrapper, rpy2,
                <sup>
                    <xref ref-type="bibr" rid="ref21">21</xref>
                </sup> which can incorporate parts of the R functions within Python. Potentially, there is a possibility to wrap R-functions from any R package into Python. However, there is not a genuine alternative in another language, and besides, it is not a popular approach in current publications, which could be recommended to biologists as a standard protocol. Note, that by standard protocol we imply a script suggested by package developers, which can be followed by the user with &#x201c;average&#x201d; skills in programming.</p>
            <p>Consequently, for quite a while the Python language had no usable application for the differential gene expression analysis, especially in times when expression microarrays and bulk RNAseq data with small sample numbers dominated the literature. Sporadically, one can find some reports with peculiar options available in Python. For instance, a &#x201c;geometrical approach&#x201d; was suggested a while ago for finding differentially expressed genes,
                <sup>
                    <xref ref-type="bibr" rid="ref22">22</xref>
                </sup> for which the implementation in jupyter notebook is also available.
                <sup>
                    <xref ref-type="bibr" rid="ref23">23</xref>
                </sup> A similar &#x201c;geometric approach&#x201d; is discussed in another publication
                <sup>
                    <xref ref-type="bibr" rid="ref24">24</xref>
                </sup> (although the later analysis was performed in R). Inspection of some of those scripts
                <sup>
                    <xref ref-type="bibr" rid="ref22">22</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref23">23</xref>
                </sup> reveals that the &#x201c;geometric&#x201d; approach rehearses a fold change statistics rather than eBayes probability approach and thus is not recommended.</p>
        </sec>
        <sec id="sec4">
            <title>Why Python?</title>
            <p>Indeed, if R and Perl performed so well, each in its own niche, why do we need Python after all? In fact, with further evolution of biological sciences more biologists realized the necessity of some elementary data analysis by themselves. Whereas R is still strong and powerful for professional statisticians, it is also recognized as a difficult language to learn and to comprehend (see for instance introduction in Quick-R, 
                <ext-link ext-link-type="uri" xlink:href="https://www.statmethods.net/">https://www.statmethods.net/</ext-link>). The same in part is true for Perl. Python, on the contrary was originally designed to be more human-friendly, more transparent, and a clearer computer language compared to Perl and R. More details of languages in comparison can be found on the Python official site (
                <ext-link ext-link-type="uri" xlink:href="https://www.python.org/doc/essays/comparisons/">https://www.Python.org/doc/essays/comparisons/</ext-link>). This gradually became recognized by the broad community of interested people, including all kinds of scientists and non-scientists in Universities, secondary education and other businesses. This made Python the most popular computer language in recent years (according to 
                <ext-link ext-link-type="uri" xlink:href="https://pypl.github.io/PYPL.html">https://pypl.github.io/PYPL.html</ext-link> for instance).</p>
            <p>The second useful feature of Python is how functions are organized and stored. Unlike R, where each individual contributor writes their own package, and gradually it becomes a collection of millions of functions, often redundant. Python has a policy of bigger consortia and bigger collections of functions within libraries with less redundancy in its content (although small packages also exist). The core packages like SciPy and Numpy collect long lists of useful functions for elementary math and statistics. They are universally used as a source of scientific and numeric functions. On top of these, other more dedicated libraries are developed, like Scikit-learn (a.k.a. Sklearn) package for machine learning applications, Pandas for file and table management, Statsmodels for various kinds of a model fitting. Regarding visualisations, most core options are in Matplotlib library, beyond that more specific illustrations could be found in Seaborn, others in Bokeh and so on. Noteworthy, Pandas, Statsmodels and Seaborn are stylistically similar and resemble R-style to some degree in their exterior. Unfortunately, Sklearn package currently does not support Pandas data frame data structures, although it can be worked around via Numpy array conversions.</p>
        </sec>
        <sec id="sec5">
            <title>What is missing in Python for expression microarrays analysis?</title>
            <p>Saying that the entire Limma package is missing in Python is a bit vague statement. It is important to specify what is exactly missing, what part of it cannot be replaced by existing alternatives. Typically, a microarray protocol is built in steps, many of which are already available in Python. Expression microarray data are deposited in public databases, the most known is GEO site, which also has a built-in tool GEO2R with an R script attached
                <sup>
                    <xref ref-type="bibr" rid="ref25">25</xref>
                </sup>; the script would begin with package enabling fetching the data from the site. Next, data are converted to the table. Values and their distribution are inspected by checking the histogram, boxplots, and maybe MDS plot. It enables us to find out whether data are already log2-transformed and normalized (high or low scale of intensities, also whether data look reasonably normally-distributed or not) or has to be log2 transformed and normalized (equalized) to one another. If required, we add a step of log2-transform (available as core function in R) and quantile normalization (available in Limma and preprocessCore). All those mentioned steps are also available in Python (see 
                <xref ref-type="table" rid="T2">Table 2</xref> for details). Next, we define groups, then the model for our 
                <italic toggle="yes">lmFit</italic> function. This is a sort of 
                <italic toggle="yes">lm</italic> function available in Python statsmodels and core R, but 
                <italic toggle="yes">lmFit</italic> works for entire table of data and it collects the results for an entire table as well. It is accompanied by another 
                <italic toggle="yes">contrasts.fit</italic> step which is more of the same for specified groups of data. Further we have a function 
                <italic toggle="yes">eBayes</italic>, which recalculates statistics obtained from the fit steps above and finally generates Bayes corrected values for significance. This is essentially the heart of Limma, which is not available in Python in any form. At last, 
                <italic toggle="yes">topTable</italic> function organizes a final table of differential expressions, what we well know from our own work and publications. Further, it can be decorated by more illustrations, like volcano plot or another PCA plot, etc. All those decorative functions can be done in Python as well. To summarize, the 
                <italic toggle="yes">lmFit</italic> and 
                <italic toggle="yes">eBayes</italic> are the only critical elements missing in Python precluding its use for microarray gene expression analysis.</p>
            <table-wrap id="T2" orientation="portrait" position="float">
                <label>Table 2. </label>
                <caption>
                    <title>Steps and functions for differential expression microarrays analysis in R and analogues in Python.</title>
                </caption>
                <table content-type="article-table" frame="hsides">
                    <thead>
                        <tr>
                            <th align="left" colspan="1" rowspan="1" valign="middle">Step</th>
                            <th align="left" colspan="1" rowspan="1" valign="middle">R package/function</th>
                            <th align="left" colspan="1" rowspan="1" valign="middle">Python analogue</th>
                        </tr>
                    </thead>
                    <tbody>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Fetch data from GEO</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">GEOquery (Bioconductor)</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">GEOparse</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Visualize data</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">hist(), boxplot()</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">plt.hist(), plt.boxplot() (Matplotlib)</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Log2 transform</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">log2()</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">log2 (Math)</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Quantile normalization</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">normalizeBetweenArrays() (Limma), normalize.quantiles() (preprocessCore)</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Not directly available, the procedure is described in detail, it can be written as custom code</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Model fit</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">lmFit(), contrasts.fit() (Limma)</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Not directly available, may be made from statsmodels package functions.</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Calculate significance</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">eBayes() (Limma)</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Missing</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Generate differential expression table</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">topTable() (Limma)</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Missing, but can be written as custom script</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Extra visualizations</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Volcanoplot (Limma), PCA (multiple packages)</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Basic plots in Matplotlib, plt.scatter(), PCA, MDS, in SciKit-learn</td>
                        </tr>
                    </tbody>
                </table>
                <table-wrap-foot>
                    <fn-group content-type="footnotes">
                        <fn id="tfn2">
                            <p>Note: packages for functions are in brackets behind the function.</p>
                        </fn>
                    </fn-group>
                </table-wrap-foot>
            </table-wrap>
        </sec>
        <sec id="sec6">
            <title>What is missing in Python for bulk RNAseq analysis?</title>
            <p>Major packages in RNAseq differential gene expression analysis in R utilize the concepts/functionalities implemented in Limma package directly or indirectly. For instance, edgeR package designed for bulk RNAseq differential expression imports Limma as a dependent package and uses elements of it. The basic steps are slightly different, but the outline is very much the same. The first step is usually either trivial read file function or read raw mapped data as series of separate files, and makes a table out of it. The data can be either raw read counts, coming directly from the step of sequencing reads counts per transcript, or corrected by transcript length (in RNA seq it is essential for comparison expression levels across different genes). Unlike to microarray data, which are the smallest expression data among all others, RNAseq primary data are much bigger in size, and they contain lots of low-level or not expressing genes. Consequently, there is a step removing genes with low read values. Those genes are useless in terms of differential analysis and only overload the memory. Since different samples in RNAseq can have different read coverage, and also a different number of detected genes (above zero), the whole philosophy of normalization is rather complicated. However, the resulting procedure of normalization is reduced to familiar log2-transform step followed by dividing all gene-expression values by so-called normalizing factors. Fortunately, algorithms of finding normalizing factors are mostly well described, especially for deseq2 (an outline can be found in Maza, 2016
                <sup>
                    <xref ref-type="bibr" rid="ref26">26</xref>
                </sup>). Therefore, it is possible to write a custom script in any available language including Python, which would recapture this sort of the normalization step. When normalization is done, the next important step is estimation of data dispersion. This step is rather complicated in details not suitable for this type of article. In edgeR there are many alternative options for this step available. After that the step of statistical estimation of significance comes to a play. The resulting differential expression table follows the steps of a 
                <italic toggle="yes">topTable</italic> from Limma. If we inspect options for Python, we will find out that similar to microarrays Python largely misses a step of dispersion analysis, estimation of fold change statistics, and statistical significance. Other steps can be replaced by known functions or custom scripts (
                <xref ref-type="table" rid="T3">Table 3</xref>).</p>
            <table-wrap id="T3" orientation="portrait" position="float">
                <label>Table 3. </label>
                <caption>
                    <title>Steps and functions for RNAseq DE analysis in edgeR and analogues in Python.</title>
                </caption>
                <table content-type="article-table" frame="hsides">
                    <thead>
                        <tr>
                            <th align="left" colspan="1" rowspan="1" valign="middle">Step</th>
                            <th align="left" colspan="1" rowspan="1" valign="middle">R package/function</th>
                            <th align="left" colspan="1" rowspan="1" valign="middle">Python analogue</th>
                        </tr>
                    </thead>
                    <tbody>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Read the data from file</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">read.csv(), read.table()</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Read_csv (pandas)</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Visualize data</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">hist(), boxplot()</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">plt.hist(), plt.boxplot() (Matplotlib)</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Convert to special data format</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">DGElist()</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Not used</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Calculate normalizing factors (normalize and log-transform)</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">calcNormFactors()</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">not directly available, the procedure described for deseq2 can be written as custom code</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Estimate dispersion</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Many kinds of estimateDispersion()</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Not available</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Calculate significance</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">exactTest()</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Not available in this context</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Generate differential expression table</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">topTable() (Limma)</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Missing, but can be written as custom script</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Extra visualisations</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Volcanoplot (Limma), PCA (multiple packages)</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Basic plots in Matplotlib, plt.scatter(), PCA, MDS, in SciKit-learn</td>
                        </tr>
                    </tbody>
                </table>
                <table-wrap-foot>
                    <fn-group content-type="footnotes">
                        <fn id="tfn3">
                            <p>Note: deseq2 protocol makes steps from normalization to differential expression table in one function.</p>
                        </fn>
                    </fn-group>
                </table-wrap-foot>
            </table-wrap>
        </sec>
        <sec id="sec7">
            <title>Single cell RNAseq in R</title>
            <p>Since R set a good trend for making all previous protocols for differential gene expression, it also pioneered a single cell gene expression protocol. Out of many protocols generated so far, the most frequently used are Scater,
                <sup>
                    <xref ref-type="bibr" rid="ref27">27</xref>
                </sup> Scran,
                <sup>
                    <xref ref-type="bibr" rid="ref28">28</xref>
                </sup> Seurat
                <sup>
                    <xref ref-type="bibr" rid="ref29">29</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref30">30</xref>
                </sup> and Monocle. Scater and Scran packages are built on a common data type, SingleCellExperiment,
                <sup>
                    <xref ref-type="bibr" rid="ref31">31</xref>
                </sup> and thus can be combined in one script using the same data type (which is often the case). In contrast, Seurat is built on its own data type and aims to be a self-sufficient package. It is currently a popular choice; it is especially appreciated for good tutorials and colorful illustrations, although integration of Seurat with other tools or packages is limited.</p>
            <p>Single cell protocol for differential gene expression likely originated from bulk-RNAseq, but it diverged from its ancestor in subsequent years. Some steps in both protocols are still common, some are different. For instance, SC-RNAseq acquired a step to check sample quality and removal of bad quality samples (which are gene expressions per cell in this case). Normalization and log2-transform are carried out in a similar fashion as in bulk RNAseq, although normalization became even more simple: samples are usually adjusted to the median read counts across entire sets of data and proportional to the detected genes per sample. Next, there is tedious step of identifying groups of cells for differential expression analysis and other characterization. Unlike other differential expression protocols, SC-RNAseq is aimed on characterization of cells, not genes, and possible discovery and/or classification of existing cell types. This is a unique and specific chapter for SC-RNAseq only. The differential gene expression is performed using regular statistical tests (there is no particular preference to those). Close to the end of the SC-RNAseq protocols, we observe increasing diversity of options and specific interests.</p>
            <p>It is important to emphasize that while R scripts in general often serve as standard protocols (or claimed to be a standard protocols), it is not really the case for bulk RNAseq and SC-RNAseq protocols. Currently used packages are known to differ substantially in detail, as well as the results of those data analysis. Therefore, we cannot pinpoint any particular protocol as standard in the field of differential gene expression analysis in R. This and availability of alternative commercial protocols for differential gene expression might be an extra source of the data irreproducibility problem in this particular field of research.</p>
        </sec>
        <sec id="sec8">
            <title>Single cell RNAseq in Python</title>
            <p>Unlike expression microarrays or a bulkRNAseq experiment, a single cell expression experiments contains lots of samples (and samples in each group if groups are defined). Therefore, the major constraint, which existed in early years, namely circumventing a dilemma of small sample numbers does not apply here. With hundreds of samples per group we can apply regular statistics, which is available in Python and other languages. Therefore, with the introduction and development of a single cell differential gene expression analysis it became possible to assemble the entire protocol from available Python functions. Surely, the development of a dedicated package might facilitate the use and popularity of Python for such analysis. In this regard, it is worth mentioning the release of the very first dedicated package of this sort, namely Scanpy.
                <sup>
                    <xref ref-type="bibr" rid="ref32">32</xref>
                </sup> Scanpy basically follows the sequence of data transformation and analysis from Seurat. They both provide tutorials on the same data sources, which makes them especially attractive for use and open for cross analysis and cross validation. Hopefully Scanpy will stimulate program developers for more interesting projects in a field of single cell analysis.</p>
            <p>There is also an alternative to this, namely create specific functions, which can be recruited with regular tools and functions already available in different packages in Python. 
                <xref ref-type="table" rid="T4">Table 4</xref> shows a sketchy comparison of how minimal protocol is organized in Seurat, Scanpy and reassembled from scratch.</p>
            <table-wrap id="T4" orientation="portrait" position="float">
                <label>Table 4. </label>
                <caption>
                    <title>Steps and functions for SC-RNAseq DE analysis in Scater, Scanpy and regular Python.</title>
                </caption>
                <table content-type="article-table" frame="hsides">
                    <thead>
                        <tr>
                            <th align="left" colspan="1" rowspan="1" valign="middle">Step</th>
                            <th align="left" colspan="1" rowspan="1" valign="middle">Seurat</th>
                            <th align="left" colspan="1" rowspan="1" valign="middle">Scanpy</th>
                            <th align="left" colspan="1" rowspan="1" valign="middle">Python</th>
                        </tr>
                    </thead>
                    <tbody>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Read the data from file</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">read.csv()
                                <xref ref-type="table-fn" rid="tfn4">*</xref>
                            </td>
                            <td align="left" colspan="1" rowspan="1" valign="top">scanpy.read_csv</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">pandas.read_csv ()</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Convert to special data format</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">CreateSeuratObject()</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Already converted as AnnData</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Keep as pd. DataFrame</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Filter off outliers</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Regular R functions</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">FilterCells(), FilterGenes()</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Use general pandas functions for subsetting by threshold values</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Normalize and log-transform</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">NormalizeData()</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">normalize_total()</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">normalize from Sklearn or self-made script</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Remove invariant genes</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">FindVariableFeatures()</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">highly_variable_genes()</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Use pandas DataFrame filter by 
                                <italic toggle="yes">var</italic> value. Use VarianceThreshold() from Sklearn</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Scale gene expressions to 0-1 interval</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">ScaleData()</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">scale()</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Normalize() in Sklearn</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Run PCA, estimate significant components</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">RunPCA(), JackStraw()</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">pca()</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Sklearn PCA()</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Find or use predefined clusters</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">FindNeighbors(), FindClusters()</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Import leiden, other options possible</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Different options in Sklearn.cluster</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Run tSNE, visualize clusters</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">RunTSNE(), TSNEplot()</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Prefers UMAP (as imported package)</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">tSNE and other options in sklearn.manifold</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Perform differential expression check</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">FindMarkers(), FindAllMarkers()</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">Build in options for Wilcoxon, t-test, logistic regression</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">t-test, oneway ANOVA, Wilcoxon, Kruskal-Wallis etc. in scipy.stats, RandomForest, ADAboost in sklearn</td>
                        </tr>
                    </tbody>
                </table>
                <table-wrap-foot>
                    <fn-group content-type="footnotes">
                        <fn id="tfn4">
                            <label>*</label>
                            <p>read.csv() in Seurat used for regular table read. Read10X() is for reading matrix data format.</p>
                        </fn>
                    </fn-group>
                </table-wrap-foot>
            </table-wrap>
            <p>Currently this field is wide open for more examples of Python-base analysis for differential expression in single cells. Some simple examples can be found on GitHub as 
                <italic toggle="yes">Extended data</italic> (which should not be taken as a standard protocol for the differential expression). Researchers should not be confused by the fact that different protocols result in different lists of the differentially expressed genes. This is already described for different RNAseq protocols in R, caused for instance by differences in normalization
                <sup>
                    <xref ref-type="bibr" rid="ref26">26</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref33">33</xref>
                </sup> or other steps.
                <sup>
                    <xref ref-type="bibr" rid="ref34">34</xref>
                </sup> The differences between those protocols are acceptable since we use not identical, but only comparable, steps and functions. The major and most prominent differentially expressed genes are usually consistent and not prone to variation upon changing options within protocols or between those. In addition, the researcher can also try artificial data to check details of the protocols on reproducibility and consistency.
                <sup>
                    <xref ref-type="bibr" rid="ref35">35</xref>
                </sup>
            </p>
        </sec>
        <sec id="sec9">
            <title>Concluding remarks</title>
            <p>Even though R remains the major language for differential gene expression analysis, further rise of Python popularity in biological applications is expected in the coming years. Regarding single cell expression data, Python has broad possibilities for data analysis. Moreover, the rise and diversification of the single cell protocols will require more programming flexibility, where Python might offer more options with respect to R. This is also dependent of community efforts within Python developers. We might expect some restructuring of existing packages and emergence of specialized dedicated packages in the direction of the single cell analysis. The time is right for more efforts in Python applications. Regarding flexibility, it is essential to keep all options open for integrating functions from different existing and future packages.</p>
            <p>More active use of Python in biological studies will certainly improve transparency and reproducibility of currently used protocols for differential gene expression and beyond. It is also a satisfying feeling that biological science makes a substantial shift from descriptive empirical style into a more exact and analytical mode.</p>
        </sec>
        <sec id="sec10">
            <title>Data availability</title>
            <sec id="sec11">
                <title>Underlying data</title>
                <p>No data is associated with this article.</p>
            </sec>
            <sec id="sec12">
                <title>Extended data</title>
                <p>Extra information and example scripts are available: 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/LeonidBystrykh/PY4DE/tree/main">https://github.com/LeonidBystrykh/PY4DE/tree/main</ext-link>.</p>
                <p>Archived scripts as at time of publication: 
                    <ext-link ext-link-type="uri" xlink:href="http://doi.org/10.5281/zenodo.5044809">http://doi.org/10.5281/zenodo.5044809</ext-link>.
                    <sup>
                        <xref ref-type="bibr" rid="ref36">36</xref>
                    </sup>
                </p>
                <p>License: GPL-2</p>
            </sec>
        </sec>
    </body>
    <back>
        <ack>
            <title>Acknowledgments</title>
            <p>Many thanks to David Porubsky for thorough reading and detailed comments.</p>
        </ack>
        <ref-list>
            <title>References</title>
            <ref id="ref1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Xuan</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Yu</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Qing</surname>
                            <given-names>T</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Next-generation sequencing in the clinic: promises and challenges.</article-title>
                    <source>

                        <italic toggle="yes">Cancer Lett.</italic>
</source>
                    <year>2013</year>;<volume>340</volume>(<issue>2</issue>):<fpage>284</fpage>&#x2013;<lpage>295</lpage>.
                    <pub-id pub-id-type="pmid">23174106</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.canlet.2012.11.025</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5739311</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Carrasco-Ramiro</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Peir&#x00f3;-Pastor</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Aguado</surname>
                            <given-names>B</given-names>
                        </name>
</person-group>:
                    <article-title>Human genomics projects and precision medicine.</article-title>
                    <source>

                        <italic toggle="yes">Gene Ther.</italic>
</source>
                    <year>2017</year>;<volume>24</volume>(<issue>9</issue>):<fpage>551</fpage>&#x2013;<lpage>561</lpage>.
                    <pub-id pub-id-type="pmid">28805797</pub-id>
                    <pub-id pub-id-type="doi">10.1038/gt.2017.77</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ching</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Himmelstein</surname>
                            <given-names>DS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Beaulieu-Jones</surname>
                            <given-names>BK</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Opportunities and obstacles for deep learning in biology and medicine.</article-title>
                    <source>

                        <italic toggle="yes">J R Soc Interface.</italic>
</source>
                    <year>2018</year>;<volume>15</volume>(<issue>141</issue>).
                    <pub-id pub-id-type="pmid">29618526</pub-id>
                    <pub-id pub-id-type="doi">10.1098/rsif.2017.0387</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5938574</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bolouri</surname>
                            <given-names>H</given-names>
                        </name>
</person-group>:
                    <article-title>Modeling genomic regulatory networks with big data.</article-title>
                    <source>

                        <italic toggle="yes">Trends Genet.</italic>
</source>
                    <year>2014</year>;<volume>30</volume>(<issue>5</issue>):<fpage>182</fpage>&#x2013;<lpage>191</lpage>.
                    <pub-id pub-id-type="pmid">24630831</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.tig.2014.02.005</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Roy</surname>
                            <given-names>SS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mukherjee</surname>
                            <given-names>AK</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Chowdhury</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>Insights about genome function from spatial organization of the genome.</article-title>
                    <source>

                        <italic toggle="yes">Hum Genomics.</italic>
</source>
                    <year>2018</year>;<volume>12</volume>(<issue>1</issue>):<fpage>8</fpage>.
                    <pub-id pub-id-type="pmid">29458419</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s40246-018-0140-z</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5819253</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Huber</surname>
                            <given-names>W</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Carey</surname>
                            <given-names>VJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gentleman</surname>
                            <given-names>R</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Orchestrating high-throughput genomic analysis with Bioconductor.</article-title>
                    <source>

                        <italic toggle="yes">Nat Methods.</italic>
</source>
                    <year>2015</year>;<volume>12</volume>(<issue>2</issue>):<fpage>115</fpage>&#x2013;<lpage>121</lpage>.
                    <pub-id pub-id-type="pmid">25633503</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nmeth.3252</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4509590</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Sadowski</surname>
                            <given-names>MI</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Grant</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Fell</surname>
                            <given-names>TS</given-names>
                        </name>
</person-group>:
                    <article-title>Harnessing QbD, Programming Languages, and Automation for Reproducible Biology.</article-title>
                    <source>

                        <italic toggle="yes">Trends Biotechnol.</italic>
</source>
                    <year>2016</year>;<volume>34</volume>(<issue>3</issue>):<fpage>214</fpage>&#x2013;<lpage>227</lpage>.
                    <pub-id pub-id-type="pmid">26708960</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.tibtech.2015.11.006</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Madsen</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Goni Moreno</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Palchick</surname>
                            <given-names>Z</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Synthetic Biology Open Language Visual (SBOL Visual) Version 2.</article-title>
                    <source>

                        <italic toggle="yes">J Integr Bioinform.</italic>
</source>
                    <year>2019</year>;<volume>16</volume>(<issue>2</issue>).
                    <pub-id pub-id-type="pmid">31199768</pub-id>
                    <pub-id pub-id-type="doi">10.1515/jib-2018-0101</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6798824</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>K&#x00f6;ster</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Rahmann</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>Snakemake&#x2013;a scalable bioinformatics workflow engine.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2012</year>;<volume>28</volume>(<issue>19</issue>):<fpage>2520</fpage>&#x2013;<lpage>2522</lpage>.
                    <pub-id pub-id-type="pmid">22908215</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/bts480</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Gr&#x00fc;ning</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Dale</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sj&#x00f6;din</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Bioconda: sustainable and comprehensive software distribution for the life sciences.</article-title>
                    <source>

                        <italic toggle="yes">Nat Methods.</italic>
</source>
                    <year>2018</year>;<volume>15</volume>(<issue>7</issue>):<fpage>475</fpage>&#x2013;<lpage>476</lpage>.
                    <pub-id pub-id-type="pmid">29967506</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41592-018-0046-7</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zhang</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Szustakowski</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Schinke</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>Bioinformatics analysis of microarray data.</article-title>
                    <source>

                        <italic toggle="yes">Methods Mol Biol.</italic>
</source>
                    <year>2009</year>;<volume>573</volume>:<fpage>259</fpage>&#x2013;<lpage>284</lpage>.
                    <pub-id pub-id-type="pmid">19763933</pub-id>
                    <pub-id pub-id-type="doi">10.1007/978-1-60761-247-6_15</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Fourment</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gillings</surname>
                            <given-names>MR</given-names>
                        </name>
</person-group>:
                    <article-title>A comparison of common programming languages used in bioinformatics.</article-title>
                    <source>

                        <italic toggle="yes">BMC Bioinformatics.</italic>
</source>
                    <year>2008</year>;<volume>9</volume>:<fpage>82</fpage>.
                    <pub-id pub-id-type="pmid">18251993</pub-id>
                    <pub-id pub-id-type="doi">10.1186/1471-2105-9-82</pub-id>
                    <pub-id pub-id-type="pmcid">PMC2267699</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref13">
                <label>13</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bolstad</surname>
                            <given-names>BM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Irizarry</surname>
                            <given-names>RA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Astrand</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A comparison of normalization methods for high density oligonucleotide array data based on variance and bias.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2003</year>;<volume>19</volume>(<issue>2</issue>):<fpage>185</fpage>&#x2013;<lpage>193</lpage>.
                    <pub-id pub-id-type="pmid">12538238</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/19.2.185</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref14">
                <label>14</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Irizarry</surname>
                            <given-names>RA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hobbs</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Collin</surname>
                            <given-names>F</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Exploration, normalization, and summaries of high density oligonucleotide array probe level data.</article-title>
                    <source>

                        <italic toggle="yes">Biostatistics.</italic>
</source>
                    <year>2003</year>;<volume>4</volume>(<issue>2</issue>):<fpage>249</fpage>&#x2013;<lpage>264</lpage>.
                    <pub-id pub-id-type="pmid">12925520</pub-id>
                    <pub-id pub-id-type="doi">10.1093/biostatistics/4.2.249</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref15">
                <label>15</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bolstad</surname>
                            <given-names>B</given-names>
                        </name>
</person-group>:
                    <source>

                        <italic toggle="yes">Bmbolstad/PreprocessCore.</italic>
</source>
                    <year>2021</year>;<volume>19</volume>:<fpage>2021</fpage>. 
Accessed April 2021.
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/bmbolstad/preprocessCore">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref16">
                <label>16</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Smyth</surname>
                            <given-names>GK</given-names>
                        </name>
</person-group>:
                    <article-title>Linear models and empirical bayes methods for assessing differential expression in microarray experiments.</article-title>
                    <source>

                        <italic toggle="yes">Stat Appl Genet Mol Biol.</italic>
</source>
                    <year>2004</year>;<volume>3</volume>:<fpage>Article3</fpage>.
                    <pub-id pub-id-type="pmid">16646809</pub-id>
                    <pub-id pub-id-type="doi">10.2202/1544-6115.1027</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref17">
                <label>17</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wettenhall</surname>
                            <given-names>JM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Smyth</surname>
                            <given-names>GK</given-names>
                        </name>
</person-group>:
                    <article-title>limmaGUI: A graphical user interface for linear modeling of microarray data.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2004</year>;<volume>20</volume>(<issue>18</issue>):<fpage>3705</fpage>&#x2013;<lpage>3706</lpage>.
                    <pub-id pub-id-type="pmid">15297296</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/bth449</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref18">
                <label>18</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ritchie</surname>
                            <given-names>ME</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Phipson</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wu</surname>
                            <given-names>D</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>limma powers differential expression analyses for RNA-sequencing and microarray studies.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2015</year>;<volume>43</volume>(<issue>7</issue>):<fpage>e47</fpage>.
                    <pub-id pub-id-type="pmid">15297296</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/bth449</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref19">
                <label>19</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Robinson</surname>
                            <given-names>MD</given-names>
                        </name>

                        <name name-style="western">
                            <surname>McCarthy</surname>
                            <given-names>DJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Smyth</surname>
                            <given-names>GK</given-names>
                        </name>
</person-group>:
                    <article-title>edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2010</year>;<volume>26</volume>(<issue>1</issue>):<fpage>139</fpage>&#x2013;<lpage>140</lpage>.
                    <pub-id pub-id-type="pmid">19910308</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btp616</pub-id>
                    <pub-id pub-id-type="pmcid">PMC2796818</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref20">
                <label>20</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Love</surname>
                            <given-names>MI</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Huber</surname>
                            <given-names>W</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Anders</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.</article-title>
                    <source>

                        <italic toggle="yes">Genome Biol.</italic>
</source>
                    <year>2014</year>;<volume>15</volume>(<issue>12</issue>):<fpage>550</fpage>.
                    <pub-id pub-id-type="doi">10.1186/s13059-014-0550-8</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref21">
                <label>21</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Gautier</surname>
                            <given-names>L</given-names>
                        </name>
</person-group>:
                    <article-title>An intuitive Python interface for Bioconductor libraries demonstrates the utility of language translators.</article-title>
                    <source>

                        <italic toggle="yes">BMC Bioinformatics.</italic>
</source>
                    <year>2010</year>;<volume>11</volume>(<issue>12</issue>):<fpage>S11</fpage>.
                    <pub-id pub-id-type="pmid">21210978</pub-id>
                    <pub-id pub-id-type="doi">10.1186/1471-2105-11-S12-S11</pub-id>
                    <pub-id pub-id-type="pmcid">PMC3040525</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref22">
                <label>22</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Clark</surname>
                            <given-names>NR</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hu</surname>
                            <given-names>KS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Feldmann</surname>
                            <given-names>AS</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The characteristic direction: a geometrical approach to identify differentially expressed genes.</article-title>
                    <source>

                        <italic toggle="yes">BMC Bioinformatics.</italic>
</source>
                    <year>2014</year>;<volume>15</volume>(<issue>1</issue>):<fpage>79</fpage>.
                    <pub-id pub-id-type="pmid">24650281</pub-id>
                    <pub-id pub-id-type="doi">10.1186/1471-2105-15-79</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4000056</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref23">
                <label>23</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>Z</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ma&#x2019;ayan</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>An open RNA-Seq data analysis pipeline tutorial with an example of reprocessing data from a recent Zika virus study.</article-title>
                    <source>

                        <italic toggle="yes">F1000Res.</italic>
</source>
                    <year>2016</year>;<volume>5</volume>:<fpage>1574</fpage>.
                    <pub-id pub-id-type="pmid">27583132</pub-id>
                    <pub-id pub-id-type="doi">10.12688/f1000research.9110.1</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4972086</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref24">
                <label>24</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Tambonis</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Boareto</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Leite</surname>
                            <given-names>VBP</given-names>
                        </name>
</person-group>:
                    <article-title>Differential Expression Analysis in RNA-seq Data Using a Geometric Approach.</article-title>
                    <source>

                        <italic toggle="yes">J Comput Biol.</italic>
</source>
                    <year>2018</year>;<volume>25</volume>(<issue>11</issue>):<fpage>1257</fpage>&#x2013;<lpage>1265</lpage>.
                    <pub-id pub-id-type="pmid">30133310</pub-id>
                    <pub-id pub-id-type="doi">10.1089/cmb.2017.0244</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref25">
                <label>25</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Barrett</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wilhite</surname>
                            <given-names>SE</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ledoux</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>NCBI GEO: archive for functional genomics data sets&#x2013;update.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2013</year>;<volume>41</volume>(<issue>Database issue</issue>):<fpage>D991</fpage>&#x2013;<lpage>D995</lpage>.
                    <pub-id pub-id-type="pmid">23193258</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gks1193</pub-id>
                    <pub-id pub-id-type="pmcid">PMC3531084</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref26">
                <label>26</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Maza</surname>
                            <given-names>E</given-names>
                        </name>
</person-group>:
                    <article-title>In Papyro Comparison of TMM (edgeR), RLE (DESeq2), and MRN Normalization Methods for a Simple Two-Conditions-Without-Replicates RNA-Seq Experimental Design.</article-title>
                    <source>

                        <italic toggle="yes">Front Genet.</italic>
</source>
                    <year>2016</year>;<volume>7</volume>:<fpage>164</fpage>.
                    <pub-id pub-id-type="pmid">27695478</pub-id>
                    <pub-id pub-id-type="doi">10.3389/fgene.2016.00164</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5025571</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref27">
                <label>27</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>McCarthy</surname>
                            <given-names>DJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Campbell</surname>
                            <given-names>KR</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lun</surname>
                            <given-names>ATL</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2017</year>;<volume>33</volume>(<issue>8</issue>):<fpage>1179</fpage>&#x2013;<lpage>1186</lpage>.
                    <pub-id pub-id-type="pmid">28088763</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btw777</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5408845</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref28">
                <label>28</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Lun</surname>
                            <given-names>ATL</given-names>
                        </name>

                        <name name-style="western">
                            <surname>McCarthy</surname>
                            <given-names>DJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Marioni</surname>
                            <given-names>JC</given-names>
                        </name>
</person-group>:
                    <article-title>A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor.</article-title>
                    <source>

                        <italic toggle="yes">F1000Res.</italic>
</source>
                    <year>2016</year>;<volume>5</volume>:<fpage>2122</fpage>.
                    <pub-id pub-id-type="pmid">27909575</pub-id>
                    <pub-id pub-id-type="doi">10.12688/f1000research.9501.2</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5112579</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref29">
                <label>29</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Satija</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Farrell</surname>
                            <given-names>JA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gennert</surname>
                            <given-names>D</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Spatial reconstruction of single-cell gene expression data.</article-title>
                    <source>

                        <italic toggle="yes">Nat Biotechnol.</italic>
</source>
                    <year>2015</year>;<volume>33</volume>(<issue>5</issue>):<fpage>495</fpage>&#x2013;<lpage>502</lpage>.
                    <pub-id pub-id-type="pmid">25867923</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nbt.3192</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4430369</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref30">
                <label>30</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hao</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hao</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Andersen-Nissen</surname>
                            <given-names>E</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Integrated analysis of multimodal single-cell data.</article-title>
                    <source>

                        <italic toggle="yes">bioRxiv.</italic>
</source>
                    <year>October 12, 2020</year>:<fpage>2020.10.12.335331</fpage>.
                    <pub-id pub-id-type="doi">10.1101/2020.10.12.335331</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref31">
                <label>31</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Amezquita</surname>
                            <given-names>RA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lun</surname>
                            <given-names>ATL</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Becht</surname>
                            <given-names>E</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Orchestrating single-cell analysis with Bioconductor.</article-title>
                    <source>

                        <italic toggle="yes">Nat Methods.</italic>
</source>
                    <year>2020</year>;<volume>17</volume>(<issue>2</issue>):<fpage>137</fpage>&#x2013;<lpage>145</lpage>.
                    <pub-id pub-id-type="pmid">31792435</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41592-019-0654-x</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7358058</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref32">
                <label>32</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wolf</surname>
                            <given-names>FA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Angerer</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Theis</surname>
                            <given-names>FJ</given-names>
                        </name>
</person-group>:
                    <article-title>SCANPY: large-scale single-cell gene expression data analysis.</article-title>
                    <source>

                        <italic toggle="yes">Genome Biol.</italic>
</source>
                    <year>2018</year>;<volume>19</volume>(<issue>1</issue>):<fpage>15</fpage>.
                    <pub-id pub-id-type="pmid">29409532</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13059-017-1382-0</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5802054</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref33">
                <label>33</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zyprych-Walczak</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Szabelska</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Handschuh</surname>
                            <given-names>L</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The Impact of Normalization Methods on RNA-Seq Data Analysis.</article-title>
                    <source>

                        <italic toggle="yes">Biomed Res Int.</italic>
</source>
                    <year>2015</year>;<volume>2015</volume>:<fpage>621690</fpage>.
                    <pub-id pub-id-type="pmid">26176014</pub-id>
                    <pub-id pub-id-type="doi">10.1155/2015/621690</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4484837</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref34">
                <label>34</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Schurch</surname>
                            <given-names>NJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Schofield</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gierli&#x0144;ski</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use?</article-title>
                    <source>

                        <italic toggle="yes">RNA.</italic>
</source>
                    <year>2016</year>;<volume>22</volume>(<issue>6</issue>):<fpage>839</fpage>&#x2013;<lpage>851</lpage>.
                    <pub-id pub-id-type="pmid">27022035</pub-id>
                    <pub-id pub-id-type="doi">10.1261/rna.053959.115</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4878611</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref35">
                <label>35</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Rigaill</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Balzergue</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Brunaud</surname>
                            <given-names>V</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Synthetic data sets for the identification of key ingredients for RNA-seq differential analysis.</article-title>
                    <source>

                        <italic toggle="yes">Brief Bioinform.</italic>
</source>
                    <year>2018</year>;<volume>19</volume>(<issue>1</issue>):<fpage>65</fpage>&#x2013;<lpage>76</lpage>.
                    <pub-id pub-id-type="pmid">27742662</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bib/bbw092</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref36">
                <label>36</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bystrykh</surname>
                            <given-names>L</given-names>
                        </name>
</person-group>:
                    <article-title>LeonidBystrykh/PY4GE: Python for gene expression (Version v0.0.1).</article-title>
                    <source>

                        <italic toggle="yes">Zenodo.</italic>
</source>
                    <year>2021, June 30</year>.
                    <pub-id pub-id-type="doi">10.5281/zenodo.5044809</pub-id>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report136994">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.57265.r136994</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Hasija</surname>
                        <given-names>Yasha</given-names>
                    </name>
                    <xref ref-type="aff" rid="r136994a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-0116-0711</uri>
                </contrib>
                <aff id="r136994a1">
                    <label>1</label>Delhi Technological University, Delhi, India</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>24</day>
                <month>5</month>
                <year>2022</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2022 Hasija Y</copyright-statement>
                <copyright-year>2022</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport136994" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.53842.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The article "Python for gene expression" discusses the applicability of Python and R for gene expression data analysis. Beginning with a brief history of several programming languages and their compatibility with biological problems/data, the article then discusses their compatibility with biological problems/data. The authors then describe the advantages of R packages for the processing and statistical analysis of big expression data, as well as their replacement&#x00a0;in&#x00a0;Python. The article concludes that the Python programming language has wide use in biological data processing processes and that the scientific community should consider adopting it.</p>
            <p> </p>
            <p> The piece is well-written and effectively conveys its intended message. A few of my recommendations are: 
                <list list-type="bullet">
                    <list-item>
                        <p>The sections on microarray data, RNAseq data, and SC-RNAseq data analysis describe the application of R packages and the limitations of Python due to the absence of a few libraries. It would be interesting to list a few advantages of using Python for bulk data processing.</p>
                    </list-item>
                    <list-item>
                        <p>Also, advantages of Python over R in terms of automation, integration, and application development can be included.</p>
                    </list-item>
                </list>
            </p>
            <p>Is the topic of the opinion article discussed accurately in the context of the current literature?</p>
            <p>Yes</p>
            <p>Are arguments sufficiently supported by evidence from the published literature?</p>
            <p>Yes</p>
            <p>Are all factual statements correct and adequately supported by citations?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn balanced and justified on the basis of the presented arguments?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Bioinformatics, Machine Learning, Polymorphisms</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report136995">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.57265.r136995</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Peignier</surname>
                        <given-names>Sergio</given-names>
                    </name>
                    <xref ref-type="aff" rid="r136995a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-9004-3033</uri>
                </contrib>
                <aff id="r136995a1">
                    <label>1</label>Universit&#x00e9; de Lyon, INSA Lyon, INRA, Villeurbanne, France</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>17</day>
                <month>5</month>
                <year>2022</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2022 Peignier S</copyright-statement>
                <copyright-year>2022</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport136995" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.53842.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>Dear Leonid Bystrykh,</p>
            <p> </p>
            <p> The opinion article "Python for gene expression" is well written, and clear, it provides an interesting historical and contextual description and explanation for the dominance of R in differential gene expression analysis, and it also clearly points the interest and benefits of developing python projects dedicated to differential gene expression analysis.</p>
            <p> </p>
            <p> I hope the following remarks will be useful to improve this interesting paper.</p>
            <p> </p>
            <p> Kind regards,</p>
            <p> </p>
            <p> Sergio Peignier 
                <list list-type="bullet">
                    <list-item>
                        <p>Maybe you can show from the title that the paper is mostly oriented towards differential gene expression analysis (e.g., "Python for differential gene expression").</p>
                    </list-item>
                    <list-item>
                        <p>In Table 1 you can replace the column "Name" by "Main library" or something like this to be more explicit.</p>
                    </list-item>
                    <list-item>
                        <p>"such as R and Python.
                            <sup>
                                <ext-link ext-link-type="uri" xlink:href="https://f1000research.com/articles/10-870/v1#ref6">6</ext-link>-
                                <ext-link ext-link-type="uri" xlink:href="https://f1000research.com/articles/10-870/v1#ref8">8</ext-link>
                            </sup>" &lt;- maybe keeping citations for R and Python separated will give a better insight to the reader.</p>
                    </list-item>
                    <list-item>
                        <p>"approaching 400 (compiled by Wikipedia)" &lt;- consider adding a citation.</p>
                    </list-item>
                    <list-item>
                        <p>"decline in popularity (as for instance recorded in codementor.io site for the worst programming languages)" &lt;- consider adding a citation.</p>
                    </list-item>
                    <list-item>
                        <p>"in e establishing" &lt;- "in establishing".</p>
                    </list-item>
                    <list-item>
                        <p>"One is using a linear model [...] the expression array." &lt;- consider adding a citation to the paper.</p>
                    </list-item>
                    <list-item>
                        <p>"Potentially, there is a possibility to wrap R-functions from any R package into Python." &lt;- there are some DEseq2 wrapped versions available e.g.,&#x00a0;
                            <ext-link ext-link-type="uri" xlink:href="https://hal.archives-ouvertes.fr/hal-02863880/document">GReNaDIne: Data-Driven Approaches to Infer Gene Regulatory Networks in Python</ext-link> (
                            <ext-link ext-link-type="uri" xlink:href="https://gitlab.com/bf2i/grenadine">gitlab link</ext-link>).</p>
                    </list-item>
                    <list-item>
                        <p>"peculiar options available in Python" &lt;- Maybe&#x00a0; "specific" instead of " "peculiar"?</p>
                    </list-item>
                    <list-item>
                        <p>"rehearses a fold change statistics " &lt;- "rehearses fold change statistics".</p>
                    </list-item>
                    <list-item>
                        <p>The following sentence could be clarified and a justification or citation to support it could be incorporated: "approach rehearses a fold change statistics rather than eBayes probability approach and thus is not recommended."</p>
                    </list-item>
                    <list-item>
                        <p>Regarding the comparison between Python and other languages,</p>
                        <p> </p>
                        <p> "This essay was written sometime in 1997. It shows its age. It is retained here merely as a historical artifact. (
                            <ext-link ext-link-type="uri" xlink:href="https://www.python.org/doc/essays/comparisons/">https://www.Python.org/doc/essays/comparisons/</ext-link>)"&#x00a0; the website that was cited by the author states: "
                            <bold>Disclaimer:</bold>&#x00a0;This essay was written sometime in 1997. It shows its age. It is retained here merely as a historical artifact.", so a more recent citation could be included instead. Moreover, I think that the comparison between python and R could be extended, in order to better support the idea that developing such a research field in Python would be valuable.</p>
                    </list-item>
                    <list-item>
                        <p>Include citations for&#x00a0;SciPy, Numpy, Scikit-learn,&#x00a0;Pandas, statmodels,&#x00a0;Matplotlib, bokeh and seaborn.</p>
                    </list-item>
                    <list-item>
                        <p>"does not support Pandas" &lt;- I would replace by "does not fully support Pandas" since some operations can be executed on pandas DataFrames, but the output is always a numpy array.&#x00a0;</p>
                    </list-item>
                    <list-item>
                        <p>There are also classical methods for RNAseq normalization such as TPM, RPKM, that are not mentioned in the article, what is the place of such techniques in this context?.</p>
                    </list-item>
                    <list-item>
                        <p>"Unlike other differential expression protocols, SC-RNAseq is aimed on characterization of cells, not genes, and possible discovery and/or classification of existing cell types" &lt;- these datasets can also be used to study genes, and specially to infer Gene Regulatory Networks
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-136995-1">1</xref>
                            </sup>.</p>
                    </list-item>
                    <list-item>
                        <p>SC-RNAseq also incurs in a missing values problem, that should be addressed by some pre-processing techniques, it could be interesting to discuss this problem.&#x00a0;</p>
                    </list-item>
                    <list-item>
                        <p>Maybe you can try to include a few citations to new python programs dedicated to the analysis of gene expression, to support the idea that there is a community in computational biology and bioinformatics that is working in python.&#x00a0;</p>
                    </list-item>
                    <list-item>
                        <p>The test scripts that are associated to the paper could be transformed into small tutorials, and could be very beneficial for the community.</p>
                    </list-item>
                </list>
            </p>
            <p>Is the topic of the opinion article discussed accurately in the context of the current literature?</p>
            <p>Yes</p>
            <p>Are arguments sufficiently supported by evidence from the published literature?</p>
            <p>Yes</p>
            <p>Are all factual statements correct and adequately supported by citations?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn balanced and justified on the basis of the presented arguments?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Gene regulatory networks inference, gene expression analysis, hyperspectral image analysis, Subspace clustering</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
        <back>
            <ref-list>
                <title>References</title>
                <ref id="rep-ref-136995-1">
                    <label>1</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>A scalable SCENIC workflow for single-cell gene regulatory network analysis.</article-title>
                        <source>
                            <italic>Nat Protoc</italic>
                        </source>.<volume>15</volume>(<issue>7</issue>) :
                        <elocation-id>10.1038/s41596-020-0336-2</elocation-id>
                        <fpage>2247</fpage>-<lpage>2276</lpage>
                        <pub-id pub-id-type="pmid">32561888</pub-id>
                        <pub-id pub-id-type="doi">10.1038/s41596-020-0336-2</pub-id>
                    </mixed-citation>
                </ref>
            </ref-list>
        </back>
    </sub-article>
</article>
