<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.129929.1</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Research Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>Disease association and comparative genomics of compositional bias in human proteins</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 1; peer review: 1 approved, 1 approved with reservations]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Kouros</surname>
                        <given-names>Christos E.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <uri content-type="orcid">https://orcid.org/0009-0005-8678-0451</uri>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Makri</surname>
                        <given-names>Vasiliki</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Ouzounis</surname>
                        <given-names>Christos A.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Funding Acquisition</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-0086-8657</uri>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Chasapi</surname>
                        <given-names>Anastasia</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Funding Acquisition</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-1986-5007</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>BCCB-AIIA, School of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece</aff>
                <aff id="a2">
                    <label>2</label>BCPL, Chemical Process &amp; Energy Resources Institute, Centre for Research &amp; Technology Hellas (CERTH), Thessaloniki, Greece</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:chasapia@gmail.com">chasapia@gmail.com</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>20</day>
                <month>2</month>
                <year>2023</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2023</year>
            </pub-date>
            <volume>12</volume>
            <elocation-id>198</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>2</day>
                    <month>2</month>
                    <year>2023</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2023 Kouros CE et al.</copyright-statement>
                <copyright-year>2023</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/12-198/pdf"/>
            <abstract>
                <p>
                    <bold>Background:</bold> The evolutionary rate of disordered proteins varies greatly due to the lack of structural constraints. So far, few studies have investigated the presence/absence patterns of intrinsically disordered regions (IDRs) across phylogenies in conjunction with human disease. In this study, we report a genome-wide analysis of compositional bias association with disease in human proteins and their taxonomic distribution.</p>
                <p>
                    <bold>Methods:</bold> The human genome protein set provided by the Ensembl database was annotated and analysed with respect to both disease associations and the detection of compositional bias. The Uniprot Reference Proteome dataset, containing 11297 proteomes was used as target dataset for the comparative genomics of a well-defined subset of the Human Genome, including 100 characteristic, compositionally biased proteins, some linked to disease.</p>
                <p>
                    <bold>Results:</bold> Cross-evaluation of compositional bias and disease-association in the human genome reveals a significant bias towards low complexity regions in disease-associated genes, with charged, hydrophilic amino acids appearing as over-represented. The phylogenetic profiling of 17 disease-associated, low complexity proteins across 11297 proteomes captures characteristic taxonomic distribution patterns.</p>
                <p>
                    <bold>Conclusions:</bold> This is the first time that a combined genome-wide analysis of low complexity, disease-association and taxonomic distribution of human proteins is reported, covering structural, functional, and evolutionary properties. The reported framework can form the basis for large-scale, follow-up projects, encompassing the entire human genome and all known gene-disease associations.</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>disease-associated gene</kwd>
                <kwd>low complexity</kwd>
                <kwd>compositional bias</kwd>
                <kwd>intrinsically disordered protein (IDP)</kwd>
                <kwd>intrinsically disordered region (IDR)</kwd>
                <kwd>phylogenetic profile</kwd>
                <kwd>human genome</kwd>
                <kwd>human disease</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1" xlink:href="http://dx.doi.org/10.13039/501100004895">
                    <funding-source>European Social Fund</funding-source>
                    <award-id>MIS-5033021</award-id>
                </award-group>
                <award-group id="fund-2" xlink:href="http://dx.doi.org/10.13039/501100008530">
                    <funding-source>European Regional Development Fund</funding-source>
                    <award-id>MIS5002780</award-id>
                    <award-id>NSRF2014-2020</award-id>
                </award-group>
                <funding-statement>This research was co-financed by Greece and the European Union (European Social Fund-ESF) through the Operational Programme &#x00ab;Human Resources Development, Education and Lifelong Learning&#x00bb; in the context of the project &#x201c;Reinforcement of Postdoctoral Researchers - 2nd Cycle&#x201d; (MIS-5033021), implemented by the State Scholarships Foundation (IKY). The work was also supported by Elixir-GR (grant # MIS 5002780), implemented under the Action &#x201c;Reinforcement of the Research &amp; Innovation Infrastructure,&#x201d; funded by the Operational Program Competitiveness, Entrepreneurship, &amp; Innovation (NSRF 2014-2020) and co-financed by Greece and the European Union (European Regional Development Fund).</funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        <sec id="sec1" sec-type="intro">
            <title>Introduction</title>
            <sec id="sec2">
                <title>Disordered proteins exhibit specific patterns at the sequence level</title>
                <p>The classical view that protein function requires a defined three-dimensional (3D) structure has been challenged by recent developments where many proteins and protein regions are shown to perform distinct biological functions, despite their propensity for disordered conformations. These 
                    <ext-link ext-link-type="uri" xlink:href="https://www.sciencedirect.com/topics/biochemistry-genetics-and-molecular-biology/intrinsically-disordered-proteins">intrinsically disordered proteins</ext-link> (IDPs) and intrinsically disordered protein regions (IDRs) are defined as lacking a precise 3D folding pattern. The difference between ordered and disordered proteins is already reflected at the primary structure level, with IDPs being characterized by regions typically enriched in specific amino acids, resulting in an overall low sequence complexity. Specifically, IDPs and IDRs contain substantially fewer residues that promote order (typically C, F, I, L, N, V, W and Y) and are substantially enriched in residues that promote disorder (typically A, E, G, K, P, Q, R, and S) (
                    <xref ref-type="bibr" rid="ref57">Williams 
                        <italic toggle="yes">et al.</italic>, 2000</xref>; 
                    <xref ref-type="bibr" rid="ref16">Dunker 
                        <italic toggle="yes">et al.</italic>, 2001</xref>; 
                    <xref ref-type="bibr" rid="ref18">Harbi 
                        <italic toggle="yes">et al.</italic>, 2011</xref>).</p>
            </sec>
            <sec id="sec3">
                <title>IDP/IDR identification</title>
                <p>With the increasing number of predicted and experimentally validated IDPs and proteins containing IDRs, disordered proteins and regions are no longer considered as exceptions, but rather the object of extensive study with regard to their structure and function. A wide range of disorder predictors has been successfully developed over the past years, adopting different approaches such as Compositional Bias Detection (
                    <xref ref-type="bibr" rid="ref19">Harrison, 2021</xref>; 
                    <xref ref-type="bibr" rid="ref41">Promponas 
                        <italic toggle="yes">et al.</italic>, 2000</xref>; 
                    <xref ref-type="bibr" rid="ref58">Wootton &amp; Federhen, 1993</xref>), residual energy-based disorder prediction (
                    <xref ref-type="bibr" rid="ref15">Dosztanyi 
                        <italic toggle="yes">et al.</italic>, 2009</xref>, 
                    <xref ref-type="bibr" rid="ref14">2005</xref>) and others (
                    <xref ref-type="bibr" rid="ref28">Linding 
                        <italic toggle="yes">et al.</italic>, 2003</xref>; 
                    <xref ref-type="bibr" rid="ref49">Tang 
                        <italic toggle="yes">et al.</italic>, 2021</xref>; 
                    <xref ref-type="bibr" rid="ref54">Walsh 
                        <italic toggle="yes">et al.</italic>, 2012</xref>; 
                    <xref ref-type="bibr" rid="ref55">Wang 
                        <italic toggle="yes">et al.</italic>, 2016</xref>; 
                    <xref ref-type="bibr" rid="ref63">Zhang 
                        <italic toggle="yes">et al.</italic>, 2012</xref>). Integrative tools have made their appearance, such as MobiDB-lite (
                    <xref ref-type="bibr" rid="ref34">Necci 
                        <italic toggle="yes">et al.</italic>, 2017</xref>), a data fusion tool making use of eight distinct predictors. The prediction accuracy of such tools varies greatly, with deep learning-based methods typically outperforming methods based on physicochemical characteristics (
                    <xref ref-type="bibr" rid="ref8">CAID Predictors 
                        <italic toggle="yes">et al.</italic>, 2021</xref>). DisProt, a manually curated, dedicated database for IDPs (
                    <xref ref-type="bibr" rid="ref45">Sickmeier 
                        <italic toggle="yes">et al.</italic>, 2007</xref>) has developed into the main resource for IDP/IDR information (
                    <xref ref-type="bibr" rid="ref20">Hatos 
                        <italic toggle="yes">et al.</italic>, 2019</xref>; 
                    <xref ref-type="bibr" rid="ref43">Quaglia 
                        <italic toggle="yes">et al.</italic>, 2022</xref>).</p>
            </sec>
            <sec id="sec4">
                <title>IDPs, phylogeny and disease</title>
                <p>Multiple computational and experimental analyses of a wide range of species at the genome level have established widespread presence of intrinsic disorder across the tree of life (
                    <xref ref-type="bibr" rid="ref20">Hatos 
                        <italic toggle="yes">et al.</italic>, 2019</xref>; 
                    <xref ref-type="bibr" rid="ref36">Ntountoumi 
                        <italic toggle="yes">et al.</italic>, 2019</xref>; 
                    <xref ref-type="bibr" rid="ref39">Peng 
                        <italic toggle="yes">et al.</italic>, 2015</xref>; 
                    <xref ref-type="bibr" rid="ref56">Ward 
                        <italic toggle="yes">et al.</italic>, 2004</xref>). In fact, proteins at all taxonomic levels, including viruses, exhibit noticeable intrinsic disorder that apparently increases with organism complexity. Disorder presence is particularly prominent in eukaryotes, in which at least half of their genome-encoded proteins possess long IDRs (
                    <xref ref-type="bibr" rid="ref1">Ahrens 
                        <italic toggle="yes">et al.</italic>, 2017</xref>; 
                    <xref ref-type="bibr" rid="ref2">Basile 
                        <italic toggle="yes">et al.</italic>, 2019</xref>; 
                    <xref ref-type="bibr" rid="ref39">Peng 
                        <italic toggle="yes">et al.</italic>, 2015</xref>; 
                    <xref ref-type="bibr" rid="ref56">Ward 
                        <italic toggle="yes">et al.</italic>, 2004</xref>; 
                    <xref ref-type="bibr" rid="ref60">Xue 
                        <italic toggle="yes">et al.</italic>, 2012</xref>). This high prevalence of IDPs and IDRs in eukaryotes indicates that key functions, such as cell signalling and regulation, are transiently associated with intrinsic disorder in nucleated cells (
                    <xref ref-type="bibr" rid="ref7">B&#x00fc;rgi 
                        <italic toggle="yes">et al.</italic>, 2016</xref>; 
                    <xref ref-type="bibr" rid="ref50">Tantos 
                        <italic toggle="yes">et al.</italic>, 2012</xref>).</p>
                <p>The same trend holds for an ever-increasing emergence of disease-associated genes in more recent speciation events (
                    <xref ref-type="bibr" rid="ref13">Dickerson &amp; Robertson, 2012</xref>; 
                    <xref ref-type="bibr" rid="ref29">Lopez-Bigas &amp; Ouzounis, 2004</xref>), raising the question whether specific residues can be directly implicated in particular diseases. A correlation between intrinsic disorder and various human diseases such as cancer, diabetes, amyloidosis, and neurodegenerative diseases has already been established in specific cases (
                    <xref ref-type="bibr" rid="ref11">Choudhary 
                        <italic toggle="yes">et al.</italic>, 2022</xref>; 
                    <xref ref-type="bibr" rid="ref32">Monti 
                        <italic toggle="yes">et al.</italic>, 2021</xref>, 
                    <xref ref-type="bibr" rid="ref33">2022</xref>), and is emerging as a significant biomedical research endeavour.</p>
                <p>Due to a lack of structural constraints, the evolutionary rate of disordered proteins varies, with some IDPs/IDRs being highly conserved while others appearing particularly diversified (
                    <xref ref-type="bibr" rid="ref4">Brown 
                        <italic toggle="yes">et al.</italic>, 2011</xref>; 
                    <xref ref-type="bibr" rid="ref26">Khan 
                        <italic toggle="yes">et al.</italic>, 2015</xref>; 
                    <xref ref-type="bibr" rid="ref59">Xue 
                        <italic toggle="yes">et al.</italic>, 2013</xref>). So far, few studies have investigated the phylogenetic profiling of IDRs in conjunction with human disease (
                    <xref ref-type="bibr" rid="ref38">Pajkos 
                        <italic toggle="yes">et al.</italic>, 2020</xref>). To assess this hypothesis, we use a curated list of 100 annotated proteins from the human genome with well-characterised compositionally biased regions (CBRs) (
                    <xref ref-type="bibr" rid="ref31">Mier 
                        <italic toggle="yes">et al.</italic>, 2020</xref>), as a first step for the comparative genomics of compositionally biased genes, some of which are in fact disease associated. We identify those instances known to be linked with human disease and assess their phylogenetic depth. This framework, with human queries against multiple species, forms the basis for follow-up, large-scale studies that would encompass the entire human genome and all known gene-disease associations.</p>
            </sec>
        </sec>
        <sec id="sec5" sec-type="methods">
            <title>Methods</title>
            <sec id="sec6">
                <title>Data compilation</title>
                <p>The Human Genome protein set recorded in the 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ensembl.org/index.html">Ensembl</ext-link> database (GRCh38.p13) was retrieved, containing ~119K gene transcripts to be used as reference (
                    <xref ref-type="bibr" rid="ref61">Yates 
                        <italic toggle="yes">et al.</italic>, 2016</xref>).</p>
                <p>For the disease mapping on gene transcripts, the 
                    <ext-link ext-link-type="uri" xlink:href="https://diseases.jensenlab.org/Search">DISEASES</ext-link> database was chosen, which integrates disease-gene associations derived from text mining, as well as manually curated disease&#x2013;gene associations, cancer mutation data, and genome-wide association studies from existing databases (
                    <xref ref-type="bibr" rid="ref40">Pletscher-Frankild 
                        <italic toggle="yes">et al.</italic>, 2015</xref>). Specifically, the &#x201c;Knowledge channel&#x201d; was selected, containing manually curated associations from GHR (
                    <xref ref-type="bibr" rid="ref27">Koos &amp; Bassett, 2018</xref>) and UniProtKB (
                    <xref ref-type="bibr" rid="ref51">The UniProt Consortium 
                        <italic toggle="yes">et al.</italic>, 2022</xref>), a total of 7269 disease-gene, high-confidence associations.</p>
                <p>Disease associations are provided with the use of 
                    <ext-link ext-link-type="uri" xlink:href="https://disease-ontology.org/">Disease Ontology identifiers</ext-link> (DOID) (
                    <xref ref-type="bibr" rid="ref44">Schriml 
                        <italic toggle="yes">et al.</italic>, 2019</xref>). For each entry of the Ensembl dataset, DOIDs were mapped from the DISEASES knowledge channel dataset and added to the header description of the corresponding gene transcript.</p>
                <p>For phylogenetic analysis, the 
                    <ext-link ext-link-type="uri" xlink:href="https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/reference_proteomes/">Uniprot Reference Proteome</ext-link> (URP) dataset was selected, containing a total of 11297 proteomes, excluding viruses. The URP set has been selected manually and algorithmically among all proteomes, to provide broad coverage of the tree of life, representing the taxonomic diversity found within UniProtKB and including the proteomes of well-studied model organisms and other proteomes of interest for biomedical research (
                    <xref ref-type="bibr" rid="ref10">Chen 
                        <italic toggle="yes">et al.</italic>, 2011</xref>). Specifically, the URP (version: Reference_Proteomes_2022_04) contains 349 Archaeal, 8763 Bacterial and 2185 Eukaryotic proteomes.</p>
                <p>The low complexity query set investigated both for its disease association and phylogenetic depth was previously recorded (
                    <xref ref-type="bibr" rid="ref31">Mier 
                        <italic toggle="yes">et al.</italic>, 2020</xref>) and contains 100 human proteins with characteristic compositional bias.</p>
            </sec>
            <sec id="sec7">
                <title>Data transformation</title>
                <p>The computational pipeline 
                    <bold>cogent_utils,</bold> part of CGG toolkit v1.0.1. (Vasileiou et al, submitted), was used to create a CoGenT-style sequence collection (
                    <xref ref-type="bibr" rid="ref22">Janssen 
                        <italic toggle="yes">et al.</italic>, 2003</xref>) from Ensembl GRCh38.p13 as well as the URP, selected as a robust and convenient identifier encoding scheme both for human interpretation and programming convenience. Specifically, cogent_utils enables header modification for all entries of FASTA sequence files, based on user-defined criteria. Below we present the example of the oleosine protein of 
                    <italic toggle="yes">Camellia sinensis</italic>, as it appears originally in URP and after cogent_utils transformation:</p>
                <p>
                    <italic toggle="yes">URP original header</italic>
                </p>
                <p>&gt;tr|A0A7J7IAQ7|A0A7J7IAQ7_CAMSI Oleosin OS=Camellia sinensis OX=4442 GN=HYC85_002860 PE=3 SV=1</p>
                <p>
                    <italic toggle="yes">Modified header</italic>
                </p>
                <p>&gt;UP000593564-00004442-Came_sine-22-000001-E-000699 tr|A0A7J7IAQ7|A0A7J7IAQ7_CAMSI Oleosin OS=Camellia sinensis OX=4442 GN=HYC85_002860 PE=3 SV=1</p>
                <p>The first part of the header has been added, and corresponds to the following format: [URP identifier]-[NCBI Taxonomy ID]-[organism name]-[URP year release]-[proteome counter]-[taxonomic domain]-[protein counter].</p>
                <p>
                    <bold>MagicMatch</bold> v1.0.1 (
                    <xref ref-type="bibr" rid="ref46">Smith 
                        <italic toggle="yes">et al.</italic>, 2005</xref>) was used for sequence matching across databases to verify the identity of the reference proteome collection against the modified identifier space.</p>
            </sec>
            <sec id="sec8">
                <title>Masking, searching, phylogenetic profiling</title>
                <p>For the detection of compositional bias as a proxy for low-complexity sequence tracts, we deployed 
                    <bold>CAST v1.0.1</bold> (
                    <xref ref-type="bibr" rid="ref41">Promponas 
                        <italic toggle="yes">et al.</italic>, 2000</xref>), for all protein sequences of the human genome. The CAST algorithm was applied on the DOID annotated Ensembl FASTA format dataset using default parameters, i.e. threshold score 40 for reported regions. The outcome of the analysis were 2 files dividing the original dataset; one containing all entries where low complexity regions were detected and one containing all remaining entries.</p>
                <p>Searching with query datasets against Proteomes for the creation of phylogenetic profile patterns was performed with 
                    <bold>DIAMOND</bold> blastp (
                    <xref ref-type="bibr" rid="ref5">Buchfink 
                        <italic toggle="yes">et al.</italic>, 2021</xref>), using the URP dataset as target database and adjusting the alignment algorithm to enable compositional bias statistics (option: --comp-based-stats 3), conditioned on sequence properties (
                    <xref ref-type="bibr" rid="ref62">Yu &amp; Altschul, 2005</xref>). All hits considered as significant recorded an E-value&lt;0.001 and exhibit sequence similarities of 21% and above.</p>
                <p>For the calculation of amino acid frequencies across the Ensembl protein set, the 
                    <ext-link ext-link-type="uri" xlink:href="https://biopython.org/">BioPython</ext-link> Bio.SeqUtils.ProtParam module (
                    <xref ref-type="bibr" rid="ref12">Cock 
                        <italic toggle="yes">et al.</italic>, 2009</xref>) was used, which takes input files of sequences (typically FASTA or FASTQ), counts all the letters in each sequence, and returns a summary table of their counts and percentages. The output was used for data normalisation as explained in 
                    <xref ref-type="fig" rid="f1">Figure 1</xref>.</p>
                <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                    <label>Figure 1. </label>
                    <caption>
                        <title>Compositionally biased protein regions identified using CAST on the Ensembl human genome.</title>
                        <p>Sum of CAST scores for all compositionally biased occurrences by amino acid, normalised with respect to general amino acid frequency in the human genome (i.e. total score/frequency). The column colour corresponds to an amino acid classification according to the chemical nature of their side chains (
                            <xref ref-type="bibr" rid="ref23">Katchalski-Katzir 
                                <italic toggle="yes">et al.</italic>, 2006</xref>).</p>
                    </caption>
                    <graphic id="gr1" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/142650/67932c45-6597-4223-a295-33834e531b86_figure1.gif"/>
                </fig>
                <p>The phylogenetic profile heatmap (
                    <xref ref-type="fig" rid="f4">Figure 4</xref>) was produced using the heatmap3 R library (
                    <xref ref-type="bibr" rid="ref64">Zhao 
                        <italic toggle="yes">et al.</italic>, 2014</xref>) with default dissimilarity matrix calculation parameters.</p>
                <p>The 2&#x00d7;2 chi-square test, comparing low complexity presence in protein transcripts and disease-association (
                    <xref ref-type="table" rid="T1">Table 1</xref>) was performed with 0.01 significance threshold and no Yates continuity correction.</p>
                <table-wrap id="T1" orientation="portrait" position="float">
                    <label>Table 1. </label>
                    <caption>
                        <title>Contingency table between disease association in genes presenting compositionally biased regions versus genes without any detectable compositional bias.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top"/>
                                <th align="left" colspan="1" rowspan="1" valign="top">Low complexity</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">High complexity</th>
                                <th align="left" colspan="1" rowspan="1" valign="top"/>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>Non-disease</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">36250</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">62827</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">99077</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <bold>Disease</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1845</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">1780</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">3625</td>
                            </tr>
                            <tr>
                                <td colspan="1" rowspan="1"/>
                                <td align="left" colspan="1" rowspan="1" valign="top">38095</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">64607</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">102702</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>
                    <bold>Lifemap</bold>, an interactive cartography-type tool to explore the NCBI taxonomy was chosen for the visualisation of the taxonomic distribution of data subsets (
                    <xref ref-type="bibr" rid="ref52">de Vienne, 2016</xref>). For each visualisation, a list of the NCBI IDs of interest were used as input for the tool, which were retrieved, in each case, from the phylogenetic profiling hit list. NCBI taxonomy ID visualisations are provided for all UPR hits of ALG13, SIX3 and RP9 (
                    <xref ref-type="fig" rid="f5">Figure 5</xref>).</p>
            </sec>
        </sec>
        <sec id="sec9" sec-type="results">
            <title>Results</title>
            <sec id="sec10">
                <title>Disease association across the human genome</title>
                <p>The dataset upon which all transformations and analyses were performed was the Ensembl Human Genome export (GRCh38.p13), containing 119068 gene transcripts. The dataset was annotated with regard to disease association, using curated associations from GHR and UniProtKB, which are indexed in the DISEASES database (
                    <xref ref-type="bibr" rid="ref40">Pletscher-Frankild 
                        <italic toggle="yes">et al.</italic>, 2015</xref>). Of these, 3625 transcripts are confidently associated with disease, whereas the remaining 115443 are not verified for any strong disease association in the &#x201c;knowledge channel&#x201d; of DISEASES.</p>
                <p>To remove noise, e.g. putative or alternative mini-transcripts (some with multiple stop codons), the Ensembl dataset was filtered and all transcripts with length &lt;80 amino acid residues were removed, with the exception of short transcripts with at least one disease (i.e. DOID) association. The filtered set contains 102702 transcripts, which include all 3625 instances associated with disease (
                    <xref ref-type="table" rid="T1">Table 1</xref>).</p>
            </sec>
            <sec id="sec11">
                <title>Compositional bias and human disease</title>
                <p>For the evaluation of low complexity presence in the transcripts of the human genome we performed compositional bias detection using CAST (
                    <xref ref-type="bibr" rid="ref41">Promponas 
                        <italic toggle="yes">et al.</italic>, 2000</xref>). Out of the 102702 transcripts of the filtered Ensembl human genome dataset, compositional bias was detected in 38095 instances, with at least one compositionally biased sequence tract. Cross-evaluation of compositional bias and disease-association presence in the dataset using chi-square test of independence, revealed a significant bias towards low complexity regions in disease-associated, X
                    <sup>2</sup> (1, N = 102702) = 306.8467, p-value &lt; 0.00001 (
                    <xref ref-type="table" rid="T1">Table 1</xref>). This significant pattern alone provides a strong indication for the involvement of low complexity in human disease on genome scale, seen here for the first time, complementing previous, well-established classifications of protein structure and function (
                    <xref ref-type="bibr" rid="ref37">Ouzounis 
                        <italic toggle="yes">et al.</italic>, 2003</xref>).</p>
                <p>Examination of the low complexity gene dataset features highlighted the significant divergence among amino acid-related, low complexity frequencies. 
                    <xref ref-type="fig" rid="f1">Figure 1</xref> shows the amino acid-specific rich regions, expressed by the sum of CAST scores for each compositionally biased region and normalised with respect to general amino acid frequency in the human genome as calculated from the filtered Ensembl dataset using Biopython protein analysis modules (
                    <xref ref-type="bibr" rid="ref12">Cock 
                        <italic toggle="yes">et al.</italic>, 2009</xref>). Charged, hydrophilic residues appear over-represented, while hydrophobic, order-promoting amino acids are less frequent, in agreement with what is known about IDP/IDR composition (
                    <xref ref-type="bibr" rid="ref57">Williams 
                        <italic toggle="yes">et al.</italic>, 2000</xref>; 
                    <xref ref-type="bibr" rid="ref16">Dunker 
                        <italic toggle="yes">et al.</italic>, 2001</xref>; 
                    <xref ref-type="bibr" rid="ref18">Harbi 
                        <italic toggle="yes">et al.</italic>, 2011</xref>). The striking over-representation of serine/threonine (S/T) tracts, along with glutamate/glutamine (E/Q) and proline (P) followed by lysine (K) is indicative of the main residue types that might affect functional properties of human proteins, including their potential association with known phenotypes, such as polyglutamine tracts with neurodegenerative diseases (
                    <xref ref-type="bibr" rid="ref6">Bunting 
                        <italic toggle="yes">et al.</italic>, 2022</xref>).</p>
                <p>For the assessment of the relationship among disease association and compositional bias across the human proteome, the associated DOID vector for each amino acid enriched region was used as a multidimensional clustering parameter for Principal Component Analysis (PCA) (
                    <xref ref-type="fig" rid="f2">Figure 2</xref>). Consistent with the above, the presence of amino acid types in low complexity regions (e.g. S, E, P, Q) exhibit the highest contribution to the main principal components with regard to disease association, thus amplifying the link between low complexity and disease and establishing a direction for further study.</p>
                <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                    <label>Figure 2. </label>
                    <caption>
                        <title>PCA analysis of DOID correlation to proteins with low complexity regions, across the human genome.</title>
                        <p>The colour coding of each amino acid is the same as 
                            <xref ref-type="fig" rid="f1">Figure 1</xref> and reflects the chemical nature of their side chains. DOID=Disease Ontology identifier.</p>
                    </caption>
                    <graphic id="gr2" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/142650/67932c45-6597-4223-a295-33834e531b86_figure2.gif"/>
                </fig>
            </sec>
            <sec id="sec12">
                <title>Phylogenetic profiling of disease-associated LC proteins</title>
                <p>To further our investigation into the phylogenetic depth of low complexity proteins with or without known disease associations, we selected a published list of 100 human proteins with well-characterised compositionally biased regions (
                    <xref ref-type="bibr" rid="ref31">Mier 
                        <italic toggle="yes">et al.</italic>, 2020</xref>). The proteins were mapped to the enriched human genome datasets derived from Ensembl. Out of the 100 proteins in this curated dataset, 17 are confidently associated with disease, with one or more associated DOIDs, covering a wide range of disorders from metabolic and cardiovascular diseases to autoimmune conditions and cancer (
                    <xref ref-type="fig" rid="f3">Figure 3</xref>).</p>
                <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                    <label>Figure 3. </label>
                    <caption>
                        <title>Compositionally biased regions of the 100-gene subset associated with disease, linked to associated DOIDs.</title>
                        <p>The first column corresponds to the amino acid that appears to have enriched presence in each case. The last column&#x2019;s size is proportional to the sum of CAST scores for amino acid rich regions in each gene. Although specific genes have originally been listed as exemplary for one compositional bias type, in this analysis they can be observed more than once along their DOID associations, as the result of CAST analysis for the derivation of total scores. DOID=Disease Ontology identifier.</p>
                    </caption>
                    <graphic id="gr3" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/142650/67932c45-6597-4223-a295-33834e531b86_figure3.gif"/>
                </fig>
                <p>To examine in more detail the emergence of compositional bias for the curated dataset of 100 human proteins as an exemplary case, protein sequence alignment was performed against the URP dataset. Homologues were detected in &gt;11000 species, with just 269 cases not containing any of these regions, largely corresponding to Archaeal and Bacterial taxa. This preliminary, targeted comparative analysis using a limited query of 100 human proteins is a first glimpse into the dynamics of compositional bias across phylogenies. Our ongoing effort to investigate the presence of compositional bias and the connection to human disease will assess these discovered phylogenetic patterns across the entire human genome in the near future. The complete phylogenetic profiling matrix is provided as 
                    <italic toggle="yes">Extended data</italic> (
                    <xref ref-type="bibr" rid="ref9">Chasapi, 2022</xref>).</p>
                <p>Focusing on the 100-gene subset with confident disease associations, most disease-associated genes had detectable homologues across Eukaryotic organisms, with only a few, scarce Bacterial hits (
                    <xref ref-type="fig" rid="f4">Figure 4</xref>). An exception is the DnaJ heat shock protein family (Hsp40) member C5 (DNAJC5) which exhibits an extended phylogenetic depth, covering 86% of the URP (i.e. 9751 proteomes), verifying the observation as a well-known, abundant domain (
                    <xref ref-type="bibr" rid="ref47">Stetler 
                        <italic toggle="yes">et al.</italic>, 2010</xref>; 
                    <xref ref-type="bibr" rid="ref42">Qiu 
                        <italic toggle="yes">et al.</italic>, 2006</xref>).</p>
                <fig fig-type="figure" id="f4" orientation="portrait" position="float">
                    <label>Figure 4. </label>
                    <caption>
                        <title>Phylogenetic profile of the 100-gene subset with confident disease associations.</title>
                        <p>The heatmap range reflects the dissimilarity matrix of the plotted values. The row side colours correspond to the log of the sum of CAST scores for all detected compositionally biased regions for each protein (darker colour indicates higher sum of CAST scores). The column side colours indicate the taxonomic level of each target proteome (orange = eukaryotes, purple = bacteria). DNAJC5 is not displayed.</p>
                    </caption>
                    <graphic id="gr4" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/142650/67932c45-6597-4223-a295-33834e531b86_figure4.gif"/>
                </fig>
                <p>The remaining, 16 disease-associated genes were detected in 1350 proteomes with one or several hits. 
                    <xref ref-type="fig" rid="f4">Figure 4</xref> shows the phylogenetic profile map of these genes across the URP target proteome set. Most genes display homologues in higher eukaryotic organisms, whereas, with the exception of the E3 ubiquitin-protein ligase RLIM, almost no homologous genes are detected in plant genomes. Similarly, the subunit of the rod cyclic GMP-gated cation channel (CNGA1) is the only query gene with ion channel homologues in ciliates and fungi, with the exception of the Ascomycota. In the case of genes with an overall high CAST score, there seem to be more sequence hits, both in number and in taxonomic distribution. This can be, in part, due to the sequence alignment analysis which was tailored to compositionally biased sequences, thus increasing hit sensitivity.</p>
                <p>This comparative genomics framework is a useful tool both for the investigation of tendencies among gene sets confidently associated with diseases, containing compositionally biased regions, as well as for the identification of specific taxonomic signatures for each gene. A selected number of specific cases are reviewed below.</p>
            </sec>
            <sec id="sec13">
                <title>ALG13 has a restricted phylogenetic depth</title>
                <p>The protein encoded by ALG13 is a subunit of a bipartite UDP-N-acetylglucosamine transferase, which heterodimerizes with asparagine-linked glycosylation 14 homolog to form a functional UDP-GlcNAc glycosyltransferase that catalyses the second sugar addition of the highly conserved oligosaccharide precursor in endoplasmic reticulum N-linked glycosylation. ALG13 has been associated with several disease conditions including developmental and epileptic encephalopathy as well as genetic intellectual disability (
                    <xref ref-type="bibr" rid="ref17">Epi4K Consortium &amp; Epilepsy Phenome/Genome Project, 2013</xref>; 
                    <xref ref-type="bibr" rid="ref3">Bissar-Tadmouri 
                        <italic toggle="yes">et al.</italic>, 2014</xref>; 
                    <xref ref-type="bibr" rid="ref35">Ng 
                        <italic toggle="yes">et al.</italic>, 2020</xref>). ALG13 homologs are detected in 248 proteomes. Moreover, all hits correspond to higher Eukaryotes, specifically to the infraphylum Gnathostomata, including mostly Euteleostomi representatives. 
                    <xref ref-type="fig" rid="f5">Figure 5A</xref> shows a general view of the tree of life, highlighted for species where ALG13 homologue hits were retrieved, whereas 
                    <xref ref-type="fig" rid="f5">Figure 5B</xref> provides a closer look of the same result. The restricted phylogenetic depth of ALG13 may indicate that the interaction pathways including ALG13 are restricted to functions specific to bony vertebrates, a hypothesis that can be assessed by jointly analysing all participating proteins for their evolutionary emergence.</p>
                <fig fig-type="figure" id="f5" orientation="portrait" position="float">
                    <label>Figure 5. </label>
                    <caption>
                        <title>Taxonomic distribution of sequence alignment hits for selected genes, across the URP proteome dataset.</title>
                        <p>A) a broad view of the tree of life, visualised be Lifemap (
                            <xref ref-type="bibr" rid="ref52">de Vienne, 2016</xref>). ALG13 hits are highlighted in blue. B) A zoomed in view of all ALG13 hits. C) SIX3 hits that belong to the Ascomycota phylum and are not found for any other gene of the dataset. D) RP9 hits that correspond to the Dictyostelia clade and are not found for any other gene of the dataset.</p>
                    </caption>
                    <graphic id="gr5" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/142650/67932c45-6597-4223-a295-33834e531b86_figure5.gif"/>
                </fig>
            </sec>
            <sec id="sec14">
                <title>SIX3 has a unique conservation signature</title>
                <p>SIX Homeobox 3 (SIX3) encodes a member of the sine oculis homeobox transcription factor family. The expressed protein plays a role in brain and eye development, and its mutations are associated with Holoprosencephaly and Schizencephaly abnormalities (
                    <xref ref-type="bibr" rid="ref53">Wallis 
                        <italic toggle="yes">et al.</italic>, 1999</xref>, 3; 
                    <xref ref-type="bibr" rid="ref21">Hehr 
                        <italic toggle="yes">et al.</italic>, 2010</xref>, 3). SIX3 homologues were detected in 869 reference proteomes, including filamentous ascomycetes proteome sequences in which SIX3 is the only disease-associated gene with significant hits (
                    <xref ref-type="fig" rid="f5">Figure 5C</xref>). A follow-up study could further investigate this distinct conservation pattern.</p>
            </sec>
            <sec id="sec15">
                <title>RP9 is uniquely matched in Dictyostelia</title>
                <p>Retinitis Pigmentosa 9 (RP9 or PAP1) is thought to be a target protein for the PIM1 serine/threonine protein kinase. The protein localises in nuclear speckles and has a role in pre-mRNA splicing. Mutations in the RP9 gene result in autosomal dominant retinitis pigmentosa (
                    <xref ref-type="bibr" rid="ref30">Maita 
                        <italic toggle="yes">et al.</italic>, 2004</xref>; 
                    <xref ref-type="bibr" rid="ref25">Keen 
                        <italic toggle="yes">et al.</italic>, 2002</xref>). The comparative genomics analysis of RP9 presence detects homologues in 507 species, including all representatives of the Dictyostelia clade, that were uniquely matched to RP9 among all disease genes (
                    <xref ref-type="fig" rid="f5">Figure 5D</xref>). 
                    <italic toggle="yes">Dictyostelium discoideum</italic>, the most studied representative of Dictyostelia (i.e. dictyostelid cellular slime molds), has been used extensively as model organism for cell communication, differentiation, and programmed cell death studies (
                    <xref ref-type="bibr" rid="ref24">Kawabe 
                        <italic toggle="yes">et al.</italic>, 2019</xref>; 
                    <xref ref-type="bibr" rid="ref48">Strassmann 
                        <italic toggle="yes">et al.</italic>, 2000</xref>). The specific presence of RP9 homologues in Dictyostelia including D. 
                    <italic toggle="yes">discoideum</italic>, raises questions about their specific roles in this taxon and the possibility that functional analysis can shed further light into the human disease.</p>
            </sec>
        </sec>
        <sec id="sec16" sec-type="discussion">
            <title>Discussion</title>
            <p>A major research objective for biomedical research is the detection of genetic factors involved in human disease at multiple levels including variation, gene expression and cellular roles. The evolutionary perspective of human disease is less appreciated, compared to the functional genomics of human genes and proteins, by either computational or experimental means. Combining evolutionary characters to structural features such as IDR presence which has yet to be systematically studied in conjunction with specific disease classes, can provide a novel analysis framework of the human genome with respect to disease.</p>
            <p>In this study, we report a genome-wide analysis of the compositional bias association with disease in human proteins and their taxonomic distribution. It is the first time that a combined genome-wide analysis of these aspects is reported, from various structural, functional and evolutionary angles. Our analysis includes novel views on the relation between compositional bias and disease-association, demonstrating a strong correlation between the two features. Delving deeper into the contribution of specific amino acids to compositionally biased regions of disease-associated genes across the human genome, we demonstrate that charged, hydrophilic residues are over-represented in genes with confident disease associations.</p>
            <p>We adopt a comparative genomics perspective for the evaluation of disease association of compositional bias in human proteins, using a curated list of 100 human proteins, as a first step towards this direction in a controlled manner. We delineate conservation patterns of the annotated gene set across taxonomic categories, taking advantage of the great plethora of sequenced genomes across the tree of life, using a total of 11297 representative proteomes.</p>
            <p>The described framework of structurally and functionally annotated gene queries against multiple species has been developed with the view of future directions, encompassing the entire human genome and all known gene-disease associations. This will potentially allow us to elucidate specific evolutionary patterns of groups of genes involved in the same disease, serving as a tool to better understand the underlying mechanisms and identify appropriate model organisms for experimental investigation.</p>
        </sec>
    </body>
    <back>
        <sec id="sec19" sec-type="data-availability">
            <title>Data availability</title>
            <sec id="sec20">
                <title>Underlying data</title>
                <p>All data underlying the analyses are available as part of the article or as referenced external data sources and no additional source data are required.</p>
            </sec>
            <sec id="sec21">
                <title>Extended data</title>
                <p>Zenodo: Phylogenetic profile of 100 annotated low complexity proteins against the Uniprot Reference Proteome dataset. 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.7486339">https://doi.org/10.5281/zenodo.7486339</ext-link> (
                    <xref ref-type="bibr" rid="ref9">Chasapi, 2022</xref>).</p>
                <p>This project contains the following extended data:
                    <list list-type="bullet">
                        <list-item>
                            <label>-</label>
                            <p>cb100-query-20221223.map (The phylogenetic profile of the 100 selected annotated low complexity proteins against the Uniprot Reference Proteome dataset)</p>
                        </list-item>
                    </list>
</p>
                <p>Data are available under the terms of the 
                    <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International license</ext-link> (CC-BY 4.0).</p>
            </sec>
        </sec>
        <ack>
            <title>Acknowledgements</title>
            <p>The authors would like to thank the IDP implementation study community participating in the Elixir Commissioned Service entitled &#x201c;Standardising Intrinsically Disordered Proteins (IDPs) Data&#x201d; for the useful knowledge exchange and excellent collaboration on the topic of IDP standardisation.</p>
        </ack>
        <ref-list>
            <title>References</title>
            <ref id="ref1">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ahrens</surname>
                            <given-names>JB</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Nunez-Castilla</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Siltberg-Liberles</surname>
                            <given-names>J</given-names>
                        </name>
</person-group>:
                    <article-title>Evolution of intrinsic disorder in eukaryotic proteins.</article-title>
                    <source>

                        <italic toggle="yes">Cell. Mol. Life Sci.</italic>
</source>
                    <year>2017</year>;<volume>74</volume>:<fpage>3163</fpage>&#x2013;<lpage>3174</lpage>.
                    <pub-id pub-id-type="doi">10.1007/s00018-017-2559-0</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref2">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Basile</surname>
                            <given-names>W</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Salvatore</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Bassot</surname>
                            <given-names>C</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Why do eukaryotic proteins contain more intrinsically disordered regions?</article-title>
                    <source>

                        <italic toggle="yes">PLoS Comput. Biol.</italic>
</source>
                    <year>2019</year>;<volume>15</volume>:<fpage>e1007186</fpage>.
                    <pub-id pub-id-type="pmid">31329574</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pcbi.1007186</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6675126</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref3">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bissar-Tadmouri</surname>
                            <given-names>N</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Donahue</surname>
                            <given-names>WL</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Al-Gazali</surname>
                            <given-names>L</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>X chromosome exome sequencing reveals a novel 
                        <italic toggle="yes">ALG 13</italic> mutation in a nonsyndromic intellectual disability family with multiple affected male siblings.</article-title>
                    <source>

                        <italic toggle="yes">Am. J. Med. Genet. A.</italic>
</source>
                    <year>2014</year>;<volume>164</volume>:<fpage>164</fpage>&#x2013;<lpage>169</lpage>.
                    <pub-id pub-id-type="doi">10.1002/ajmg.a.36233</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref4">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Brown</surname>
                            <given-names>CJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Johnson</surname>
                            <given-names>AK</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Dunker</surname>
                            <given-names>AK</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Evolution and disorder.</article-title>
                    <source>

                        <italic toggle="yes">Curr. Opin. Struct. Biol.</italic>
</source>
                    <year>2011</year>;<volume>21</volume>:<fpage>441</fpage>&#x2013;<lpage>446</lpage>.
                    <pub-id pub-id-type="pmid">21482101</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.sbi.2011.02.005</pub-id>
                    <pub-id pub-id-type="pmcid">PMC3112239</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref5">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Buchfink</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Reuter</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Drost</surname>
                            <given-names>H-G</given-names>
                        </name>
</person-group>:
                    <article-title>Sensitive protein alignments at tree-of-life scale using DIAMOND.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Methods.</italic>
</source>
                    <year>2021</year>;<volume>18</volume>:<fpage>366</fpage>&#x2013;<lpage>368</lpage>.
                    <pub-id pub-id-type="pmid">33828273</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41592-021-01101-x</pub-id>
                    <pub-id pub-id-type="pmcid">PMC8026399</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref6">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bunting</surname>
                            <given-names>EL</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hamilton</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Tabrizi</surname>
                            <given-names>SJ</given-names>
                        </name>
</person-group>:
                    <article-title>Polyglutamine diseases.</article-title>
                    <source>

                        <italic toggle="yes">Curr. Opin. Neurobiol.</italic>
</source>
                    <year>2022</year>;<volume>72</volume>:<fpage>39</fpage>&#x2013;<lpage>47</lpage>.
                    <pub-id pub-id-type="doi">10.1016/j.conb.2021.07.001</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref7">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>B&#x00fc;rgi</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Xue</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Uversky</surname>
                            <given-names>VN</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Intrinsic Disorder in Transmembrane Proteins: Roles in Signaling and Topology Prediction.</article-title>
                    <source>

                        <italic toggle="yes">PLoS One.</italic>
</source>
                    <year>2016</year>;<volume>11</volume>:<fpage>e0158594</fpage>.
                    <pub-id pub-id-type="pmid">27391701</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pone.0158594</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4938508</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref8">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <collab>CAID Predictors</collab>

                        <collab>DisProt Curators</collab>

                        <name name-style="western">
                            <surname>Necci</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Critical assessment of protein intrinsic disorder prediction.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Methods.</italic>
</source>
                    <year>2021</year>;<volume>18</volume>:<fpage>472</fpage>&#x2013;<lpage>481</lpage>.</mixed-citation>
            </ref>
            <ref id="ref9">
                <mixed-citation publication-type="data">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Chasapi</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <data-title>Phylogenetic profile of 100 annotated low complexity proteins against the Uniprot Reference Proteome dataset.</data-title>[Dataset].
                    <source>

                        <italic toggle="yes">Zenodo.</italic>
</source>
                    <year>2022</year>.
                    <pub-id pub-id-type="doi">10.5281/zenodo.7486339</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref10">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Chen</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Natale</surname>
                            <given-names>DA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Finn</surname>
                            <given-names>RD</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Representative Proteomes: A Stable, Scalable and Unbiased Proteome Set for Sequence Analysis and Functional Annotation.</article-title>
                    <source>

                        <italic toggle="yes">PLoS One.</italic>
</source>
                    <year>2011</year>;<volume>6</volume>:<fpage>e18910</fpage>.
                    <pub-id pub-id-type="pmid">21556138</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pone.0018910</pub-id>
                    <pub-id pub-id-type="pmcid">PMC3083393</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref11">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Choudhary</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lopus</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hosur</surname>
                            <given-names>RV</given-names>
                        </name>
</person-group>:
                    <article-title>Targeting disorders in unstructured and structured proteins in various diseases.</article-title>
                    <source>

                        <italic toggle="yes">Biophys. Chem.</italic>
</source>
                    <year>2022</year>;<volume>281</volume>:<fpage>106742</fpage>.
                    <pub-id pub-id-type="pmid">34922214</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.bpc.2021.106742</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref12">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Cock</surname>
                            <given-names>PJA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Antao</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Chang</surname>
                            <given-names>JT</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Biopython: freely available Python tools for computational molecular biology and bioinformatics.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2009</year>;<volume>25</volume>:<fpage>1422</fpage>&#x2013;<lpage>1423</lpage>.
                    <pub-id pub-id-type="pmid">19304878</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btp163</pub-id>
                    <pub-id pub-id-type="pmcid">PMC2682512</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref13">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Dickerson</surname>
                            <given-names>JE</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Robertson</surname>
                            <given-names>DL</given-names>
                        </name>
</person-group>:
                    <article-title>On the Origins of Mendelian Disease Genes in Man: The Impact of Gene Duplication.</article-title>
                    <source>

                        <italic toggle="yes">Mol. Biol. Evol.</italic>
</source>
                    <year>2012</year>;<volume>29</volume>:<fpage>61</fpage>&#x2013;<lpage>69</lpage>.
                    <pub-id pub-id-type="pmid">21705381</pub-id>
                    <pub-id pub-id-type="doi">10.1093/molbev/msr111</pub-id>
                    <pub-id pub-id-type="pmcid">PMC3709195</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref14">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Dosztanyi</surname>
                            <given-names>Z</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Csizmok</surname>
                            <given-names>V</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Tompa</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2005</year>;<volume>21</volume>:<fpage>3433</fpage>&#x2013;<lpage>3434</lpage>.
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/bti541</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref15">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Dosztanyi</surname>
                            <given-names>Z</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Meszaros</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Simon</surname>
                            <given-names>I</given-names>
                        </name>
</person-group>:
                    <article-title>ANCHOR: web server for predicting protein binding regions in disordered proteins.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2009</year>;<volume>25</volume>:<fpage>2745</fpage>&#x2013;<lpage>2746</lpage>.
                    <pub-id pub-id-type="pmid">19717576</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btp518</pub-id>
                    <pub-id pub-id-type="pmcid">PMC2759549</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref16">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Dunker</surname>
                            <given-names>AK</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lawson</surname>
                            <given-names>JD</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Brown</surname>
                            <given-names>CJ</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Intrinsically disordered protein.</article-title>
                    <source>

                        <italic toggle="yes">J. Mol. Graph. Model.</italic>
</source>
                    <year>2001</year>;<volume>19</volume>:<fpage>26</fpage>&#x2013;<lpage>59</lpage>.
                    <pub-id pub-id-type="doi">10.1016/S1093-3263(00)00138-8</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref17">
                <mixed-citation publication-type="journal">
                    <collab>Epi4K Consortium &amp; Epilepsy Phenome/Genome Project</collab>:
                    <article-title>De novo mutations in epileptic encephalopathies.</article-title>
                    <source>

                        <italic toggle="yes">Nature.</italic>
</source>
                    <year>2013</year>;<volume>501</volume>:<fpage>217</fpage>&#x2013;<lpage>221</lpage>.
                    <pub-id pub-id-type="pmid">23934111</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nature12439</pub-id>
                    <pub-id pub-id-type="pmcid">PMC3773011</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref18">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Harbi</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kumar</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Harrison</surname>
                            <given-names>PM</given-names>
                        </name>
</person-group>:
                    <article-title>LPS-annotate: complete annotation of compositionally biased regions in the protein knowledgebase.</article-title>
                    <source>

                        <italic toggle="yes">Database.</italic>
</source>
                    <year>2011</year>;<volume>2011</volume>:<fpage>baq031&#x2013;baq031</fpage>.
                    <pub-id pub-id-type="doi">10.1093/database/baq031</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref19">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Harrison</surname>
                            <given-names>PM</given-names>
                        </name>
</person-group>:
                    <article-title>fLPS 2.0: rapid annotation of compositionally-biased regions in biological sequences.</article-title>
                    <source>

                        <italic toggle="yes">PeerJ.</italic>
</source>
                    <year>2021</year>;<volume>9</volume>:<fpage>e12363</fpage>.
                    <pub-id pub-id-type="pmid">34760378</pub-id>
                    <pub-id pub-id-type="doi">10.7717/peerj.12363</pub-id>
                    <pub-id pub-id-type="pmcid">PMC8557692</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref20">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hatos</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hajdu-Solt&#x00e9;sz</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Monzon</surname>
                            <given-names>AM</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>DisProt: intrinsic protein disorder annotation in 2020.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2019</year>;<volume>48</volume>:<fpage>D269</fpage>&#x2013;<lpage>D276</lpage>.
                    <pub-id pub-id-type="pmid">31713636</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkz975</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7145575</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref21">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hehr</surname>
                            <given-names>U</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pineda-Alvarez</surname>
                            <given-names>DE</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Uyanik</surname>
                            <given-names>G</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Heterozygous mutations in SIX3 and SHH are associated with schizencephaly and further expand the clinical spectrum of holoprosencephaly.</article-title>
                    <source>

                        <italic toggle="yes">Hum. Genet.</italic>
</source>
                    <year>2010</year>;<volume>127</volume>:<fpage>555</fpage>&#x2013;<lpage>561</lpage>.
                    <pub-id pub-id-type="pmid">20157829</pub-id>
                    <pub-id pub-id-type="doi">10.1007/s00439-010-0797-4</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4101187</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref22">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Janssen</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Enright</surname>
                            <given-names>AJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Audit</surname>
                            <given-names>B</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>COmplete GENome Tracking (COGENT): a flexible data environment for computational genomics.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2003</year>;<volume>19</volume>:<fpage>1451</fpage>&#x2013;<lpage>1452</lpage>.
                    <pub-id pub-id-type="pmid">12874064</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btg161</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref23">
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Katchalski-Katzir</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kasher</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Fridkin</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <chapter-title>Amino Acids: Physicochemical Properties.</chapter-title>
                    <source>

                        <italic toggle="yes">Encyclopedic Reference of Genomics and Proteomics in Molecular Medicine.</italic>
</source>
                    <publisher-loc>Berlin Heidelberg</publisher-loc>:
                    <publisher-name>Springer</publisher-name>;<year>2006</year>; pp<fpage>55</fpage>&#x2013;<lpage>68</lpage>.</mixed-citation>
            </ref>
            <ref id="ref24">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kawabe</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Du</surname>
                            <given-names>Q</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Schilde</surname>
                            <given-names>C</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Evolution of multicellularity in Dictyostelia.</article-title>
                    <source>

                        <italic toggle="yes">Int. J. Dev. Biol.</italic>
</source>
                    <year>2019</year>;<volume>63</volume>:<fpage>359</fpage>&#x2013;<lpage>369</lpage>.
                    <pub-id pub-id-type="pmid">31840775</pub-id>
                    <pub-id pub-id-type="doi">10.1387/ijdb.190108ps</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6978153</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref25">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Keen</surname>
                            <given-names>TJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hims</surname>
                            <given-names>MM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>McKie</surname>
                            <given-names>AB</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Mutations in a protein target of the Pim-1 kinase associated with the RP9 form of autosomal dominant retinitis pigmentosa.</article-title>
                    <source>

                        <italic toggle="yes">Eur. J. Hum. Genet.</italic>
</source>
                    <year>2002</year>;<volume>10</volume>:<fpage>245</fpage>&#x2013;<lpage>249</lpage>.
                    <pub-id pub-id-type="pmid">12032732</pub-id>
                    <pub-id pub-id-type="doi">10.1038/sj.ejhg.5200797</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref26">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Khan</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Douglas</surname>
                            <given-names>GM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Patel</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Polymorphism Analysis Reveals Reduced Negative Selection and Elevated Rate of Insertions and Deletions in Intrinsically Disordered Protein Regions.</article-title>
                    <source>

                        <italic toggle="yes">Genome Biol. Evol.</italic>
</source>
                    <year>2015</year>;<volume>7</volume>:<fpage>1815</fpage>&#x2013;<lpage>1826</lpage>.
                    <pub-id pub-id-type="pmid">26047845</pub-id>
                    <pub-id pub-id-type="doi">10.1093/gbe/evv105</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4494057</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref27">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Koos</surname>
                            <given-names>JA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Bassett</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>Genetics Home Reference: A Review.</article-title>
                    <source>

                        <italic toggle="yes">Med. Ref. Serv. Q.</italic>
</source>
                    <year>2018</year>;<volume>37</volume>:<fpage>292</fpage>&#x2013;<lpage>299</lpage>.
                    <pub-id pub-id-type="doi">10.1080/02763869.2018.1477716</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref28">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Linding</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Jensen</surname>
                            <given-names>LJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Diella</surname>
                            <given-names>F</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Protein Disorder Prediction.</article-title>
                    <source>

                        <italic toggle="yes">Structure.</italic>
</source>
                    <year>2003</year>;<volume>11</volume>:<fpage>1453</fpage>&#x2013;<lpage>1459</lpage>.
                    <pub-id pub-id-type="doi">10.1016/j.str.2003.10.002</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref29">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Lopez-Bigas</surname>
                            <given-names>N</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ouzounis</surname>
                            <given-names>CA</given-names>
                        </name>
</person-group>:
                    <article-title>Genome-wide identification of genes likely to be involved in human genetic disease.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2004</year>;<volume>32</volume>:<fpage>3108</fpage>&#x2013;<lpage>3114</lpage>.
                    <pub-id pub-id-type="pmid">15181176</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkh605</pub-id>
                    <pub-id pub-id-type="pmcid">PMC434425</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref30">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Maita</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kitaura</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Keen</surname>
                            <given-names>TJ</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>PAP-1, the mutated gene underlying the RP9 form of dominant retinitis pigmentosa, is a splicing factor.</article-title>
                    <source>

                        <italic toggle="yes">Exp. Cell Res.</italic>
</source>
                    <year>2004</year>;<volume>300</volume>:<fpage>283</fpage>&#x2013;<lpage>296</lpage>.
                    <pub-id pub-id-type="pmid">15474994</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.yexcr.2004.07.029</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref31">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Mier</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Paladin</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Tamana</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Disentangling the complexity of low complexity proteins.</article-title>
                    <source>

                        <italic toggle="yes">Brief. Bioinform.</italic>
</source>
                    <year>2020</year>;<volume>21</volume>:<fpage>458</fpage>&#x2013;<lpage>472</lpage>.
                    <pub-id pub-id-type="pmid">30698641</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bib/bbz007</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7299295</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref32">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Monti</surname>
                            <given-names>SM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>De Simone</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Langella</surname>
                            <given-names>E</given-names>
                        </name>
</person-group>:
                    <article-title>The Amazing World of IDPs in Human Diseases.</article-title>
                    <source>

                        <italic toggle="yes">Biomolecules.</italic>
</source>
                    <year>2021</year>;<volume>11</volume>:<fpage>333</fpage>.
                    <pub-id pub-id-type="pmid">33672116</pub-id>
                    <pub-id pub-id-type="doi">10.3390/biom11020333</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7926885</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref33">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Monti</surname>
                            <given-names>SM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>De Simone</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Langella</surname>
                            <given-names>E</given-names>
                        </name>
</person-group>:
                    <article-title>The Amazing World of IDPs in Human Diseases II.</article-title>
                    <source>

                        <italic toggle="yes">Biomolecules.</italic>
</source>
                    <year>2022</year>;<volume>12</volume>:<fpage>369</fpage>.
                    <pub-id pub-id-type="pmid">35327561</pub-id>
                    <pub-id pub-id-type="doi">10.3390/biom12030369</pub-id>
                    <pub-id pub-id-type="pmcid">PMC8945807</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref34">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Necci</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Piovesan</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Doszt&#x00e1;nyi</surname>
                            <given-names>Z</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>MobiDB-lite: Fast and highly specific consensus prediction of intrinsic disorder in proteins.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2017</year>;<volume>33</volume>:<fpage>1402</fpage>&#x2013;<lpage>1404</lpage>.
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btx015</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref35">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ng</surname>
                            <given-names>BG</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Eklund</surname>
                            <given-names>EA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Shiryaev</surname>
                            <given-names>SA</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Predominant and novel de novo variants in 29 individuals with 
                        <italic toggle="yes">ALG13</italic> deficiency: Clinical description, biomarker status, biochemical analysis, and treatment suggestions.</article-title>
                    <source>

                        <italic toggle="yes">J. Inherit. Metab. Dis.</italic>
</source>
                    <year>2020</year>;<volume>43</volume>:<fpage>1333</fpage>&#x2013;<lpage>1348</lpage>.
                    <pub-id pub-id-type="pmid">32681751</pub-id>
                    <pub-id pub-id-type="doi">10.1002/jimd.12290</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7722193</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref36">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ntountoumi</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Vlastaridis</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mossialos</surname>
                            <given-names>D</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Low complexity regions in the proteins of prokaryotes perform important functional roles and are highly conserved.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2019</year>;<volume>47</volume>:<fpage>9998</fpage>&#x2013;<lpage>10009</lpage>.
                    <pub-id pub-id-type="pmid">31504783</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkz730</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6821194</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref37">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ouzounis</surname>
                            <given-names>CA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Coulson</surname>
                            <given-names>RMR</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Enright</surname>
                            <given-names>AJ</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Classification schemes for protein structure and function.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Rev. Genet.</italic>
</source>
                    <year>2003</year>;<volume>4</volume>:<fpage>508</fpage>&#x2013;<lpage>519</lpage>.
                    <pub-id pub-id-type="doi">10.1038/nrg1113</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref38">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Pajkos</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zeke</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Doszt&#x00e1;nyi</surname>
                            <given-names>Z</given-names>
                        </name>
</person-group>:
                    <article-title>Ancient Evolutionary Origin of Intrinsically Disordered Cancer Risk Regions.</article-title>
                    <source>

                        <italic toggle="yes">Biomolecules.</italic>
</source>
                    <year>2020</year>;<volume>10</volume>:<fpage>1115</fpage>.
                    <pub-id pub-id-type="pmid">32731489</pub-id>
                    <pub-id pub-id-type="doi">10.3390/biom10081115</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7465906</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref39">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Peng</surname>
                            <given-names>Z</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Yan</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Fan</surname>
                            <given-names>X</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Exceptionally abundant exceptions: comprehensive characterization of intrinsic disorder in all domains of life.</article-title>
                    <source>

                        <italic toggle="yes">Cell. Mol. Life Sci.</italic>
</source>
                    <year>2015</year>;<volume>72</volume>:<fpage>137</fpage>&#x2013;<lpage>151</lpage>.
                    <pub-id pub-id-type="pmid">24939692</pub-id>
                    <pub-id pub-id-type="doi">10.1007/s00018-014-1661-9</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref40">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Pletscher-Frankild</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pallej&#x00e0;</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Tsafou</surname>
                            <given-names>K</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>DISEASES: Text mining and data integration of disease&#x2013;gene associations.</article-title>
                    <source>

                        <italic toggle="yes">Methods.</italic>
</source>
                    <year>2015</year>;<volume>74</volume>:<fpage>83</fpage>&#x2013;<lpage>89</lpage>.
                    <pub-id pub-id-type="pmid">25484339</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.ymeth.2014.11.020</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref41">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Promponas</surname>
                            <given-names>VJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Enright</surname>
                            <given-names>AJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Tsoka</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>CAST: an iterative algorithm for the complexity analysis of sequence tracts.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2000</year>;<volume>16</volume>:<fpage>915</fpage>&#x2013;<lpage>922</lpage>.
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/16.10.915</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref42">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Qiu</surname>
                            <given-names>X-B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Shao</surname>
                            <given-names>Y-M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Miao</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The diversity of the DnaJ/Hsp40 family, the crucial partners for Hsp70 chaperones.</article-title>
                    <source>

                        <italic toggle="yes">Cell. Mol. Life Sci.</italic>
</source>
                    <year>2006</year>;<volume>63</volume>:<fpage>2560</fpage>&#x2013;<lpage>2570</lpage>.
                    <pub-id pub-id-type="pmid">16952052</pub-id>
                    <pub-id pub-id-type="doi">10.1007/s00018-006-6192-6</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref43">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Quaglia</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>M&#x00e9;sz&#x00e1;ros</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Salladini</surname>
                            <given-names>E</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>DisProt in 2022: improved quality and accessibility of protein intrinsic disorder annotation.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2022</year>;<volume>50</volume>:<fpage>D480</fpage>&#x2013;<lpage>D487</lpage>.
                    <pub-id pub-id-type="doi">10.1093/nar/gkab1082</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref44">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Schriml</surname>
                            <given-names>LM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mitraka</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Munro</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Human Disease Ontology 2018 update: classification, content and workflow expansion.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2019</year>;<volume>47</volume>:<fpage>D955</fpage>&#x2013;<lpage>D962</lpage>.
                    <pub-id pub-id-type="pmid">30407550</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gky1032</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6323977</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref45">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Sickmeier</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hamilton</surname>
                            <given-names>JA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>LeGall</surname>
                            <given-names>T</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>DisProt: the Database of Disordered Proteins.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2007</year>;<volume>35</volume>:<fpage>D786</fpage>&#x2013;<lpage>D793</lpage>.
                    <pub-id pub-id-type="pmid">17145717</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkl893</pub-id>
                    <pub-id pub-id-type="pmcid">PMC1751543</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref46">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Smith</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kunin</surname>
                            <given-names>V</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Goldovsky</surname>
                            <given-names>L</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>MagicMatch--cross-referencing sequence identifiers across databases.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2005</year>;<volume>21</volume>:<fpage>3429</fpage>&#x2013;<lpage>3430</lpage>.
                    <pub-id pub-id-type="pmid">15961438</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/bti548</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref47">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Stetler</surname>
                            <given-names>RA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gan</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhang</surname>
                            <given-names>W</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Heat shock proteins: Cellular and molecular mechanisms in the central nervous system.</article-title>
                    <source>

                        <italic toggle="yes">Prog. Neurobiol.</italic>
</source>
                    <year>2010</year>;<volume>92</volume>:<fpage>184</fpage>&#x2013;<lpage>211</lpage>.
                    <pub-id pub-id-type="pmid">20685377</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.pneurobio.2010.05.002</pub-id>
                    <pub-id pub-id-type="pmcid">PMC2939168</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref48">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Strassmann</surname>
                            <given-names>JE</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhu</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Queller</surname>
                            <given-names>DC</given-names>
                        </name>
</person-group>:
                    <article-title>Altruism and social cheating in the social amoeba Dictyostelium discoideum.</article-title>
                    <source>

                        <italic toggle="yes">Nature.</italic>
</source>
                    <year>2000</year>;<volume>408</volume>:<fpage>965</fpage>&#x2013;<lpage>967</lpage>.
                    <pub-id pub-id-type="pmid">11140681</pub-id>
                    <pub-id pub-id-type="doi">10.1038/35050087</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref49">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Tang</surname>
                            <given-names>Y-J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pang</surname>
                            <given-names>Y-H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Liu</surname>
                            <given-names>B</given-names>
                        </name>
</person-group>:
                    <article-title>IDP-Seq2Seq: identification of intrinsically disordered regions based on sequence to sequence learning.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2021</year>;<volume>36</volume>:<fpage>5177</fpage>&#x2013;<lpage>5186</lpage>.
                    <pub-id pub-id-type="pmid">32702119</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btaa667</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref50">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Tantos</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Han</surname>
                            <given-names>K-H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Tompa</surname>
                            <given-names>P</given-names>
                        </name>
</person-group>:
                    <article-title>Intrinsic disorder in cell signaling and gene transcription.</article-title>
                    <source>

                        <italic toggle="yes">Mol. Cell. Endocrinol.</italic>
</source>
                    <year>2012</year>;<volume>348</volume>:<fpage>457</fpage>&#x2013;<lpage>465</lpage>.
                    <pub-id pub-id-type="doi">10.1016/j.mce.2011.07.015</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref51">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <collab>The UniProt Consortium</collab>

                        <name name-style="western">
                            <surname>Bateman</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Martin</surname>
                            <given-names>M-J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>UniProt: the Universal Protein Knowledgebase in 2023.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2022</year>;<fpage>gkac1052</fpage>.</mixed-citation>
            </ref>
            <ref id="ref52">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Vienne</surname>
                            <given-names>DM</given-names>
                            <prefix>de</prefix>
                        </name>
</person-group>:
                    <article-title>Lifemap: Exploring the Entire Tree of Life.</article-title>
                    <source>

                        <italic toggle="yes">PLoS Biol.</italic>
</source>
                    <year>2016</year>;<volume>14</volume>:<fpage>e2001624</fpage>.
                    <pub-id pub-id-type="pmid">28005907</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pbio.2001624</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5179005</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref53">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wallis</surname>
                            <given-names>DE</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Roessler</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hehr</surname>
                            <given-names>U</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Mutations in the homeodomain of the human SIX3 gene cause holoprosencephaly.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Genet.</italic>
</source>
                    <year>1999</year>;<volume>22</volume>:<fpage>196</fpage>&#x2013;<lpage>198</lpage>.
                    <pub-id pub-id-type="doi">10.1038/9718</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref54">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Walsh</surname>
                            <given-names>I</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Martin</surname>
                            <given-names>AJM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Di Domenico</surname>
                            <given-names>T</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>ESpritz: accurate and fast prediction of protein disorder.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2012</year>;<volume>28</volume>:<fpage>503</fpage>&#x2013;<lpage>509</lpage>.
                    <pub-id pub-id-type="pmid">22190692</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btr682</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref55">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ma</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Xu</surname>
                            <given-names>J</given-names>
                        </name>
</person-group>:
                    <article-title>AUCpreD: proteome-level protein disorder prediction by AUC-maximized deep convolutional neural fields.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2016</year>;<volume>32</volume>:<fpage>i672</fpage>&#x2013;<lpage>i679</lpage>.
                    <pub-id pub-id-type="pmid">27587688</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btw446</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5013916</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref56">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ward</surname>
                            <given-names>JJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sodhi</surname>
                            <given-names>JS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>McGuffin</surname>
                            <given-names>LJ</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Prediction and Functional Analysis of Native Disorder in Proteins from the Three Kingdoms of Life.</article-title>
                    <source>

                        <italic toggle="yes">J. Mol. Biol.</italic>
</source>
                    <year>2004</year>;<volume>337</volume>:<fpage>635</fpage>&#x2013;<lpage>645</lpage>.
                    <pub-id pub-id-type="pmid">15019783</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.jmb.2004.02.002</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref57">
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Williams</surname>
                            <given-names>RM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Obradovic</surname>
                            <given-names>Z</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mathura</surname>
                            <given-names>V</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <chapter-title>The Protein Non-folding Problem: Amino Acid Determinants of Intrinsic Order and Disorder</chapter-title>.
                    <source>

                        <italic toggle="yes">Biocomputing 2001.</italic>
</source>
                    <publisher-loc>Mauna Lani, Hawaii</publisher-loc>:
                    <publisher-name>World Scientific</publisher-name>;<year>2000</year>; pp.<fpage>89</fpage>&#x2013;<lpage>100</lpage>.</mixed-citation>
            </ref>
            <ref id="ref58">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wootton</surname>
                            <given-names>JC</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Federhen</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>Statistics of local complexity in amino acid sequences and sequence databases.</article-title>
                    <source>

                        <italic toggle="yes">Comput. Chem.</italic>
</source>
                    <year>1993</year>;<volume>17</volume>:<fpage>149</fpage>&#x2013;<lpage>163</lpage>.
                    <pub-id pub-id-type="doi">10.1016/0097-8485(93)85006-X</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref59">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Xue</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Brown</surname>
                            <given-names>CJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Dunker</surname>
                            <given-names>AK</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Intrinsically disordered regions of p53 family are highly diversified in evolution.</article-title>
                    <source>

                        <italic toggle="yes">Biochim Biophys Acta BBA - Proteins Proteomics.</italic>
</source>
                    <year>2013</year>;<volume>1834</volume>:<fpage>725</fpage>&#x2013;<lpage>738</lpage>.
                    <pub-id pub-id-type="pmid">23352836</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.bbapap.2013.01.012</pub-id>
                    <pub-id pub-id-type="pmcid">PMC3905691</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref60">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Xue</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Dunker</surname>
                            <given-names>AK</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Uversky</surname>
                            <given-names>VN</given-names>
                        </name>
</person-group>:
                    <article-title>Orderly order in protein intrinsic disorder distribution: disorder in 3500 proteomes from viruses and the three domains of life.</article-title>
                    <source>

                        <italic toggle="yes">J. Biomol. Struct. Dyn.</italic>
</source>
                    <year>2012</year>;<volume>30</volume>:<fpage>137</fpage>&#x2013;<lpage>149</lpage>.
                    <pub-id pub-id-type="pmid">22702725</pub-id>
                    <pub-id pub-id-type="doi">10.1080/07391102.2012.675145</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref61">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Yates</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Akanni</surname>
                            <given-names>W</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Amode</surname>
                            <given-names>MR</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Ensembl 2016.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2016</year>;<volume>44</volume>:<fpage>D710</fpage>&#x2013;<lpage>D716</lpage>.
                    <pub-id pub-id-type="pmid">26687719</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkv1157</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4702834</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref62">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Yu</surname>
                            <given-names>Y-K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Altschul</surname>
                            <given-names>SF</given-names>
                        </name>
</person-group>:
                    <article-title>The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2005</year>;<volume>21</volume>:<fpage>902</fpage>&#x2013;<lpage>911</lpage>.
                    <pub-id pub-id-type="pmid">15509610</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/bti070</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref63">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zhang</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Faraggi</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Xue</surname>
                            <given-names>B</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>SPINE-D: Accurate Prediction of Short and Long Disordered Regions by a Single Neural-Network Based Method.</article-title>
                    <source>

                        <italic toggle="yes">J. Biomol. Struct. Dyn.</italic>
</source>
                    <year>2012</year>;<volume>29</volume>:<fpage>799</fpage>&#x2013;<lpage>813</lpage>.
                    <pub-id pub-id-type="pmid">22208280</pub-id>
                    <pub-id pub-id-type="doi">10.1080/073911012010525022</pub-id>
                    <pub-id pub-id-type="pmcid">PMC3297974</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref64">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zhao</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Guo</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sheng</surname>
                            <given-names>Q</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Heatmap3: an improved heatmap package with more powerful and convenient features.</article-title>
                    <source>

                        <italic toggle="yes">BMC Bioinformatics.</italic>
</source>
                    <year>2014</year>;<volume>15</volume>(<issue>10</issue>):<fpage>1</fpage>&#x2013;<lpage>2</lpage>.
                    <pub-id pub-id-type="doi">10.1186/1471-2105-15-S10-P16</pub-id>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report164958">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.142650.r164958</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Amoutzias</surname>
                        <given-names>Gregory</given-names>
                    </name>
                    <xref ref-type="aff" rid="r164958a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-5961-964X</uri>
                </contrib>
                <aff id="r164958a1">
                    <label>1</label>Bioinformatics Laboratory, Department of Biochemistry and Biotechnology, University of Thessaly, Larissa, Greece</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>13</day>
                <month>3</month>
                <year>2023</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2023 Amoutzias G</copyright-statement>
                <copyright-year>2023</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport164958" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.129929.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The authors identified disordered regions in human proteins, they investigated their annotation and their association with diseases and then investigated the presence/absence pattern of these disordered proteins in homologs from various taxa. Interestingly, the authors found that disease-related proteins tend to have LCRs that are enriched in charged, hydrophilic amino acids. In addition, they investigated the phylogenetic distribution of these proteins. The authors discuss in detail some very interesting disease-associated genes with LCRs and their distinct phylogenetic depth. This is an interesting and very useful study that sheds more light (also from an evolutionary point of view) on the intriguing link between LCRs and disease, and importantly, it functions as a pilot study for a larger-scale planned analysis by the authors.</p>
            <p> </p>
            <p> Comments/suggestions:</p>
            <p> </p>
            <p> In introduction, I would also add a paragraph and explain in more detail the terms 'compositional bias' and 'low complexity' and explain that they are strongly linked to IDRs. Concerning the LCRs part, I would include the references of Wooton 
                <italic>et al.</italic> (1994
                <sup>
                    <xref ref-type="bibr" rid="rep-ref-164958-1">1</xref>
                </sup>), of Karlin 
                <italic>et al.</italic> (2002
                <sup>
                    <xref ref-type="bibr" rid="rep-ref-164958-2">2</xref>
                </sup>), of Schaper 
                <italic>et al. </italic>(2014
                <sup>
                    <xref ref-type="bibr" rid="rep-ref-164958-3">3</xref>
                </sup>) and explain that they were originally thought of as junk or linker regions, but not anymore (Haerty 
                <italic>et al.</italic>, 2010
                <sup>
                    <xref ref-type="bibr" rid="rep-ref-164958-4">4</xref>
                </sup>).</p>
            <p> </p>
            <p> In Methods: &#x201c;Searching with query datasets against Proteomes for the creation of phylogenetic profile patterns was performed with DIAMOND blastp&#x201d;. I understand that the authors search for homologs and not only orthologs, is that correct? Also, concerning the cutoff of 21% sequence similarity, is it for any local alignment that DIAMOND reports, or is it for a certain query coverage too?</p>
            <p> </p>
            <p> Maybe Figure 1 and Table 1 could be moved to Results.</p>
            <p> </p>
            <p> In Results, third paragraph. The authors clearly identify a strong compositional bias for disease-associated transcripts. It would be very informative to mention the enrichment fold of 1.8 ((1845/1780) / (36250/62827)). Also, better mention the p-value as 1e-5. I wonder if disease-associated genes tend to have more transcripts on average than non-disease genes? If they do, are the transcripts-isoforms more frequently retaining the compositionally biased region? Maybe the authors could repeat this chi-square test, where they use the longest transcript per gene.</p>
            <p> </p>
            <p> Some thoughts for the future, when the authors perform their planned large-scale analysis: Do disease-associated genes and transcripts with compositional bias have more wide or more restricted gene expression profiles?</p>
            <p> </p>
            <p> Results: &#x201c;The striking over-representation of serine/threonine (S/T) tracts, along with glutamate/glutamine (E/Q) and proline (P) followed by lysine (K) is indicative of the main residue types that might affect functional properties of human proteins, including their potential association with known phenotypes, such as polyglutamine tracts with neurodegenerative diseases (Bunting et al., 2022)&#x201d;</p>
            <p> It would also be informative to add that tandem repeats of short oligopeptides that are rich in glycine, proline, serine or threonine are capable of forming flexible structures that bind ligands under certain pH and temperature conditions (Matsushima 
                <italic>et al.</italic>, 2008
                <sup>
                    <xref ref-type="bibr" rid="rep-ref-164958-5">5</xref>
                </sup>). Maybe also add Williamson 
                <italic>et al. </italic>(1994
                <sup>
                    <xref ref-type="bibr" rid="rep-ref-164958-6">6</xref>
                </sup>), concerning proline rich regions.</p>
            <p> </p>
            <p> I would recommend that the authors explain somewhere in detail what the CAST score means.</p>
            <p> </p>
            <p> Concerning the section of phylogenetic profiling, first paragraph: It would be informative to show what is the enrichment of disease in the well characterized CB proteins, compared to the background, and the statistical significance (Hypergeometric test probably). For example, the background is X disease-proteins in Y total proteins (or genes) of the genome. Enrichment: (17/100)/(X/Y).</p>
            <p> Same section, second paragraph: I guess its pairwise local sequence alignment with DIAMOND?</p>
            <p> Same section, third paragraph: &#x201c;Focusing on the 100-gene subset with confident disease associations&#x201d;. I guess these are the 17 proteins of the 100 gene subset with known IDs?</p>
            <p> </p>
            <p> Within the abstract, the authors use all three related terms, intrinsically disordered regions (IDRs), compositional bias, low complexity regions (LCRs). Maybe they could use only one (in the abstract). Also, they should explain in the Introduction their inter-changeability.</p>
            <p> </p>
            <p> Abstract: &#x201c;The evolutionary rate of disordered proteins&#x201d;. I would rephrase that as disordered protein regions.</p>
            <p> </p>
            <p> Abstract: Low complexity proteins or proteins with LCRs?</p>
            <p> </p>
            <p> In keywords, I would rephrase towards: low complexity region (LCR).</p>
            <p> </p>
            <p> In Introduction, please correct the CAID Predictors et al reference.</p>
            <p> </p>
            <p> In Introduction: &#x201c;are transiently associated with intrinsic disorder in nucleated cells&#x201d;. Could the authors please explain in more detail what they mean with the term &#x201c;transiently&#x201d;?</p>
            <p> </p>
            <p> Introduction: the paragraph that discusses the link between intrinsic disorder and disease: It would be nice to briefly mention one specific example of how intrinsic disorder is associated with human disease.</p>
            <p> </p>
            <p> Introduction: &#x201c;while others appearing particularly diversified&#x201d;. Maybe the term &#x201c;rapidly evolving&#x201d; would be better.</p>
            <p> </p>
            <p> In Methods, when first mentioning GHR, I would include the entire name &#x201c;Genetics Home Reference&#x201d;.</p>
            <p> </p>
            <p> In Methods: &#x201c;a total of 7269 disease-gene, high-confidence associations&#x201d;. It would be even more informative if the number of unique genes and the number of unique diseases in these associations were included too.</p>
            <p> </p>
            <p> In Methods, could the authors please elaborate more on what MagicMatch does? I did not exactly understand this part: &#x201c;to verify the identity of the reference proteome collection against the modified identifier space&#x201d;.</p>
            <p> </p>
            <p> Concerning the calculation of amino acid frequencies across the Ensembl protein set, was that done for all protein isoforms of a certain gene, or only for one representative isoform (i.e. the longest one) from each gene?</p>
            <p> </p>
            <p> In table 1, the numbers are for human genes or transcripts/protein isoforms?</p>
            <p> </p>
            <p> In Methods: &#x201c;For each visualisation, a list of the NCBI IDs&#x201d;. Which types of IDs?</p>
            <p> </p>
            <p> Results: &#x201c;(GRCh38.p13), containing 119068 gene transcripts&#x201d;. Could the authors also mention the number of genes?</p>
            <p> </p>
            <p> Results: &#x201c;Examination of the low complexity gene dataset features highlighted the significant divergence among amino acid related, low complexity frequencies.&#x201d; Maybe better use the term difference because divergence relates to conservation.</p>
            <p> </p>
            <p> Results: &#x201c;Charged, hydrophilic residues appear over-represented&#x201d;. I would also mention them in parenthesis.</p>
            <p> </p>
            <p> In figure 3, this must be the subset of 17 disease-associated genes from the set of 100 human proteins with well-characterised compositionally biased regions? . The title of Figure 3 should be changed accordingly. Is it possible to also include a key of DOIDS-disease next to the figure, or is it too many of them?</p>
            <p> </p>
            <p> Figure 4 legend: could the authors explain this more?: &#x201c;The heatmap range reflects the dissimilarity matrix of the plotted values.&#x201d;. I am not sure I understood figure 4. Does the figure only show if a homologue is present in a certain species, or does it show as well if the homologue also contains an LCR or CB region as well? Does this correspond to the red/blue colour of the matrix?</p>
            <p> Concerning figure 4, as a thought/suggestion for future studies, when the authors move to larger-scale analyses, maybe they could also include an analogous analysis, where they show the presence of orthologs, not homologs. For that, they would have to use best reciprocal blast and one representative protein (the longest) from each gene of a genome. Or, maybe the authors could do that using the orthology presence from the OMA database (https://omabrowser.org/oma/home/).</p>
            <p> </p>
            <p> Figure 5: visualized by Lifemap.</p>
            <p> </p>
            <p> Concerning supplementary .map file, I would simply convert it to a csv file for import in excel. I would also add two columns, one with the species name of the proteome and another one with the wider taxonomic group that the species belongs to. Also, could the authors explain what the numbers in the cells correspond to, I guess it's the number of homologs in that species?</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Partly</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Yes</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Yes</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Bioinformatics, evolution, sequence analysis</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <back>
            <ref-list>
                <title>References</title>
                <ref id="rep-ref-164958-1">
                    <label>1</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Non-globular domains in protein sequences: automated segmentation using complexity measures.</article-title>
                        <source>
                            <italic>Comput Chem</italic>
                        </source>.<year>1994</year>;<volume>18</volume>(<issue>3</issue>) :
                        <elocation-id>10.1016/0097-8485(94)85023-2</elocation-id>
                        <fpage>269</fpage>-<lpage>85</lpage>
                        <pub-id pub-id-type="pmid">7952898</pub-id>
                        <pub-id pub-id-type="doi">10.1016/0097-8485(94)85023-2</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-164958-2">
                    <label>2</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Amino acid runs in eukaryotic proteomes and disease associations.</article-title>
                        <source>
                            <italic>Proc Natl Acad Sci U S A</italic>
                        </source>.<year>2002</year>;<volume>99</volume>(<issue>1</issue>) :
                        <elocation-id>10.1073/pnas.012608599</elocation-id>
                        <fpage>333</fpage>-<lpage>8</lpage>
                        <pub-id pub-id-type="pmid">11782551</pub-id>
                        <pub-id pub-id-type="doi">10.1073/pnas.012608599</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-164958-3">
                    <label>3</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Deep conservation of human protein tandem repeats within the eukaryotes.</article-title>
                        <source>
                            <italic>Mol Biol Evol</italic>
                        </source>.<year>2014</year>;<volume>31</volume>(<issue>5</issue>) :
                        <elocation-id>10.1093/molbev/msu062</elocation-id>
                        <fpage>1132</fpage>-<lpage>48</lpage>
                        <pub-id pub-id-type="pmid">24497029</pub-id>
                        <pub-id pub-id-type="doi">10.1093/molbev/msu062</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-164958-4">
                    <label>4</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Low-complexity sequences and single amino acid repeats: not just "junk" peptide sequences.</article-title>
                        <source>
                            <italic>Genome</italic>
                        </source>.<year>2010</year>;<volume>53</volume>(<issue>10</issue>) :
                        <elocation-id>10.1139/g10-063</elocation-id>
                        <fpage>753</fpage>-<lpage>62</lpage>
                        <pub-id pub-id-type="pmid">20962881</pub-id>
                        <pub-id pub-id-type="doi">10.1139/g10-063</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-164958-5">
                    <label>5</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Flexible structures and ligand interactions of tandem repeats consisting of proline, glycine, asparagine, serine, and/or threonine rich oligopeptides in proteins.</article-title>
                        <source>
                            <italic>Curr Protein Pept Sci</italic>
                        </source>.<year>2008</year>;<volume>9</volume>(<issue>6</issue>) :
                        <elocation-id>10.2174/138920308786733886</elocation-id>
                        <fpage>591</fpage>-<lpage>610</lpage>
                        <pub-id pub-id-type="pmid">19075749</pub-id>
                        <pub-id pub-id-type="doi">10.2174/138920308786733886</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-164958-6">
                    <label>6</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>The structure and function of proline-rich regions in proteins.</article-title>
                        <source>
                            <italic>Biochem J</italic>
                        </source>.<year>1994</year>;<volume>297 ( Pt 2)</volume>(<issue>Pt 2</issue>) :
                        <elocation-id>10.1042/bj2970249</elocation-id>
                        <fpage>249</fpage>-<lpage>60</lpage>
                        <pub-id pub-id-type="pmid">8297327</pub-id>
                        <pub-id pub-id-type="doi">10.1042/bj2970249</pub-id>
                    </mixed-citation>
                </ref>
            </ref-list>
        </back>
        <sub-article article-type="response" id="comment9534-164958">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Chasapi</surname>
                            <given-names>Anastasia</given-names>
                        </name>
                        <aff/>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>3</day>
                    <month>4</month>
                    <year>2023</year>
                </pub-date>
            </front-stub>
            <body>
                <p>&#x201c;In introduction, I would also add a paragraph and explain in more detail the terms 'compositional bias' and 'low complexity' and explain that they are strongly linked to IDRs. Concerning the LCRs part, I would include the references of Wooton&#x00a0;
                    <italic>et al.</italic>&#x00a0;(1994
                    <ext-link ext-link-type="uri" xlink:href="https://f1000research.com/articles/12-198/v1#rep-ref-164958-1">
                        <sup>1</sup>
                    </ext-link>), of Karlin&#x00a0;
                    <italic>et al.</italic>&#x00a0;(2002
                    <ext-link ext-link-type="uri" xlink:href="https://f1000research.com/articles/12-198/v1#rep-ref-164958-2">
                        <sup>2</sup>
                    </ext-link>), of Schaper&#x00a0;
                    <italic>et al.&#x00a0;</italic>(2014
                    <ext-link ext-link-type="uri" xlink:href="https://f1000research.com/articles/12-198/v1#rep-ref-164958-3">
                        <sup>3</sup>
                    </ext-link>) and explain that they were originally thought of as junk or linker regions, but not anymore (Haerty&#x00a0;
                    <italic>et al.</italic>, 2010
                    <ext-link ext-link-type="uri" xlink:href="https://f1000research.com/articles/12-198/v1#rep-ref-164958-4">
                        <sup>4</sup>
                    </ext-link>).&#x201d;</p>
                <p> 
                    <bold>
                        <italic>The introduction has been enriched to accommodate most points raised by both reviewers.</italic>
                    </bold>
                </p>
                <p> </p>
                <p> &#x201c;In Methods: &#x201c;Searching with query datasets against Proteomes for the creation of phylogenetic profile patterns was performed with DIAMOND blastp&#x201d;. I understand that the authors search for homologs and not only orthologs, is that correct? Also, concerning the cutoff of 21% sequence similarity, is it for any local alignment that DIAMOND reports, or is it for a certain query coverage too?&#x201d;</p>
                <p> 
                    <bold>
                        <italic>We thank the reviewer for the comment. For the phylogenetic profiling analysis, only sequence similarity, not alignment geometry, was taken into consideration. We have added an explanatory phrase in the Methods section.</italic>
                    </bold>
                </p>
                <p> </p>
                <p> &#x201c;Maybe Figure 1 and Table 1 could be moved to Results.&#x201d;</p>
                <p> 
                    <bold>
                        <italic>Table 1 and Figure 1 have been moved.</italic>
                    </bold>
                </p>
                <p> </p>
                <p> &#x201c;In Results, third paragraph. The authors clearly identify a strong compositional bias for disease-associated transcripts. It would be very informative to mention the enrichment fold of 1.8 ((1845/1780) / (36250/62827)). Also, better mention the p-value as 1e-5. I wonder if disease-associated genes tend to have more transcripts on average than non-disease genes? If they do, are the transcripts-isoforms more frequently retaining the compositionally biased region? Maybe the authors could repeat this chi-square test, where they use the longest transcript per gene.&#x201d;</p>
                <p> 
                    <bold>
                        <italic>Enrichment fold and p-value modification have been added. No reference to transcript abundance has been attempted, an interesting question for further research in the future.</italic>
                    </bold>
                </p>
                <p> </p>
                <p> &#x201c;Some thoughts for the future, when the authors perform their planned large-scale analysis: Do disease-associated genes and transcripts with compositional bias have more wide or more restricted gene expression profiles?&#x201d;</p>
                <p> 
                    <bold>
                        <italic>We thank the reviewer for the very interesting thought. We have not looked into gene expression for the time being, but it would be a very informative functional parameter for follow up study.</italic>
                    </bold>
                </p>
                <p> </p>
                <p> &#x201c;Results: &#x201c;The striking over-representation of serine/threonine (S/T) tracts, along with glutamate/glutamine (E/Q) and proline (P) followed by lysine (K) is indicative of the main residue types that might affect functional properties of human proteins, including their potential association with known phenotypes, such as polyglutamine tracts with neurodegenerative diseases (Bunting et al., 2022)&#x201d;. It would also be informative to add that tandem repeats of short oligopeptides that are rich in glycine, proline, serine or threonine are capable of forming flexible structures that bind ligands under certain pH and temperature conditions (Matsushima&#x00a0;
                    <italic>et al.</italic>, 2008
                    <ext-link ext-link-type="uri" xlink:href="https://f1000research.com/articles/12-198/v1#rep-ref-164958-5">
                        <sup>5</sup>
                    </ext-link>). Maybe also add Williamson&#x00a0;
                    <italic>et al.&#x00a0;</italic>(1994
                    <ext-link ext-link-type="uri" xlink:href="https://f1000research.com/articles/12-198/v1#rep-ref-164958-6">
                        <sup>6</sup>
                    </ext-link>), concerning proline rich regions.&#x201d;</p>
                <p> 
                    <bold>
                        <italic>We thank the reviewer for the suggestion. The text has been enriched accordingly.</italic>
                    </bold>
                </p>
                <p> </p>
                <p> &#x201c;I would recommend that the authors explain somewhere in detail what the CAST score means.&#x201d;</p>
                <p> 
                    <bold>
                        <italic>Done (figure 1 legend)</italic>
                    </bold>
                </p>
                <p> </p>
                <p> &#x201c;Concerning the section of phylogenetic profiling, first paragraph: It would be informative to show what is the enrichment of disease in the well characterized CB proteins, compared to the background, and the statistical significance (Hypergeometric test probably). For example, the background is X disease-proteins in Y total proteins (or genes) of the genome. Enrichment: (17/100)/(X/Y).&#x201d;</p>
                <p> 
                    <bold>
                        <italic>We thank the reviewer for the suggestion. We have chosen a 100-protein low complexity query set, with characteristic CB for further exploration regarding disease association and phylogenetic depth. However, this query set is not necessarily representative of the whole genome, and we therefore chose not to make any extrapolation.</italic>
                    </bold>
                </p>
                <p> </p>
                <p> &#x201c;Same section, second paragraph: I guess its pairwise local sequence alignment with DIAMOND?&#x201d;</p>
                <p> 
                    <bold>
                        <italic>Indeed, as described in the methods section. Reference to methods has been added.</italic>
                    </bold>
                </p>
                <p> </p>
                <p> &#x201c;Same section, third paragraph: &#x201c;Focusing on the 100-gene subset with confident disease associations&#x201d;. I guess these are the 17 proteins of the 100 gene subset with known IDs?&#x201d;</p>
                <p> 
                    <bold>
                        <italic>Indeed. A small clarification has been added.</italic>
                    </bold>
                </p>
                <p> </p>
                <p> &#x201c;Within the abstract, the authors use all three related terms, intrinsically disordered regions (IDRs), compositional bias, low complexity regions (LCRs). Maybe they could use only one (in the abstract). Also, they should explain in the Introduction their inter-changeability.&#x201d;</p>
                <p> 
                    <bold>
                        <italic>Done</italic>
                    </bold>
                </p>
                <p> </p>
                <p> &#x201c;Abstract: &#x201c;The evolutionary rate of disordered proteins&#x201d;. I would rephrase that as disordered protein regions.&#x201d;</p>
                <p> 
                    <bold>
                        <italic>Done</italic>
                    </bold>
                </p>
                <p> </p>
                <p> &#x201c;Abstract: Low complexity proteins or proteins with LCRs?&#x201d;</p>
                <p> 
                    <bold>
                        <italic>Thank you, it is indeed more accurate and has been changed.</italic>
                    </bold>
                </p>
                <p> </p>
                <p> &#x201c;In keywords, I would rephrase towards: low complexity region (LCR).&#x201d;</p>
                <p> 
                    <bold>
                        <italic>Done</italic>
                    </bold>
                </p>
                <p> </p>
                <p> &#x201c;In Introduction, please correct the CAID Predictors et al reference.&#x201d;</p>
                <p> 
                    <bold>
                        <italic>Thank you for your comment, it has been corrected.</italic>
                    </bold>
                </p>
                <p> </p>
                <p> &#x201c;In Introduction: &#x201c;are transiently associated with intrinsic disorder in nucleated cells&#x201d;. Could the authors please explain in more detail what they mean with the term &#x201c;transiently&#x201d;?&#x201d;</p>
                <p> 
                    <bold>
                        <italic>&#x201c;Transiently&#x201d; was replaced with &#x201c;dynamically&#x201d; for better clarity.</italic>
                    </bold>
                </p>
                <p> </p>
                <p> &#x201c;Introduction: the paragraph that discusses the link between intrinsic disorder and disease: It would be nice to briefly mention one specific example of how intrinsic disorder is associated with human disease.&#x201d;</p>
                <p> 
                    <bold>
                        <italic>A paragraph explaining the role of alpha-synuclein and relation in PD onset has been added.</italic>
                    </bold>
                </p>
                <p> </p>
                <p> &#x201c;Introduction: &#x201c;while others appearing particularly diversified&#x201d;. Maybe the term &#x201c;rapidly evolving&#x201d; would be better.&#x201d;</p>
                <p> 
                    <bold>
                        <italic>Done</italic>
                    </bold>
                </p>
                <p> </p>
                <p> &#x201c;In Methods, when first mentioning GHR, I would include the entire name &#x201c;Genetics Home Reference&#x201d;.&#x201d;</p>
                <p> 
                    <bold>
                        <italic>Done</italic>
                    </bold>
                </p>
                <p> </p>
                <p> &#x201c;In Methods: &#x201c;a total of 7269 disease-gene, high-confidence associations&#x201d;. It would be even more informative if the number of unique genes and the number of unique diseases in these associations were included too.&#x201d;</p>
                <p> 
                    <bold>
                        <italic>Thank you for the comment, we have added the values.</italic>
                    </bold>
                </p>
                <p> </p>
                <p> &#x201c;In Methods, could the authors please elaborate more on what MagicMatch does? I did not exactly understand this part: &#x201c;to verify the identity of the reference proteome collection against the modified identifier space&#x201d;.&#x201d;</p>
                <p> 
                    <bold>
                        <italic>We thank the reviewer for the comment. MagicMatch is used so that a sequence-identity check at 100% is performed based on sequence only, no comparison of identifiers. A clarification has been added in the Methods section.</italic>
                    </bold>
                </p>
                <p> </p>
                <p> &#x201c;Concerning the calculation of amino acid frequencies across the Ensembl protein set, was that done for all protein isoforms of a certain gene, or only for one representative isoform (i.e. the longest one) from each gene?&#x201d;</p>
                <p> 
                    <bold>
                        <italic>The calculation concerns the whole Ensembl protein dataset, therefore all reported isoforms for each gene.</italic>
                    </bold>
                </p>
                <p> </p>
                <p> &#x201c;In table 1, the numbers are for human genes or transcripts/protein isoforms?&#x201d;</p>
                <p> 
                    <bold>
                        <italic>We thank the reviewer for the observation. The numbers correspond to gene transcripts, we have changed the Table legend.</italic>
                    </bold>
                </p>
                <p> </p>
                <p> &#x201c;In Methods: &#x201c;For each visualisation, a list of the NCBI IDs&#x201d;. Which types of IDs?&#x201d;</p>
                <p> 
                    <bold>
                        <italic>It has been clarified in the text.</italic>
                    </bold>
                </p>
                <p> </p>
                <p> &#x201c;Results: &#x201c;(GRCh38.p13), containing 119068 gene transcripts&#x201d;. Could the authors also mention the number of genes?&#x201d;</p>
                <p> 
                    <bold>
                        <italic>Done</italic>
                    </bold>
                </p>
                <p> </p>
                <p> &#x201c;Results: &#x201c;Examination of the low complexity gene dataset features highlighted the significant divergence among amino acid related, low complexity frequencies.&#x201d; Maybe better use the term difference because divergence relates to conservation.&#x201d;</p>
                <p> 
                    <bold>
                        <italic>Done</italic>
                    </bold>
                </p>
                <p> </p>
                <p> &#x201c;Results: &#x201c;Charged, hydrophilic residues appear over-represented&#x201d;. I would also mention them in parenthesis.&#x201d;</p>
                <p> 
                    <bold>
                        <italic>Done</italic>
                    </bold>
                </p>
                <p> </p>
                <p> &#x201c;In figure 3, this must be the subset of 17 disease-associated genes from the set of 100 human proteins with well-characterised compositionally biased regions? . The title of Figure 3 should be changed accordingly. Is it possible to also include a key of DOIDS-disease next to the figure, or is it too many of them?&#x201d;</p>
                <p> 
                    <bold>
                        <italic>The title has been changed accordingly. Regarding the DOIDs, we have chosen to include only the IDs as some of them are quite descriptive and would result in a &#x201c;noisy&#x201d; figure</italic>
                    </bold>
                </p>
                <p> </p>
                <p> &#x201c;Figure 4 legend: could the authors explain this more?: &#x201c;The heatmap range reflects the dissimilarity matrix of the plotted values.&#x201d;. I am not sure I understood figure 4. Does the figure only show if a homologue is present in a certain species, or does it show as well if the homologue also contains an LCR or CB region as well? Does this correspond to the red/blue colour of the matrix?</p>
                <p> Concerning figure 4, as a thought/suggestion for future studies, when the authors move to larger-scale analyses, maybe they could also include an analogous analysis, where they show the presence of orthologs, not homologs. For that, they would have to use best reciprocal blast and one representative protein (the longest) from each gene of a genome. Or, maybe the authors could do that using the orthology presence from the OMA database (
                    <ext-link ext-link-type="uri" xlink:href="https://omabrowser.org/oma/home/">https://omabrowser.org/oma/home/</ext-link>).&#x201d;</p>
                <p> 
                    <bold>
                        <italic>We thank the author for the comment. The heatmap represents the number of homologues found across species, without specifying whether these homologues contain LCR/CB regions. The colour range is indicative of this matrix of values, with normalization applied across rows (each row is scaled to have a mean of 0 and standard deviation of 1). We have clarified these points in the figure legend and Methods.</italic>
                    </bold>
                </p>
                <p> 
                    <bold>
                        <italic>Although in this study all analyses concern homologue detection, we welcome the reviewer&#x2019;s comment for performing similar comparisons for orthologue presence, potentially annotating LCR presence in parallel</italic>
                    </bold>
                </p>
                <p> </p>
                <p> &#x201c;Concerning supplementary .map file, I would simply convert it to a csv file for import in excel. I would also add two columns, one with the species name of the proteome and another one with the wider taxonomic group that the species belongs to. Also, could the authors explain what the numbers in the cells correspond to, I guess it's the number of homologs in that species?&#x201d;</p>
                <p> 
                    <italic>
                        <bold>We thank the reviewer for the suggestions. The file has been converted to .csv and the MagicMatch identifier has been separated in respective columns, for better readability. These include reference proteome identifier, NCBI taxonomy ID, species code, taxonomic domain (eukaryota/bacteria/archaea). Moreover, a column with the full species name has been added. Regarding the numbers, they indeed correspond to homologs, an information that has been added to the Zenodo file description along the description of the rest of the columns.</bold>
                    </italic>
                </p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report164957">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.142650.r164957</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Tamana</surname>
                        <given-names>Stella</given-names>
                    </name>
                    <xref ref-type="aff" rid="r164957a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-3414-4972</uri>
                </contrib>
                <aff id="r164957a1">
                    <label>1</label>Molecular Genetics Thalassaemia Department, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>3</day>
                <month>3</month>
                <year>2023</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2023 Tamana S</copyright-statement>
                <copyright-year>2023</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport164957" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.129929.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The presented manuscript &#x201c;Disease association and comparative genomics of compositional bias in human proteins&#x201d; focuses on the very interesting, but challenging to analyze and interpret, study of the structural, functional and evolutionary properties of compositional bias in human proteins. The authors report novel findings and a strong correlation between compositional bias and disease association. Studying compositional bias and disease association, both computationally and experimentally, has been quite challenging over the years, mainly because proteins with compositional bias tend to conform into non-globular structures or be in a disordered state and thus, requiring special treatment in fundamental steps of comparative genomic analyses and experimental procedures to determine their three-dimensional (3D) structures. This is the first time that, successfully, a computational framework, combined genome-wide disease-association analysis of compositional bias with regards to their functional, structural, and evolutionary properties. The presentation is very clear, well-structured, and easy to comprehend. The reported computational framework and methodology take advantage of up-to-date computational tools and statistical packages. This paper is of interest to scientists within the field of human diseases and variant interpretation as elucidating specific functional and structural patterns of groups of genes (especially those presenting extreme compositional bias) involved in the same disease will ultimately help discover the underlying mechanisms and develop new experimental procedures.&#x00a0;</p>
            <p> </p>
            <p> 
                <bold>Minor comments:</bold> 
                <list list-type="bullet">
                    <list-item>
                        <p>
                            <bold>Introduction &#x2013; Disordered proteins exhibit specific patterns at the sequence level</bold> 
                            <list list-type="bullet">
                                <list-item>
                                    <p>This section will be greatly enhanced if the authors consider adding a paragraph (or 2-3 lines) of the structural characteristics and underlying mechanisms of IDPs/IDRs and thus, giving the reader the opportunity right from the start of the paper to understand the correlation between compositional bias and disease-association. For example, providing in a bit more detail that order-promoting residue types are commonly found within the hydrophobic cores of foldable proteins as opposed to disorder-promoting residues typically located at the surface of foldable proteins (for references see Theillet 
                                        <italic>et al.</italic>, 2013
                                        <sup>
                                            <xref ref-type="bibr" rid="rep-ref-164957-1">1</xref>
                                        </sup>; Uversky, 2013
                                        <sup>
                                            <xref ref-type="bibr" rid="rep-ref-164957-2">2</xref>
                                        </sup>). Also, another example could be that hydrophobic enriched regions are prone to induce either self-aggregation or/and intermolecular interactions with surrounding proteins when exposed and thus, trigger aggregation (for reference see Grignaschi 
                                        <italic>et al.</italic>, 2018
                                        <sup>
                                            <xref ref-type="bibr" rid="rep-ref-164957-3">3</xref>
                                        </sup>).&#x00a0;</p>
                                </list-item>
                            </list> </p>
                    </list-item>
                </list> Line 1-3: please consider adding references.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Yes</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Yes</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Bioinformatics, Compositionally Biased Regions, Comparative genomics</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
        <back>
            <ref-list>
                <title>References</title>
                <ref id="rep-ref-164957-1">
                    <label>1</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>The alphabet of intrinsic disorder: I. Act like a Pro: On the abundance and roles of proline residues in intrinsically disordered proteins.</article-title>
                        <source>
                            <italic>Intrinsically Disord Proteins</italic>
                        </source>.<year>2013</year>;<volume>1</volume>(<issue>1</issue>) :
                        <elocation-id>10.4161/idp.24360</elocation-id>
                        <fpage>e24360</fpage>
                        <pub-id pub-id-type="pmid">28516008</pub-id>
                        <pub-id pub-id-type="doi">10.4161/idp.24360</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-164957-2">
                    <label>2</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>The alphabet of intrinsic disorder: II. Various roles of glutamic acid in ordered and intrinsically disordered proteins.</article-title>
                        <source>
                            <italic>Intrinsically Disord Proteins</italic>
                        </source>.<year>2013</year>;<volume>1</volume>(<issue>1</issue>) :
                        <elocation-id>10.4161/idp.24684</elocation-id>
                        <fpage>e24684</fpage>
                        <pub-id pub-id-type="pmid">28516010</pub-id>
                        <pub-id pub-id-type="doi">10.4161/idp.24684</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-164957-3">
                    <label>3</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>A hydrophobic low-complexity region regulates aggregation of the yeast pyruvate kinase Cdc19 into amyloid-like aggregates in vitro.</article-title>
                        <source>
                            <italic>J Biol Chem</italic>
                        </source>.<year>2018</year>;<volume>293</volume>(<issue>29</issue>) :
                        <elocation-id>10.1074/jbc.RA117.001628</elocation-id>
                        <fpage>11424</fpage>-<lpage>11432</lpage>
                        <pub-id pub-id-type="pmid">29853641</pub-id>
                        <pub-id pub-id-type="doi">10.1074/jbc.RA117.001628</pub-id>
                    </mixed-citation>
                </ref>
            </ref-list>
        </back>
        <sub-article article-type="response" id="comment9533-164957">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Chasapi</surname>
                            <given-names>Anastasia</given-names>
                        </name>
                        <aff/>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>3</day>
                    <month>4</month>
                    <year>2023</year>
                </pub-date>
            </front-stub>
            <body>
                <p>"Introduction &#x2013; Disordered proteins exhibit specific patterns at the sequence level</p>
                <p> This section will be greatly enhanced if the authors consider adding a paragraph (or 2-3 lines) of the structural characteristics and underlying mechanisms of IDPs/IDRs and thus, giving the reader the opportunity right from the start of the paper to understand the correlation between compositional bias and disease-association. For example, providing in a bit more detail that order-promoting residue types are commonly found within the hydrophobic cores of foldable proteins as opposed to disorder-promoting residues typically located at the surface of foldable proteins (for references see Theillet&#x00a0;
                    <italic>et al.</italic>, 2013
                    <ext-link ext-link-type="uri" xlink:href="https://f1000research.com/articles/12-198/v1#rep-ref-164957-1">
                        <sup>1</sup>
                    </ext-link>; Uversky, 2013
                    <ext-link ext-link-type="uri" xlink:href="https://f1000research.com/articles/12-198/v1#rep-ref-164957-2">
                        <sup>2</sup>
                    </ext-link>). Also, another example could be that hydrophobic enriched regions are prone to induce either self-aggregation or/and intermolecular interactions with surrounding proteins when exposed and thus, trigger aggregation (for reference see Grignaschi&#x00a0;
                    <italic>et al.</italic>, 2018
                    <ext-link ext-link-type="uri" xlink:href="https://f1000research.com/articles/12-198/v1#rep-ref-164957-3">
                        <sup>3</sup>
                    </ext-link>). "</p>
                <p> 
                    <bold>
                        <italic>We thank the reviewer for the suggestion. The text has been enriched accordingly.</italic>
                    </bold>
                </p>
                <p> </p>
                <p> "Line 1-3: please consider adding references."</p>
                <p> 
                    <bold>
                        <italic>Done</italic>
                    </bold>
                </p>
            </body>
        </sub-article>
    </sub-article>
</article>
