<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="methods-article" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.16252.3</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Method Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>Improved inference of chromosome conformation from images of labeled loci</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 3; peer review: 2 approved]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Ross</surname>
                        <given-names>Brian C.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-2432-4678</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Costello</surname>
                        <given-names>James C.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Funding Acquisition</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="corresp" rid="c2">b</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>Computational Bioscience Program, Department of Pharmacology, University of Colorado, Anschutz Medical Campus, Aurora, CO, 80045, USA</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:brian.ross@ucdenver.edu">brian.ross@ucdenver.edu</email>
                </corresp>
                <corresp id="c2">
                    <label>b</label>
                    <email xlink:href="mailto:JAMES.COSTELLO@ucdenver.edu">JAMES.COSTELLO@ucdenver.edu</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>28</day>
                <month>3</month>
                <year>2019</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2018</year>
            </pub-date>
            <volume>7</volume>
            <elocation-id>ISCB Comm J-1521</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>26</day>
                    <month>3</month>
                    <year>2019</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Ross BC and Costello JC</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/7-1521/pdf"/>
            <abstract>
                <p>We previously published a method that infers chromosome conformation from images of fluorescently-tagged genomic loci, for the case when there are many loci labeled with each distinguishable color. Here we build on our previous work and improve the reconstruction algorithm to address previous limitations. We show that these improvements 1) increase the reconstruction accuracy and 2) allow the method to be used on large-scale problems involving several hundred labeled loci. Simulations indicate that full-chromosome reconstructions at 1/2 Mb resolution are possible using existing labeling and imaging technologies. The updated reconstruction code and the script files used for this paper are available at: 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/heltilda/align3d">https://github.com/heltilda/align3d</ext-link>.</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>chromosome</kwd>
                <kwd>conformation</kwd>
                <kwd>reconstruction</kwd>
                <kwd>fluorescence</kwd>
                <kwd>genetic</kwd>
                <kwd>loci</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1" xlink:href="http://dx.doi.org/10.13039/100009141">
                    <funding-source>Cancer League of Colorado</funding-source>
                </award-group>
                <award-group id="fund-2" xlink:href="http://dx.doi.org/10.13039/100000002">
                    <funding-source>National Institutes of Health</funding-source>
                    <award-id>2T15LM009451</award-id>
                </award-group>
                <award-group id="fund-3" xlink:href="http://dx.doi.org/10.13039/100005508">
                    <funding-source>Boettcher Foundation</funding-source>
                </award-group>
                <funding-statement>Funding was provided by the Boettcher Foundation (J.C.C.), National Institutes of Health [2T15LM009451 to B.C.R.], and a Cancer League of Colorado grant (J.C.C. and B.C.R.).</funding-statement>
                <funding-statement>
                    <italic>The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</italic>
                </funding-statement>
            </funding-group>
        </article-meta>
        <notes>
            <sec sec-type="version-changes">
                <label>Revised</label>
                <title>Amendments from Version 2</title>
                <p>We improved a sentence comparing our method to ChromoTrace, in the Introduction section.</p>
            </sec>
        </notes>
    </front>
    <body>
        <sec sec-type="intro">
            <title>Introduction</title>
            <p>Measurement of 
                <italic toggle="yes">in vivo</italic> chromosome conformation is a major unsolved problem in structural biology despite its known biological importance
                <sup>
                    <xref ref-type="bibr" rid="ref-1">1</xref>
                </sup>. The present state-of-art is to either obtain indirect information about conformations using 3C-derived methods which measure DNA-DNA contacts (typically in a cell-averaged population)
                <sup>
                    <xref ref-type="bibr" rid="ref-2">2</xref>
                </sup>, or else directly measure the cellular locations of individual chromosomal loci in single cells by microscopy
                <sup>
                    <xref ref-type="bibr" rid="ref-3">3</xref>
                </sup>. The major limitation of direct localization is one of throughput: only ~ 3&#x2013;5 labeled loci can be uniquely identified &#x2018;by color&#x2019; in a standard microscope image, whereas a whole-chromosome reconstruction would involve labeling and identifying hundreds or thousands of loci.</p>
            <p>Several research efforts aim to remove the color limitation either by experimental improvements or computational inferences. The experimental approaches aim to allow an increased number of labels that can be distinguished in an image
                <sup>
                    <xref ref-type="bibr" rid="ref-4">4</xref>&#x2013;
                    <xref ref-type="bibr" rid="ref-6">6</xref>
                </sup>. Alternatively, attempts have been made to infer the identity of labels that cannot be uniquely identified in an image, by comparing the image to the known label positions along the DNA contour. The first attempt to do this was &#x2018;by eye&#x2019;
                <sup>
                    <xref ref-type="bibr" rid="ref-7">7</xref>
                </sup>, but subsequently two computational algorithms have been developed to automate this inference: 
                <monospace>align3d</monospace>
                <sup>
                    <xref ref-type="bibr" rid="ref-8">8</xref>
                </sup> and 
                <monospace>ChromoTrace</monospace>
                <sup>
                    <xref ref-type="bibr" rid="ref-9">9</xref>
                </sup>. There are two important differences between these algorithms. First, 
                <monospace>align3d</monospace> has less stringent experimental requirements than 
                <monospace>ChromoTrace</monospace>, as it allows for missing labels in the image and does not require a uniform label spacing along the chromosome. Second, 
                <monospace>ChromoTrace</monospace> outputs explicit conformations, whereas 
                <monospace>align3d</monospace> outputs likelihoods of the various possible identities for each labeled locus. Both approaches have their advantages: 
                <monospace>ChromoTrace</monospace> output is straightforward to interpret, whereas 
                <monospace>align3d</monospace> output gives information on the range of possible conformational solutions along with their likelihoods.</p>
            <p>This paper presents improvements to 
                <monospace>align3d</monospace>
                <sup>
                    <xref ref-type="bibr" rid="ref-8">8</xref>
                </sup> that allow it to generate high-quality, chromosome-scale conformational reconstructions. First, we briefly describe the algorithm. Using a) the genomic locations and colors of labeled loci and b) the spatial locations and colors of spots in a microscope image, together with a relation tying the genomic distance between two loci to their average spatial displacement, this method constructs a table of &#x2018;mapping probabilities&#x2019; 
                <italic toggle="yes">p</italic>(
                <italic toggle="yes">L</italic> &#x2192; 
                <italic toggle="yes">s</italic>) for a given labeled genomic locus 
                <italic toggle="yes">L</italic> having produced spot 
                <italic toggle="yes">s</italic> in the microscope image. Each mapping probability 
                <italic toggle="yes">p</italic>(
                <italic toggle="yes">L</italic> &#x2192; 
                <italic toggle="yes">s</italic>) is calculated by dividing the summed statistical weights of conformations where locus 
                <italic toggle="yes">L</italic> maps to spot 
                <italic toggle="yes">s</italic>, which we term a mapping partition function and denote 
                <italic toggle="yes">Z</italic>
                <sub>
                    <italic toggle="yes">L</italic>&#x2192;
                    <italic toggle="yes">s</italic>
                </sub>, by the full partition function 
                <italic toggle="yes">Z</italic> that is the summed weight of all conformations. A proper calculation of 
                <italic toggle="yes">Z</italic>
                <sub>
                    <italic toggle="yes">L</italic>&#x2192;
                    <italic toggle="yes">s</italic>
                </sub> and 
                <italic toggle="yes">Z</italic> would consider all conformations having no more than one locus at any given spot in the image
                <sup>
                    <xref ref-type="other" rid="FN1">1</xref>
                </sup>, similar to a traveling salesman tour
                <sup>
                    <xref ref-type="bibr" rid="ref-10">10</xref>
                </sup>, but this exact calculation is intractable for large problems. Instead, 
                <monospace>align3d</monospace> counts all conformations for which 
                <italic toggle="yes">adjacent</italic> loci do not overlap at the same spot (see 
                <xref ref-type="fig" rid="f1">Figure 1</xref>), using a variant of the forward-backward algorithm
                <sup>
                    <xref ref-type="bibr" rid="ref-11">11</xref>
                </sup> that can propagate between non-adjacent layers. This is a major source of error as the vast majority of conformations contributing to the partition function overlap at non-adjacent loci, and one consequence is that the normalization of mapping probabilities makes no sense for a non-overlapping conformation, as &#x2211;
                <sub>
                    <italic toggle="yes">L</italic>
                </sub> 
                <italic toggle="yes">p</italic>(
                <italic toggle="yes">L</italic> &#x2192; 
                <italic toggle="yes">s</italic>) can exceed 100% for certain spots. To recover from this error, 
                <monospace>align3d</monospace> assigns a penalty to each spot and iteratively adjusts these penalties until the spot normalization is sensible. Although somewhat ad hoc, use of spot penalties recovers significant information about medium-sized conformations (&#x223c; 30 labeled loci), although larger simulated experiments (&#x223c; 300 loci) have convergence problems due to the cost function plateauing at very small or large values of the spot penalties.</p>
            <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                <label>Figure 1. </label>
                <caption>
                    <title>Legal versus illegal (overlapping) conformations.</title>
                    <p> Schematic showing one legal and one illegal conformation passing through spots 
                        <italic toggle="yes">A</italic>, 
                        <italic toggle="yes">B</italic> and 
                        <italic toggle="yes">C</italic>. 
                        <monospace>align3d</monospace> counts both legal and overlapping conformations in estimating the partition 
                        <italic toggle="yes">Z</italic> (although it is able to prevent 
                        <italic toggle="yes">adjacent</italic> loci from overlapping).</p>
                </caption>
                <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/20533/443bf5d9-6659-4411-ad00-936624286941_figure1.gif"/>
            </fig>
            <p>The final step is to use the mapping probabilities to construct the range of likely conformations compatible with the microscope image. Uncertainty in the conformation results from inaccuracy or uncertainty in the mapping probabilities due to three factors: inaccuracy in the DNA model (the relation between genomic and spatial distance), error in estimating the partition functions, and the inherent uncertainty in the data even with a perfect reconstruction algorithm. The DNA model can be calibrated by a control experiment, and we argue that the remaining model error can reduce our method&#x2019;s confidence in its results but it generally does 
                <italic toggle="yes">not</italic> cause our method to reconstruct mistaken conformations. The main focus of this paper is on improving the partition function estimate, using two different strategies. First, we give an efficient method for optimizing the spot penalties when there are hundreds of spots in the image. Next, we provide formulas for the partition functions which allow them to be estimated to arbitrarily high accuracy (given enough computation time), without using spot penalties or any optimization. As we show using simulations, these two methods used individually or in tandem permit confident, chromosome-scale conformational reconstructions using existing experimental technologies.</p>
        </sec>
        <sec sec-type="methods">
            <title>Methods</title>
            <p>First we provide a method for efficiently optimizing the spot penalties regardless of the number of labeled loci. This rule guarantees that a) the rate of missing spots is as expected, and b) the mapping probabilities are properly normalized. Let 
                <italic toggle="yes">q
                    <sub>s</sub>
                </italic> denote the penalty attached to spot 
                <italic toggle="yes">s</italic>; then the update rule for that spot penalty is:</p>
            <p>
                <disp-formula id="e1">
                    <mml:math display="block" id="math1">
                        <mml:mrow>
                            <mml:msubsup>
                                <mml:mi>q</mml:mi>
                                <mml:mi>s</mml:mi>
                                <mml:mo>&#x2032;</mml:mo>
                            </mml:msubsup>
                            <mml:mo>=</mml:mo>
                            <mml:mfrac>
                                <mml:mrow>
                                    <mml:mfrac>
                                        <mml:mn>1</mml:mn>
                                        <mml:mrow>
                                            <mml:mi>P</mml:mi>
                                            <mml:mrow>
                                                <mml:mo>(</mml:mo>
                                                <mml:mi>s</mml:mi>
                                                <mml:mo>)</mml:mo>
                                            </mml:mrow>
                                            <mml:mo>/</mml:mo>
                                            <mml:mi>N</mml:mi>
                                        </mml:mrow>
                                    </mml:mfrac>
                                    <mml:mo>&#x2212;</mml:mo>
                                    <mml:mn>1</mml:mn>
                                </mml:mrow>
                                <mml:mrow>
                                    <mml:mfrac>
                                        <mml:mn>1</mml:mn>
                                        <mml:mrow>
                                            <mml:mn>1</mml:mn>
                                            <mml:mo>&#x2212;</mml:mo>
                                            <mml:msub>
                                                <mml:mi>p</mml:mi>
                                                <mml:mrow>
                                                    <mml:mi>f</mml:mi>
                                                    <mml:mi>n</mml:mi>
                                                </mml:mrow>
                                            </mml:msub>
                                            <mml:mrow>
                                                <mml:mo>(</mml:mo>
                                                <mml:mi>c</mml:mi>
                                                <mml:mo>)</mml:mo>
                                            </mml:mrow>
                                        </mml:mrow>
                                    </mml:mfrac>
                                    <mml:mo>&#x2212;</mml:mo>
                                    <mml:mn>1</mml:mn>
                                </mml:mrow>
                            </mml:mfrac>
                            <mml:mspace width="0.1em"/>
                            <mml:mo>&#x22c5;</mml:mo>
                            <mml:mspace width="0.1em"/>
                            <mml:mfrac>
                                <mml:mrow>
                                    <mml:mfrac>
                                        <mml:mn>1</mml:mn>
                                        <mml:mrow>
                                            <mml:mi>P</mml:mi>
                                            <mml:mrow>
                                                <mml:mo>(</mml:mo>
                                                <mml:mi>s</mml:mi>
                                                <mml:mo>)</mml:mo>
                                            </mml:mrow>
                                            <mml:mo>/</mml:mo>
                                            <mml:mi>N</mml:mi>
                                        </mml:mrow>
                                    </mml:mfrac>
                                    <mml:mo>&#x2212;</mml:mo>
                                    <mml:mn>1</mml:mn>
                                </mml:mrow>
                                <mml:mrow>
                                    <mml:mfrac>
                                        <mml:mn>1</mml:mn>
                                        <mml:mrow>
                                            <mml:mi>min</mml:mi>
                                            <mml:mo>&#x2061;</mml:mo>
                                            <mml:mrow>
                                                <mml:mo>(</mml:mo>
                                                <mml:mrow>
                                                    <mml:mn>1</mml:mn>
                                                    <mml:mo>,</mml:mo>
                                                    <mml:mi>P</mml:mi>
                                                    <mml:mrow>
                                                        <mml:mo>(</mml:mo>
                                                        <mml:mi>s</mml:mi>
                                                        <mml:mo>)</mml:mo>
                                                    </mml:mrow>
                                                </mml:mrow>
                                                <mml:mo>)</mml:mo>
                                            </mml:mrow>
                                            <mml:mo>/</mml:mo>
                                            <mml:mi>N</mml:mi>
                                        </mml:mrow>
                                    </mml:mfrac>
                                    <mml:mo>&#x2212;</mml:mo>
                                    <mml:mn>1</mml:mn>
                                </mml:mrow>
                            </mml:mfrac>
                            <mml:mo>&#x22c5;</mml:mo>
                            <mml:msub>
                                <mml:mi>q</mml:mi>
                                <mml:mi>s</mml:mi>
                            </mml:msub>
                            <mml:mspace width="4em"/>
                            <mml:mo stretchy="false">(</mml:mo>
                            <mml:mn>1</mml:mn>
                            <mml:mo stretchy="false">)</mml:mo>
                        </mml:mrow>
                    </mml:math>
                </disp-formula>
            </p>
            <p>where 
                <italic toggle="yes">N</italic> is the number of loci, 
                <italic toggle="yes">P</italic>(
                <italic toggle="yes">s</italic>) = &#x2211;
                <sub>
                    <italic toggle="yes">L</italic>
                </sub> 
                <italic toggle="yes">p</italic>(
                <italic toggle="yes">L</italic> &#x2192; 
                <italic toggle="yes">s</italic>) is the total probability of mapping any locus to spot 
                <italic toggle="yes">s</italic>, and 
                <italic toggle="yes">p
                    <sub>fn</sub>
                </italic>(
                <italic toggle="yes">c</italic>) is the estimated rate of missing spots having color 
                <italic toggle="yes">c</italic>. The justification for this rule is given in 
                <xref ref-type="other" rid="SF1">Appendix 1</xref> (
                <xref ref-type="other" rid="SF1">Supplementary File 1</xref>).</p>
            <p>We can also update a penalty 
                <inline-formula>
                    <mml:math display="inline" id="M10">
                        <mml:mrow>
                            <mml:mover>
                                <mml:mi>q</mml:mi>
                                <mml:mo>&#x2212;</mml:mo>
                            </mml:mover>
                        </mml:mrow>
                    </mml:math>
                </inline-formula>
                <sub>
                    <italic toggle="yes">c</italic>
                </sub> that is associated with 
                <italic toggle="yes">missing</italic> spots of color 
                <italic toggle="yes">c</italic>. This gives a faster way to enforce a desired missing spot rate because there are fewer 
                <inline-formula>
                    <mml:math display="inline" id="M11">
                        <mml:mrow>
                            <mml:mover>
                                <mml:mi>q</mml:mi>
                                <mml:mo>&#x2212;</mml:mo>
                            </mml:mover>
                        </mml:mrow>
                    </mml:math>
                </inline-formula> penalties than 
                <italic toggle="yes">q</italic> penalties. An update to 
                <inline-formula>
                    <mml:math display="inline" id="M22">
                        <mml:mrow>
                            <mml:mover>
                                <mml:mi>q</mml:mi>
                                <mml:mo>&#x2212;</mml:mo>
                            </mml:mover>
                        </mml:mrow>
                    </mml:math>
                </inline-formula>
                <sub>
                    <italic toggle="yes">c</italic>
                </sub> is equivalent to a reverse update to all 
                <italic toggle="yes">q
                    <sub>s</sub>
                </italic> for spots 
                <italic toggle="yes">s</italic> of color 
                <italic toggle="yes">c</italic>, so the update rule is:</p>
            <p>
                <disp-formula id="e2">
                    <mml:math display="block" id="math2">
                        <mml:mrow>
                            <mml:msubsup>
                                <mml:mrow>
                                    <mml:mover>
                                        <mml:mi>q</mml:mi>
                                        <mml:mo>&#x2212;</mml:mo>
                                    </mml:mover>
                                </mml:mrow>
                                <mml:mi>c</mml:mi>
                                <mml:mo>&#x2032;</mml:mo>
                            </mml:msubsup>
                            <mml:mo>=</mml:mo>
                            <mml:mfrac>
                                <mml:mrow>
                                    <mml:mfrac>
                                        <mml:mn>1</mml:mn>
                                        <mml:mrow>
                                            <mml:mn>1</mml:mn>
                                            <mml:mo>&#x2212;</mml:mo>
                                            <mml:msub>
                                                <mml:mi>p</mml:mi>
                                                <mml:mrow>
                                                    <mml:mi>f</mml:mi>
                                                    <mml:mi>n</mml:mi>
                                                </mml:mrow>
                                            </mml:msub>
                                            <mml:mo stretchy="false">(</mml:mo>
                                            <mml:mi>c</mml:mi>
                                            <mml:mo stretchy="false">)</mml:mo>
                                        </mml:mrow>
                                    </mml:mfrac>
                                    <mml:mo>&#x2212;</mml:mo>
                                    <mml:mn>1</mml:mn>
                                </mml:mrow>
                                <mml:mrow>
                                    <mml:mfrac>
                                        <mml:mn>1</mml:mn>
                                        <mml:mrow>
                                            <mml:mi>P</mml:mi>
                                            <mml:mo stretchy="false">(</mml:mo>
                                            <mml:mi>s</mml:mi>
                                            <mml:mo stretchy="false">)</mml:mo>
                                            <mml:mo>/</mml:mo>
                                            <mml:mi>N</mml:mi>
                                        </mml:mrow>
                                    </mml:mfrac>
                                    <mml:mo>&#x2212;</mml:mo>
                                    <mml:mn>1</mml:mn>
                                </mml:mrow>
                            </mml:mfrac>
                            <mml:mo>&#x22c5;</mml:mo>
                            <mml:msub>
                                <mml:mrow>
                                    <mml:mover>
                                        <mml:mi>q</mml:mi>
                                        <mml:mo>&#x2212;</mml:mo>
                                    </mml:mover>
                                </mml:mrow>
                                <mml:mi>c</mml:mi>
                            </mml:msub>
                            <mml:mo>.</mml:mo>
                            <mml:mspace width="12em"/>
                            <mml:mo stretchy="false">(</mml:mo>
                            <mml:mn>2</mml:mn>
                            <mml:mo stretchy="false">)</mml:mo>
                        </mml:mrow>
                    </mml:math> </disp-formula>
            </p>
            <p>Typically, we first optimize the 
                <inline-formula>
                    <mml:math display="inline" id="M13">
                        <mml:mrow>
                            <mml:mover>
                                <mml:mi>q</mml:mi>
                                <mml:mo>&#x2212;</mml:mo>
                            </mml:mover>
                        </mml:mrow>
                    </mml:math>
                </inline-formula> parameters to achieve a target missing spot rate, then optimize the 
                <italic toggle="yes">q</italic> parameters to enforce 
                <italic toggle="yes">P</italic>(
                <italic toggle="yes">s</italic>) &#x2264; 1 while maintaining the missing spot rate. In either case, we apply 
                <xref ref-type="other" rid="e1">Equation 1</xref> or 
                <xref ref-type="other" rid="e2">Equation 2</xref> to bring the 
                <italic toggle="yes">q</italic> or 
                <inline-formula>
                    <mml:math display="inline" id="M24">
                        <mml:mrow>
                            <mml:mover>
                                <mml:mi>q</mml:mi>
                                <mml:mo>&#x2212;</mml:mo>
                            </mml:mover>
                        </mml:mrow>
                    </mml:math>
                </inline-formula> parameters close to their final values. When the cost function stops improving, we switch to the steepest-descent algorithms used in Ross and Wiggins, 2012
                <sup>
                    <xref ref-type="bibr" rid="ref-8">8</xref>
                </sup> to polish 
                <italic toggle="yes">q</italic> or 
                <inline-formula>
                    <mml:math display="inline" id="M15">
                        <mml:mrow>
                            <mml:mover>
                                <mml:mi>q</mml:mi>
                                <mml:mo>&#x2212;</mml:mo>
                            </mml:mover>
                        </mml:mrow>
                    </mml:math>
                </inline-formula>.</p>
            <p>Next, we give two exact formulas for the partition functions 
                <italic toggle="yes">Z</italic>
                <sub>
                    <italic toggle="yes">L</italic>&#x2192;
                    <italic toggle="yes">s</italic>
                </sub> and the full partition function 
                <italic toggle="yes">Z</italic> that determine our locus-to-spot mapping probabilities. We focus on the full partition function 
                <italic toggle="yes">Z</italic> since the formulas for 
                <italic toggle="yes">Z</italic>
                <sub>
                    <italic toggle="yes">L</italic>&#x2192;
                    <italic toggle="yes">s</italic>
                </sub> are identical. The largest term in each formula, which we denote 
                <inline-formula>
                    <mml:math display="inline" id="M12">
                        <mml:mrow>
                            <mml:msub>
                                <mml:mover accent="true">
                                    <mml:mi>Z</mml:mi>
                                    <mml:mo>&#x02dc;</mml:mo>
                                </mml:mover>
                                <mml:mn>0</mml:mn>
                            </mml:msub>
                        </mml:mrow>
                    </mml:math>
                </inline-formula> (or 
                <inline-formula>
                    <mml:math display="inline" id="M53">
                        <mml:mrow>
                            <mml:msubsup>
                                <mml:mover accent="true">
                                    <mml:mi>Z</mml:mi>
                                    <mml:mo>&#x02dc;</mml:mo>
                                </mml:mover>
                                <mml:mn>0</mml:mn>
                                <mml:mrow>
                                    <mml:mi>o</mml:mi>
                                    <mml:mi>p</mml:mi>
                                    <mml:mi>t</mml:mi>
                                </mml:mrow>
                            </mml:msubsup>
                        </mml:mrow>
                    </mml:math> </inline-formula> when spot penalty optimization is used), is the original estimate from Ross and Wiggins, 2012
                <sup>
                    <xref ref-type="bibr" rid="ref-8">8</xref>
                </sup> calculated using a variant of the forward-backward algorithm
                <sup>
                    <xref ref-type="bibr" rid="ref-11">11</xref>
                </sup>. Additional terms are computed in the same way, except that certain loci are constrained to map to certain spots. All of the constraints we will apply are 
                <italic toggle="yes">illegal constraints</italic>, in that they force multiple loci to overlap at some spot in the image; therefore these terms only count illegal conformations that we would like to remove from the baseline calculation. By computing these terms and subtracting them from 
                <inline-formula>
                    <mml:math display="inline" id="M14">
                        <mml:mrow>
                            <mml:msub>
                                <mml:mover accent="true">
                                    <mml:mi>Z</mml:mi>
                                    <mml:mo>&#x02dc;</mml:mo>
                                </mml:mover>
                                <mml:mn>0</mml:mn>
                            </mml:msub>
                        </mml:mrow>
                    </mml:math>
                </inline-formula> we eliminate the overlapping conformations and improve the calculation. It turns out that this process erroneously subtracts conformations with multiple overlaps more than once and thus we have to add back in higher-order corrections (i.e. partition functions having multiple constrained spots). Repeating this logic leads to exact formulas for 
                <italic toggle="yes">Z</italic> taking the form of series expansions, which are dominated by the lowest-order terms as those have the fewest restrictions on conformational overlaps. 
                <xref ref-type="fig" rid="f2">Figure 2A</xref> illustrates an example of such a series expansion, where each parenthetical subscript (
                <italic toggle="yes">X Y</italic> . . . )
                <sub>
                    <italic toggle="yes">s</italic>
                </sub> on a term label denotes an illegal constraint forcing loci 
                <italic toggle="yes">X</italic>, 
                <italic toggle="yes">Y</italic>, . . . to overlap at spot 
                <italic toggle="yes">s</italic> when that term is calculated. We use this notation throughout.</p>
            <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                <label>Figure 2. </label>
                <caption>
                    <title>Series expansions.</title>
                    <p>
                        <bold>A</bold>. Schematic showing terms in a series expansion, in a case where series 1 and series 2 have the same terms. The full series gives the exact partition function for the 4-locus experiment shown where only 2 spots appeared in the image (due to a high rate of missing spots). Cartoons show only the constrained loci for each term (so for example each term includes the illegal conformation visiting spots 
                        <italic toggle="yes">s</italic> &#x2192; 
                        <italic toggle="yes">t</italic> &#x2192; 
                        <italic toggle="yes">s</italic> &#x2192; 
                        <italic toggle="yes">t</italic>). 
                        <bold>B</bold>. An illegal conformation for which loci 
                        <italic toggle="yes">A</italic>, 
                        <italic toggle="yes">C</italic> and 
                        <italic toggle="yes">E</italic> overlap at spots 
                        <italic toggle="yes">s</italic>, and loci 
                        <italic toggle="yes">F</italic> and 
                        <italic toggle="yes">H</italic> overlap at spot 
                        <italic toggle="yes">t</italic>. Series expansion 1 includes this conformation in terms 
                        <inline-formula>
                            <mml:math display="inline" id="M25">
                                <mml:mrow>
                                    <mml:msub>
                                        <mml:mover accent="true">
                                            <mml:mi>Z</mml:mi>
                                            <mml:mo>&#x02dc;</mml:mo>
                                        </mml:mover>
                                        <mml:mn>0</mml:mn>
                                    </mml:msub>
                                </mml:mrow>
                            </mml:math>
                        </inline-formula>, 
                        <inline-formula>
                            <mml:math display="inline" id="M16">
                                <mml:mover accent="true">
                                    <mml:mi>Z</mml:mi>
                                    <mml:mo>&#x02dc;</mml:mo>
                                </mml:mover>
                            </mml:math>
                        </inline-formula>
                        <sub>
                            <italic toggle="yes">(ACE)</italic>
                            <sub>
                                <italic toggle="yes">s</italic>
                            </sub>
                        </sub>, 
                        <inline-formula>
                            <mml:math display="inline" id="M17">
                                <mml:mover accent="true">
                                    <mml:mi>Z</mml:mi>
                                    <mml:mo>&#x02dc;</mml:mo>
                                </mml:mover>
                            </mml:math>
                        </inline-formula>
                        <sub>
                            <italic toggle="yes">(FH)</italic>
                            <sub>
                                <italic toggle="yes">t</italic>
                            </sub>
                        </sub>, and 
                        <inline-formula>
                            <mml:math display="inline" id="M18">
                                <mml:mover accent="true">
                                    <mml:mi>Z</mml:mi>
                                    <mml:mo>&#x02dc;</mml:mo>
                                </mml:mover>
                            </mml:math>
                        </inline-formula>
                        <sub>
                            <italic toggle="yes">(ACE)</italic>
                            <sub>
                                <italic toggle="yes">s</italic>
                            </sub>
                            <italic toggle="yes">(FH)</italic>
                            <sub>
                                <italic toggle="yes">t</italic>
                            </sub>
                        </sub>. Series expansion 2 includes this conformation in the same terms with the addition of 
                        <inline-formula>
                            <mml:math display="inline" id="M19">
                                <mml:mover accent="true">
                                    <mml:mi>Z</mml:mi>
                                    <mml:mo>&#x02dc;</mml:mo>
                                </mml:mover>
                            </mml:math>
                        </inline-formula>
                        <sub>
                            <italic toggle="yes">(AC)</italic>
                            <sub>
                                <italic toggle="yes">s</italic>
                            </sub>
                        </sub>, 
                        <inline-formula>
                            <mml:math display="inline" id="M20">
                                <mml:mover accent="true">
                                    <mml:mi>Z</mml:mi>
                                    <mml:mo>&#x02dc;</mml:mo>
                                </mml:mover>
                            </mml:math>
                        </inline-formula>
                        <sub>
                            <italic toggle="yes">(AE)</italic>
                            <sub>
                                <italic toggle="yes">s</italic>
                            </sub>
                        </sub>, 
                        <inline-formula>
                            <mml:math display="inline" id="M21">
                                <mml:mover accent="true">
                                    <mml:mi>Z</mml:mi>
                                    <mml:mo>&#x02dc;</mml:mo>
                                </mml:mover>
                            </mml:math>
                        </inline-formula>
                        <sub>
                            <italic toggle="yes">(CE)</italic>
                            <sub>
                                <italic toggle="yes">s</italic>
                            </sub>
                        </sub>, 
                        <inline-formula>
                            <mml:math display="inline" id="M42">
                                <mml:mover accent="true">
                                    <mml:mi>Z</mml:mi>
                                    <mml:mo>&#x02dc;</mml:mo>
                                </mml:mover>
                            </mml:math>
                        </inline-formula>
                        <sub>
                            <italic toggle="yes">(AC)</italic>
                            <sub>
                                <italic toggle="yes">s</italic>
                            </sub>
                            <italic toggle="yes">(FH)</italic>
                            <sub>
                                <italic toggle="yes">t</italic>
                            </sub>
                        </sub>, 
                        <inline-formula>
                            <mml:math display="inline" id="M23">
                                <mml:mover accent="true">
                                    <mml:mi>Z</mml:mi>
                                    <mml:mo>&#x02dc;</mml:mo>
                                </mml:mover>
                            </mml:math>
                        </inline-formula>
                        <sub>
                            <italic toggle="yes">(AE)</italic>
                            <sub>
                                <italic toggle="yes">s</italic>
                            </sub>
                            <italic toggle="yes">(FH)</italic>
                            <sub>
                                <italic toggle="yes">t</italic>
                            </sub>
                        </sub>, and 
                        <inline-formula>
                            <mml:math display="inline" id="M44">
                                <mml:mover accent="true">
                                    <mml:mi>Z</mml:mi>
                                    <mml:mo>&#x02dc;</mml:mo>
                                </mml:mover>
                            </mml:math>
                        </inline-formula>
                        <sub>
                            <italic toggle="yes">(CE)</italic>
                            <sub>
                                <italic toggle="yes">s</italic>
                            </sub>
                            <italic toggle="yes">(FH)</italic>
                            <sub>
                                <italic toggle="yes">t</italic>
                            </sub>
                        </sub>.</p>
                </caption>
                <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/20533/443bf5d9-6659-4411-ad00-936624286941_figure2.gif"/>
            </fig>
            <p>There are two ways we might remove conformations containing overlapping loci, leading us to two different series expansions for the true partition function 
                <italic toggle="yes">Z</italic>. Suppose that we are calculating the term 
                <inline-formula>
                    <mml:math display="inline" id="M101">
                        <mml:mrow>
                            <mml:msub>
                                <mml:mover accent="true">
                                    <mml:mi>Z</mml:mi>
                                    <mml:mo>&#x02dc;</mml:mo>
                                </mml:mover>
                                <mml:mrow>
                                    <mml:msub>
                                        <mml:mrow>
                                            <mml:mo stretchy="false">(</mml:mo>
                                            <mml:mi>A</mml:mi>
                                            <mml:mi>C</mml:mi>
                                            <mml:mn>...</mml:mn>
                                            <mml:mo stretchy="false">)</mml:mo>
                                        </mml:mrow>
                                        <mml:mi>S</mml:mi>
                                    </mml:msub>
                                </mml:mrow>
                            </mml:msub>
                        </mml:mrow>
                    </mml:math>
                </inline-formula>
 whose single illegal constraint forces loci 
                <italic toggle="yes">A</italic>, 
                <italic toggle="yes">C</italic>, . . . to overlap at spot 
                <italic toggle="yes">s</italic>. One option is to forbid any of the other unconstrained loci from also mapping to spot 
                <italic toggle="yes">s</italic>, since spot 
                <italic toggle="yes">s</italic> is already overused. This leads to series expansion 1. Alternatively, allowing further overlaps with spot 
                <italic toggle="yes">s</italic> from the unconstrained loci gives us series expansion 2. 
                <xref ref-type="fig" rid="f2">Figure 2B</xref> illustrates the differences between the two series.</p>
            <p>Each of the two series expansions is a weighted sum over 
                <italic toggle="yes">all possible illegally-constrained terms</italic> having two properties: 1) each locus and each spot appear at most once in the indices, and 2) two or more loci map to each constrained spot. To be formal, we use &#x03a9; to represent the set of all possible illegal constraints: each element of &#x03a9; consists of a set of two or more non-adjacent loci and a single spot where they are forced to overlap. Each expansion thus takes the form</p>
            <p>
                <disp-formula>
                    <mml:math display="block" id="math3">
                        <mml:mrow>
                            <mml:mi>Z</mml:mi>
                            <mml:mo>=</mml:mo>
                            <mml:mstyle displaystyle="true">
                                <mml:munder>
                                    <mml:mo>&#x2211;</mml:mo>
                                    <mml:mrow>
                                        <mml:mi>&#x03d5;</mml:mi>
                                        <mml:mo>&#x2286;</mml:mo>
                                        <mml:mo>&#x03a9;</mml:mo>
                                    </mml:mrow>
                                </mml:munder>
                                <mml:mrow>
                                    <mml:msub>
                                        <mml:mi>w</mml:mi>
                                        <mml:mi>&#x03d5;</mml:mi>
                                    </mml:msub>
                                    <mml:msub>
                                        <mml:mover>
                                            <mml:mi>Z</mml:mi>
                                            <mml:mo>&#x02dc;</mml:mo>
                                        </mml:mover>
                                        <mml:mi>&#x03d5;</mml:mi>
                                    </mml:msub>
                                </mml:mrow>
                            </mml:mstyle>
                            <mml:mspace width="13em"/>
                            <mml:mo stretchy="false">(</mml:mo>
                            <mml:mn>3</mml:mn>
                            <mml:mo stretchy="false">)</mml:mo>
                        </mml:mrow>
                    </mml:math>
                </disp-formula>
            </p>
            <p>where 
                <inline-formula>
                    <mml:math display="inline" id="M110">
                        <mml:mrow>
                            <mml:msub>
                                <mml:mover accent="true">
                                    <mml:mi>Z</mml:mi>
                                    <mml:mi>&#x02dc;</mml:mi>
                                </mml:mover>
                                <mml:mi mathvariant="italic">&#x03d5;</mml:mi>
                            </mml:msub>
                        </mml:mrow>
                    </mml:math>
                </inline-formula> is zero if any two constraints share a locus or spot. We will choose the integer weights 
                <italic toggle="yes">w
                    <sub>&#x03d5;</sub>
                </italic> so as to cancel out the overlapping conformations. By symmetry arguments, the weighting factor should not depend on the identities of the loci or spots, but rather only by the number of constrained spots 
                <italic toggle="yes">n
                    <sub>&#x03d5;</sub>
                </italic>, and the number of loci 
                <inline-formula>
                    <mml:math display="inline" id="M26">
                        <mml:mrow>
                            <mml:msubsup>
                                <mml:mi>n</mml:mi>
                                <mml:mi>k</mml:mi>
                                <mml:mi>&#x03d5;</mml:mi>
                            </mml:msubsup>
                        </mml:mrow>
                    </mml:math>
                </inline-formula> involved in each 
                <italic toggle="yes">k
                    <sup>th</sup>
                </italic> constraint. For example, 
                <italic toggle="yes">w</italic>
                <sub>(
                    <italic toggle="yes">ACE</italic>)
                    <sub>
                        <italic toggle="yes">s</italic>
                    </sub>
                </sub>
                <sub>(
                    <italic toggle="yes">BD</italic>)
                    <sub>
                        <italic toggle="yes">t</italic>
                    </sub>
                </sub> is determined by 
                <italic toggle="yes">n</italic>
                <sub>
                    <italic toggle="yes">&#x03d5;</italic>
                </sub> =  2, 
                <inline-formula>
                    <mml:math display="inline" id="M27">
                        <mml:mrow>
                            <mml:msubsup>
                                <mml:mi>n</mml:mi>
                                <mml:mi>1</mml:mi>
                                <mml:mi>&#x03d5;</mml:mi>
                            </mml:msubsup>
                        </mml:mrow>
                    </mml:math>
                </inline-formula> = 3 and 
                <inline-formula>
                    <mml:math display="inline" id="M28">
                        <mml:mrow>
                            <mml:msubsup>
                                <mml:mi>n</mml:mi>
                                <mml:mi>2</mml:mi>
                                <mml:mi>&#x03d5;</mml:mi>
                            </mml:msubsup>
                        </mml:mrow>
                    </mml:math>
                </inline-formula> = 2.</p>
            <p>Here we specify each series expansion by giving a formula for the weights 
                <italic toggle="yes">w
                    <sub>&#x03d5;</sub>
                </italic> in terms of 
                <italic toggle="yes">n
                    <sub>&#x03d5;</sub>
                </italic> and the various 
                <inline-formula>
                    <mml:math display="inline" id="M29">
                        <mml:mrow>
                            <mml:msubsup>
                                <mml:mi>n</mml:mi>
                                <mml:mi>i</mml:mi>
                                <mml:mi>&#x03d5;</mml:mi>
                            </mml:msubsup>
                        </mml:mrow>
                    </mml:math>
                </inline-formula>. We also explain how to select an appropriate set of terms 
                <italic toggle="yes">&#x03c8;</italic> when there are too many terms to evaluate. Our selection prohibits any legal or overlapping conformation from contributing a negative weight to the partition function estimate, thereby guaranteeing positive mapping probabilities and allowing use of the reconstruction-quality metrics given in Ross and Wiggins, 2012
                <sup>
                    <xref ref-type="bibr" rid="ref-8">8</xref>
                </sup>. Derivations of the coefficient formulas and the term-selection criteria for each series expansion appear in 
                <xref ref-type="other" rid="SF1">Appendix 2</xref> (
                <xref ref-type="other" rid="SF1">Supplementary File 1</xref>).</p>
            <p>
                <bold>Series expansion 1</bold> For series expansion 1, we do not allow the unconstrained loci to map to spots that were used in constraints. Then the weights 
                <italic toggle="yes">w
                    <sub>&#x03d5;</sub>
                </italic> in the series formula given by 
                <xref ref-type="other" rid="e3">Equation 3</xref> are:</p>
            <p>
                <disp-formula id="e3">
                    <mml:math display="block" id="math4">
                        <mml:mrow>
                            <mml:msub>
                                <mml:mi>w</mml:mi>
                                <mml:mi>&#x03d5;</mml:mi>
                            </mml:msub>
                            <mml:mo>=</mml:mo>
                            <mml:msup>
                                <mml:mrow>
                                    <mml:mo stretchy="false">(</mml:mo>
                                    <mml:mo>&#x2212;</mml:mo>
                                    <mml:mn>1</mml:mn>
                                    <mml:mo stretchy="false">)</mml:mo>
                                </mml:mrow>
                                <mml:mrow>
                                    <mml:msub>
                                        <mml:mi>n</mml:mi>
                                        <mml:mi>&#x03d5;</mml:mi>
                                    </mml:msub>
                                </mml:mrow>
                            </mml:msup>
                            <mml:mspace width="16em"/>
                            <mml:mo stretchy="false">(</mml:mo>
                            <mml:mn>4</mml:mn>
                            <mml:mo stretchy="false">)</mml:mo>
                        </mml:mrow>
                    </mml:math>
                </disp-formula>
            </p>
            <p>To select terms for a series approximation, we first choose a set of illegal constraints 
                <italic toggle="yes">&#x03c8;</italic> to disallow, then include all series terms 
                <inline-formula>
                    <mml:math display="inline" id="M30">
                        <mml:mrow>
                            <mml:msub>
                                <mml:mover accent="true">
                                    <mml:mi>Z</mml:mi>
                                    <mml:mi mathvariant="italic">&#x02dc;</mml:mi>
                                </mml:mover>
                                <mml:mi mathvariant="italic">&#x03d5;</mml:mi>
                            </mml:msub>
                        </mml:mrow>
                    </mml:math>
                </inline-formula> containing only those constraints: i.e. 
                <italic toggle="yes">&#x03d5;</italic> &#x2286;  
                <italic toggle="yes">&#x03c8;</italic>. This guarantees non-negative mapping probabilities. In order to efficiently evaluate the largest terms, we recommend selecting the 
                <italic toggle="yes">N</italic>
                <sub>&#x03c8;</sub> constraints having the highest product of mapping probabilities in the baseline calculation 
                <inline-formula>
                    <mml:math display="inline" id="M31">
                        <mml:mrow>
                            <mml:msub>
                                <mml:mover accent="true">
                                    <mml:mi>Z</mml:mi>
                                    <mml:mo>&#x02dc;</mml:mo>
                                </mml:mover>
                                <mml:mn>0</mml:mn>
                            </mml:msub>
                        </mml:mrow>
                    </mml:math>
                </inline-formula> (or 
                <inline-formula>
                    <mml:math display="inline" id="M300">
                        <mml:mrow>
                            <mml:msubsup>
                                <mml:mover accent="true">
                                    <mml:mi>Z</mml:mi>
                                    <mml:mo>&#x02dc;</mml:mo>
                                </mml:mover>
                                <mml:mn>0</mml:mn>
                                <mml:mrow>
                                    <mml:mi>o</mml:mi>
                                    <mml:mi>p</mml:mi>
                                    <mml:mi>t</mml:mi>
                                </mml:mrow>
                            </mml:msubsup>
                        </mml:mrow>
                    </mml:math> </inline-formula> if spot penalties will be used). For example, we would include (
                <italic toggle="yes">AC</italic>)
                <sub>
                    <italic toggle="yes">s</italic>
                </sub> if 
                <italic toggle="yes">p</italic>(
                <italic toggle="yes">A</italic> &#x2192; 
                <italic toggle="yes">s</italic>) &#x00b7; 
                <italic toggle="yes">p</italic>(
                <italic toggle="yes">C</italic> &#x2192; 
                <italic toggle="yes">s</italic>) is sufficiently large.</p>
            <p>
                <bold>Series expansion 2</bold> For series expansion 2, the unconstrained loci are allowed to map to spots that were used in constraints. Then the weights 
                <italic toggle="yes">w
                    <sub>&#x03d5;</sub>
                </italic> in 
                <xref ref-type="other" rid="e3">Equation 3</xref> are:</p>
            <p>
                <disp-formula id="e4">
                    <mml:math display="block" id="math5">
                        <mml:mrow>
                            <mml:msub>
                                <mml:mi>w</mml:mi>
                                <mml:mi>&#x03d5;</mml:mi>
                            </mml:msub>
                            <mml:mo>=</mml:mo>
                            <mml:mstyle displaystyle="true">
                                <mml:munderover>
                                    <mml:mo>&#x220f;</mml:mo>
                                    <mml:mrow>
                                        <mml:mi>k</mml:mi>
                                        <mml:mo>=</mml:mo>
                                        <mml:mn>1</mml:mn>
                                    </mml:mrow>
                                    <mml:mrow>
                                        <mml:msub>
                                            <mml:mi>n</mml:mi>
                                            <mml:mi>&#x03d5;</mml:mi>
                                        </mml:msub>
                                    </mml:mrow>
                                </mml:munderover>
                                <mml:mo stretchy="false">(</mml:mo>
                            </mml:mstyle>
                            <mml:mo>&#x2212;</mml:mo>
                            <mml:mn>1</mml:mn>
                            <mml:msup>
                                <mml:mo stretchy="false">)</mml:mo>
                                <mml:mrow>
                                    <mml:msubsup>
                                        <mml:mi>n</mml:mi>
                                        <mml:mi>k</mml:mi>
                                        <mml:mi>&#x03d5;</mml:mi>
                                    </mml:msubsup>
                                    <mml:mo>&#x2212;</mml:mo>
                                    <mml:mn>1</mml:mn>
                                </mml:mrow>
                            </mml:msup>
                            <mml:mo stretchy="false">(</mml:mo>
                            <mml:msubsup>
                                <mml:mi>n</mml:mi>
                                <mml:mi>k</mml:mi>
                                <mml:mi>&#x03d5;</mml:mi>
                            </mml:msubsup>
                            <mml:mo>&#x2212;</mml:mo>
                            <mml:mn>1</mml:mn>
                            <mml:mo stretchy="false">)</mml:mo>
                            <mml:mo>.</mml:mo>
                            <mml:mspace width="9em"/>
                            <mml:mo stretchy="false">(</mml:mo>
                            <mml:mn>5</mml:mn>
                            <mml:mo stretchy="false">)</mml:mo>
                        </mml:mrow>
                    </mml:math> </disp-formula>
            </p>
            <p>To select terms for a series approximation, we first choose a set of 
                <italic toggle="yes">N</italic>
                <sub>&#x03c8;</sub> single-locus-to-spot mappings &#x03a8;, then include all terms 
                <italic toggle="yes">Z
                    <sub>
                        <italic toggle="yes">&#x03d5;</italic>
                    </sub>
                </italic> whose illegal constraints use only mappings in &#x03a8;. For example, the constraint (
                <italic toggle="yes">AC</italic>)
                <sub>
                    <italic toggle="yes">s</italic>
                </sub> would be included if &#x03a8; &#x2286; {
                <italic toggle="yes">A</italic> &#x2192; 
                <italic toggle="yes">s</italic>, 
                <italic toggle="yes">C</italic> &#x2192; 
                <italic toggle="yes">s</italic>}. In order to select the largest terms, we recommend building &#x03a8; from the 
                <italic toggle="yes">N</italic>
                <sub>
                    <italic toggle="yes">&#x03c8;</italic>
                </sub> largest mapping probabilities calculated from 
                <inline-formula>
                    <mml:math display="inline" id="M32">
                        <mml:mrow>
                            <mml:msub>
                                <mml:mover accent="true">
                                    <mml:mi>Z</mml:mi>
                                    <mml:mo>&#x02dc;</mml:mo>
                                </mml:mover>
                                <mml:mn>0</mml:mn>
                            </mml:msub>
                        </mml:mrow>
                    </mml:math>
                </inline-formula> or 
                <inline-formula>
                    <mml:math display="inline" id="M33">
                        <mml:mrow>
                            <mml:msubsup>
                                <mml:mover accent="true">
                                    <mml:mi>Z</mml:mi>
                                    <mml:mo>&#x02dc;</mml:mo>
                                </mml:mover>
                                <mml:mn>0</mml:mn>
                                <mml:mrow>
                                    <mml:mi>o</mml:mi>
                                    <mml:mi>p</mml:mi>
                                    <mml:mi>t</mml:mi>
                                </mml:mrow>
                            </mml:msubsup>
                        </mml:mrow>
                    </mml:math>
                </inline-formula>.</p>
        </sec>
        <sec sec-type="results">
            <title>Results</title>
            <p>We tested the improved 
                <monospace>align3d</monospace> method by generating random chromosome conformations using our software tool 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/heltilda/wormulator">wormulator</ext-link> (version 1.1), and simulating the process of error-prone labeling, imaging and finally producing the locus-to-spot mapping probabilities. We considered three scenarios for our simulations. 1) The &#x2018;Toy&#x2019; scenario involves 10 genomic loci, where each locus is labeled using one of 3 colors. For these simple problems the partition function can be calculated exactly. 2) Our simulated Experiment 1 uses standard DNA labeling methods and traditional 3-color microscopy to label 30 loci with 3 colors, thus interrogating a significant fraction of a chromosome contour. 3) Our simulated Experiment 2 labels 300 loci across a chromosome-length contour. The reconstruction of Experiment 2 is made possible by using the Oligopaints labeling technique
                <sup>
                    <xref ref-type="bibr" rid="ref-4">4</xref>
                </sup> to label in 20 different colors.</p>
            <p>For each scenario, we randomly generated 100 conformations using a wormlike chain model (packing density 
                <italic toggle="yes">n
                    <sub>l</sub>
                </italic> = 0.3 kb/nm, persistence length 
                <italic toggle="yes">l
                    <sub>p</sub>
                </italic> = 300 kb, as suggested by the measurements of Trask, Pinkel and van den Engh, 1989
                <sup>
                    <xref ref-type="bibr" rid="ref-12">12</xref>
                </sup>); applied a random labeling at a mean density of 1 locus per megabase; and simulated experimental error: 100/200-nm Gaussian localization error in xy/z, a 10% rate of missing labels, and a 10% rate of nonspecifically-bound labels. A typical simulated experiment from the Toy scenario is shown in 
                <xref ref-type="fig" rid="f3">Figure 3A</xref>.</p>
            <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                <label>Figure 3. </label>
                <caption>
                    <title>Example reconstruction.</title>
                    <p>
                        <bold>A</bold>. Randomly generated and labeled chromosome contour with simulated experimental error: localization error (lines offsetting spots from the labeled genomic loci) and missing labels (open circles). This example lacks nonspecifically-bound labels (floating spots). 
                        <bold>B</bold>. Spot mapping probabilities calculated using both the largest series term 
                        <inline-formula>
                            <mml:math display="inline" id="M34">
                                <mml:mrow>
                                    <mml:msub>
                                        <mml:mover accent="true">
                                            <mml:mi>Z</mml:mi>
                                            <mml:mo>&#x02dc;</mml:mo>
                                        </mml:mover>
                                        <mml:mn>0</mml:mn>
                                    </mml:msub>
                                </mml:mrow>
                            </mml:math>
                        </inline-formula> (grey circles), and the exact 
                        <italic toggle="yes">Z</italic> that can be computed using 2210 series terms (blue circles). The dotted red line connects the true locus-to-spot mappings, which are used to calculate the unrecovered information. In this example 
                        <italic toggle="yes">I</italic>(
                        <inline-formula>
                            <mml:math display="inline" id="M35">
                                <mml:mrow>
                                    <mml:msub>
                                        <mml:mover accent="true">
                                            <mml:mi>Z</mml:mi>
                                            <mml:mo>&#x02dc;</mml:mo>
                                        </mml:mover>
                                        <mml:mn>0</mml:mn>
                                    </mml:msub>
                                </mml:mrow>
                            </mml:math>
                        </inline-formula>) = 1.54 bits/locus and 
                        <italic toggle="yes">I</italic>(
                        <italic toggle="yes">Z</italic> ) = 0.32 bits/locus. 
                        <bold>C</bold>. Unrecovered information 
                        <italic toggle="yes">I</italic> and entropy 
                        <italic toggle="yes">S</italic> (left panel) and log 
                        <italic toggle="yes">Z</italic> (right panel) versus the number of terms used in the series expansions.</p>
                </caption>
                <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/20533/443bf5d9-6659-4411-ad00-936624286941_figure3.gif"/>
            </fig>
            <p>Next, we specified a DNA model relating the genomic distance between two loci 
                <italic toggle="yes">L</italic> to their expected RMS spatial distance 
                <italic toggle="yes">R</italic>, which is used by 
                <monospace>align3d</monospace> to estimate the probability density of spatial displacement 
                <bold>r</bold> using a Gaussian chain model: 
                <italic toggle="yes">&#x03c3;</italic>(
                <bold>r</bold>) &#x221d; exp[&#x2013;3|
                <bold>r</bold>|
                <sup>2</sup>/2
                <italic toggle="yes">R</italic>
                <sup>2</sup>]. Our current implementation requires a power relation between 
                <italic toggle="yes">R</italic> and 
                <italic toggle="yes">L</italic>, where the exponent may depend on 
                <italic toggle="yes">L</italic>. Since any realistic polymer model predicts straight DNA on very short scales, we chose the model 
                <italic toggle="yes">R</italic> = 
                <italic toggle="yes">n
                    <sub>l</sub> L</italic> for 
                <italic toggle="yes">L &lt; l
                    <sub>p</sub>
                </italic> and 
                <italic toggle="yes">R</italic> = 
                <italic toggle="yes">A
                    <sub>
                        <italic toggle="yes">&#x03c1;</italic>
                    </sub> L
                    <sup>
                        <italic toggle="yes">&#x03c1;</italic>
                    </sup>
                </italic> for 
                <italic toggle="yes">L &gt; l
                    <sub>p</sub>
                </italic>, where 
                <inline-formula>
                    <mml:math display="inline" id="M103">
                        <mml:mrow>
                            <mml:msub>
                                <mml:mi>A</mml:mi>
                                <mml:mi>&#x03c1;</mml:mi>
                            </mml:msub>
                            <mml:mo>=</mml:mo>
                            <mml:msubsup>
                                <mml:mi>l</mml:mi>
                                <mml:mi>p</mml:mi>
                                <mml:mrow>
                                    <mml:mo>&#x2212;</mml:mo>
                                    <mml:mo stretchy="false">(</mml:mo>
                                    <mml:mi>&#x03c1;</mml:mi>
                                    <mml:mo>&#x2212;</mml:mo>
                                    <mml:mn>1</mml:mn>
                                    <mml:mo stretchy="false">)</mml:mo>
                                </mml:mrow>
                            </mml:msubsup>
                            <mml:mo>&#x22c5;</mml:mo>
                            <mml:msub>
                                <mml:mi>n</mml:mi>
                                <mml:mi>l</mml:mi>
                            </mml:msub>
                        </mml:mrow>
                    </mml:math>
                </inline-formula> for continuity. In a real experiment the three free parameters 
                <italic toggle="yes">n
                    <sub>l</sub>
                </italic>, 
                <italic toggle="yes">l
                    <sub>p</sub>
                </italic> and 
                <italic toggle="yes">&#x03c1;</italic> would be fit to pairwise distance distributions between different pairs of loci in a separate calibration experiment. For our purposes 
                <italic toggle="yes">n
                    <sub>l</sub>
                </italic> and 
                <italic toggle="yes">l
                    <sub>p</sub>
                </italic> were set to the same values used to generate the wormlike chain conformations, and since these conformations were random walks we set 
                <italic toggle="yes">&#x03c1;</italic> = 1/2.</p>
            <p>For each simulated conformation, we fed the label positions and colors together with the simulated 3D images and our DNA model into the 
                <monospace>align3d</monospace> algorithm to produce locus-to-spot mapping probabilities. For example, the simulated experiment shown in 
                <xref ref-type="fig" rid="f3">Figure 3A</xref> produced the mapping probabilities shown graphically in 
                <xref ref-type="fig" rid="f3">Figure 3B</xref> using circles, where the size of each circle indicates probability magnitude. Here grey circles show the mapping probabilities computed from 
                <inline-formula>
                    <mml:math display="inline" id="M37">
                        <mml:mrow>
                            <mml:msub>
                                <mml:mover accent="true">
                                    <mml:mi>Z</mml:mi>
                                    <mml:mo>&#x02dc;</mml:mo>
                                </mml:mover>
                                <mml:mn>0</mml:mn>
                            </mml:msub>
                        </mml:mrow>
                    </mml:math>
                </inline-formula> with no use of spot penalties, and blue circles show those same probabilities computed using the exact 
                <italic toggle="yes">Z</italic>. This example shows how excluding high-weight and heavily-overlapping conformations reduces and improves the partition function estimate (see 
                <xref ref-type="fig" rid="f3">Figure 3C</xref>) and concentrates the probability mass into the &#x2018;true&#x2019; locus-to-spot mappings (shown connected by the dotted red line in 
                <xref ref-type="fig" rid="f3">Figure 3B</xref>).</p>
            <p>Our reconstruction quality metric is the amount of 
                <italic toggle="yes">unrecovered information</italic> from the mapping probabilities, defined as 
                <italic toggle="yes">I</italic> = &#x2013; &#x2329;log 
                <italic toggle="yes">p</italic>(
                <italic toggle="yes">L</italic>
                <sub>i</sub> &#x2192; 
                <italic toggle="yes">s</italic>
                <sub>i</sub>)&#x232a;
                <sub>i</sub> where the average &#x2329;.&#x232a; is taken over the set of true locus-to-spot mappings (
                <italic toggle="yes">L
                    <sub>i</sub>
                </italic> , 
                <italic toggle="yes">s
                    <sub>i</sub>
                </italic>). The ideal case of 
                <italic toggle="yes">I</italic> &#x2192; 0 implies a perfect reconstruction with no mistakes and zero uncertainty, but in practice 
                <italic toggle="yes">I</italic> is always positive. In a real experiment where the true mappings are not known, we use a proxy for unrecovered information that we term entropy, defined as 
                <italic toggle="yes">S</italic> = &#x2212; &#x2329;
                <italic toggle="yes">p</italic>(
                <italic toggle="yes">L</italic>
                <italic toggle="yes">
                    <sub>i</sub>
                </italic> &#x2192; 
                <italic toggle="yes">s</italic>
                <italic toggle="yes">
                    <sub>j</sub>
                </italic>) log 
                <italic toggle="yes">p</italic>(
                <italic toggle="yes">L</italic>
                <italic toggle="yes">
                    <sub>i</sub>
                </italic> &#x2192; 
                <italic toggle="yes">s</italic>
                <italic toggle="yes">
                    <sub>j</sub>
                </italic>)&#x232a;
                <italic toggle="yes">
                    <sub>ij</sub>
                </italic> whose average is taken over all locus-to-spot mappings, not just the correct mappings. The goal is to have 
                <italic toggle="yes">S</italic> &#x2248; 
                <italic toggle="yes">I</italic> so that a real experiment will have an accurate estimate of the reconstruction performance. The left-hand panel of 
                <xref ref-type="fig" rid="f3">Figure 3C</xref> shows how 
                <italic toggle="yes">I</italic> and 
                <italic toggle="yes">S</italic> depend on the accuracy of the calculation for the simple example shown, using either of the two series expansions and varying the number of terms from 1 (simply 
                <inline-formula>
                    <mml:math display="inline" id="M38">
                        <mml:mrow>
                            <mml:msub>
                                <mml:mover accent="true">
                                    <mml:mi>Z</mml:mi>
                                    <mml:mo>&#x02dc;</mml:mo>
                                </mml:mover>
                                <mml:mn>0</mml:mn>
                            </mml:msub>
                        </mml:mrow>
                    </mml:math>
                </inline-formula>) to 2210 which is the full set of terms for either series and thus computes 
                <italic toggle="yes">Z</italic> exactly. Entropy generally overestimates the amount of unrecovered information (see 
                <xref ref-type="other" rid="SM1">Supplementary Figure S1</xref> and 
                <xref ref-type="other" rid="SM1">Supplementary Figure S2</xref>, 
                <xref ref-type="other" rid="SM1">Supplementary File 1</xref>), because the large mapping probabilities should be even larger, and the small ones even smaller, than their assigned values (see 
                <xref ref-type="other" rid="SM1">Supplementary Figure S3</xref>, 
                <xref ref-type="other" rid="SM1">Supplementary File 1</xref>). 
                <xref ref-type="other" rid="SF1">Appendix 3</xref> (
                <xref ref-type="other" rid="SF1">Supplementary File 1</xref>) argues that this miscalibration is caused by the mismatch between the wormlike chain DNA model used to generate the simulated conformations and the Gaussian chain model used by 
                <monospace>align3d</monospace> in the reconstruction.</p>
            <p>
                <bold>Validation of 
                    <xref ref-type="other" rid="e1">Equation 1</xref>&#x2013;
                    <xref ref-type="other" rid="e3">Equation 5</xref>.</bold> We first validated each of the two series expansions by comparing them against exact partition function calculations for the simulated Toy experiments. In all cases, both series expansions, when taken to their maximum number of terms, exactly reproduced the partition function calculations obtained by direct enumeration over all possible non-overlapping conformations. This test validates 
                <xref ref-type="other" rid="e4">Equation 4</xref> and 
                <xref ref-type="other" rid="e3">Equation 5</xref>. We also verified that both series expansions could be used in conjunction with spot penalty optimization (
                <xref ref-type="other" rid="e1">Equation 1</xref> and 
                <xref ref-type="other" rid="e2">Equation 2</xref>), both by numerically validating the cost function gradient calculation and by testing for convergence on these small problems.</p>
            <p>
                <bold>Improved optimization allows large-scale reconstructions.</bold> Next, we tested whether the iterative spot-penalty optimization rules given by 
                <xref ref-type="other" rid="e1">Equation 1</xref> and 
                <xref ref-type="other" rid="e2">Equation 2</xref> could work on large-scale problems such as those of Experiment 2, where the old gradient descent optimizer in 
                <monospace>align3d</monospace> had difficulty
                <sup>
                    <xref ref-type="bibr" rid="ref-8">8</xref>
                </sup>. The results are shown in 
                <xref ref-type="fig" rid="f4">Figure 4</xref>, which compares the number of iterative steps required to converge the 
                <inline-formula>
                    <mml:math display="inline" id="M116">
                        <mml:mrow>
                            <mml:mover>
                                <mml:mi>q</mml:mi>
                                <mml:mo>&#x2212;</mml:mo>
                            </mml:mover>
                        </mml:mrow>
                    </mml:math>
                </inline-formula> (missing-spot penalty) and 
                <italic toggle="yes">q</italic> (spot penalty) parameters without/with use of our improved optimization rules (labeled &#x2018;old&#x2019;/&#x2018;new&#x2019; respectively in the legend). Since the spot penalties 
                <italic toggle="yes">q</italic> are optimized for probability normalization only after 
                <inline-formula>
                    <mml:math display="inline" id="M117">
                        <mml:mrow>
                            <mml:mover>
                                <mml:mi>q</mml:mi>
                                <mml:mo>&#x2212;</mml:mo>
                            </mml:mover>
                        </mml:mrow>
                    </mml:math>
                </inline-formula> parameters have been optimized to achieve a desired missing spot frequency, we only attempted to optimize the 
                <italic toggle="yes">q</italic> parameters for simulations where 
                <inline-formula>
                    <mml:math display="inline" id="M118">
                        <mml:mrow>
                            <mml:mover>
                                <mml:mi>q</mml:mi>
                                <mml:mo>&#x2212;</mml:mo>
                            </mml:mover>
                        </mml:mrow>
                    </mml:math>
                </inline-formula> converged. There were two results from this experiment. First, more attempts to optimize the 
                <inline-formula>
                    <mml:math display="inline" id="M119">
                        <mml:mrow>
                            <mml:mover>
                                <mml:mi>q</mml:mi>
                                <mml:mo>&#x2212;</mml:mo>
                            </mml:mover>
                        </mml:mrow>
                    </mml:math>
                </inline-formula> and 
                <italic toggle="yes">q</italic> parameters successfully converged when using the new optimization rules in conjunction with gradient descent, as indicated by the greater volume of the &#x2018;new&#x2019; histogram and the correspondingly larger numbers shown in the legends. Secondly, of the trials that did converge, our new method required significantly fewer iterations and thus less computation time than the old method, as indicated by the relative skews of the distributions.</p>
            <fig fig-type="figure" id="f4" orientation="portrait" position="float">
                <label>Figure 4. </label>
                <caption>
                    <title>Comparison of old and new optimization methods.</title>
                    <p>Each panel compares the number of iterations required to achieve convergence using the old (purple) versus new (yellow) optimization methods. Only trials that successfully converged are counted, so the histograms are not normalized relative to each other. The first number in parentheses of each legend entry shows the number of converged trials, and the second number shows the total number of trials. Note that the second numbers in the right-hand panel equal the first numbers in the left-hand panel, since we required convergence in 
                        <inline-formula>
                            <mml:math display="inline" id="M120">
                                <mml:mrow>
                                    <mml:mover>
                                        <mml:mi>q</mml:mi>
                                        <mml:mo>&#x2212;</mml:mo>
                                    </mml:mover>
                                </mml:mrow>
                            </mml:math>
                        </inline-formula> in order to attempt optimization of the 
                        <italic toggle="yes">q</italic> parameters.</p>
                </caption>
                <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/20533/443bf5d9-6659-4411-ad00-936624286941_figure4.gif"/>
            </fig>
            <p>
                <bold>Use of more colors dramatically improves reconstructions.</bold> Our most striking result is that simulations of the ambitious Experiment 2 produce far better results than even the Toy scenario, despite the fact that these simulations have more loci per color than either the Toy scenario or Experiment 1. This can be seen in the amount of unrecovered information 
                <italic toggle="yes">I</italic> shown in the simulation-averaged plots of 
                <xref ref-type="fig" rid="f5">Figure 5A</xref>. High-quality reconstructions using ~ 20 colors were also observed by the ChromoTrace reconstruction method
                <sup>
                    <xref ref-type="bibr" rid="ref-9">9</xref>
                </sup> even for large numbers of labeled loci. Our explanation is that the reconstruction quality has more to do with the average spatial density of loci per color than the total number of loci per color, because each &#x2018;propagator&#x2019; evolving one potential locus-to-spot mapping to the next sees only the spots within some reasonable radius, as determined by the genomic distance to the next locus. These arguments really pertain to the information recovery of the baseline calculation of 
                <inline-formula>
                    <mml:math display="inline" id="M68">
                        <mml:mrow>
                            <mml:msub>
                                <mml:mover accent="true">
                                    <mml:mi>Z</mml:mi>
                                    <mml:mo>&#x02dc;</mml:mo>
                                </mml:mover>
                                <mml:mn>0</mml:mn>
                            </mml:msub>
                        </mml:mrow>
                    </mml:math>
                </inline-formula>; the story is more complicated when better approximating the true 
                <italic toggle="yes">Z</italic> which forbids spot reuse between loci, but a simple heuristic is that some average fraction of the competing spots were used earlier along the contour and should thus removed from consideration. If our reasoning is correct, then reconstructions based on huge numbers of labeled loci (for example whole-genome reconstructions) should be possible as long as the spot density does not get too high.</p>
            <p>At the end of this section we revisit Experiment 2, in order to assess the reconstruction quality when analyzing more realistic DNA contours having tighter confinement and thus more closely-packed spots.</p>
            <fig fig-type="figure" id="f5" orientation="portrait" position="float">
                <label>Figure 5. </label>
                <caption>
                    <title>Comparison of the convergence rates of series expansion 1 and series expansion 2.</title>
                    <p>
                        <bold>A</bold>. Median unrecovered information 
                        <italic toggle="yes">I</italic> as a function of the number of terms used in each series expansion, without using spot penalty optimization (solid lines) versus with optimization (dotted lines), and over the three simulation scenarios (panels left-to-right). Each curve was derived from the 100 individual curves corresponding to the 100 simulations in each scenario using a simple point-by-point median average. 
                        <bold>B</bold>. Percentile distribution of the difference between the unrecovered information using series 2 minus the unrecovered information using series 1; the fact that this difference quickly drops below zero in nearly all individual simulations shows that series 2 recovers more information in the first few terms than does series 1.</p>
                </caption>
                <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/20533/443bf5d9-6659-4411-ad00-936624286941_figure5.gif"/>
            </fig>
            <p>
                <bold>Series expansion 2 outperforms series expansion 1.</bold> Next, we compared the convergence properties of our two expansions on the three scenarios of simulated experiments. 
                <xref ref-type="fig" rid="f5">Figure 5A</xref> gives a sense of how the amount of unrecovered information varies with the number of terms taken in each series, without (solid lines) and with (dotted lines) the use of spot penalties. Each of the 3 panels summarizes all 100 simulated experiments of that scenario, and each experiment in that scenario shows a unique relationship between information recovery and number of series terms computed. Representative curves of individual experiments in each scenario are shown in 
                <xref ref-type="other" rid="SM1">Supplementary Figure S1</xref> (
                <xref ref-type="other" rid="SM1">Supplementary File 1</xref>). In order to summarize these very dissimilar curves, 
                <xref ref-type="fig" rid="f5">Figure 5A</xref> shows a median average of all 100 individual experimental curves taken at each data point. Note that this averaging process does not necessarily preserve the shape of the curves from typical individual simulations.</p>
            <p>In order to directly compare the two series expansions, we plotted their difference in unrecovered information 
                <italic toggle="yes">I</italic>
                <sub>2</sub> &#x2212; 
                <italic toggle="yes">I</italic>
                <sub>1</sub> versus the number of series terms in 
                <xref ref-type="fig" rid="f5">Figure 5B</xref>. In this case, we plotted the full distribution showing the median (50th percentile) as well as the 10th, 25th, 75th and 90th percentile curves. These plots show directly that series 2 almost always outperforms series 1 when only a few terms can be evaluated. The reason is that the terms in series 2 are larger in magnitude owing to their looser constraints, and thus remove the extraneous part of the partition function more quickly than the terms of series 1 (see 
                <xref ref-type="other" rid="SM1">Supplementary Figure S1</xref> and 
                <xref ref-type="other" rid="SM1">Supplementary Figure S4</xref>, 
                <xref ref-type="other" rid="SM1">Supplementary File 1</xref>). Based on these results, we recommend using series expansion 2 in all situations where the partition function cannot be evaluated exactly.</p>
            <p>
                <bold>Spot penalty optimization is the most efficient way to recover information.</bold> Spot penalty optimization is an iterative process where each iterative step requires the evaluation of some number of series terms. An optimization requiring 
                <italic toggle="yes">t</italic> iterations thus multiplies computation time by a factor of 
                <italic toggle="yes">t</italic> relative to the simple evaluation of the series. Alternatively, one could spend the extra computation time on taking the series to a higher order without spot penalty optimization. 
                <xref ref-type="fig" rid="f6">Figure 6A</xref> plots the unrecovered information when a) taking series 2 to a certain order without optimization, versus b) using spot penalty optimization on only the first term yielding 
                <inline-formula>
                    <mml:math display="inline" id="M40">
                        <mml:mrow>
                            <mml:msubsup>
                                <mml:mover accent="true">
                                    <mml:mi>Z</mml:mi>
                                    <mml:mo>&#x02dc;</mml:mo>
                                </mml:mover>
                                <mml:mn>0</mml:mn>
                                <mml:mrow>
                                    <mml:mi>o</mml:mi>
                                    <mml:mi>p</mml:mi>
                                    <mml:mi>t</mml:mi>
                                </mml:mrow>
                            </mml:msubsup>
                        </mml:mrow>
                    </mml:math> </inline-formula> . The dotted line in each panel shows the median number of terms requiring the same computation time as 
                <inline-formula>
                    <mml:math display="inline" id="M500">
                        <mml:mrow>
                            <mml:msubsup>
                                <mml:mover accent="true">
                                    <mml:mi>Z</mml:mi>
                                    <mml:mo>&#x02dc;</mml:mo>
                                </mml:mover>
                                <mml:mn>0</mml:mn>
                                <mml:mrow>
                                    <mml:mi>o</mml:mi>
                                    <mml:mi>p</mml:mi>
                                    <mml:mi>t</mml:mi>
                                </mml:mrow>
                            </mml:msubsup>
                        </mml:mrow>
                    </mml:math> </inline-formula> . The Toy scenario shows that, if the series expansion is carried deep enough, it becomes more accurate than 
                <inline-formula>
                    <mml:math display="inline" id="M450">
                        <mml:mrow>
                            <mml:msubsup>
                                <mml:mover accent="true">
                                    <mml:mi>Z</mml:mi>
                                    <mml:mo>&#x02dc;</mml:mo>
                                </mml:mover>
                                <mml:mn>0</mml:mn>
                                <mml:mrow>
                                    <mml:mi>o</mml:mi>
                                    <mml:mi>p</mml:mi>
                                    <mml:mi>t</mml:mi>
                                </mml:mrow>
                            </mml:msubsup>
                        </mml:mrow>
                    </mml:math> </inline-formula> : in other words the difference 
                <inline-formula>
                    <mml:math display="inline" id="M43">
                        <mml:mrow>
                            <mml:mi>I</mml:mi>
                            <mml:mo>&#x2212;</mml:mo>
                            <mml:msubsup>
                                <mml:mi>I</mml:mi>
                                <mml:mn>0</mml:mn>
                                <mml:mrow>
                                    <mml:mi>o</mml:mi>
                                    <mml:mi>p</mml:mi>
                                    <mml:mi>t</mml:mi>
                                </mml:mrow>
                            </mml:msubsup>
                        </mml:mrow>
                    </mml:math>
                </inline-formula> becomes negative. However, for the practical scenarios of Experiments 1 and 2 this crossover point requires taking more terms than would be needed to match the computational cost of calculating 
                <inline-formula>
                    <mml:math display="inline" id="M400">
                        <mml:mrow>
                            <mml:msubsup>
                                <mml:mover accent="true">
                                    <mml:mi>Z</mml:mi>
                                    <mml:mo>&#x02dc;</mml:mo>
                                </mml:mover>
                                <mml:mn>0</mml:mn>
                                <mml:mrow>
                                    <mml:mi>o</mml:mi>
                                    <mml:mi>p</mml:mi>
                                    <mml:mi>t</mml:mi>
                                </mml:mrow>
                            </mml:msubsup>
                        </mml:mrow>
                    </mml:math> </inline-formula> (the dotted line). Based on these results, we recommend always performing spot penalty optimization, especially for larger reconstructions.</p>
            <fig fig-type="figure" id="f6" orientation="portrait" position="float">
                <label>Figure 6. </label>
                <caption>
                    <title>Optimization in conjunction with series expansions.</title>
                    <p>
                        <bold>A</bold>. Comparison of unrecovered information using series expansions without iteration, denoted 
                        <italic toggle="yes">I</italic>, to the unrecovered information obtained by optimizing spot penalties using only the first series term, denoted 
                        <inline-formula>
                            <mml:math display="inline" id="M45">
                                <mml:mrow>
                                    <mml:msubsup>
                                        <mml:mi>I</mml:mi>
                                        <mml:mn>0</mml:mn>
                                        <mml:mrow>
                                            <mml:mi>o</mml:mi>
                                            <mml:mi>p</mml:mi>
                                            <mml:mi>t</mml:mi>
                                        </mml:mrow>
                                    </mml:msubsup>
                                </mml:mrow>
                            </mml:math>
                        </inline-formula>, over three experimental situations. Vertical dotted lines indicate the median number of series terms computable with the same computational time as was required to obtain 
                        <inline-formula>
                            <mml:math display="inline" id="M46">
                                <mml:mrow>
                                    <mml:msubsup>
                                        <mml:mi>I</mml:mi>
                                        <mml:mn>0</mml:mn>
                                        <mml:mrow>
                                            <mml:mi>o</mml:mi>
                                            <mml:mi>p</mml:mi>
                                            <mml:mi>t</mml:mi>
                                        </mml:mrow>
                                    </mml:msubsup>
                                </mml:mrow>
                            </mml:math>
                        </inline-formula>. For Experiments 1 and 2 the difference 
                        <inline-formula>
                            <mml:math display="inline" id="M47">
                                <mml:mrow>
                                    <mml:mi>I</mml:mi>
                                    <mml:mo>&#x2212;</mml:mo>
                                    <mml:msubsup>
                                        <mml:mi>I</mml:mi>
                                        <mml:mn>0</mml:mn>
                                        <mml:mrow>
                                            <mml:mi>o</mml:mi>
                                            <mml:mi>p</mml:mi>
                                            <mml:mi>t</mml:mi>
                                        </mml:mrow>
                                    </mml:msubsup>
                                </mml:mrow>
                            </mml:math>
                        </inline-formula> is typically positive at the intersection of the dotted line, indicating that spot penalty optimization method is the more efficient way of recovering information. 
                        <bold>B</bold>. Comparison of unrecovered information using spot-penalty optimization in conjunction with multiple series terms versus optimization of 
                        <inline-formula>
                            <mml:math display="inline" id="M48">
                                <mml:mrow>
                                    <mml:msub>
                                        <mml:mover accent="true">
                                            <mml:mi>Z</mml:mi>
                                            <mml:mo>&#x02dc;</mml:mo>
                                        </mml:mover>
                                        <mml:mn>0</mml:mn>
                                    </mml:msub>
                                </mml:mrow>
                            </mml:math>
                        </inline-formula> alone, showing the added benefit of including more terms in the series.</p>
                </caption>
                <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/20533/443bf5d9-6659-4411-ad00-936624286941_figure6.gif"/>
            </fig>
            <p>
                <bold>Series expansions can improve optimization information recovery.</bold> Although spot penalty optimization is the most efficient way to recover information, that process alone can only extract a certain fraction of the recoverable information: once the cost function is zero, optimization can proceed no further despite the problem not having been solved exactly. At this point, the only way forward is to go higher in the order of series terms used; we can still apply spot penalties to this sum of terms and iteratively optimize them as before using 
                <xref ref-type="other" rid="e1">Equation 1</xref> and 
                <xref ref-type="other" rid="e2">Equation 2</xref>. 
                <xref ref-type="fig" rid="f6">Figure 6B</xref> plots the difference in unrecovered information when applying spot penalty optimization between a) a variable number of terms in series expansion 2, and b) only 
                <inline-formula>
                    <mml:math display="inline" id="M49">
                        <mml:mrow>
                            <mml:msub>
                                <mml:mover accent="true">
                                    <mml:mi>Z</mml:mi>
                                    <mml:mo>&#x02dc;</mml:mo>
                                </mml:mover>
                                <mml:mn>0</mml:mn>
                            </mml:msub>
                        </mml:mrow>
                    </mml:math>
                </inline-formula> (the first series term). This figure shows that including additional series terms in the optimization improves the information recovery, albeit at a slow rate (especially for large problems).</p>
            <p>
                <bold>20-color labeling leads to near-perfect reconstructions.</bold> As shown in 
                <xref ref-type="fig" rid="f5">Figure 5A</xref>, the unrecovered information for the whole-chromosome Experiment 2 averages around 0.2 bits per locus, implying near perfect mapping probabilities. However, because these results were based on randomly-generated unconfined conformations, they do not establish whether such good information recovery is possible with real chromosomes which are likely to be more compact. To test Experiment 2 on realistic chromosome conformations, we generated four plausible conformations of human chromosome 4 by running the 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/MagiKoco/GEM">GEM</ext-link> software package
                <sup>
                    <xref ref-type="bibr" rid="ref-13">13</xref>
                </sup> on the smoothed human Hi-C data set provided by Yaffe and Tanay, 2011
                <sup>
                    <xref ref-type="bibr" rid="ref-14">14</xref>
                </sup> and using a 3D spline interpolation to increase the resolution from 1 Mb to 50 kb. These conformations were then virtually labeled at 300 randomly-selected loci and simulated experimental error was added in as before. One set of experiments assumed diffraction-limited 100/200 nm localization error in xy/z, and a second set of experiments assumed superresolution 30/50 nm localization error in xy/z; in both sets the missing- and extra-spot rates were 10%. For this experiment we determined the DNA model parameters 
                <italic toggle="yes">n
                    <sub>l</sub>
                </italic> and 
                <italic toggle="yes">l
                    <sub>p</sub>
                </italic> by fitting pairwise locus distributions, as one would do in an experiment, and for 
                <italic toggle="yes">L &gt; l
                    <sub>p</sub>
                </italic> we set 
                <italic toggle="yes">&#x03c1;</italic> = 1/3 as that has been reported in the literature for locus separations under 7 Mb
                <sup>
                    <xref ref-type="bibr" rid="ref-4">4</xref>
                </sup>. Mapping probabilities were reconstructed by taking series expansion 2 to the lowest order that included at least 1000 terms, then applying and optimizing spot penalties. Compared with the random-walk conformations used to test the Experiment 2 scenario, the diffraction-limited reconstructions did somewhat worse (~ 0.4 versus ~ 0.2 bits of unrecovered information per locus) owing to fact that physical confinement of chromosomes increases the density of competing spots in the image. The superresolution reconstruction quality was unchanged at ~ 0.2 bits of unrecovered information.</p>
            <p>Despite the drop in performance when localizing spots at the diffraction limit, 0.4 bits of unrecovered information per locus is still an extremely strong reconstruction, implying that the correct locus-to-spot mappings are assigned 
                <italic toggle="yes">p</italic>-values averaging around 2
                <sup>&#x2013;0.4</sup> &#x2248; 76%. Starting from such accurate and confident mapping probabilities, one can infer a reasonable conformation simply by assigning each locus to the unassigned spot to which it maps with the highest probability (or calling a missing spot if 
                <inline-formula>
                    <mml:math display="inline" id="M50">
                        <mml:mrow>
                            <mml:mn>1</mml:mn>
                            <mml:mo>&#x2212;</mml:mo>
                            <mml:mstyle displaystyle="true">
                                <mml:msub>
                                    <mml:mo>&#x2211;</mml:mo>
                                    <mml:mrow>
                                        <mml:mi>s</mml:mi>
                                        <mml:mo>&#x2032;</mml:mo>
                                    </mml:mrow>
                                </mml:msub>
                                <mml:mrow>
                                    <mml:mtext>&#x2009;</mml:mtext>
                                    <mml:msub>
                                        <mml:mi>p</mml:mi>
                                        <mml:mrow>
                                            <mml:mi>L</mml:mi>
                                            <mml:mo>&#x2192;</mml:mo>
                                            <mml:msup>
                                                <mml:mi>s</mml:mi>
                                                <mml:mo>&#x2032;</mml:mo>
                                            </mml:msup>
                                        </mml:mrow>
                                    </mml:msub>
                                </mml:mrow>
                            </mml:mstyle>
                        </mml:mrow>
                    </mml:math> </inline-formula> &gt; any 
                <italic toggle="yes">p</italic>
                <sub>
                    <italic toggle="yes">L</italic>&#x2192;
                    <italic toggle="yes">s</italic>
                </sub>), repeating the process for overlapping loci, and drawing a line in the image that connects these spots in genomic order. The conformations produced by this simple rule are shown in 
                <xref ref-type="fig" rid="f7">Figure 7</xref>: the correct conformation is shown with a blue line and errors in the inferred conformation are shown in red. The reconstructed conformations are ~ 90% accurate at diffraction-limited resolution and ~ 96% accurate at superresolution, as determined by an alignment between the true and inferred spot sequences traveling along the DNA contour. Most mistakes are of a sort that does not change the large-scale structure. For example, one common error is to erroneously skip one or more spots in the image, thus &#x2018;looping out&#x2019; a small part of the conformation and effectively lowering the resolution.</p>
            <fig fig-type="figure" id="f7" orientation="portrait" position="float">
                <label>Figure 7. </label>
                <caption>
                    <title>Simulated reconstructions of 4 plausible conformations of human chromosome 4.</title>
                    <p>The left-hand reconstruction of each conformation was obtained using a simulated image from diffraction-limited microscopy (shown in inset; localization error is shown as lines connecting spots to DNA), and the right-hand reconstruction used a simulated superresolution image. Grey shaded lines indicate the underlying DNA contours; blue lines trace the ideal reconstructed contours given the measured spot positions; red lines show our reconstructed contours where they deviate from the ideal contours. Captions above each reconstruction indicate the amount of unrecovered information 
                        <italic toggle="yes">I</italic> per locus after/before the reconstruction process; captions below indicate the number of alignment errors between the spot ID sequences read along the true versus inferred conformations. For both superresolution reconstructions 2 and 3 we calculated 
                        <italic toggle="yes">I</italic> excluding a single locus whose true spot mapping was given 0 probability; including that locus sends 
                        <italic toggle="yes">I</italic> &#x2192; &#x221e;.</p>
                </caption>
                <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/20533/443bf5d9-6659-4411-ad00-936624286941_figure7.gif"/>
            </fig>
            <p>
                <xref ref-type="fig" rid="f7">Figure 7</xref> shows that the benefit of superresolution is twofold: 1) the locus-to-spot mapping quality improves relative to diffraction-limited resolution (i.e. fewer red lines), and 2) the small-scale structure of an ideal mapping (blue line) more faithfully traces the underlying contour (grey line). This shows the importance of measuring spot locations to sub-pixel resolution, even in experiments where normal-resolution microscopes using standard fluorophores are used to localize spots separated by two pixels or more. In our GEM conformations 23 spots were closer than 200 nm to another spot of the same color, which would indicate problems localizing these spots, but this is inconsistent with the data shown in Wang 
                <italic toggle="yes">et al</italic>., 2016
                <sup>
                    <xref ref-type="bibr" rid="ref-4">4</xref>
                </sup> which indicates that virtually all spots in our experimental scenarios should be well-separated in at least in some cell lines.</p>
        </sec>
        <sec sec-type="discussion">
            <title>Discussion</title>
            <p>We have developed and evaluated two improvements to the 
                <monospace>align3d</monospace> method for reconstructing chromosome structure. Both of these improve the partition function estimates that determine the locus-to-spot mapping probabilities, which can provide the basis for (probabilistic) reconstructed conformations. The first improvement is a more robust spot-penalty optimizer that allows for large-scale reconstructions involving hundreds of labeled loci, such as will be needed to uncover whole-chromosome conformations. The second improvement is two series expansion formulas for the partition functions, which in principle allow the mapping probabilities to be solved to arbitrary accuracy within the limitations of the experiment and the underlying DNA model. In practice, the series approach is difficult for two reasons: 1) there are a huge number of terms in each series expansion, and 2) the lowest-order approximation 
                <inline-formula>
                    <mml:math display="inline" id="M51">
                        <mml:mrow>
                            <mml:msub>
                                <mml:mover accent="true">
                                    <mml:mi>Z</mml:mi>
                                    <mml:mo>&#x02dc;</mml:mo>
                                </mml:mover>
                                <mml:mn>0</mml:mn>
                            </mml:msub>
                        </mml:mrow>
                    </mml:math>
                </inline-formula> overestimates 
                <italic toggle="yes">Z</italic> by many orders of magnitude, unlike other series expansions where the initial approximation is close to the final answer. Despite the difficulties, the series formulas that we give offer some way forward to improve on the original estimate 
                <inline-formula>
                    <mml:math display="inline" id="M52">
                        <mml:mrow>
                            <mml:msubsup>
                                <mml:mover accent="true">
                                    <mml:mi>Z</mml:mi>
                                    <mml:mo>&#x02dc;</mml:mo>
                                </mml:mover>
                                <mml:mn>0</mml:mn>
                                <mml:mrow>
                                    <mml:mi>o</mml:mi>
                                    <mml:mi>p</mml:mi>
                                    <mml:mi>t</mml:mi>
                                </mml:mrow>
                            </mml:msubsup>
                        </mml:mrow>
                    </mml:math> </inline-formula>. Of the two formulas, we recommend using series expansion 2, which has the larger-magnitude terms and thus recovers the most information when only a few terms can be evaluated.</p>
            <p>Our problem of finding likely (i.e. low-free-energy) DNA conformations passing through a set of imaged spots is similar to the well-known traveling salesman problem (TSP), in which a salesman must find the shortest route connecting a set of cities. Somewhat more closely related is a generalization of the TSP called the time-dependent traveling salesman problem (TDTSP)
                <sup>
                    <xref ref-type="bibr" rid="ref-10">10</xref>
                </sup>, where the intercity distances change every step on the tour; this is analogous to our situation where the free energy needed to thread DNA between two spots depends not only on their separation but also on the length of DNA used to connect them. In our case, the presence of missing and extra spots generalizes our problem still further: in the TDTSP analogy the salesman would be allowed to skip stops and cities for a penalty. Our main finding is that the partition function of this generalized TDTSP (which encompasses traditional TSP and TDTSP problems) can be expressed as a sum of terms computable using a (modified) forward-backward algorithm, a result which should also apply to other route-finding applications where research has historically focused on route optimization rather than route inference.</p>
            <p>Both our mapping 
                <italic toggle="yes">p</italic>-values and our entropy proxy for information recovery show a systematic bias, which comes from the use of a different DNA model for reconstruction than was used to create the simulated DNA contours. The fact that our reconstructions were nonetheless quite strong shows that the reconstruction method itself is quite robust to model error. This is very fortunate given the uncertainty in the true 
                <italic toggle="yes">in vivo</italic> biological model describing the cells in a real experiment. For our results to be accurate, we had to calibrate our model so as to reproduce the peak in the distance distribution of pairs of distinguishable loci. An experimenter would perform this calibration by imaging distinguishable pairs of loci in a parallel experiment. Due to 
                <monospace>align3d</monospace>&#x2019;s use of a very permissive Gaussian chain DNA model, both systematic biases work in the direction of causing the method to underestimate its performance: high 
                <italic toggle="yes">p</italic>-values should be higher (and low 
                <italic toggle="yes">p</italic>-values lower) than reported, and the unrecovered information tends to be less than the entropy estimate. Thus the results are at least as good as they appear to be.</p>
            <p>From a genomic standpoint, our most exciting result is that the combination of our computational improvements together with 20-color labeling technology gives almost perfect reconstructions at the whole-chromosome scale. Out of ~ 4 bits per locus of uncertainty inherent in the reconstruction problem, our method recovers ~ 3.6&#x2013;3.8 bits. Such confident mapping probabilities allow for the direct construction of individual conformations that are &#x2265; 90% accurate. High-quality piecewise reconstructions are likewise possible with two overlapping copies of the same chromosome (data not shown), although sometimes the fragments cannot be assembled. We want to emphasize that our reconstructions require only a few parameters that would be known experimentally with proper controls: the 3 DNA model parameters which in a real scenario would be calibrated using a control experiment, and the correct average rates of missing and extra spots averaged over all experiments, used by 
                <monospace>align3d</monospace> to estimate the actual number of missing spots per color in each experiment. The robustness of the analysis to experimental unknowns gives evidence that reconstructions using real-world experimental data will be of similar quality to those in our simulations, and if so then direct measurement of chromosome conformations is possible today with current technology.</p>
        </sec>
        <sec>
            <title>Data availability</title>
            <p>The simulated conformations and labelings used to generate the plots in this paper, together with the output of the 
                <monospace>align3d</monospace> analysis, can be found at: 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/heltilda/align3d/blob/master/seriesExpansions/a3dRawData.zip">https://github.com/heltilda/align3d/blob/master/seriesExpansions/a3dRawData.zip</ext-link>
            </p>
        </sec>
        <sec>
            <title>Software availability</title>
            <p>Results in this paper were generated using version 1.1 of 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/heltilda/align3d">
                    <monospace>align3d</monospace>
                </ext-link>, built using version 1.1 of 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/heltilda/cicada">Cicada scripting language</ext-link>. Simulated conformations and labelings were generated using version 1.1 of 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/heltilda/wormulator">
                    <monospace>wormulator</monospace>
                </ext-link>.</p>
            <p>All source files used in preparing this paper are available from the GitHub page for this paper: 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/heltilda/align3d/tree/master/seriesExpansions">https://github.com/heltilda/align3d/tree/master/seriesExpansions</ext-link>.</p>
            <p>License: GPL 3.0</p>
            <p>Archived code at time of publication:</p>
            <p>align3d: 
                <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.2580342">https://doi.org/10.5281/zenodo.2580342</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-15">15</xref>
                </sup>
            </p>
            <p>License: GPL 3.0</p>
            <p>wormulator: 
                <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.1411503">https://doi.org/10.5281/zenodo.1411503</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-16">16</xref>
                </sup>
            </p>
            <p>License: GPL 3.0</p>
            <p>Cicada scripting language: 
                <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.1411505">https://doi.org/10.5281/zenodo.1411505</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-17">17</xref>
                </sup>
            </p>
            <p>License: MIT License</p>
        </sec>
    </body>
    <back>
        <ack>
            <title>Acknowledgements</title>
            <p>The authors want to thank Rani Powers and Jenny Mae Samson for helping review the manuscript.</p>
        </ack>
        <sec id="SM1" sec-type="supplementary-material">
            <title>Supplementary material</title>
            <p id="SF1">Supplementary File 1: align3dSupplement.pdf. File containing three appendices giving the derivations of the equations used in this text (Appendix 1 and Appendix 2), a discussion of model error (Appendix 3), and the supplemental figures (Appendix 4).</p>
            <p>
                <ext-link ext-link-type="uri" xlink:href="https://f1000researchdata.s3.amazonaws.com/supplementary/16252/bdc6b253-d39b-459d-b1f8-91ce179623ee_Supplementary_file_1_v2.pdf">Click here to access the data</ext-link>
            </p>
        </sec>
        <fn-group>
            <fn id="FN1">
                <label>1</label>
                <p>Depending on how the experiment is done, two spots of the same color sufficiently close in the image may appear as a single spot where the conformation self-overlaps. We prefer to treat this scenario as a missing-spot measurement error rather than relax the one-spot-per-locus rule. If the spots have been properly localized, then the underlying conformation visits any given spot once at most.</p>
            </fn>
        </fn-group>
        <ref-list>
            <ref id="ref-1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Dekker</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Belmont</surname>
                            <given-names>AS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Guttman</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The 4D nucleome project.</article-title>
                    <source>

                        <italic toggle="yes">Nature.</italic>
</source>
                    <year>2017</year>;<volume>549</volume>(<issue>7671</issue>):<fpage>219</fpage>&#x2013;<lpage>226</lpage>.
                    <pub-id pub-id-type="pmid">28905911</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nature23884</pub-id>
                    <pub-id pub-id-type="pmcid">5617335</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Imakaev</surname>
                            <given-names>MV</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Fudenberg</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mirny</surname>
                            <given-names>LA</given-names>
                        </name>
</person-group>:
                    <article-title>Modeling chromosomes: Beyond pretty pictures.</article-title>
                    <source>

                        <italic toggle="yes">FEBS Lett.</italic>
</source>
                    <year>2015</year>;<volume>589</volume>(<issue>20 Pt A</issue>):<fpage>3031</fpage>&#x2013;<lpage>3036</lpage>.
                    <pub-id pub-id-type="pmid">26364723</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.febslet.2015.09.004</pub-id>
                    <pub-id pub-id-type="pmcid">4722799</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kocanova</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Goiffon</surname>
                            <given-names>I</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Bystricky</surname>
                            <given-names>K</given-names>
                        </name>
</person-group>:
                    <article-title>3D FISH to analyse gene domain-specific chromatin re-modeling in human cancer cell lines.</article-title>
                    <source>

                        <italic toggle="yes">Methods.</italic>
</source>
                    <year>2018</year>;<volume>142</volume>:<fpage>3</fpage>&#x2013;<lpage>15</lpage>.
                    <pub-id pub-id-type="pmid">29501423</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.ymeth.2018.02.013</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Su</surname>
                            <given-names>JH</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Beliveau</surname>
                            <given-names>BJ</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Spatial organization of chromatin domains and compartments in single chromosomes.</article-title>
                    <source>

                        <italic toggle="yes">Science.</italic>
</source>
                    <year>2016</year>;<volume>353</volume>(<issue>6299</issue>):<fpage>598</fpage>&#x2013;<lpage>602</lpage>.
                    <pub-id pub-id-type="pmid">27445307</pub-id>
                    <pub-id pub-id-type="doi">10.1126/science.aaf8084</pub-id>
                    <pub-id pub-id-type="pmcid">4991974</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Takei</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Shah</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Harvey</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Multiplexed Dynamic Imaging of Genomic Loci by Combined CRISPR Imaging and DNA Sequential FISH.</article-title>
                    <source>

                        <italic toggle="yes">Biophys J.</italic>
</source>
                    <year>2017</year>;<volume>112</volume>(<issue>9</issue>):<fpage>1773</fpage>&#x2013;<lpage>1776</lpage>.
                    <pub-id pub-id-type="pmid">28427715</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.bpj.2017.03.024</pub-id>
                    <pub-id pub-id-type="pmcid">5425380</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ma</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Tu</surname>
                            <given-names>LC</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Naseri</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Multiplexed labeling of genomic loci with dCas9 and engineered sgRNAs using CRISPRainbow.</article-title>
                    <source>

                        <italic toggle="yes">Nat Biotechnol.</italic>
</source>
                    <year>2016</year>;<volume>34</volume>(<issue>5</issue>):<fpage>528</fpage>&#x2013;<lpage>30</lpage>.
                    <pub-id pub-id-type="pmid">27088723</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nbt.3526</pub-id>
                    <pub-id pub-id-type="pmcid">4864854</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Lowenstein</surname>
                            <given-names>MG</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Goddard</surname>
                            <given-names>TD</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sedat</surname>
                            <given-names>JW</given-names>
                        </name>
</person-group>:
                    <article-title>Long-range interphase chromosome organization in 
                        <italic toggle="yes">Drosophila</italic>: a study using color barcoded fluorescence 
                        <italic toggle="yes">in situ</italic> hybridization and structural clustering analysis.</article-title>
                    <source>

                        <italic toggle="yes">Mol Biol Cell.</italic>
</source>
                    <year>2004</year>;<volume>15</volume>(<issue>12</issue>):<fpage>5678</fpage>&#x2013;<lpage>5692</lpage>.
                    <pub-id pub-id-type="pmid">15371546</pub-id>
                    <pub-id pub-id-type="doi">10.1091/mbc.e04-04-0289</pub-id>
                    <pub-id pub-id-type="pmcid">532046</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ross</surname>
                            <given-names>BC</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wiggins</surname>
                            <given-names>PA</given-names>
                        </name>
</person-group>:
                    <article-title>Measuring chromosome conformation with degenerate labels.</article-title>
                    <source>

                        <italic toggle="yes">Phys Rev E Stat Nonlin Soft Matter Phys.</italic>
</source>
                    <year>2012</year>;<volume>86</volume>(<issue>1 Pt 1</issue>):<fpage>011918</fpage>.
                    <pub-id pub-id-type="pmid">23005463</pub-id>
                    <pub-id pub-id-type="doi">10.1103/PhysRevE.86.011918</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Barton</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Morganella</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Oedegaard</surname>
                            <given-names>O</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Chromotrace: Reconstruction of 3D chromosome configurations by super-resolution microscopy.</article-title>
                    <source>

                        <italic toggle="yes">bioRxiv.</italic>
</source>
                    <year>2017</year>; 115436.
                    <pub-id pub-id-type="doi">10.1101/115436</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Gouveia</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Vo&#x00df;</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>A classification of formulations for the (time-dependent) traveling salesman problem.</article-title>
                    <source>

                        <italic toggle="yes">Eur J Oper Res.</italic>
</source>
                    <year>1995</year>;<volume>83</volume>(<issue>1</issue>):<fpage>69</fpage>&#x2013;<lpage>82</lpage>.
                    <pub-id pub-id-type="doi">10.1016/0377-2217(93)E0238-S</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Baum</surname>
                            <given-names>L</given-names>
                        </name>
</person-group>:
                    <article-title>An inequality and associated maximization technique in statistical estimation of probabilistic functions of a markov process.</article-title>
                    <source>

                        <italic toggle="yes">Inequalities.</italic>
</source>
                    <year>1972</year>;<volume>3</volume>:<fpage>1</fpage>&#x2013;<lpage>8</lpage>.
                    <ext-link ext-link-type="uri" xlink:href="https://ci.nii.ac.jp/naid/10000006411/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Trask</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pinkel</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>van den Engh</surname>
                            <given-names>G</given-names>
                        </name>
</person-group>:
                    <article-title>The proximity of DNA sequences in interphase cell nuclei is correlated to genomic distance and permits ordering of cosmids spanning 250 kilobase pairs.</article-title>
                    <source>

                        <italic toggle="yes">Genomics.</italic>
</source>
                    <year>1989</year>;<volume>5</volume>(<issue>4</issue>):<fpage>710</fpage>&#x2013;<lpage>717</lpage>.
                    <pub-id pub-id-type="pmid">2591960</pub-id>
                    <pub-id pub-id-type="doi">10.1016/0888-7543(89)90112-2</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-13">
                <label>13</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zhu</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Deng</surname>
                            <given-names>W</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hu</surname>
                            <given-names>H</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Reconstructing spatial organizations of chromosomes through manifold learning.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2018</year>;<volume>46</volume>(<issue>8</issue>):<fpage>e50</fpage>.
                    <pub-id pub-id-type="pmid">29408992</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gky065</pub-id>
                    <pub-id pub-id-type="pmcid">5934626</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-14">
                <label>14</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Yaffe</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Tanay</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture.</article-title>
                    <source>

                        <italic toggle="yes">Nat Genet.</italic>
</source>
                    <year>2011</year>;<volume>43</volume>(<issue>11</issue>):<fpage>1059</fpage>&#x2013;<lpage>65</lpage>.
                    <pub-id pub-id-type="pmid">22001755</pub-id>
                    <pub-id pub-id-type="doi">10.1038/ng.947</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-15">
                <label>15</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>heltilda</surname>
                        </name>
</person-group>:
                    <article-title>heltilda/align3d: Final version of align3d incorporating series formulas (Version 1.1.1).</article-title>
                    <source>

                        <italic toggle="yes">Zenodo.</italic>
</source>
                    <year>2019</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://www.doi.org/10.5281/zenodo.2580342">http://www.doi.org/10.5281/zenodo.2580342</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-16">
                <label>16</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>heltilda</surname>
                        </name>
</person-group>:
                    <article-title>heltilda/wormulator: wormulator version for paper (Version 1.1).</article-title>
                    <source>

                        <italic toggle="yes">Zenodo.</italic>
</source>
                    <year>2018</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://www.doi.org/10.5281/zenodo.1411503">http://www.doi.org/10.5281/zenodo.1411503</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-17">
                <label>17</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>heltilda</surname>
                        </name>
</person-group>:
                    <article-title>heltilda/cicada: Cicada version used in F1000Research paper (Version 1.1).</article-title>
                    <source>

                        <italic toggle="yes">Zenodo.</italic>
</source>
                    <year>2018</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://www.doi.org/10.5281/zenodo.1411505">http://www.doi.org/10.5281/zenodo.1411505</ext-link>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report46353">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.20533.r46353</article-id>
            <title-group>
                <article-title>Reviewer response for version 3</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Lagomarsino</surname>
                        <given-names>Marco Cosentino</given-names>
                    </name>
                    <xref ref-type="aff" rid="r46353a1">1</xref>
                    <xref ref-type="aff" rid="r46353a2">2</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-0235-0445</uri>
                </contrib>
                <contrib contrib-type="author">
                    <name>
                        <surname>Scolari</surname>
                        <given-names>Vittore</given-names>
                    </name>
                    <xref ref-type="aff" rid="r46353a3">3</xref>
                    <role>Co-referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-3490-0579</uri>
                </contrib>
                <aff id="r46353a1">
                    <label>1</label>Physics Department, University of Milan, Milan, Italy</aff>
                <aff id="r46353a2">
                    <label>2</label>IFOM Foundation&#x2014;FIRC Institute of Molecular Oncology, Milan, Italy</aff>
                <aff id="r46353a3">
                    <label>3</label>Pasteur Institute, Paris, France</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>19</day>
                <month>7</month>
                <year>2019</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Lagomarsino MC and Scolari V</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport46353" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.16252.3"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>We have now both read the replies and the new manuscript and we are satisfied.</p>
            <p>Is the rationale for developing the new method (or application) clearly explained?</p>
            <p>Partly</p>
            <p>Is the description of the method technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions about the method and its performance adequately supported by the findings presented in the article?</p>
            <p>Partly</p>
            <p>If any results are presented, are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Partly</p>
            <p>Are sufficient details provided to allow replication of the method development and its use by others?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>NA</p>
            <p>We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report45547">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.20223.r45547</article-id>
            <title-group>
                <article-title>Reviewer response for version 2</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Birney</surname>
                        <given-names>Ewan</given-names>
                    </name>
                    <xref ref-type="aff" rid="r45547a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-8314-8497</uri>
                </contrib>
                <contrib contrib-type="author">
                    <name>
                        <surname>Barton</surname>
                        <given-names>Carl</given-names>
                    </name>
                    <xref ref-type="aff" rid="r45547a1">1</xref>
                    <role>Co-referee</role>
                </contrib>
                <aff id="r45547a1">
                    <label>1</label>European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>27</day>
                <month>3</month>
                <year>2019</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Birney E and Barton C</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport45547" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.16252.2"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>In my initial review of the paper "Improved inference of chromosome conformation from images of labeled loci" a few concerns were raised. My comments mainly related to putting the current work of the authors in better context with respect to the previous work as well as explaining a number of the details mentioned in the paper in more detail.</p>
            <p> </p>
            <p> I believe the authors have satisfactorily addressed my comments. They have added an appendix giving more information on the convergence properties of the used equations. They have addressed differences between align3d and previous work explaining how they are quite different approaches.</p>
            <p> </p>
            <p> Some concerns were also raised relating to the effect of the number of colors on reconstruction quality and they have added another paragraph explaining in more detail why this effect is seen. In addition to this it seems they have addressed or responded to all of the other reviewer comments.</p>
            <p> </p>
            <p> We are happy to approve indexing.</p>
            <p>Is the rationale for developing the new method (or application) clearly explained?</p>
            <p>Yes</p>
            <p>Is the description of the method technically sound?</p>
            <p>Partly</p>
            <p>Are the conclusions about the method and its performance adequately supported by the findings presented in the article?</p>
            <p>Partly</p>
            <p>If any results are presented, are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Partly</p>
            <p>Are sufficient details provided to allow replication of the method development and its use by others?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>genomics</p>
            <p>We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report40510">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.17750.r40510</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Lagomarsino</surname>
                        <given-names>Marco Cosentino</given-names>
                    </name>
                    <xref ref-type="aff" rid="r40510a1">1</xref>
                    <xref ref-type="aff" rid="r40510a2">2</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-0235-0445</uri>
                </contrib>
                <contrib contrib-type="author">
                    <name>
                        <surname>Scolari</surname>
                        <given-names>Vittore</given-names>
                    </name>
                    <xref ref-type="aff" rid="r40510a3">3</xref>
                    <role>Co-referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-3490-0579</uri>
                </contrib>
                <aff id="r40510a1">
                    <label>1</label>Physics Department, University of Milan, Milan, Italy</aff>
                <aff id="r40510a2">
                    <label>2</label>IFOM Foundation&#x2014;FIRC Institute of Molecular Oncology, Milan, Italy</aff>
                <aff id="r40510a3">
                    <label>3</label>Pasteur Institute, Paris, France</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>10</day>
                <month>12</month>
                <year>2018</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2018 Lagomarsino MC and Scolari V</copyright-statement>
                <copyright-year>2018</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport40510" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.16252.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>In the manuscript&#x00a0; &#x201c;Improved inference of chromosome conformation from images of labeled loci", Ross and Costello present a computational inference method to reconstruct genome conformation from measurements of the positions of m labelled loci of known coordinates with n&lt;&lt;m colored fluorescent foci. The presented tool is a new version of their computational tool "align3d" with multiple improvements. The tool has the aim of inferring the polymer conformation of chromosomes in-vivo, starting from images of fluorescently-tagged genomic loci, where each color tags different loci at the same time.</p>
            <p> </p>
            <p> The authors provide a test of the algorithm with data that are generated computationally, in a simple (short polymers) or more complex (longer polymers, more colors) setting. Finally they provide a simplified (and limited - see point 5 below) test using data that are derived from empirical data.</p>
            <p> </p>
            <p> The question appears interesting due to its experimental motivation, although probably the method is not yet close to something with concrete applicability to experimental data. We think that this could develop into a useful tool for the global effort of understanding the chromosome conformation of organisms in-vivo. As physicists, we are concerned with some aspects related to the representation (modelling) of the polymer and the experimental situation. Our observations might be useful for the authors or for other scientists that intend to analyse this kind of experimental data.</p>
            <p> &#x00a0; 
                <list list-type="order">
                    <list-item>
                        <p>The authors state that the inaccuracy of the DNA (conformation) model, i.e. how the physical distance of two loci scales with arclength distance along the genomic coordinate is a major factor of error (more precisely, this is a conditional distribution of distances given distance along the chain). They further state that nothing is known about this. However, this is not really the case, as both Hi-C and FiSH experiments with labelled loci give information about these quantities (Lagomarsino 
                            <italic>et al.</italic>, 2015
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-40510-1">1</xref>
                            </sup> and Fudenberg and Mirny, 2012
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-40510-2">2</xref>
                            </sup>).</p>
                        <p> </p>
                        <p> In particular, the assumption that the polymer is a Gaussian chain seems very restrictive. A much less restrictive (though still limited) assumption would be that this scaling relation is a tuneable power-law. This assumption is particularly interesting because in this case the scaling law relating physical distance to distance along the genome is related to the contact probability measured in Hi-C data (Fudenberg and Mirny, 2012
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-40510-2">2</xref>
                            </sup>). Indeed, in this scenario the contact probability (sometimes called &#x201c;P(s)&#x201d;, where s is the arclength distance) and the connection between genomic distance and typical spatial distance R(s) are related by a scaling (Polovnikov 
                            <italic>et al.</italic>, 2018
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-40510-3">3</xref>
                            </sup>). Thus Hi-C data could be used to directly constrain the inference, or to compare with the results.</p>
                        <p> </p>
                        <p> In this last scenario one could use the inference to learn the scaling from data. It seems quite reasonable to us that this scaling should be one of the main observables to infer from the data. Imposing this scaling appears like imposing a specific behaviour on the configurations that we are attempting to infer. In this regard, one big question is whether the observable &#x201c;scaling of physical distance with arclength distance&#x201d; can be inferred from the data without making the problem under-determined. We would like to stimulate the authors to spend some words to address this question.</p>
                        <p> </p>
                        <p> As we suggest above, there are multiple possible approaches to this practical issue, such as the use of the observable quantity &#x201c;P(s)&#x201d;, the contact probability measured with Hi-C, or the use of an ansatz, such as a power law (Marie-Nelly 
                            <italic>et al.</italic>, 2014
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-40510-4">4</xref>
                            </sup>), accompanied by a procedure to optimize the parameters.</p>
                        <p> </p>
                    </list-item>
                    <list-item>
                        <p>The authors&#x2019; main hypothesis is that only one locus can map to each identified spot in the image, and, for this reason, the solution proposed is a heuristic method to solve the traveling salesman problem for the polymer on those loci. We observe that this might be a good practical assumption but it is not necessarily a good one for the chromosome, and for polymers in general. Polymers can have loops, even randomly. The definition of those loops depends on the resolution of observation (which experimentally will be limited by diffraction). The frequency of loops in chromosomes depends on important physical and biological parameters such as active looping (Fudenberg 
                            <italic>et al.</italic>, 2016
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-40510-5">5</xref>
                            </sup>), the presence of different solvent phases and the balance between steric and other kinds of interactions (Scolari and Lagomarsino, 2015
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-40510-6">6</xref>
                            </sup>) as well as from steps of the experimental protocols (Scolari 
                            <italic>et al.</italic>, 2018
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-40510-7">7</xref>
                            </sup>). Hi-C experiments, measure loops and quantify their specific and generic properties. In terms of the genomic distance, it has been shown that at small distances the chromosomes are very compact, and the amount of this compaction varies widely across conditions (Lazar-Stefanita 
                            <italic>et al.,</italic> 2017
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-40510-8">8</xref>
                            </sup> and Muller 
                            <italic>et al.,</italic> 2018
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-40510-9">9</xref>
                            </sup>) even for the same organism. For increasingly longer distances, generally, the probability of making a loop normally decreases monotonically with genomic distance. Thus, we think that the authors&#x2019; approach should be applicable to an increasing number of cases by increasing the scale of observation and modelling, under the condition that the relation that ties the genomic distance to the three-dimensional distance is chosen correctly.</p>
                        <p> </p>
                    </list-item>
                    <list-item>
                        <p>The algorithm is focused on a single chain conformation and does not exploit ensembles. Typically in such experiments one expects to have fairly low precision of localisation, but almost arbitrarily large amount of realisations (different cells). Each will be different but will also have common properties, and relaxing the question could make the inference process much easier. After all, inferring precisely a single configuration is not so relevant, because it will change in time due to natural fluctuations of the system. It is more useful (and well defined) to infer some ensemble properties (at fixed conditions for the cells such as time and phase into the cell cycle), and then quantify the cell-to-cell diversity with respect to such average behavior.</p>
                        <p> </p>
                    </list-item>
                    <list-item>
                        <p>These images will come from microscopy and they will likely be 2D projections, or have lower resolution in the z direction. The authors do not address this issue (and in general the issue of resolution seems underestimated), but we expect it to be quite important in any concrete situation.</p>
                        <p> </p>
                    </list-item>
                    <list-item>
                        <p>In regard to the final example, we notice that the data is binned at 1mb and then interpolated at 100kb with a spline, we wonder if this resolution improvement introduces any alterations in the reconstructed conformations of the polymer. For this reason, it seems reasonable to perform a more thoughtful statistical analysis with different levels of interpolation to support this choice.</p>
                    </list-item>
                </list>
            </p>
            <p>Is the rationale for developing the new method (or application) clearly explained?</p>
            <p>Partly</p>
            <p>Is the description of the method technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions about the method and its performance adequately supported by the findings presented in the article?</p>
            <p>Partly</p>
            <p>If any results are presented, are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Partly</p>
            <p>Are sufficient details provided to allow replication of the method development and its use by others?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>Statistical Physics, Quantitative Biology, Chromosome Organization</p>
            <p>We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above.</p>
        </body>
        <back>
            <ref-list>
                <title>References</title>
                <ref id="rep-ref-40510-1">
                    <label>1</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>From structure to function of bacterial chromosomes: Evolutionary perspectives and ideas for new experiments.</article-title>
                        <source>
                            <italic>FEBS Lett</italic>
                        </source>.<year>2015</year>;<volume>589</volume>(<issue>20 Pt A</issue>) :
                        <elocation-id>10.1016/j.febslet.2015.07.002</elocation-id>
                        <fpage>2996</fpage>-<lpage>3004</lpage>
                        <pub-id pub-id-type="pmid">26171924</pub-id>
                        <pub-id pub-id-type="doi">10.1016/j.febslet.2015.07.002</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-40510-2">
                    <label>2</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Higher-order chromatin structure: bridging physics and biology.</article-title>
                        <source>
                            <italic>Curr Opin Genet Dev</italic>
                        </source>.<year>2012</year>;<volume>22</volume>(<issue>2</issue>) :
                        <elocation-id>10.1016/j.gde.2012.01.006</elocation-id>
                        <fpage>115</fpage>-<lpage>24</lpage>
                        <pub-id pub-id-type="pmid">22360992</pub-id>
                        <pub-id pub-id-type="doi">10.1016/j.gde.2012.01.006</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-40510-3">
                    <label>3</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Fractal Folding and Medium Viscoelasticity Contribute Jointly to Chromosome Dynamics.</article-title>
                        <source>
                            <italic>Phys Rev Lett</italic>
                        </source>.<year>2018</year>;<volume>120</volume>(<issue>8</issue>) :
                        <elocation-id>10.1103/PhysRevLett.120.088101</elocation-id>
                        <fpage>088101</fpage>
                        <pub-id pub-id-type="pmid">29542996</pub-id>
                        <pub-id pub-id-type="doi">10.1103/PhysRevLett.120.088101</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-40510-4">
                    <label>4</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>High-quality genome (re)assembly using chromosomal contact data.</article-title>
                        <source>
                            <italic>Nat Commun</italic>
                        </source>.<year>2014</year>;<volume>5</volume>:
                        <elocation-id>10.1038/ncomms6695</elocation-id>
                        <fpage>5695</fpage>
                        <pub-id pub-id-type="pmid">25517223</pub-id>
                        <pub-id pub-id-type="doi">10.1038/ncomms6695</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-40510-5">
                    <label>5</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Formation of Chromosomal Domains by Loop Extrusion</article-title>.
                        <source>
                            <italic>Cell Reports</italic>
                        </source>.<year>2016</year>;<volume>15</volume>(<issue>9</issue>) :
                        <elocation-id>10.1016/j.celrep.2016.04.085</elocation-id>
                        <fpage>2038</fpage>-<lpage>2049</lpage>
                        <pub-id pub-id-type="doi">10.1016/j.celrep.2016.04.085</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-40510-6">
                    <label>6</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Combined collapse by bridging and self-adhesion in a prototypical polymer model inspired by the bacterial nucleoid.</article-title>
                        <source>
                            <italic>Soft Matter</italic>
                        </source>.<year>2015</year>;<volume>11</volume>(<issue>9</issue>) :
                        <elocation-id>10.1039/c4sm02434f</elocation-id>
                        <fpage>1677</fpage>-<lpage>87</lpage>
                        <pub-id pub-id-type="pmid">25532064</pub-id>
                        <pub-id pub-id-type="doi">10.1039/c4sm02434f</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-40510-7">
                    <label>7</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Kinetic Signature of Cooperativity in the Irreversible Collapse of a Polymer.</article-title>
                        <source>
                            <italic>Phys Rev Lett</italic>
                        </source>.<year>2018</year>;<volume>121</volume>(<issue>5</issue>) :
                        <elocation-id>10.1103/PhysRevLett.121.057801</elocation-id>
                        <fpage>057801</fpage>
                        <pub-id pub-id-type="pmid">30118310</pub-id>
                        <pub-id pub-id-type="doi">10.1103/PhysRevLett.121.057801</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-40510-8">
                    <label>8</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Cohesins and condensins orchestrate the 4D dynamics of yeast chromosomes during the cell&#x00a0;cycle</article-title>.
                        <source>
                            <italic>The EMBO Journal</italic>
                        </source>.<year>2017</year>;<volume>36</volume>(<issue>18</issue>) :
                        <elocation-id>10.15252/embj.201797342</elocation-id>
                        <fpage>2684</fpage>-<lpage>2697</lpage>
                        <pub-id pub-id-type="doi">10.15252/embj.201797342</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-40510-9">
                    <label>9</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Characterizing meiotic chromosomes' structure and pairing using a designer sequence optimized for Hi-C.</article-title>
                        <source>
                            <italic>Mol Syst Biol</italic>
                        </source>.<year>2018</year>;<volume>14</volume>(<issue>7</issue>) :
                        <elocation-id>10.15252/msb.20188293</elocation-id>
                        <fpage>e8293</fpage>
                        <pub-id pub-id-type="pmid">30012718</pub-id>
                        <pub-id pub-id-type="doi">10.15252/msb.20188293</pub-id>
                    </mixed-citation>
                </ref>
            </ref-list>
        </back>
        <sub-article article-type="response" id="comment4478-40510">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Ross</surname>
                            <given-names>Brian</given-names>
                        </name>
                        <aff>University of Colorado Anschutz Medical Campus, USA</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>11</day>
                    <month>3</month>
                    <year>2019</year>
                </pub-date>
            </front-stub>
            <body>
                <p>We wish to thank Reviewers 2 for their many helpful comments and insights. To address their comments as well as several concerns of our own, we have made a number of changes to our analysis, our results and the content of the main paper, and added a new appendix. The changes made to the code required us to regenerate all 9 figures that show results, although only Figure 7 changed significantly. We have also updated our GitHub repository containing all our code and example data.</p>
                <p>Reviewers 2 pointed out that the we were too strong in our language stating that 'nothing is known' about the DNA model. We agree and we have removed this wording from the last paragraph in the Introduction. They also suggest the use of contact probabilities as a basis for inferring the distance function used in the model. This would be possible, but there is actually a direct measurement of the distance function (Wang 
                    <italic>et al.,</italic> 2016) which we have decided to use instead. Wang 
                    <italic>et al.</italic> found a 1/3 scaling exponent between L and R at the relevant length range, which we incorporated into our model for analyzing the Hi-C reconstruction (but not the random chains, whose exponent is 1/2) and then reran the results shown in Figure 7. The results did not change very much because most adjacent loci are close enough that R = k*L, i.e. the exponent is 1. The reviewers point out this could be due to our spline interpolation used to increase the resolution of the conformation. Unfortunately we were not able to directly infer higher-resolution conformations because the Hi-C data set recommended by Zhu 
                    <italic>et al.,</italic> 2018 (the Hi-C inference method called 'GEM') is binned at 1 Mb resolution, and this sets the resolution of the GEM conformations. We do not believe this is a problem for 2 reasons: 1) our average label spacing is ~2/3 Mb, not far below the 1 Mb GEM conformation subunit length, and 2) the inferred conformations seem to have a persistence length somewhat above the length of a subunit, although we cannot rule out that this might change with a different bin density. We agree that it would be ideal to have a higher-resolution structure (although bin sampling would become an issue using this Hi-C data set), but we suspect that the errors in Hi-C inferences probably overwhelm the resolution issue.</p>
                <p>We want to point out that our use of a Gaussian chain for reconstruction (but not for producing the DNA chains) is not incompatible with the scaling relations mentioned by Reviewer 2, because these scaling relations determine average spatial separation R of two loci based on their genomic separation L, but not the form of the distribution p(r|R). We have chosen to model p(r|R) with a Gaussian (partly for convenience since it is easy to factor in localization error, but partly for other reasons; see below) having = R^2, but R is in turn calculated as R = L^power. In our earlier draft this power was fixed at 1/2 for distances above a persistence length, but as Reviewers 2 pointed out recent experimental data show exponents of 1/3 - 1/5. To address this issue, we generalized our program to accept more general DNA models consisting of different power laws at different inter-locus distance regimes, and our new results use exponents of either 1/2 and 1/3 for long DNA segments, depending on the simulated experiment. To make this clearer, we have added a new paragraph to the Results section (2nd paragraph) explaining the model selection in our simulations, as well as how a model would be chosen in a real experiment.</p>
                <p>In our initial submission we claimed that a systematic error seen in the mapping probabilities was due to overestimation of the missing-spot rate. Since then we have both fixed the missing-spot rate estimation and made major progress in figuring out the real cause of the error, which we explain in detail in a new Appendix 3 and refer to in several places in the text.&#x00a0; The error comes from the fact that the Gaussian chain model used for reconstruction differs significantly from the wormlike chain model used to generate our simulated contours. While Reviewers 2 were concerned that the use of a 'wrong' model would skew the results, we believe that the opposite interpretation is more accurate: the fact that we obtain high-quality results even when the reconstruction model differs from the model used to produce the conformations shows that our approach is robust to model error. Appendix 3 justifies this intuition, by showing that model error causes our results to appear less certain than they are, but does not cause reconstruction errors if the reconstruction model is less sharply-peaked than the true model. This is the other justification for using the Gaussian chain model, which is quite permissive of unexpected behavior that we may find given that the true 
                    <italic>in-vivo</italic> DNA model may behave unexpectedly sometimes which may be very difficult to measure exactly in calibration experiments. We have also added a new 3rd paragraph to the Discussion explaining this.</p>
                <p>We agree with Reviewers 2 that loops certainly can happen and, to the extent that they can be distinguished by microscopy, our algorithm is certainly capable of finding looped conformations (even if the loop is over 2 adjacent loci -- our Gaussian model peaks at r = 0).&#x00a0; If two loci of the same color happen to overlap in a microscope image, one may be missed -- this is considered a missing-spot or false-negative error, as mentioned in the footnote in the Introduction. Since our algorithm is capable of handling both false negatives and false positives (extra unbound spots), we do not anticipate loops to be a problem. If there are many points of overlap coming from an identical color sequence (e.g. if two copies of the same chromosome overlap) then the reconstruction fragments can become fragmented, with ambiguity as to which piece goes with which other piece -- we have added a brief note about this to the Discussion section.</p>
                <p>Reviewers 2 point out that align3d is a single-cell method, not an ensemble method, and we completely agree. We believe that aggregating single-cell conformations will give many interesting insights that one could not get by aggregating, for example, pairwise distances.&#x00a0; Our method should be seen as one possible means of obtaining these cell conformations.</p>
                <p>Finally, Reviewers 2 raise the issue of resolution in the z dimension: we certainly do consider localization error in z, both in generating the spot localizations (which have z error as well as x/y error) and in the reconstructions (where the errors in x/y/z are required inputs). In all simulations we set the z localization error higher than x/y error (200 vs 100 nm in normal resolution, 50 vs 30 nm in superresolution), reflecting the fact that axial resolution is worse in most setups. We have updated the main text to more explicitly give the localization error in the various simulations.</p>
                <p>References:</p>
                <p>Wang, S., Su, J. H., Beliveau, B. J., Bintu, B., Moffitt, J. R., Wu, C. T., &amp; Zhuang, X. (2016). Spatial organization of chromatin domains and compartments in single chromosomes.&#x00a0;
                    <italic>Science</italic>,&#x00a0;
                    <italic>353</italic>(6299), 598-602.</p>
                <p>Zhu, G., Deng, W., Hu, H., Ma, R., Zhang, S., Yang, J., ... &amp; Zeng, J. (2018). Reconstructing spatial organizations of chromosomes through manifold learning.&#x00a0;
                    <italic>Nucleic acids research</italic>,&#x00a0;
                    <italic>46</italic>(8), e50-e50.</p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report38643">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.17750.r38643</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Birney</surname>
                        <given-names>Ewan</given-names>
                    </name>
                    <xref ref-type="aff" rid="r38643a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-8314-8497</uri>
                </contrib>
                <contrib contrib-type="author">
                    <name>
                        <surname>Barton</surname>
                        <given-names>Carl</given-names>
                    </name>
                    <xref ref-type="aff" rid="r38643a1">1</xref>
                    <role>Co-referee</role>
                </contrib>
                <aff id="r38643a1">
                    <label>1</label>European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>7</day>
                <month>11</month>
                <year>2018</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2018 Birney E and Barton C</copyright-statement>
                <copyright-year>2018</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport38643" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.16252.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>In this paper the authors update and build on a method they have previously published known as &#x2018;align3d&#x2019;. This method attempts to infer the chromosome conformation based on images of fluorescently tagged genomic loci. The authors claim that this updated method increases the accuracy of the inferred conformation as well as allowing the method to run on larger instances of the problem. They then go on to demonstrate where the method allows for the near perfect reconstruction of larger scale, simulated, labelled images. We&#x00a0;believe that the article is worthy of indexing on the condition that some minor issues, outlined below, are addressed.</p>
            <p> In the introduction the authors mention a couple of other methods attempting to resolve similar problems. I think that this section should be expanded as there is no critical comparison of how this method compares to each of those mentioned. In particular the computational methods should be compared and contrasted so it is clear to the reader how this method differs from others.</p>
            <p> Whilst reviewing this paper we were unable to access the supplementary data. This must be made available before the paper can be indexed. Some things the authors could include in the supplementary section that would be useful from the computational perspective would be the type of series expansion being used and any information on how quickly the series expansion converge to the original formula. The authors have experimentally checked on the convergence properties but sometimes it's quite simple to determine theoretically how quickly some approximation converges. This information could be useful in determining better expansions and would explain more concretely why they get some of the results they see.</p>
            <p> In the experimental section of the paper the authors generate three different types of simulation that they denote 'Toy', 'Experiment 1' and 'Experiment 2'. In the discussion of the results the authors make the following comment:</p>
            <p> &#x2018;Use of more colors dramatically improves reconstructions. Our most striking result is that simulations of the ambitious Experiment 2 produce far better results than even the Toy scenario, despite the fact that these simulations have more loci per color than either the Toy scenario or Experiment 1. This can be seen in the amount of unrecovered information shown in the simulation-averaged plots of Figure 5A. Thus a push to 20-color labeling could prove critical for genomic reconstruction at the chromosome scale and beyond. At the end of this section we revisit Experiment 2, in order to assess the reconstruction quality when analyzing more realistic DNA contours having tighter confinement.&#x2019;</p>
            <p> The authors should make some attempt to explain this situation. Actually if you increase the number of colours and also increase the number of loci with the same colour then you would not obviously assume that the problem should be harder. It very much depends on how each is increased within proportion to each other.</p>
            <p> An increase in the number of unique colours available should lead to the problem being exponentially easier as you are effectively exponentially decreasing the ambiguity in the data set. Should you also increase the number of loci labelled with the same colour then you wouldn't expect the problem to become harder unless that increase was large enough to outweigh the effects of the increase in the number of available colours. In this sense it could be argued that many instances of the 'Toy' example are fundamentally more challenging than the (on the face of it) more complicated 'Experiment 2'. This should somehow be addressed by the authors.</p>
            <p> Also in the discussion of the experimental results the authors note that '20-color labelling leads to near-perfect reconstructions.' This result is consistent with our results reported by Barton 
                <italic>et al. </italic>(2017
                <sup>
                    <xref ref-type="bibr" rid="rep-ref-38643-1">1</xref>
                </sup>). It would be good to mention this as although the computational methods are different, the simulations are generated in different ways and the resolution simulated is different, both methods suggest that if ~20 colours are available then near perfect reconstruction is possible.&#x00a0;The authors&#x00a0;should also point out the similarity of the number of colours needed in both their method and ours.</p>
            <p> The differences in computational methodology yet similarity in the numbers of colours needed for near perfect reconstruction perhaps suggests to me that both methods are in some sense 'naive'. There must exist a minimum number of colours required for a certain average reconstruction performance (with the appropriate caveats) but we&#x00a0;would be&#x00a0;surprised if it was as high as 20. It could be interesting to see the authors add some discussion about this connection and any insight they might have into it.</p>
            <p> Finally there have been a number of different attempts to simulate super resolved images of the type used in this and other computational methods. If the authors can use this data as input or the data can easily be coerced into an appropriate format for this method then the paper would be much stronger with the addition of results of using the method against these datasets. In this way the authors can clearly demonstrate that the method they propose is not simply good on their own simulated data, but also performs robustly on other independently generated simulations.</p>
            <p>Is the rationale for developing the new method (or application) clearly explained?</p>
            <p>Yes</p>
            <p>Is the description of the method technically sound?</p>
            <p>Partly</p>
            <p>Are the conclusions about the method and its performance adequately supported by the findings presented in the article?</p>
            <p>Partly</p>
            <p>If any results are presented, are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Partly</p>
            <p>Are sufficient details provided to allow replication of the method development and its use by others?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>genomics</p>
            <p>We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above.</p>
        </body>
        <back>
            <ref-list>
                <title>References</title>
                <ref id="rep-ref-38643-1">
                    <label>1</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>ChromoTrace: Computational reconstruction of 3D chromosome configurations for super-resolution microscopy.</article-title>
                        <source>
                            <italic>PLoS Comput Biol</italic>
                        </source>.<volume>14</volume>(<issue>3</issue>) :
                        <elocation-id>10.1371/journal.pcbi.1006002</elocation-id>
                        <fpage>e1006002</fpage>
                        <pub-id pub-id-type="pmid">29522506</pub-id>
                        <pub-id pub-id-type="doi">10.1371/journal.pcbi.1006002</pub-id>
                    </mixed-citation>
                </ref>
            </ref-list>
        </back>
        <sub-article article-type="response" id="comment4477-38643">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Ross</surname>
                            <given-names>Brian</given-names>
                        </name>
                        <aff>University of Colorado Anschutz Medical Campus, USA</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>11</day>
                    <month>3</month>
                    <year>2019</year>
                </pub-date>
            </front-stub>
            <body>
                <p>We wish to thank Reviewers 1 for their many helpful comments and insights. Based on these comments and those of Reviewers 2, we have made some&#x00a0;changes to our analysis, updated our results (particularly those shown in Figure 7) and the content of the main paper, and added a new appendix.</p>
                <p>We apologize for the problems Reviewers 1 had in accessing the Supplemental Material. The material was uploaded and available (to us), but the 'Appendix x' links lead nowhere in the published version. These seemingly dead links have been removed.</p>
                <p>Reviewers 1 suggested that in the Introduction we compare our align3d method to the other published method that we are aware of (ChromoTrace) to highlight their differences. We agree that this is indeed a useful addition, and so we have added several sentences to the Introduction (2nd paragraph) contrasting the two algorithms. We are not experts on ChromoTrace, and if we have mischaracterized it in some way we apologize and hope the reviewers will correct us.</p>
                <p>Reviewers 1 also inquired about the exact series expansion formulas we used. The expansion formulas are in the Methods section of the main text, not the Supplemental material. To make this clearer we have added an equation number to the series definition preceding the coefficient formulas (this is the new Equation 3), and referenced that equation explicitly in the two coefficient formulas (which are now Equations 4-5). Thus the series definitions are fully in the main body of the paper, and only their derivations are in Appendix 2.</p>
                <p>One technical detail is that our original code could not use our series expansions in conjunction with the preexisting capability to 'fix' certain loci to map to certain spots in the image, in order to obtain mapping probabilities that are conditional on the fixed loci. This has been addressed in the new version of the code. This oversight did not affect the results shown in the paper, but it did require us to add a explanatory paragraph to the end of Appendix 2.</p>
                <p>Reviewers 1 asked about the finding that our simulated Experiment 2 reconstructions came out much better than the Experiment 1 reconstructions, despite having more labeled loci of a given color. We have added several sentences to the Results section ("Use of more colors dramatically improves reconstructions" section) explaining that we believe that it is the spatial density of labeled loci rather than the absolute number that determines the reconstruction quality. Reviewers 1 noticed the same finding in Barton 
                    <italic>et al.</italic>&#x00a0;(2018); we have added this citation. We have not systematically tested performance as a function of the number of colors; we chose 20 simply based on the fact that 10 sequential hybridizations is reasonable for our planned experiment based on conversations with our collaborator (Wang 
                    <italic>et al.,</italic> 2016, demonstrate 17 rounds). Since we haven't noticed a plateau in reconstruction performance versus number of colors, as evidenced by the fact that the 20-color reconstructions still have some uncertainty, we do not see a reason to go towards fewer colors.</p>
                <p>A final question raised by Reviewers 1 concerned the issue of superresolution in the simulated images. Since our spots are presumed well-separated (based on the data of Wang 
                    <italic>et al.,&#x00a0;</italic>2016) we believe we can get super-resolved spot localization without having to use special microscopes or fluorophores, and without having to resolve individual fluorophores. Thus the superresolution comes for free on normal images at the scale we consider here. We have added text explaining this (new final paragraph of Results), and also a second set of conformational reconstructions to Figure 7 showing explicitly the benefit of superresolving the spot locations. If we were to push to higher-genomic-resolution labeling (say, 10s-100s kb locus separation; current simulations are at ~600 kb) then we would indeed need superresolution microscopes, but since those are not the experiments simulated here we did not try to simulate those images. In fact this is why we chose to label these simulations at the 600 kb resolution.</p>
                <p>Although we were not able to increase the Hi-C inferred resolution, we did discover that we had misinterpreted the scale of the Hi-C-derived conformations of Figure 7, thus underestimating the relative magnitude of microscope error in these simulations. Our new plots have corrected this error. Owing to the larger microscope error our new reconstruction quality is somewhat worse as measured by our information metric. To compensate we improved our script that estimates a likely conformation from our output (mapping p-values), and as a result these likely conformations are roughly of the same quality as before. We also added a parallel set of superresolution reconstructions to this figure, in order to show explicitly the benefit of reducing microscope error.</p>
                <p>References:</p>
                <p>Barton, C., Morganella, S., Oedegaard, O., Alexander, S., Ries, J., Fitzgerald, T., ... &amp; Birney, E. (2018). Chromotrace: reconstruction of 3D chromosome configurations by super-resolution microscopy.&#x00a0;
                    <italic>bioRxiv</italic>, 115436.</p>
                <p>Wang, S., Su, J. H., Beliveau, B. J., Bintu, B., Moffitt, J. R., Wu, C. T., &amp; Zhuang, X. (2016). Spatial organization of chromatin domains and compartments in single chromosomes.&#x00a0;
                    <italic>Science</italic>,&#x00a0;
                    <italic>353</italic>(6299), 598-602.</p>
                <p>Zhu, G., Deng, W., Hu, H., Ma, R., Zhang, S., Yang, J., ... &amp; Zeng, J. (2018). Reconstructing spatial organizations of chromosomes through manifold learning.&#x00a0;
                    <italic>Nucleic acids research</italic>,&#x00a0;
                    <italic>46</italic>(8), e50-e50.</p>
            </body>
        </sub-article>
    </sub-article>
</article>
