Identification and analysis of functional associations among natural eukaryotic genome editing components

During development in the ciliate excess DNA interspersed Paramecium, throughout the germline genome is deleted to generate a new somatic genome. In this process, most of the intervening DNA is excised by a Piggybac-derived transposase, assisted by small RNAs (scnRNAs and iesRNAs) and chromatin remodelling. As the list of genes involved in DNA elimination has been growing, a need for a general approach to discover functional relationships among these genes now exists. We show that deep sequencing-based comparisons of experimentally-induced DNA retention provide a sensitive, quantitative approach to identify and analyze functional associations among genes involved in native genome editing. This reveals two functional molecular groups: (i) iesRNAs/scnRNAs, the putative Piwiand RNA-binding Nowa1/2 proteins, and the transcription elongation factor TFIIS4; and (ii) PtCAF1 and Ezl1, two proteins involved in chromatin remodelling. Comparative analyses of silencing effects upon the largely unstudied regions comprising most developmentally eliminated DNA in suggests a continuum between precise and Paramecium imprecise DNA elimination. These findings show there is now a way forward to systematically elucidate the main components of natural eukaryotic genome editing systems. 1 2 1 1


Introduction
Paramecium, like other ciliates, contains both a large, somatic, polyploid macronucleus (MAC), and a relatively tiny, gametic, diploid micronucleus (MIC), within the same cell.During sexual or asexual "reproduction" (conjugation or autogamy), a copy of Paramecium's micronucleus develops into a new macronucleus, and at the same time the micronuclear genome it contains is transformed into a new macronuclear genome by extensive, targeted DNA deletion.Concomitant with DNA deletion, the developing MAC genome is amplified from 2N to ~800N.In Paramecium, the best studied germline DNA sequences destined to be deleted, known as internally eliminated sequences (IESs), are interspersed among the macronuclear-destined sequences (MDSs) (Figure 1).Characterized Paramecium IESs are typically short (> 90% are < 150 bp), TA-flanked, and unique (∼45,000 IESs per Paramecium tetraurelia MIC genome 1 ).These IESs are recognized and generally precisely removed by the PiggyMac transposase complex 2 .For some IESs, development-specific small RNAs (sRNAs) are required for the recognition and efficient removal of all their copies 3 .
Paramecium's current annotated IESs comprise just a small fraction (∼13%) of all the micronuclear-specific DNA that is eliminated during development 1 .Though much of the remaining eliminated DNA has been assembled during IES annotation 1 , it has yet to be assembled as a complete micronuclear genome, and comparatively little is known about it.For the sake of explanation, and because some of this DNA may reside at chromosome ends, and hence may not be internally eliminated, we classify this DNA as "other eliminated sequences" (OESs; see Figure 1).
Developmental elimination of the few OESs investigated in detail [4][5][6] is associated with two possible outcomes: either breakage of the MIC chromosomes into smaller MAC chromosomes, accompanied by end capping with telomeres, or fusion of the regions flanking the removed region [5][6][7] .The lengths of studied OESs are on the order of kilobases to tens of kilobases [4][5][6] .The boundaries of MAC chromosome telomere addition sites occur at similar locations to the boundaries of the internally removed sequences 6 .In contrast to classical IESs, the cut sites of these internally removed sequences are imprecise 6 .During chromosome breakage, the exact locations of the telomere addition sites vary over a span of at least hundreds of base pairs, but whether this is due to resectioning of DNA before telomere addition or due to initial imprecise DNA cleavage, or both, is unknown 6 .
An important question has therefore been whether the mechanisms underlying imprecise DNA elimination are distinct from those employed in precise IES excision, as previously proposed 6 , or whether some of the same developmental DNA deletion pathway is shared, as previously proposed 4 .Suggesting the latter may be true, substantial retention of OESs occurs after silencing the gene encoding PiggyMac (PGM) 1 , and OESs and transposons embedded within them are typically retained when gene silencing leads to IES retention 3,[8][9][10][11][12] .
In ciliates, small RNAs (> 21 nt and < 32 nt) involved in MAC genome development have proposed roles either in DNA targeting and excision (as in Tetrahymena and Paramecium [13][14][15][16] ) or in marking/protecting DNA for retention (as in Oxytricha 17,18 ).Paramecium tetraurelia has two classes of sRNAs involved in DNA elimination: scnRNAs and iesRNAs 3,13 .scnRNAs are produced in gametic MICs during micronuclear meiosis, early during sexual development.Newly formed scnRNAs are transported to the existing, old MAC where MDS-matching scnRNAs are removed (i.e.removed by "RNA scanning", as originally proposed in the ciliate Tetrahymena 19 ), after which the non-MDS-matching scnRNAs (including IES-matching scnRNAs) are transported to the developing new MAC where they target DNA deletion 3,13 .Paramecium's scnRNAs have been proposed to bind to longer RNA transcripts, rather than directly to DNA, in both the old and new MAC 12,20 .iesRNAs are produced in the developing MAC, where, in principle, they can immediately target unexcised IESs, and their production peaks later during development 3 .iesRNAs are hypothesized to be produced from transcripts of excised IESs 3 .
scnRNAs and RNA scanning were discovered in the ciliate Tetrahymena (belonging to class Oligohymenophorea, like Paramecium) 14,16,21 .Other than the use of scnRNAs to target excision, Tetrahymena's macronuclear genome development resembles that of Paramecium in a number of other key ways, Macronuclear-destined sequences (MDSs) inbetween are joined and capped with new telomeres to give rise to the final macronuclear chromosomes.When DNA is imprecisely eliminated, the telomere-capped ends of the macronuclear chromosomes occur at different sites.
In some cases, this process also produces alternative macronuclear chromosome forms.Note that the distinction between OESs and IESs, though useful for the purposes of description and analysis here, is an artificial one, and that there may only be one main class of eliminated DNA in Paramecium (see later in Discussion).
most notably in employing a domesticated PiggyBac in DNA elimination 22 , and that both species have two phases of development-specific sRNA production 16 .There are also important differences between Paramecium and Tetrahymena genome development, in particular, unlike Paramecium's classical short, precisely excised, single copy IESs, Tetrahymena's conventional IESs tend to be longer, imprecisely excised, multicopy elements 23,24 .In contrast to Paramecium, which produces scn-RNAs and iesRNAs with distinct Dicer-like proteins (Dcl2/3 for scnRNAs 13 ; Dcl5 for iesRNAs 3 ), Tetrahymena produces two types of scnRNAs (early-scnRNAs and late-scnRNAs 16 ) with a single Dicer-like protein (Dcl1) 25,26 .These scnRNAs are differentiated by the Piwi proteins (Twi1 and Twi11) processing Dcl1-generated dsRNA precursors 16 .At present, it is not known how the development-specific sRNAs of Paramecium and Tetrahymena guide the domesticated PiggyBacs to their DNA targets, nor what role proteins implicated in their genome editing have in mediating the intervening interactions.
In the past, the effect on the excision of Paramecium IESs following microinjection of IES DNA copies into the old MAC during autogamy was investigated for a small set of IESs.IESs were classified either as maternally controlled (i.e.governed by the old or maternal MAC) when the DNA injection led to IES retention in the new MAC, and non-maternally controlled when there was no observable IES retention 27,28 .In principle, since DCL2/3 co-silencing stops transmission of epigenetic information from the old macronucleus to the new one by eliminating the scnRNAs bearing this information, we previously classified IESs affected by DCL2/3 silencing as epigenetically controlled, and those that were not, as non-epigenetically controlled 3,15 .This classification is a simplification, since there is no clear boundary between epigenetically and non-epigenetically controlled IESs (and between maternally and non-maternally controlled IESs), and because it may be difficult to detect low levels of epigenetic control.After DCL2/3 silencing, some non-maternally controlled IESs are weakly retained, below the detection level of conventional PCR analyses 3 , and hence are weakly epigenetically controlled.IESs affected by DCL5 silencing do not correlate well with those affected by DCL2/3 silencing, suggesting that a substantial amount of Paramecium's sRNA-dependent IES excision is not epigenetically controlled 3,15 .
In the last few years, additional proteins influencing IES removal in Paramecium have been reported, including homologs of proteins with known functions in other eukaryotes, such as those involved in histone modification (PtCAF1, Ezl1) and RNA transcription (TFIIS4) [10][11][12] (see also Table 1 and Table 2).From these studies, following existing models in Tetrahymena 29,30 , this has led to the proposal that scnRNAs in Paramecium direct certain chromatin modifications that may facilitate DNA excision 10,11 .Proteins involved in IES removal with no readily detectable homologs in other eukaryotes have also been investigated 8 .Other than these proteins, two closely related protein paralogs, Nowa1 and Nowa2 (Nowa1/2), were previously implicated in IES elimination and proposed to be involved in scnRNA-mediated, trans-nuclear crosstalk 31 .Identification and establishment of the relationships between all these proteins and the development-specific sRNAs is necessary to determine how genome editing functions in Paramecium, and the main objective of the research presented here.
In this paper, highlighting the utility of DNA retention analyses in discovering associations between genome editing proteins, we present what we have learnt about the role of Nowa1/2 proteins and their relation to other Paramecium genome editing components.We report strong correlations between the IESs affected by NOWA1/2 co-silencing (NOWA1/2-KD; KD=knockdown), DCL2/3/5 triple silencing (DCL2/3/5-KD) and TFIIS4 silencing (TFIIS4-KD), suggesting that Dcl2/3/5, Nowa1/2 and TFIIS4 are all components of an sRNA-guided DNA excision subsystem.Strong correlations also exist between PTCAF1-KD and EZL1-KD IES retention, whereas the correlations between the IES retention of these gene knockdowns and those of RNA-related genes are somewhat weaker.We also observe more IES retention due to the silencing of chromatin modifying genes (PTCAF1 and EZL1) than due to depletion of most IES-matching sRNAs by DCL2/3/5-KD, suggesting that chromatin modification due to PtCaf1 and Ezl1 also influences genome editing in an sRNA-independent manner.The correlations in OES retention following gene knockdown suggest most OESs are effectively the longest IESs.

Paramecium strains, cultivation, and autogamy
All experiments were carried out with Paramecium tetraurelia stock 51, which may be obtained from the ATCC as ATCC30567.Cells were grown in a wheat grass powder (WGP; Pines International, Lawrence, KS) infusion medium bacterized the previous day with Klebsiella pneumoniae supplemented with 0.8 mg/l β-sitosterol.Cultivation and autogamy were carried out at 27°C as previously described 28 .

Gene silencing
Gene silencing (knockdown) was carried out by RNA interference by feeding of Paramecium cells with E. coli expressing doubledstranded RNA from a portion of the target gene cloned into the L4440 vector, as previously described 3,8,10,31 .Sequence regions used for each of the genes silenced in DCL2/3/5-KD were as previously described 3,13 .Silencing of PTCAF1 and ND7 were as previously described 10,31 .
Three knockdowns of NOWA1/2 were performed; where necessary we distinguish between these knockdowns (i.e.NOWA1/2_a-KD, NOWA1/2_b-KD, NOWA1/2_c-KD).Macronuclear DNA from postautogamous cells from the first and third knockdowns was used to produce Illumina paired-end sequence libraries.Illumina sRNA libraries were produced from NOWA1/2_b-KD at the same time as those produced for sRNAs from DCL2/3-KD, DCL5-KD and control cells -the latter knockdowns were described in "Experiment 1" of 3.
The following controls were used in this manuscript: Control 1 = feeding with an empty vector (EV) construct.This was the control for DCL2/3/5-KD and PDSG2-KD; Control 2 = ND7-KD.This was the control for PGM-KD 1 , Control 3 = ND7-KD.This was the control for an unpublished experiment (Singh et al., in preparation); Control 4 = ND7-KD.This was the control for EZL1-KD 11 .ND7 is a trichocyst discharge gene which is not expected to affect IES retention 32 .Survival tests and cytological staging Tests of survival of sexual progeny cells were performed by isolating and refeeding 30 individual cells.NOWA1/2 knockdown is generally lethal, i.e., typically all cells die within at most the time taken for 3 divisions of control cells following autogamy; for DCL2/3/5-KD 90% of cells were dead 3 days after autogamy; for PDSG2-KD after 3 days 50% of cells were dead and 47% appeared to be "sick" (i.e.dividing slower than usual); for PTCAF1-KD after 3 days 80% of cells were dead and 17% were sick.
Morphological characteristics of MAC development were used to assign the developmental stages of the control and knockdown samples: "Early", "Middle" and "Late", correspond to 50% of cells with fragmented MACs; 100% of cells with fragmented MACs; and 100% cells with fragmented MACs and a new MAC; Figure S3A,C).Note that because Paramecium cells are not completely synchronous during induction of autogamy 33 and because the underlying assessments are somewhat subjective, due caution should be exercised when comparing developmentally staged analyses between gene knockdowns.

Nowa1 antibody production and use in western blotting
A portion of Nowa1's C-terminal domain (amino acids 744-1024) fused to an N-terminal Strep-Tag was expressed in E. coli BL21-CodonPlus(DE3)-RIL cells (formerly Stratagene, currently Agilent, USA).The protein was column purified, followed by removal of the tag by HRV 3C protease cleavage.Immunization of a rabbit with this protein was performed by Eurogentec, Belgium, following their 28-day polyclonal antibody protocol, then affinity purified using the matrix bound C-terminal domain portion of Nowa1, before performing western blotting using a 1:500 antibody dilution (Figure S3B).As a loading control, western blotting was performed using 1:5000 dilution of monoclonal antialpha-tubulin (Sigma T8203 lot 111M4828) and 2% BSA for blocking.
Protein translation was performed at 37°C; cells were supplied with 1 mM IPTG and 0.2% glucose.Protein was purified according to the pMAL™ Protein Fusion and Purification System (NEB e8200s).EMSA was performed with the LightShift® Chemiluminescent RNA EMSA Kit (Thermo scientific, 201558) using 6.25 μM of RNA or DNA, 4 μg of MBP, and 12 μg of MBP-Nowa1.
DNA and RNA extraction, library preparation and sequencing Macronuclear DNA was isolated from post-autogamous cells as previously described 1 .For PCR analyses of IES retention, DNA was extracted from approximately 800 cells with a Nucleospin Tissue kit (Macherey-Nagel).
Total RNA was extracted from 400 ml of culture per the TRI Reagent BD protocol (Sigma) and was resuspended in RNase free water.A MirVana miRNA isolation kit (Ambion) was used for sRNA enrichment.
MAC genomic DNA and small RNA libraries were produced and sequenced per standard Illumina protocols.Paired-end Illumina DNA-seq libraries were constructed and sequenced on either the Illumina GAIIX, HiSeq 2000 or HiSeq 3000 sequencers.

PCR analysis of IES retention
PCR conditions and primers used to test IES retention following NOWA1/2_a-KD were the same as those in 3.

IES and OES retention score regressions
To generate the IES and OES score regression matrices, shown in Figures 3A-C, Figure 4E and Figures S1A-B, we provide the program "after_ParTIES.py"(see Data and software availability), which can be run on Dataset 1 and Dataset 2 (IES and OES retention scores, respectively).This was tested using Python3.6 and relevant modules for scientific computing from the Anaconda software distribution from Continuum Analytics (version: Anaconda3-4.3.1).
Three regressions are graphically shown for IRS pairs: an ordinary least squares (OLS) regression, a LOWESS regression (nonparametric) and an orthogonal distance regression (ODR) with a linear function initialized by the slope and intercept from the OLS.OLS and ODR used standard SciPy functions.LOWESS regressions used the Python statsmodels library.ODR was performed since errors in both the "dependent" and "independent" regression variables are expected (both IRSs), not just the dependent variable as assumed by OLS.The combined effect of background IES retention and systematic IES retention errors on the regressions, and the effect of subtracting the mean of multiple control IES retention scores is shown in Figure S1A.For each pair of knockdown IES retention scores, Spearman's rank correlation coefficient (r s ) was calculated instead of Pearson's correlation coefficient to mitigate the effect of outliers.Reproducibility of IES retention scores between experiments is demonstrated by the strongly correlated scores of different NOWA1/2 silencing experiments (Figure S1B).Two-sided p-values, calculated with SciPy's spearmanr function, were used to test the null hypothesis of no correlation (for p ≤ α/m ≤ 0.01/45; i.e. with a Bonferroni correction for multiple hypothesis testing; uncorrected p-values are given in Supplementary Data S1-4), which was rejected in all the displayed correlations in the main figures, except where indicated by a circumflex accent.To facilitate assessment of the statistical significance of differences between r s values, tables containing the two-tailed p-values are provided in Supplementary Data S5-7.p-values were calculated using methods from the R statistical package cocor version 1.1-3 36 that employ Fisher's r-to-Z transformation (which are also applicable to r s

37
). Appropriate methods for calculating test statistics for correlations with independent groups or dependent groups with either overlapping or non-overlapping variables were used from cocor (noted in the supplementary data files).All the comparisons of r s differences described in the text are statistically significant at α=0.01 with a Bonferroni correction for multiple hypothesis testing.

Assembly and selection of OESs
Genomic regions that are not part of the current reference Paramecium MAC+IES assembly were previously assembled 1 , but we decided to use an alternative, metagenome assembler to attempt to produce a better assembly, and to minimize bacterial contamination.Reads from PGM-KD 1 not mapping to the current Paramecium tetraurelia strain 51 MAC genome and 51 MAC genome with IESs (bwa 38 version 0.7.12-r1039; default parameters) were assembled with IDBA-UD 39 version 1.0.9 (default parameters).To remove bacterial contaminants assembled during the assembly, only scaffolds with GC < 40% were selected (bacterial scaffolds peak around 58% GC; Figure S5A).This procedure resulted in a ∼20 Mb assembly consisting of 7266 scaffolds.As a small fraction of the P. tetraurelia MAC genome is currently missing from the reference assembly, this fraction was present in the ∼20 Mb assembly.To exclude the missing MAC genome regions, empty vector control (control 1) reads were mapped to the 7266 scaffolds with HISAT2 40 version 2.0.4 (parameters: "-X 600 -pen-noncansplice 0 --mp 24,8 --max-intronlen 1000), and then, using a custom Python script, only those regions with little or no sequence coverage (≤ 2.0×; compared to a mean MAC coverage of ∼100×) were extracted (Figure S5C shows an illustrative scaffold from which OESs were extracted).Coverage was smoothed using a bi-directional average of two exponentially weighted moving averages (EWMA) with a 200 bp span.This procedure extracted 4799 sequence regions (12.3 Mb).
To minimize the effects of potential genome assembly chaff (short assembled regions, esp.those that might arise from sequence errors), analyses were continued with only the low control sequence coverage regions extracted from scaffolds longer than 300 bp and with a PGM-KD sequence coverage of between 3× and 70× (Figure S5B).These 3882 regions, obtained from 5236 scaffolds, corresponding to 11.8 Mb of sequence data, are classified as putative OESs (Supplementary Data S8).Note that OES boundaries are a first approximation, since imprecise DNA elimination generates variable boundaries (as illustrated in Figure 1 and shown in Figure S5C), and many OESs situated at the ends of the scaffolds are likely incomplete due to the limitations of genome assembly (esp.since the assembly may end when untraversable repetitive regions are encountered).

Estimation of OES retention
For each knockdown, to quantify the effects of gene knockdown on OES retention, reads were mapped to the OESs (identified in the previous methods section) with HISAT2 40 version 2.0.4 (parameters: "-X 600 --pen-noncansplice 0 --mp 24,8 --maxintronlen 1000) before determining coverage, and normalized to the median coverage of the knockdown reads mapped with HISAT2 to the Paramecium tetraurelia strain 51 MAC genome plus IES assembly, i.e.OES retention score=OES coverage /median(MAC-with-IES-genome coverage ).OES retention scores may not be as accurate as IES retention scores, both because their denominators use a measure of global, rather than local, sequence coverage and because only reads that map within OES boundaries are counted.In the latter case, read mapping may fail to detect short matches to an OES when only a short part of an OES/MDS-derived read matches, and so the shorter the OES, the greater the underestimation of its retention.Shorter OESs are more weakly retained in most knockdowns affecting genome reorganization (Figure S5D) in Paramecium, but at least some of this effect may be due to boundary-related underestimation of retention.In future, when a complete Paramecium MIC genome is assembled, the relationship between OES length and retention can be fully assessed.OES retention estimates are provided in Dataset 2. Control OES retention was subtracted from the knockdown OES retention for each OES used in the regression analyses.
sRNA mapping sRNAs were mapped and histograms of their genomic distribution were produced as previously 3 with the addition of OESs.

Sequence logos
Sequence logos were created with WebLogo 41 , version 3.3, with redundant sRNAs and base frequencies A=U=0.4 and C=G=0.1.

Examination of basic IES retention properties
Previously, each newly silenced Paramecium gene involved in IES excision appeared to generate a different distribution of IES retention 3,11,12 .From additional knockdowns we have subsequently performed, it can now be seen that the IES retention distributions for NOWA1/2 knockdown, TFIIS4 knockdown and DCL2/3/5 triple knockdown are very similar (Figure 2A; see Methods for the IES retention score calculation).Given the role of TFIIS4 and Dcl2/3/5 in producing and processing of RNAs, and the suggested role of Nowa1 in RNA interactions (see "Nowa1/2 is involved in RNA scanning, may mediate scnRNA-and iesRNA-targeted DNA cutting and excision" later in the results), we classify these proteins as "RNA-associated" proteins involved in Paramecium MAC genome development.The IES retention score distributions of the EZL1 knockdown and PTCAF1 knockdown also have similar forms, though the effect of the former knockdown is slightly stronger (Figure 2B).Since both Ezl1 and PtCAF1 affect histone modifications 10,11 , we classify these proteins as those involved in "histone modification".
As we previously reported, the effects of either DCL2/3 knockdown or DCL5 knockdown on IES retention alone are quite modest 3,15 .Even when their IES retention scores are summed, it is apparent that the knockdowns of DCL2/3 and DCL5 influence considerably fewer IESs, and to a lesser degree than the PGM knockdown.For DCL2/3/5, NOWA1/2 and TFIIS4, given that the gene knockdowns are efficient (as judged by depletion of their mRNAs or proteins; Figure S3B,D and 12 ), we think the similarity of their IES retention score distributions indicates that the upper bounds of IES retention for these knockdowns have been approached.Consequently, these knockdowns may indicate most of the Paramecium IESs requiring scnRNAs and iesRNAs for their excision.
Though an interesting idea, we find no evidence of a proposed missing class of sRNAs 11,12 responsible for targeting and excision of additional IESs.Instead, we observe that after DCL2/3/5-KD the levels of IES-matching sRNAs are negligible (Figure 5C).We therefore infer that most IES excision in Paramecium does not require IES-targeting sRNAs.This is consistent with past studies reporting that most tested IESs were non-maternally controlled 27,28 , i.e. that scnRNAs alone are insufficient for most IES excision 3,15 .On the other hand, from IES retention following DCL2/3/5-KD we infer that many IESs require some scnRNAs or iesRNAs to cleanly remove all their copies (the exact number of affected IESs depending on the chosen retention score threshold, e.g.19043 IESs are affected by DCL2/3/5-KD above a threshold of 0.1 retention).
An important observation from the DCL2/3/5 knockdown is the considerably stronger IES retention compared to the separate DCL2/3 and DCL5 knockdowns (Figure 2C); for example, at an IES retention score > 0.1, for DCL2/3/5-KD 19043 IESs are affected vs. 4815 IESs for DCL2/3-KD, and 5384 IESs for DCL5-KD.This suggests some co-operation between scnRNAs and iesRNAs in IES excision.From the DCL2/3/5 triple silencing, it is clear that multiple gene co-silencing may be necessary to gauge the full contributions of individual genes to IES excision.This is a critical consideration in gene function investigations in Paramecium tetraurelia with its abundance of paralogs that arose from successive whole genome duplications 42 .
As we previously proposed, IES excision/recognition efficiency and the need for particular genome reorganization proteins vary as a function of IES end bases and lengths 15 .A trend of increasing IES retention with IES length was shown for the knockdowns of DCL2/3, TFIIS4 and EZL1 3,11,12,15 .It can now be seen that, with the exceptions of PGM-KD and DCL5-KD, most knockdowns leading to IES retention show this trend (Figure 2E,F).A new observation with respect to IES length is that IES retention for the knockdowns of DCL2/3/5, TFIIS4 and NOWA1/2 also has an obvious 10-11 base periodicity for ~45-130 bp IESs (Figure 2E), mirroring that of the IES length peaks (which are proposed to reflect the constraints of DNA twist on transposase excision 1 ).This signal is not apparent in knockdowns with weaker effects (DCL2/3-KD, DCL5-KD and PDSG2-KD), and only weakly visible in the knockdowns with stronger effects (PTCAF1 and EZL1).We previously also observed that the terminal IES base frequencies relative to IES length show a periodic, 10-11 bp signal (Figure 2A of 15).As in knockdowns of DCL2/3, DCL5 and PGM 15 , for all the new knockdowns we examined, the end base frequencies of longer IESs (e.g.45-55 bp) vary in a similar manner with respect to their IES retention scores (Figure 2G-H; Figure S2), consistent with the idea that some IESs have a greater requirement for the products of these genes than others 15 .

Relationships among genes involved in genome editing inferred from IES retention
While similar IES retention distributions suggest different gene knockdowns might affect retention of particular IESs in a similar manner, examination of pairwise correlations is necessary to clearly establish this.We therefore determined all the possible pairwise correlations between IES retention scores of different gene knockdowns (Figure 3).From this figure, it can be seen that the DCL2/3/5, NOWA1/2, TFIIS4 (group 1), and histone modification   For the purposes of regression, for each IES the mean of two negative control OES retention scores (control 1 and control 2) have been subtracted, and zeroed when the subtraction yields a negative value, i.e.ORS regression =max(ORS experiment -mean(ORS controls ), 0).This calculation was performed to reduce the effects of natural, background OES retention and systematic errors in OES retention score estimation as much as possible (but note that when OES retention scores are very low, some residual background retention may lead to some correlation, e.g. between DCL5-KD and control 3).Hexagonal binning of OES retention scores was used to generate the plots in the lower triangular matrix.Red lines are for OLS regression; orange lines for LOWESS and grey lines for ODR.For each plot in the lower triangular matrix (M i,j ), Spearman's rank correlation coefficient (r s ) is given in the corresponding position diagonally opposite in the upper triangular matrix (M j,i ).S1), maternal control scores were obtained from the maximal IES retention observed in Figure 6  genes (group 2) not only have the most similar IES retention score distributions within each of these two groups, but also the strongest IES retention score correlations.Though not as strong as those within the DCL2/3/5, NOWA1/2, TFIIS4, and histone modification groups, positive correlations are also evident between these groups (Figure 3; e.g.Spearman's rank correlation coefficient, r s =0.74 between DCL2/3/5-KD and PTCAF1-KD).The correlations between PGM knockdown and other gene knockdowns are generally weak -not much greater than the correlations between the control (ctrl3) and other gene knockdowns.Such weak correlations are an expected consequence of the ability of PiggyMac to excise most IES copies without scnRNAs/iesRNAs and without chromatin state changes brought about by Ezl1 and PtCAF1.
Since IES retention increases with IES length for many of the knockdowns influencing IES excision (see Figure 2E,F), we examined the retention score correlations between the abundant IESs corresponding to the first length peak (e.g.≤ 35 bp; ∼15600 IESs; the IES length peaks can be seen in the background of Figure 2E) compared to longer IESs (e.g.≥ 500 bp; ~380 IESs; patterns of correlations between these length intervals are intermediate to those in Figure 2E,F).For short IESs there are good correlations between all the possible pairs of IES retention scores from DCL2/3/5-KD, NOWA1/2-KD, TFIIS4-KD, PTCAF1-KD and EZL1-KD.This could be because the protein products of all the genes involved in these knockdowns are cooperating in the excision of short IESs.For the subset of short IESs, the right-skewed IES retention score distributions of these knockdowns also all resemble each other (diagonal of Figure 3B).We previously noted that DCL5-KD leads to greater retention of short IESs than DCL2/3-KD, i.e. iesRNAs are marginally more important for short IES excision than scnRNAs and short IESs are generally weakly epigenetically controlled 3,15 .This is reflected by the stronger correlations in IES retention score for short IESs between DCL5-KD and the knockdowns of NOWA1/2, TFIIS4, PTCAF1 and EZL1, relative to the correlations between DCL2/3-KD and the same genes.
When focusing on longer IESs (e.g., those > 500 bp) the IES retention score correlations between the knockdowns of histoneassociated genes, and also those between DCL2/3/5, NOWA1/2, TFIIS4 and histone-associated genes weaken substantially.Compared to their short counterparts the retention score distributions of longer IESs also look very different, with those of DCL2/3, DCL2/3/5, NOWA1/2, TFIIS4 closer to uniform, and those of the histone modification group closer to normal.The forms of the histone modification group IES retention score distributions now also become more like that of PGM-KD.These patterns are consistent with the chromatin state changes brought about by EZL1-KD and PTCAF1-KD exerting more influence on long IESs than short ones, with a much lesser requirement for iesRNAs, and generally some requirement for scnRNAs (i.e.long IESs are generally more strongly epigenetically controlled).At longer scales, the effect of PTCAF1-KD and EZL1-KD on IES retention (e.g. at ≥ 500 bp, r s =0.58) does not correlate as well as at the shorter scale (e.g. at ≤ 35 bp, r s =0.88), likely reflecting that the products of these genes do not have exactly the same consequences on chromatin state at this scale.The decrease in IES retention score correlations between either DCL2/3-KD or DCL2/3/5-KD and histone modification groups largely reflects that iesRNAs have little influence on long IES excision.

Relationships among genes involved in genome editing inferred from OES retention resemble those of long IESs
To date, information about the effects of gene silencing upon Paramecium's other eliminated sequences (OESs), the remaining deleted DNA currently not annotated as IESs, has typically come from Southern blotting or dot blotting, e.g.8,10,11,13,31, though more recent studies have also begun to investigate the overall effects of gene knockdown on these regions 11,12 .Most gene knockdowns that lead to IES retention also lead to retention of transposons within OESs 8,[10][11][12][13]31 . Foreach of the knockdowns used to investigate IES retention, we examined OES retention by mapping the knockdown reads to genomic regions which do not match MDSs and IESs from the reference P. tetraurelia assemblies that were extracted from a PGM-KD DNA assembly (see Methods for more details).Since PGM-KD leads to the most IES retention, like previous reports 11,12 , we assume that much of the remaining DNA eliminated from the MIC genome can be recovered from this knockdown.
Examination of the distributions of OES retention (Figure 4A-D), generated similarly to those for IES retention (Figure 2A-D), shows that while some features of OES retention resemble those of IES retention, e.g. the forms of the NOWA1/2-KD, TFIIS4-KD and DCL2/3/5-KD OES retention distributions are roughly similar, there are also important differences.OES retention due to DCL5-KD is very weak, hardly differing from that of a negative control, and DCL2/3/5-KD OES retention is weaker than that of DCL2/3-KD.Compared to the very strong IES retention following PGM-KD versus all other knockdowns (Figure 2D), PGM-KD OES retention is not as pronounced (Figure 4D).Like IESs, in part this may be because the strength of OES retention in other knockdowns is greater, but it is also possible that DNA cutting enzymes other than PiggyMac may be involved in OES elimination.
Examination of the correlations in OES retention in Figure 4E reveals a number of important effects: (i) between PGM-KD and other knockdowns, OES retention correlation is greater than general IES retention score correlations (Figure 3A); (ii) NOWA1/2-KD correlates most strongly with TFIIS4-KD and DCL2/3-KD; PTCAF1-KD and EZL1-KD correlate the best with each other, and moderately with DCL2/3-KD, DCL2/3/5-KD, TFIIS4-KD and NOWA1/2-KD; (iii) though a clear role for the Pdsg2 protein has yet to be defined, OES retention following PDSG2-KD correlates most strongly with PGM-KD; (iv) as OES length increases (e.g.> 4 kb) the PTCAF1-KD and EZL-KD OES retention scores become very narrowly distributed in contrast to those of DCL2/3-KD, NOWA1/2-KD and PGM-KD, suggesting that chromatin state changes brought about by PtCaf1 and Ezl1 may have a very general effect on OES retention.From these observations and those of the effects of previously reported analyses of the overall sequence complexity of retained OES DNA following gene silencing 11,12 , it follows that OES elimination mostly does not need iesRNAs, whereas scnRNAs, together with TFIIS4 and Nowa1/2 are necessary for OES elimination.The stronger correlations in OES retention scores of other knockdowns relative to those of PGM-KD suggests greater cooperation between PiggyMac, scnRNAs and other protein factors in OES elimination than in general IES excision.Overall, the patterns of OES retention correlation for the different knockdowns resemble those of retention correlations for long IESs, suggesting most OESs may be treated like long IESs.
Nowa1/2 is involved in RNA scanning, and may mediate scnRNA-and iesRNA-targeted DNA cutting and excision As the Nowa1/2 proteins have previously been implicated in removal of maternally controlled IESs 31 , but their precise role in Paramecium genome development has not yet been established, we wished to obtain additional insight into the function of these proteins, and their relation to other molecules involved in this process.Nowa1/2 proteins contain two distinct domains: an unstructured N-terminal domain consisting of alternating blocks of 'FRG' repeats and 'GGWG' repeats, and a structured C-terminal domain which is sufficient to route Nowa1 from the maternal MAC to the zygotic nuclei 31 .Nowa1/2's 'FRG' repeats resemble the 'RGG' boxes present in many hnRNP and other RNA-binding proteins, raising the possibility that they could facilitate interaction between Piwi-bound sRNAs and nascent transcripts on the old and the new MAC.One way Nowa1/2 may mediate these interactions is via its "GGWG" repeats, which resemble the "WG/GW" repeats found in Argonaute binding proteins 43,44 (Argonaute proteins are members of the Piwi protein superfamily 45 )."WG/GW" repeat-containing proteins in Tetrahymena (Wag1 and CnjB) have been shown to interact with its Twi1 Piwi protein 46 .In principle, Nowa1/2 could interact with Paramecium's sRNA-bound Piwi proteins in both the old and the developing new MAC, and during the internuclear transport of the sRNAs.Like PtCAF1-KD, NOWA1/2-KD also substantially decreases H3K9me3 and H3K27me3 in the old and developing new macronuclei 10 .
To gain more insight into the role of Nowa1/2 in Paramecium genome reorganization we analyzed the effects of NOWA1/2-KD upon IES retention and the sRNA population during macronuclear development (Figure 5).We previously showed that in control and DCL5-KD cells, MAC genome-matching scnRNAs decrease during development, consistent with RNA scanning, and that there is a progressive increase in iesRNAs 3 .It can now also be seen that OESs give rise mainly to 25 nt scnRNAs and, compared to IESs, relatively few iesRNAs (Figure 5A; see also Figure S3G,H).
Since it is known that smaller IESs can be nested within larger IESs 47 , we searched for IESs embedded within OESs.Visual inspection of DNA reads mapping to the assembled OESs from different gene knockdowns confirms the presence of shorter IESs in OESs, often with matching iesRNAs.We previously noted that, though both scnRNAs and iesRNAs are enriched at IES ends, scnRNAs overlap IES ends, whereas iesRNAs map very specifically within the limits of IESs 3 .Given the patterns of OES retention shown in Figure 4E, we therefore suggest that iesRNAs primarily facilitate precise IES excision, whereas scnRNAs facilitate both precise IES excision and imprecise OES elimination.
Compared to control cells, scnRNAs accumulate and iesRNAs are reduced after NOWA1/2 silencing (Figure 5A, B). scnRNA accumu-lation is consistent with inhibition of RNA scanning when NOWA1/2 is silenced, i.e.MDS scnRNAs are not removed.We do not think that NOWA1/2-KD affects RNA scanning by interfering with transport of scnRNA-bearing Piwis (Ptiwi01/09 9 ), as we observed no differences in localization of GFP-tagged Ptiwi09 with and without silencing NOWA1/2 (Figure S4).On the other hand, the pronounced mobility shift in an electrophoretic mobility shift assay upon addition of synthetic MBP-Nowa1 protein (Figure 5F) indicates Nowa1 may interact with RNA, RNA:DNA hybrids, and, to a lesser degree, DNA.Hence NOWA1/2-KD-induced interference in RNA scanning in the old MAC could be due to disruption of interactions normally mediated by Nowa1/2 between nucleic acids and sRNA-carrying Ptiwi proteins.
From the electrophoretic visualization of sRNAs (Figure 5B) it can be seen that iesRNAs are reduced following NOWA1/2-KD (since Figure 5A shows only proportions, it is not possible to deduce whether absolute levels of sRNAs change from this figure).A decrease in iesRNAs was also observed following DCL2/3 cosilencing, and, to a lesser degree for the independent DCL2 and DCL3 silencings 3 .For the lower quantity of iesRNAs in NOWA1/2-KD cells, there are two possible explanations, both of which may be valid: (i) Nowa1/2 is directly involved in iesRNA production or stability; (ii) Nowa1/2 depletion inhibits scnRNA-dependent IES excision due to the inhibition of RNA scanning, which in turn inhibits iesRNA production.Teasing apart these possibilities is a possible future experimental avenue to gain additional insight into the function of the Nowa1/2 proteins.NOWA1/2 silencing was previously reported to lead to retention of maternally controlled IESs, and indeed maternally controlled IES retention levels (following microinjection of complementary IESs into the old MAC) correlate well with the IES retention scores (Figure 5D).However, following NOWA1/2 silencing we also observe low levels of retention of some non-maternally controlled IESs by PCR analyses and in DNA-seq data (Figure 5D, E; Table S1).As the non-maternally controlled IESs that are weakly retained following NOWA1/2-KD are also weakly retained following DCL2/3-KD, we infer that they are weakly epigenetically controlled (Table S1).Therefore, due to the sensitivity of IES retention detection in DNA-seq data, some "non-maternally controlled" IESs may actually be weakly epigenetically controlled (it is also possible that with more sensitive methods these IESs would be classified as weakly maternally controlled).From Figure 3, we infer that Nowa1/2 is not only involved in scnRNA-dependent IES excision (epigenetically controlled IES excision), but also iesRNAdependent IES excision, since IES retention scores of NOWA1/2-KD are positively correlated with those of both DCL2/3-KD and DCL5-KD, and strongly correlated with those of DCL2/3/5-KD.
From these observations of the effects of NOWA1/2-KD, we propose that Nowa1/2's WG/GW (Argonaute-binding) repeats could mediate IES targeting by both Ptiwi01/09-bound scnRNAs (both old and new MAC) and iesRNAs bound to other Ptiwi proteins 48 in the new MAC.The apparent failure to remove MAC genome-matching scn-RNAs after NOWA1/2 silencing (Figure 5A) is consistent with the involvement of Nowa1/2 in RNA scanning in the old macronucleus, as expected if these proteins mediate pairing of scnRNAs with nascent maternal non-coding transcripts.In principle, Nowa1/2 proteins could interact with such transcripts via their FRG (RNAbinding RG boxes) repeats.From the correlations in Figure 4, we infer that Nowa1/2 proteins may operate predominantly in concert with scnRNAs and not iesRNAs during OES elimination.

Discussion
The strong similarity of IES retention following silencing of DCL2/3/5, NOWA1/2 and TFIIS4 suggests that the protein products of these genes may work in the same sRNA-dependent DNA elimination pathway during short IES excision, and that DCL2/3, NOWA1/2 and TFIIS4 work in the predominantly scnRNA-dependent DNA pathway responsible for OES and long IES elimination.PGM-KD-induced DNA retention correlates very weakly with that of other knockdowns for short IESs, but moderately with longer IESs and OESs.This suggests that for most short DNA excision, interactions between PiggyMac and the other proteins and scn-RNAs/iesRNAs are less important than for longer DNA excision.For comparison, the silencing effects upon these and related genes affecting Paramecium genome development, and the expression/ localization patterns of their encoded proteins are summarized in Table 1 and Table 2, respectively.
Even though the knockdowns of DCL2/3/5, NOWA1/2 and TFIIS4 seem to affect short IESs in a very similar manner, they affect the development-specific sRNAs in quite different ways.DCL2/3/5-KD leads to elimination of both scnRNAs and iesRNAs; NOWA1/2-KD inhibits RNA scanning (MAC genome-matching scnRNAs are not properly removed) and iesRNA production; TFIIS4-KD has little, if any, effect upon RNA scanning, but does eliminate iesRNA production 12 .Though limited to four analyzed IESs, it is thought that TFIIS4 may be involved in general IES transcription in the developing new MAC 12 , including that of an IES whose excision has no apparent requirement for scnRNAs, iesRNAs or TFIIS4 12 (51A4404; see Table S1).This is proposed to produce RNA to which the sRNAs bind, facilitating IES targeting 12 .Since TFIIS4-KD also leads to OES retention, it is likely that TFIIS4 is also involved in extensive transcription of these regions.It is also apparent that even when there is a good correlation in IES or OES retention scores, the normal expression and localization patterns of tagged proteins encoded by the genes tested in these knockdowns may be quite distinct, e.g.Nowa1/2 proteins are present in both old and new macronuclei, whereas TFIIS4 is predominantly present in new macronuclei (Table 1).
At present, the simplest explanation which reconciles the observed correlations in IES and OES retention scores between gene knockdowns, the effects of these knockdowns on sRNAs, and the expression and localization of the proteins encoded by these genes is that: (i) TFIIS4-KD removes the RNA transcripts which both scnRNAs and iesRNAs need to bind to their target IESs (or OESs, in the case of scnRNAs), hence IES retention following TFIIS4-KD correlates well with that following DCL2/3/5-KD (and TFIIS4-KD OES retention correlates modestly with DCL2/3-KD); (ii) NOWA1/2-KD prevents binding of Piwi-bound scnRNAs and iesRNAs to their targets via the nascent RNAs produced by TFIIS4, hence NOWA1/2-KD IES retention correlates well with both TFIIS4-KD and DCL2/3/5-KD IES retention (and NOWA1/2-KD OES retention correlates well with both TFIIS4-KD and DCL2/3-KD IES retention).
For the shortest IESs (those < 35 bp), we propose that the high IES retention score correlations between DCL2/3/5-KD, NOWA1/2-KD, TFIIS4-KD, PTCAF1-KD and EZL1-KD, may be a consequence of cooperation of scnRNAs, iesRNAs, RNA transcripts and chromatin state changes allowing these genomic regions to be recognized and targeted (Figure 3B).For longer IESs the correlations in IES retention score between the different possible pairs of DCL2/3-KD, DCL2/3/5-KD, NOWA1/2-KD, TFIIS4-KD are all strong, as are those between PTCAF1-KD and EZL1-KD, but those between the latter two knockdowns and the former ones are modest (Figure 3C).This reflects that long IESs have a greater requirement for scnRNAs, Nowa1/2 proteins and TFIIS4-dependent RNA transcripts, and very little requirement for iesRNAs, but that only some IES retention due to PTCAF1-KD and EZL1-KD also occurs when either scnRNAs, Nowa1/2 or RNA transcripts are depleted.
Though potential co-operation between chromatin modifying genes and scnRNA/iesRNA genes in IES excision is suggested by the modest correlations between their knockdown-induced IES retention scores, much of the excision requiring the Ezl1 and PtCAF1 chromatin-modifying proteins does not appear to need development-specific sRNAs, as there is generally much less IES retention after DCL2/3/5-KD (median 0.053) than after EZL1-KD or PTCAF1-KD (median 0.37 and 0.23, respectively) even though DCL2/3/5-KD effectively eliminates the IES-matching sRNAs (Figure 5C).DCL2/3 knockdown abolishes most H3K27me3 and H3K9me3 during new MAC development 11 , like EZL1-KD, PTCAF1-KD and NOWA1/2-KD 10,11 , whereas DCL5-KD does not influence these modifications 11 .Thus it has been suggested that scn-RNAs alone may guide histone modifications that facilitate DNA elimination in Paramecium 11 .
The idea that Paramecium's scnRNAs guide chromatin state changes in the developing new MAC should, however, be interpreted cautiously: though both PTCAF1-KD and NOWA1/2-KD affect H3K27me3 and H3K9me3 histone modifications in the new MAC, both PtCAF1 and Nowa1/2 also localize in the actively expressed, old MAC, and both PTCAF1-KD and NOWA1/2-KD 10 affect H3K9me3 and H3K27me3 and scanning of scnRNAs proposed to occur in this nucleus, showing that PtCAF1 and Nowa1/2 (and potentially also the chromatin modifications that they influence) operate upstream of any possible scnRNA-directed chromatin modifications in the new MAC.It will thus be important to determine whether, like PTCAF1-KD and NOWA1/2-KD, EZL1-KD influences RNA scanning and the scnRNA population.Assuming fairly accurate quantification, more IESs are very strongly retained following DCL2/3/5-KD than EZL1-KD or PTCAF1-KD, e.g. at a threshold of > 0.7, 4316 IESs for DCL2/3/5-KD, compared to 1077 and 649 IESs for EZL1-KD or PTCAF1-KD), suggesting there may be some scnRNA/iesRNA-dependent IES excision which does not require the chromatin state changes induced by Ezl1-and PtCAF1.In future, it will thus be necessary to determine how the silencing of chromatin modifying genes, such as EZL1 and PTCAF1, and the removal of histone modifications, prevent excision of scnRNA/ iesRNA-independent IESs, and, reciprocally, how depletion of scnRNAs and iesRNAs by DCL2/3/5 silencing prevents IES excision independently of Ezl1-or PtCAF1-dependent histone modifications.
Previously, it was proposed that the mechanisms involved in imprecise DNA elimination in Paramecium are distinct from those involved in IES excision 6 .For imprecise DNA elimination, it was suggested that the resultant boundaries are a mixture of AT-rich direct repeats 6 .Since these repeats are an arbitrary length and are in relatively AT rich genomic regions it is straightforward to find them purely by chance: e.g.39 out 40 of the proposed direct repeats bounding the imprecisely eliminated DNA contained a TA dinucleotide, with all 7 of the two nucleotide repeats being TA 6 .We examined additional junctions of imprecisely eliminated regions (OESs) in DNA from control cells and consistently found that where direct repeats were present and unambiguous, they invariably contained a TA dinucleotide.We therefore favor the simple explanation that, like IESs, the imprecisely eliminated OESs are excised by Pig-gyMac transposases, with TA dinucleotides as the invariant bases necessary for this cleavage 47 .We previously noted that long IESs in Paramecium tend to be underrepresented in MIC genome regions that develop into coding sequences in the MAC genome, suggesting counterselection against such IESs 49 , either due to their lower excision efficiency or higher excision imprecision.Supporting the latter, we find that IESs with evidence of alternative excision in their vicinity are longer than those that do not (Figure S5E).
For the correlations between pairs of gene knockdowns, the effects on short IESs can be seen to differ from those on long IESs, and OES retention correlations more closely resemble those of long IESs than those of short IESs.We therefore propose there is a continuum of excision precision, which is highest for the shortest IESs and lowest for the longest IESs and OESs.In this case, the abbreviation "IES" may be a misnomer, since excision of longer DNA may lead to chromosome fragmentation rather than internal elimination.In a model in which PiggyMac operates as a dimer 1 that recognizes and cuts at two sites, for short IESs it is most likely that both subunits recognize and cut where expected.As IES length increases, DNA elimination imprecision increases.When IESs/ OESs become sufficiently long, the two PiggyMac proteins become increasingly unlikely to make contact, or the cut sites may be too far apart to be rejoined efficiently (the latter, proposed by 6), thus imprecise chromosome breakage results.Thus, consistent with our overall observations, it follows that the recognition and delimitation of the longest eliminated DNA sequences, including transposons, benefits most from additional factors, such as sRNAs or chromatin alterations.
In conclusion, we have shown that comparisons of DNA excision readout following gene silencing by RNAi now provide the means to identify and study associations among genome editing components in Paramecium on a genomic scale.Using this approach, we have proposed putative associations between pairs of Dicer-like and Piwi proteins involved in DNA elimination, which were subsequently confirmed by experimental pulldowns of these proteins and their bound sRNAs 48 .In principle, the application of similar methods would also be useful in identifying and analyzing the molecular components of pathways in forms of developmental DNA deletion found in diverse eukaryotes, including that occurring in parasitic nematodes 50 , sea lamprey 51 , and copepods 52 .In the case of the parasitic Ascaris nematodes, this includes DNA elimination which is considered unlikely to have a requirement for developmentspecific sRNAs such as those in ciliates 53 , but which could conceivably require specific chromatin modifications.As highlighted by the analysis of NOWA1/2 knockdown effects in this paper, these methods, together with complementary experiments, set the stage to begin generally disentangling interactions between candidate genome editing genes.1.

Open Peer Review
Current Referee Status: In as well as other ciliates, massive genome re-arrangements in form of elimination of Paramecium thousands of transposon derived IES sequences occur after sexual fertilization.Some of these excision events have been described to be controlled by small RNAs acting in a genome comparison mechanism.Swart and colleagues compared IES retention and OES retention in F1 progeny after dsRNA induced knockdown of genes involved in these genome re-arrangements and aimed to classify functional relationship of genes by similarity of the individual IES/OES retention.They show that RNA associated and chromatin associated genes form individual groups.In addition, useful information is provided by the assembly of eliminated sequences apart of IESs.The paper represents an interesting analysis de novo providing useful tools and solid data for future research applications.

Major points:
Although the described method to classify IES retention after silencing of different genes is innovative, the paper is a balancing act between methods and research paper.In its current shape, the conclusions drawn and the main results of this paper are of biological significance pushing the methodological aspect into the background.Right now, the paper is more a research paper.The authors need to decide for a strategy.

Minor Points:
The identification of the functional groups of Dcl2/3/5, TFIIS4, NowaX and Ezl1, PtCAF also reflects the expression behavior of these enzymes during development (with the summary of Dcl2/3 early -Dcl5 late).As it makes sense at first glance that cooperating enzymes need to become induced at the same time, the authors should also discuss whether the dsRNA feeding efficiency could be different in early and late stages of development.
In As most of the analyses here relies on the calculation of IES retention scores, please give a more
As most of the analyses here relies on the calculation of IES retention scores, please give a more precise description of their definition in the text, as this term could be new for readers outside the field.Maybe also give more information in Methods, which reads are counted that map to an IES junction or to IES, in particular those, which only partially overlap to the IES.Also, are there any cutoffs set at any time of calculation?
In Methods: IES and OES retention score regressions: It's nice to see the effort put in creating different kinds of regression.However, in the paper, authors do not address about regression in the text (Also, not in Supp.Fig. S1).They only focus on the correlation co-efficient and the IES retention score distributions.
Introduction: asexual reproduction would be cell division in which is not autogamy or paramecia conjugation In Methods: Estimation of OES retention section, the first sentence is too long and difficult to follow.Kindly rephrase it.Authors do not explain how exactly OES scores are obtained.I'm looking for how the authors mapped the 3882 OESs with variable boundaries (obtained from the 5236 scaffolds in the new assembly) to the already available 697 scaffolds (in MAC+IES assembly).
In Results: Examination of basic IES retention properties (last paragraph) Shouldn't it be Figure 2G-J.
Figure 2: In the legend H should be replaced with J. Also in the sentence "(G-H) Mean base frequencies of positions 1-3 after the TA repeat relative are plotted relative to IES retention score" there seems to be a typographical error with the use of the word relative.And, for G-J, along the Y-axis it would be nice to say "Bases after TA repeat" along the Y-axis for the base frequency plots.Discussion: The idea that….This sentence is too long.
Authors say they have attempted to produce a better assembly.It would be helpful to provide this assembly.

For readers not used to
Mac development, the different classifications of IESs into Paramecium epigenetic/not epigenetic, scnRNA dependent/non scnRNA dependent, long/short etc. can be confusing.Together with the classification of the genetic requirements, is it possible to sum the discussion up in kind of a venn diagram to compare IES properties and genetic requirements for excision?As a suggestion… Is the rationale for developing the new method (or application) clearly explained?Yes

Is the description of the method technically sound? Yes
Are sufficient details provided to allow replication of the method development and its use by others?Yes 1.

If any results are presented, are all the source data underlying the results available to ensure full reproducibility? Yes
Are the conclusions about the method and its performance adequately supported by the findings presented in the article?Yes No competing interests were disclosed.

Competing Interests:
We have read this submission.We believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.In this manuscript, the authors aimed to classify genes reported to be involved in DNA elimination of Paramecium by comparing sensitivities of internal eliminated sequences (IESs) and other eliminated sequences (OESs) to RNAi knockdown (KD) of the genes.Their classification method provides quantitative scores for pairwise correlations of genes in DNA elimination pathway and they successfully separated two groups of genes: RNA-associated genes and histone modification genes.The authors also showed that shorter and longer IESs have different sensitivities to RNAi KD of different genes and the sensitivities of OESs were similar to those of longer IESs.Therefore, they suggested that the mechanisms for IES and OES eliminations are overlapping.I believe the presented data is compelling and the strategy described is useful to elucidate the function of uncharacterized genes in DNA elimination in future study.

Major points
Because this is a method article, I suggest the authors to minimize biological interpretations and rather extensively discuss methodological/strategical validities and technical/theoretical disadvantages, if any.Related to this point, it may be better to present the data shown in Figure 5 in another paper.
The authors' strategy does not distinguish direct roles of a genes in DNA elimination from its indirect effects in DNA elimination.I believe this point should be discussed.Some possibilities may be discusses are: a) histone modification genes may affect DNA elimination not only directly through histone modifications on IESs/OESs but also indirectly through regulating expressions of genes involved in DNA elimination; b) RNA-associated genes may act not only on chromatin but also in post-transcriptional regulations of genes involved in DNA elimination, like classical RNAi.In these cases, direct targets of RNA-associated genes and histone modification genes may be identical and the observed differences are due to indirect effects.4.

4.
a) Though they first mentioned that "these knockdowns may indicate most of the Paramecium IESs requiring scnRNAs and iesRNAs for their excision", they then wrote in the next paragraph that "We therefore infer that most IES excision in Paramecium does not require IES-targeting sRNAs."b) I do not see why "most IES excision in Paramecium does not require IES-targeting sRNAs" is "consistent" with the fact that "scnRNAs alone are insufficient for most IES excision".I think a requirement cannot be inferred from an insufficiency.
According to the authors' recent publication (Allen et al. 2017), the production of iesRNA requires circularization of IESs.Therefore, different sensitivities of short and long IESs (and OESs) to KDs of the DNA elimination genes are likely related to their ability to make circle when they are eliminated.I think this possibility should be discussed.
Very little discussion for PDSG2 in the manuscript.The IES retention scores of PDSG2 KD shows no strong correlation with those of any other KDs.Does this mean PDSG2 forms the third group?If so, what would be the function of the third group?If they do not provide any additional explanation, it may be better to consider removing PDSG2-related data from this study.

Other minor points
Abstract: I do not see "comparisons of experimentally-induced DNA retention" in this study.
Abstract: although I agree that the approach is quantitative but not clear how the authors can claim it is sensitive.
PtCAF1 encodes a protein similar to chromatin remodeling factor and thus classifying this gene as "histone modification" would be misnomer.
Figure 3A: The colors for "IESs per bin" are not visible in the individual plots (all dots look purple).Same for Figure 4E.

Is the rationale for developing the new method (or application) clearly explained? Yes
Is the description of the method technically sound?Yes

Are sufficient details provided to allow replication of the method development and its use by others? Yes
If any results are presented, are all the source data underlying the results available to ensure full reproducibility?Yes Are the conclusions about the method and its performance adequately supported by the findings presented in the article?Yes No competing interests were disclosed.Competing Interests:

Figure 1 .
Figure 1.Paramecium genome development schematic.A representative micronuclear chromosome is transformed into macronuclear chromosomes via precise excision of internally eliminated sequences (IESs) and imprecise elimination of other eliminated sequences (OESs).Macronuclear-destined sequences (MDSs) inbetween are joined and capped with new telomeres to give rise to the final macronuclear chromosomes.When DNA is imprecisely eliminated, the telomere-capped ends of the macronuclear chromosomes occur at different sites.In some cases, this process also produces alternative macronuclear chromosome forms.Note that the distinction between OESs and IESs, though useful for the purposes of description and analysis here, is an artificial one, and that there may only be one main class of eliminated DNA in Paramecium (see later in Discussion).

Figure 2 .
Figure 2. IES retention scores and their relation to basic IES properties.(A-D) For the sake of clarity, histogram bars with more than 10000 IESs have been truncated.The number of IESs binned in the left-most bars are as follows: (A) NOWA1/2-KD -17468, TFIIS4-KD -14152, DCL2/3/5-KD 18915 (Figure S3B and S3D provide Western and Northern blots, for NOWA1/2-KD and DCL2/3/5-KD, respectively); (B) PTCAF1-KD -12614, EZL1-KD -8047; (C) DCL2/3-KD -37410, DCL5-KD -35019, DCL2/3/5-KD -22136; (D) Control -43294; PDSG2-KD -20730.Note that, as judged by survival tests in comparison to those previously published 8, PDSG2-KD may have been of reduced efficiency.(E,F) IES retention scores (scales on right y-axes) vs. IES length over shorter (E) (≤ 200 bp) and longer (F) (≤ 1000 bp) IES length scales.The IES length distribution is shown in grey in the background (scale on left y-axes).Lines are the exponentially-weighted mean averages (EWMA) with spans of 5 (E) or 50 (F) bp.(G-H) Mean base frequencies of positions 1-3 after the TA repeat relative are plotted relative to IES retention score for IESs from the third (45-55 bp) length peak (only the highest frequency bases are shown -see (H) for the base color scheme; similar trends are visible for ~10 bp windows surrounding longer IES length peaks).IES length is restricted to control for IES end base frequencies variation associated with large scale (26-1000 bp) IES length variation.EWMA lines for spans of 10 data points (intervals of 0.01) are plotted for data within 2 standard deviations (dotted vertical lines) of the mean retention score (dashed vertical line).Mean IES lengths (bp; counting only one of the two TA pointers) vs. IES retention score are potted in the bottom subgraphs (navy diamonds).See Figure S2 for end base frequency graphs of EZL1-KD and TFIIS4-KD and for end base frequency graphs of shorter IESs (26-36 bp).Equivalent graphs for DCL2/3-KD, DCL5-KD and PGM-KD can be seen in 15.

Figure 3 .
Figure 3. IES retention score correlations and regressions.(A)For the purposes of regression, for each IES, negative control IES retention scores have been subtracted and zeroed when the subtraction yields a negative value, i.e.IRS regression =max(IRS experiment -mean(IRS controls ), 0) (controls=controls 1-4).This calculation was performed to reduce the effects of natural, background IES retention and systematic errors in IES retention score estimation as much as possible (see FigureS1, which examines these effects on regression).Hexagonal binning of IES retention scores was used to generate the plots in the lower triangular matrix.Red lines are for OLS regression, orange lines for LOWESS, and grey lines for ODR.IES retention score distributions given along the diagonal are shown to a maximum of 4000 IESs except for PGM-KD (maximum 8500 IESs).For each plot in the lower triangular matrix (M i,j ), Spearman's rank correlation coefficient (r s ) is given in the corresponding position diagonally opposite in the upper triangular matrix (M j,i ).Note r s =0.91 for biological replicates of NOWA1/2-KD (FigureS1B).(B) as in (A) but for short IESs (≤ 35 bp).IES retention score distributions given along the diagonal are shown to a maximum of 1500 IESs except for PGM-KD (maximum 3200 IESs).(C) as in (A) but for long IESs (≥ 500 bp).IES retention score distributions given along the diagonal are shown to a maximum of 50 IESs except for PGM-KD (maximum 75 IESs).

Figure 4 .
Figure 4. OES retention and OES retention correlations.(A-D) For the sake of clarity, histogram bars with more than 2000 OESs have been truncated.The number of OESs binned in the left-most bars are as follows: (A) NOWA1/2-KD -1238, TFIIS4-KD -1230, DCL2/3/5-KD 2812; (C) DCL2/3-KD -473, DCL5-KD -3788, DCL2/3/5-KD -2812; (D) Control -3882; PDSG2-KD -3062; PGM-KD -709.(E) OES retention score correlations and regressions.For the purposes of regression, for each IES the mean of two negative control OES retention scores (control 1 and control 2) have been subtracted, and zeroed when the subtraction yields a negative value, i.e.ORS regression =max(ORS experiment -mean(ORS controls ), 0).This calculation was performed to reduce the effects of natural, background OES retention and systematic errors in OES retention score estimation as much as possible (but note that when OES retention scores are very low, some residual background retention may lead to some correlation, e.g. between DCL5-KD and control 3).Hexagonal binning of OES retention scores was used to generate the plots in the lower triangular matrix.Red lines are for OLS regression; orange lines for LOWESS and grey lines for ODR.For each plot in the lower triangular matrix (M i,j ), Spearman's rank correlation coefficient (r s ) is given in the corresponding position diagonally opposite in the upper triangular matrix (M j,i ).

Figure 5 .
Figure 5. Analysis of the effects and properties of NOWA1/2-KD.(A) Length distribution of small RNAs from control cells and NOWA1/2-KD cells (see Methods for details of histogram construction and normalization).Vector=vector backbone only.Figure 2SA shows the cytological characterization of the cells these sRNAs were isolated from.(B) Radioactively end-labelled total RNA from NOWA1/2-KD cells and control cells (cells fed with E. coli producing dsRNAs corresponding to the plasmid L4440 with no sequence target in the Paramecium genome; the gel portion showing the control sRNAs was previously shown in Figure 2A of 3 as "Control 1; Experiment 1").(C) sRNA length distribution for DCL2/3/5-KD; the peak at 33 nt corresponds to an unknown sRNA family (e.g.5'-GGAUCUAUCGUAUAGUGGUUAGUACCUGAGGCU-3') that matches a handful of locations in the MAC genome, which we have also be observed in other controls and knockdowns (e.g. from 8).See Figure 2SE for cytological characterization.(D) To examine the relationship between maternal control and IES retention scores (values determined in TableS1), maternal control scores were obtained from the maximal IES retention observed in Figure6of 28.Linear regression lines are shown along with Pearson's r and the two-tailed p-values for hypothesis testing with a null hypothesis that the regression slope is zero.(E) Retention of IESs tested by PCR for control and NOWA1/2-KD cells.NOWA1/2-KD IES retention scores are given to the right.Additional bands indicate the presence of heterodimers of ssDNAs with excised and unexcised IESs.(F) Electrophoretic mobility shift assay with MBP (maltose-binding protein), MBP-Nowa1 (alternating lanes) and oligonucleotides, visualized on a native polyacrylamide gel.
Figure 5. Analysis of the effects and properties of NOWA1/2-KD.(A) Length distribution of small RNAs from control cells and NOWA1/2-KD cells (see Methods for details of histogram construction and normalization).Vector=vector backbone only.Figure 2SA shows the cytological characterization of the cells these sRNAs were isolated from.(B) Radioactively end-labelled total RNA from NOWA1/2-KD cells and control cells (cells fed with E. coli producing dsRNAs corresponding to the plasmid L4440 with no sequence target in the Paramecium genome; the gel portion showing the control sRNAs was previously shown in Figure 2A of 3 as "Control 1; Experiment 1").(C) sRNA length distribution for DCL2/3/5-KD; the peak at 33 nt corresponds to an unknown sRNA family (e.g.5'-GGAUCUAUCGUAUAGUGGUUAGUACCUGAGGCU-3') that matches a handful of locations in the MAC genome, which we have also be observed in other controls and knockdowns (e.g. from 8).See Figure 2SE for cytological characterization.(D) To examine the relationship between maternal control and IES retention scores (values determined in TableS1), maternal control scores were obtained from the maximal IES retention observed in Figure6of 28.Linear regression lines are shown along with Pearson's r and the two-tailed p-values for hypothesis testing with a null hypothesis that the regression slope is zero.(E) Retention of IESs tested by PCR for control and NOWA1/2-KD cells.NOWA1/2-KD IES retention scores are given to the right.Additional bands indicate the presence of heterodimers of ssDNAs with excised and unexcised IESs.(F) Electrophoretic mobility shift assay with MBP (maltose-binding protein), MBP-Nowa1 (alternating lanes) and oligonucleotides, visualized on a native polyacrylamide gel.
Institute of Human Genetics, CNRS, University of Montpellier, Montpellier, France

I
am confused by the following descriptions in the 2nd and the 3rd paragraphs of Results (page 8), which need some clarification.a) Though they first mentioned that "these knockdowns may indicate most of the Paramecium Page 22 of 24 F1000Research 2017, 6:1374 Last updated: 30 AUG 2017 3.

Table 2 . Properties of proteins implicated in Paramecium genome editing
33Proteins are those considered in this study.Timing of expression during development is gauged from microarray data33.Localization is that determined from GFP fusion proteins.Putative function indicates the clearest known function.

Table 1 . Gene knockdown effects during development. Effects
of knockdowns as described in this and existing studies.See Results for a description of the functional group assignment.

PubMed Abstract | Publisher Full Text | Free Full Text 53
. Wang J, Czech B, Crunk A, et al.:

Table 2 ,
the induction time points differ for some enzymes to the microarray data in ParameciumDB described in Arnaiz 2010.For instance, TFIIS4 is indicated to be expressed et al. late in Table 2 but according to Maliszewska-Olejniczak 2015 and ParameciumDB, this gene et al. is induced early during autogamy.