A role for MIR 828 in pineapple fruit development

Chen et al. (Nature Genet. 51: 1549–1558; Oct. 2019) sequenced Ananas comosus var. bracteatus accession CB5, cultivated for its bright pink-tored colored fruit, and yellow-fleshed A. comosus accession F153, reporting an improved F153 reference assembly while annotating MICRORNA (MIRNA) loci and gene family expressions relevant to lignin and anthocyanin biosynthesis. An independent article (Xiong et al. Sci. Rep. 8: 1947; 2018) reported var. bracteatus MIRNAs but not MIR828, a negative regulator of anthocyanin and polyphenolics biosynthesis by targeting MYB transcription factors associated with UV lightand sugar-signaling in dicots. MIR828 has been reported in gymnosperms, Amborella (sister to flowering plants), and basal monocot orders Liliales, Asparagales, Zingiberales, Arecales, but not in the Poales, a sister order comprising grasses and ~3,000 species of bromeliads including pineapple. Here I show MIR828 exists in pineapple and directs post-transcriptional gene silencing of mRNAs encoding MYB family members with inferred function to regulate the conspicuous red fruit trait in var. bracteatus. MIR828 plesiomorphy (an ancient basal trait) may shed light on monocot apomorphic fruit development, postulated for 21 monocot families with fleshy fruits as due to homoplasy/convergence driven by tropical climate and/or enticements to vertebrate endozoic seed dispersers.

Chen et al. 1 sequenced Ananas comosus var. bracteatus accession CB5, cultivated for its bright pink-to-red colored fruit, and yellow-fleshed A. comosus accession F153, reporting an improved F153 reference assembly 2 while annotating MICRORNA (MIRNA) loci 2-4 and gene family expressions relevant to lignin and anthocyanin biosynthesis. An independent article reported var. bracteatus MIRNAs 5 but not MIR828 6 , a negative regulator of anthocyanin and polyphenolics biosynthesis by targeting MYB transcription factors associated with UV light-and sugar-signaling in dicots 7,8 . MIR828 has been reported in gymnosperms 8 , Amborella (sister to flowering plants) 9 , and basal monocot orders Liliales 8,10 , Asparagales 11 , Zingiberales 12,13 , Arecales 14 , but not in the Poales, a sister order comprising grasses and ~3,000 species of bromeliads including pineapple 15 . Here I show MIR828 exists in pineapple and directs posttranscriptional gene silencing of mRNAs encoding MYB family members with inferred function to regulate the conspicuous red fruit trait in var. bracteatus. MIR828 plesiomorphy (an ancient basal trait) may shed light on monocot apomorphic fruit development, postulated for 21 monocot families with fleshy fruits as due to homoplasy/convergence driven by tropical climate and/or enticements to vertebrate endozoic seed dispersers 16 .
The astronomer Carl Sagan popularized the quip (a corollary of Occam's Razor) "absence of evidence is NOT evidence of absence." Taking a conservative approach applies especially to quantitative transcriptomics. For example, the existence of regulatory and post-transcriptional processes increase the complexity of non-coding RNA space, where annotation is sparse. Analysis of Chen et al's A. comosus MD-2 cultivar leaf small RNA (sRNA) libraries and stranded RNA-seq libraries 4 from flowers and fruits of F153 and CB5 genotypes establish the existence of aco-miR828 (Extended data: Figure S1 17 ) and pri-MIR828 expressions, with the novel observation that pri-MIR828 is properly transcribed in leaves, flowers and fruits yet ~50% of mature miR828 species abundance (0.3 reads per million in leaves from 264.7 million reads; Extended data: Table S1 17 ) are 21 nt, while 93% of the equally abundant 22 nt species appear to have undergone non-templated 3' uridylation of the 21 nt species 18 (Extended data: Table S1 17 ). Analysis of independent bracteatus cultivar leaf sRNA libraries (NCBI SRA SRR5677552-7;113.2 million reads) 5 of unknown provenance relative to the subject CB5 genotype failed to identify any MIR828 reads.
Fortuitously, a unique aspect of miR828 is that despite its very low abundance, it has very high activity 7,19 that serves as diagnostic. miR828 guides ARGONAUTE slicing of target MYB mRNAs within the deeply conserved SANT domain region 8 by Watson-Crick complementarity, with consequent knock-on production of easily quantified DICER-mediated sense-and antisense 21 nt phased small-interfering RNAs (phasiRNAs) mapping downstream (3') on target MYB transcripts. The improved F153 reference assembly 1 contains two candidate miR828-targeted MYBs: Aco017254.1 (LG4), with two introns of 1113 and 1340 nts, and Aco020986.1 (LG14) without RNA-seq evidence of intron splicing, whereas the CB5 bracteatus genome only contains one Aco017254.1 homologous gene with RNA-seq splicing evidence for conserved introns of 1109 and 1332 nt (Extended data: Table S2 17 ). When phasiRNA expressions from leaf sRNA libraries 4,5 are respectively mapped to the F153 and CB5 candidate MYB target mRNAs, it is apparent that F153 MYB transcripts clearly undergo miR828-guided slicing, evidenced by D1(+) phased siRNA reads mapping to the 10 th nucleotide position of miR828 homology to the mRNA target, and unique sense and antisense secondary phasi-RNAs mapping precisely in multiples of 21 nt downstream from the detected slice sites (Figure 1; Extended data: Figure S2, Table  S3 17 ). The CB5 reference genome target MYB locus Aco017254.1 homolog (contig tig00012294, CABWKS010000088.1:25590-28868rc) encodes two missense codons at residues 41 (M➔R) and 200 (N➔K compared to F153), four silent codon substitutions, and RNA-seq analysis reveals mis-annotation of the CB5 genome which lacks two Gs (at contig residues 28780 and 27566) that exist and result in a CB5 open reading frame of the same size as Aco017254.1 in F153, including six instead of seven trinucleotide GGC glycine codon repeats templated in the CB5 genome at residue 204 (Extended data: Table S2 17 ).
In contrast to demonstrated post-transcriptional silencing activity of miR828 on target MYB abundance in F153 leaf samples (Figure 1), analysis of six bracteatus leaf sRNA libraries from an independent study 5 did not provide any evidence of target MYB Aco017254.1 mRNA slicing or target phasiRNA accumulation (Extended data: Table S3 17 ). Taken together, subject to the caveat that the provenance of the bracteatus cultivar used for the sRNA analysis 5 may be different than subject CB5 genotype, the data suggest there may be differences in expression and/or regulation of pri-MIR828, and/or target MYB Aco017254.1 between F153 yellow-fleshed versus CB5

Amendments from Version 1
Two citations (20, 21) were added as suggested by Reviewer 2 to document prior uses of sRNA libraries as confirmatory evidence of secondary sRNAs as products of miRNA targeting. A software analysis was added as suggested by Reviewer 2: "strucVis" for visualization of predicted RNA secondary structures with overlaid sRNA depths display of ShortStack output. Extended data: Figure S1 was revised to include the strucVis analysis and the legend modified as requested by Reviewer 2. The titles of Extended data: Tables S1 and S2 were changed for clarity as requested by Reviewer 1. The legend to Figure 1 was modified as requested by Reviewer 2 to describe the images as T plots of 5' ends of sRNA alignments, which I previously (citation 7) named as pseudo-degradome analyses and referenced in the software availability description. The legend of Figure 2 was modified to direct readers to Extended data Tables where underlying evidence is documented and file rows are referenced to substantiate claims, as requested by Reviewer 2. Minor changes to the text were made as requested by Reviewers to improve clarity and articulate comparisons made across Figure 2 panels for interpretations underlying claims. Reviewer 2's suggestion to elaborate on methods was addressed by a statement added at the end of the Software availability section: "The options parameters used for various algorithms are detailed in Extended data: Tables S1-S3." No changes were made to data in Figures or Extended data Tables in this revision.
Any further responses from the reviewers can be found at the end of the article REVISED red-fleshed genotypes. Figure 2 shows this is indeed the case in various tissues examined, with the evidence supporting higher pri-MIR828 expression in F153 ovules concordant with lower target MYB Aco017254.1 mRNA levels (Figure 2A), significant decreases over time from stage 1 early fruit development for target MYBs Aco017254.1 and Aco020986.1 in yellow-fleshed MD-2 cultivar ( Figure 2B), whereas in contrast there is a trend of lower expression of CB5 pri-MIR828 concordant with sustained higher abundance of Aco017254.1 target MYB ( Figure 2C stage 7 versus stage 1) than seen in MD-2 during the ripening stage of red-fleshed CB5 genotype (compare Figure 2C stage 7 showing high CB5 Aco017254.1 abundance Concatenated sRNA libraries were used as pseudo-degradome on grounds sliced mRNAs subject to production of amplified diced dsRNAs are manifest in sRNA libraries 7,20,21 . Slicing is at nt10 from 5' end of miR828 (arrow, inset; same target sequence for Aco020986.1 at mRNA nt 115). Y axis is numbers of reads in sRNA libraries mapped to cDNAs; red dot is the documented miR828 sliced sRNA species that sets the register for 3' phasiRNA production.  Table S1, rows 23-36) to target MYB Aco017254.1 (low; Extended data: Table S2, rows 406-470) anthocyanin effector expressions in F153 ovules. Error bars are s.e.m., n=3 biological replicates. B) Down regulation of miR828-targeted MYB anthocyanin effectors during MD2 yellow-fleshed fruit development (n=1 per stage; Extended data: Table S2, rows 134-405). Asterisk (*) denotes stage 1 significantly different across Aco017254.1 and Aco020986.1 than stages 2-8, p = 0.02 (Student's two-sided paired t-test, equal variance assumed). Stage 1 error bars show the 95% confidence interval for significance. C) Apparent maintenance of positive anthocyanin effector MYB Aco017254.1 expression at "ripe" two-month stage of CB5 red-fleshed fruit development, concordant with trend of lower pri-MIR828 expression (see Extended data: Table S1, rows 37-52; Table S2, rows 471-883). Error bars are s.e.m., n=5 biological replicates.
Plant development gives rise to an astounding complexity of shapes, colors, and functions that Darwin called 'an abominable mystery' in his efforts to integrate species complexity with the unifying theory of evolution. The observations reported here potentially offer insight into the conservation of a developmental control mechanism whereby miR828 in pineapple, like in dicots silences MYBs inferred to act as positive effectors of anthocyanin biosynthesis that could give rise to the redfleshed trait in the bracteatus variety. Consistent with this view is the finding from genome-wide functional phylogenomic approaches that ARGONAUTE1, RNA-DEPENDENT RNA POLYMERASE6, and mRNA export factor homolog SILENCING DEFECTIVE5, each required for trans-acting siRNA accumulation, played significant roles in the evolution of monocot metabolic and developmental traits 22,23 . Also of interest in this context is the claim of bracteatus genome authors 24 that convergent expansion in several Crassulacean Acid Metabolism (CAM) bromeliad lineages of XAP5 CIRCADIAN TIMEKEEPER/XCT, a nuclear-localized regulator of blue light responses 25 and sRNA production 26 , supports evolution of the myriad metabolic and physiological transitions from C3 to CAM photosynthesis by duplication/differentiation of a highly pleiotropic effector 24 .

Data availability
Underlying data RNA-seq data 1 , CB5 reference genome GCA_902506285.1, and F153 improved assembly genome GCA_902162155. This project contains the following extended data: • Figure S1. strucVis graphical linear output of Aco-MIR828 hairpin structure and sRNA abundance evidence from 23 MD-2 leaf sRNA libraries.
• Table S3: bowtie mapping of sRNAs to miR828 target MYB mRNAs Data are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication).

Software availability
Fastx-toolkit version 0.0.14 for trimming Illumina universal adapterAGATCGGAAGAGCACACGTCTGAACTCCAGTCA (fastx_clipper option -l 18; discards short reads) and fasta file manipulations (fastx_uncollapser to expand the pre-processed data 4 ; fastq_to_fasta for inputs to CleaveLand) is available at http://hannonlab.cshl.edu/fastx_toolkit/. FastQC version 0.11.5 for quality control of fastq raw sequence data is available at https://www.bioinformatics.babraham.ac.uk/projects/fastqc/. ShortStack 28 version 3.8.5 (options --mincov 15 --foldsize 340; Extended data: Figure S1, Table S1 17 ) for comprehensive annotation and quantitation of MIRNAs and sRNA cluster phasing is available at https://github.com/MikeAxtell/ShortStack. strucVis for visualization of predicted RNA secondary structures with overlaid sRNA depths display of ShortStack output (Extended data: Figure S1, page 2) is available at https://github. com/MikeAxtell/strucVis. CleaveLand 29 version 4.4 for plotting 5' ends of sRNA libraries as pseudo-degradome inputs (amplified sense and anti-sense siRNAs in phase and derived from miRNA-sliced mRNAs 7,20,21 ) to find sliced miRNA targets is available at https://github.com/MikeAxtell/CleaveLand4. PhaseTank 30 version 1.0 for quantifying and aligning phasiRNAs to miRNA target mRNAs (Extended data: Figure S2 17 ) is available at http://phasetank.sourceforge.net/; the linux command sed -i 's/-/_x/' was used to reconfigure fastx_collapser output from concatenated sRNA fasta library files to PhaseTank input style ">t1_xN" where N is number of collapsed reads). Magic-BLAST 31 version 1.5.0 for RNA-seq fastq read alignment (Extended data: Table S2 row 4 for options parameters 17 ) to reference genomes is available at https://ncbi.github.io/magicblast/. Blastn 32 version 2.6.0 is available at ftp://ftp.ncbi.nlm.nih. gov/blast/executables/blast+/LATEST/. Bowtie 33 version 1.1.2 for short read alignment (option -v 1 to allow one mismatch) is available at http://bowtie-bio.sourceforge.net/manual.shtml. RNAfold 34 web server was used for generating miR828 hairpin graphics (Extended data: Figure S1 17 ) at http://rna.tbi.univie.ac.at/cgi-bin/RNAWebSuite/RNAfold.cgi. The options parameters used for various algorithms are detailed in Extended data: Tables S1-S3. University, University Park, PA, USA This article focuses on identifying the presence of a miRNA loci (MIR828) within several cultivars of pineapple. The author leverages prior sequencing data from two studies to annotate a very lowly expressed locus which is apparently MIR828, marking the first example of this miRNA family in Poales. Despite the low expression, the author supports the presence of the miRNA through identification of targeted mRNAs, shown through detection of secondary siRNAs resulting from cleavage by a 22-nt miRNA. They also provide some evidence to speculate about the role miR828 may play in pineapple coloration relative to development.
My background is in small RNA annotation and interpretation of function in plants. I believe that this article does a good job identifying a miRNA that is challenging to find in pineapple. I also think that the author does a compelling job proving its actual function through secondary siRNAs. False negatives are always present in the computational annotation of genomes and this type of careful analysis is often needed to find exceptions that are not caught by programs using default settings.
I believe the conclusions of this article are true and the results to be adequate, pending some revision. My criticisms of this article mainly have to do with clarity. I think that several changes must be made to make the article more transparent in the analyses performed, the language used to describe it, and the presentation of data. I do not suggest any additional experiments to support their result, but I would like a response and modification to the manuscript based on the following notes: Table S1) are 21 nt, while 93% of the equally abundant 22 nt species appear to have undergone non-templated 3' uridylation of the 21 nt species (Extended data: Table S1)." ---This sentence does not clearly describe what "analysis" has taken place. I need to see the author describe what evidence supports the 3 conclusions listed here, both in writing and with a description of the analysis pipeline in the methods section (see below).

1.
It is preferred to include a genome-browser shot showing the read coverage for the pri-miRNA regions listed in supp figure 1, as it can be difficult to identify from a table what the locus actually looks like for the reader. It also appears that many of the rna-seq reads originate from the 'minus' strand. This is surprising, as the hairpin is never double stranded. Could you explain this discrepancy? 2.
I like that the ShortStack output is included in supp.

3.
Your RPMs for this miRNA locus are extremely low (0.3 RPM), more than 10 fold lower than ShortStacks default locus identification threshold (5 RPM). This is a real lack and makes it 4. challenging to identify this as a miRNA. Lacking a mir-star is also a problem, but might not be surprising for a low-expression locus. As this is a published miRNA you are identifying, I am inclined to agree that it is real. However, it is inappropriate to include an artificial sequence to make ShortStack call it as such, as is done in Supp. Figure 1. Please correct this and clearly identify that this locus lacks mir-star support, though is still likely a miRNA.
Consider looking into the package strucVis, an add-on for RNAfold which shows nucleotideresolution read depth on the folded diagram. I think this would greatly clarify this figure. 5.

"Analysis of independent bracteatus cultivar leaf sRNA libraries (NCBI SRA SRR5677552-7;
113.2 million reads) of unknown provenance relative to the subject CB5 genotype failed to identify any MIR828 reads." ---This is surprising! Did you also try analysis using the F153 reference, considering the provenance of this variety is unknown? I think this is important considering the considerable depth of these libraries, where we might expect around 30 reads coming from this locus. Despite the lack of a locus identified, it might be useful to look directly at the alignment through a browser, to see if any small RNAs are coming from this locus.
6. Table  S1)..." ---Please show the approximate read depth here also, as was shown with the Xiong et al libraries.

7.
Figure 1 is labeled as a degradome, when it is actually a plot of 5' ends of sRNA alignments, thought to be secondary siRNAs. While I agree that secondary siRNAs can be strong evidence of targeting. However, this is entirely distinct from a degradome, which are derived from isolated mRNA degradation products. Please remove references of this as a degradome and correct the language to clearly describe this. This also should contain a citation for one of the following papers: where this evidence was used for confirming targeting (Shahid et al. 2018 1 ), or evidence of secondary sRNAs as a product of miRNA targeting (Allen et al. 2005 2 ).
8. Figure 2 has several critical problems. In general, I think the author must provide more detail in the conclusions and evidence shown in each section of the figure. I have detailed some more specific points as follows: It is deceiving to use bar plots to represent such small numbers of data points. Bars that represent an N of ideally 5 or less are much more informative shown as points representing the actual data. To show error bars (I assume... it is not indicated if these are standard error or deviation) greatly misrepresents the variance of these data. I would like to see these figures redone showing points rather than bars, or at least showing the raw data in a supplementary table (see below).
1. Figure 2B is vague in its description of the test, and replicates involved. The legend describes an N=1 for all stages. If this is true, how can a statistical comparison be made? Stage 1 shows error bars, does this represent N>1? Additionally, precisely which samples are being grouped and compared here for the test? This must be made clearer.

2.
The interpretation of results in Figure 2C is unclear. There is little to no textual 3.

9.
description of what the author concludes from Figure 2 in general and it is mainly confined to one sentence where all three sub-figures are mentioned. This needs to be greatly expanded upon for me to understand what they perceive as the outcome of these analyses.
I recommend including *all* of the raw data shown in these charts in supplementary tables in addition to a more clear demonstration of the data-points mentioned above. This allows the reader to clearly identify what is being shown in each.

4.
A description of methods is lacking. Though the sequencing data is derived from prior publications, the analysis performed in this paper is new. You need to describe the logical steps taken for each analysis, including processing, tools used in a step, the actual commands used, and data that is output. This would go a long way to clarifying the actual analyses they performed. This needs to be shown for RNA-seq analyses, sRNA-seq analyses, and identification of pri-miRNAs. I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Author Response 27 Mar 2020
Christopher Rock, Texas Tech University, Lubbock, USA Author responses to verbatim copied comments from Reviewer2 below, in conjunction with revised article submission.
"Analysis of Chen et al's A. comosus MD-2 cultivar leaf small RNA (sRNA) libraries and stranded RNA-seq libraries from flowers and fruits of F153 and CB5 genotypes establish the existence of aco-miR828 (Extended data: Figure S1) and pri-MIR828 expressions, with the novel observation that pri-MIR828 is properly transcribed in leaves, flowers and fruits yet ~50% of mature miR828 species abundance (0.3 reads per million in leaves; Extended data: Table S1) are 21 nt, while 93% of the equally abundant 22 nt species appear to have undergone non-templated 3' uridylation of the 21 nt species (Extended data: Table S1)." ---This sentence does not clearly describe what "analysis" has taken place. I need to see the author describe what evidence supports the 3 conclusions listed here, both in writing and with a description of the analysis pipeline in the methods section (see below). Table S1; i) mapping of sRNAs to the candidate aco-MIR828 locus/primary-MIR828 transcript corresponding to ShortStack Cluster_43525 in the new improved F153 reference genome, and ii) phasing of secondary siRNAs mapped as clusters to target MYB mRNA/cDNAs (older F153 annotated genome) triggered by inferred aco-miR828programmed AGO slicing activity. Fig. S1 shows default ShortStack graphical aco-MIR828 hairpin secondary structure; the legend refers to Suppl. Table S1 for ShortStack tabular output details. Those tabular outputs on row 11 (primary-MIR828 Cluster_43525, length 333 nt) shows "Major RNA species"= UCUUGCUCAAAUGAGUAUUCCU, which is grounds for the claimed 3' non-templated uridylated species (see elaboration below in Conclusion3 response). Tabular and graphical ShortStack outputs in Suppl. Table S1 and Fig. S1, respectively, are intended to document parameters required, according to community standards, for MIRNA annotation (Axtell MJ and Meyers BC [2018] "Revisiting criteria for plant microRNA annotation in the era of Big Data" Plant Cell 30: 272-284). Suppl. Table S1 rows 13 and 14 document ShortStack parameters of phased secondary siRNA clusters derived from two MYB target mRNAs/cDNAs: row13 "MajorRNA species" is D2(+) 21mer in phase (phasing score=20.2) with miR828 slice site activity independently substantiated by CleaveLand degradome analysis detailed in Fig. 1 (left panel). Furthermore, Fig. S2 further substantiates independent algorithmic evidence by PhaseTank software for the siRNA phasing on both MYB mRNA/cDNA targets triggered by miR828 activity. Suppl. Table S1 Row 14 "MajorRNA species"= CUAGGAGUGAAAGAUUACGCC was subsequently found (and annotated as such in a revised Suppl. Table S1) to map exactly 12 registers downstream from the 10 th nt position of D2(-) phasiRNA shown in Fig. S2 for mRNA/cDNA Aco020986. Thus, this additional subsequent analysis supports that Aco020986 D2(-) phasiRNA may analogously program AGO for secondary slicing of Aco020986 mRNA/cDNA and help explain the low ShortStack phasing score of 2.9 for Aco020986. Another plausible reason for the low phasing score of Aco020986 versus Aco017254 is the ~50% fewer reads (drawn from the multi-mapper functionality of ShortStack, parameter "RPM") and higher reads complexity for Aco020986 per ShortStack output parameters. Notwithstanding, CleaveLand (Fig. 1 right panel) degradome and PhaseTank algorithmic analyses (Suppl. Fig. S2) independently show Aco020986 MYB mRNA/cDNA is a target of aco-miR828, substantiating not only aco-miR828 existence from expression data, but also its function.

Conclusion 2 "existence [and proper transcription] of pri-MIR828 expressions [by analysis] of stranded RNA-seq libraries from flowers, and fruits of F153 and CB5
genotypes": See Suppl. Table S1, rows 17-52 output parameters from MagicBLAST analysis. Row 22 is singular evidence of an RNA-seq read in genotype F153 leaf tissue (not formally claimed). Rows 23-36 are reads from F153 ovules and stamens. Rows 37-52 are reads from genotype CB5 fruit.
Conclusion 3 "~50% of mature miR828 species abundance are 21 nt, while 93% of the equally abundant 22 nt species appear to have undergone non-templated 3' uridylation of the 21 nt species": See above re: row 11 of Suppl. Table S1 that establishes the MajorRNA species expressed in the sRNA libraries is the hypothesized non-templated 22 nt aco-miR828. Note that ShortStack output reports five reads of 20 nt size, 29 reads of 21 nt size, and 35 reads of 22mer size. Thus at face value assuming all species of these sizes correspond to mature miR828s (see below), then 51% would be the miR828 22mer species (other ShortStack-reported 23 and 24 nt sizes are few and discounted, yet documented in Suppl. Table S1). Rows 54-136 of Suppl. Table S1 detail every sRNA read in the subject libraries sorted by position having at most one mismatch aligned by bowtie to the F153 primary-MIR828 locus corresponding to ShortStack full Cluster_43525, including 113 nt upstream and 97 nt downstream of the miR828 hairpin shown in Suppl. Fig. S1 as default ShortStack output. Each relevant read is annotated manually as 20mer mature miR828, 21mer mature miR828, 22mer perfect match, or 22mer non-perfect matches to mature miR828. There are four 20mer mature miR828, 26 21mer mature miR828, one perfect match mature miR828 22mer, one 22mer with a "C" mismatch at nt 22, and 28 22mers with "T" mismatch at nt 22 of miR828 that is the claimed non-templated 3' uridylated species. Thus (26 +4)/(26+4+30) = 50% 20mer plus 21mer mature miR828 (my claim of "~50% 21 mer"), and 93% of 22mer species are 3' uridylated (28/30). It takes >700 words here to pedantically articulate the grounds that prove the compound claims in that one sentence of the manuscript text.
It is preferred to include a genome-browser shot showing the read coverage for the pri-miRNA regions listed in supp figure 1, as it can be difficult to identify from a table what the locus actually looks like for the reader. It also appears that many of the rna-seq reads originate from the 'minus' strand. This is surprising, as the hairpin is never double stranded. Could you explain this discrepancy? RESPONSE: I have modified Suppl. Fig. S1, keeping the default ShortStack dataset graphical output (instead of the suggested browser screenshot) but adding rows for species to visualize all relevant data described above for bowtie output in Suppl. Table  S1, rows 57-136, where parameters of strand and importantly sRNA read location on the primary MIR828 transcript are detailed, along with manual annotations (including for nine anti-sense sRNA reads mapping only up-and downstream of the hairpin per se). Note also that formally speaking, the Reviewer's reference to the 'pri-miRNA regions' at issue is most analogous to the "pre-MIRNA" after 'stem-to-loop' biogenesis as described in Adams, L.
[2017] "Pri-miRNA processing: structure is key." Nat. Rev. Genet. 18: 145) which is the default output of ShortStack software showing only the minimal foldback encompassing expression reads for mature miR828 and miR828-star. Suppl. Fig. S1 was verbatim output from ShortStack software that automatically trims the foldback stem distal to the miRNA/miRNA-star duplex. In this matter the full length cluster #43525 corresponding to the 'primary-MIRNA' transcript is described in Suppl. Fig. S1 image header as "Cluster_43525 Original Location: CABGUK010000004.1:12043665-12043997 Displayed Location: CABGUK010000004.1:12043762-12043883 Strand: -". Therefore the graphical presentation in Suppl. Fig. S1 represents those pertinent sRNA species detailed in the Suppl. Table S1 list of reads corresponding to pri-MIRNA transcript positions 114 to 235. The bowtie-mapped reads described in Suppl. Table S1, rows 57-136 sorted by primary-MIR828 position are now fully accounted for in the revised Suppl. Fig. S1 which retains the graphical default ShortStack output style. I added to the original Suppl. Fig. S1 four additional rows for the other species that are germane to the issue (excluding only four 23mer reads that have non-templated "CU" residues at the 3' terminus of miR828, where the second "U" residue could be templated thus confounding description). Now both graphical (revised Suppl. Fig. S1) and tabular representations (Suppl. Table S1) quantify the principal species at issue: the 20 and 21 nt mature miR828 (~50%), the 22 nt mature miR828 with either a non-templated "T" as the last residue (lower case; ~50% of all reads), a non-templated "C", or a perfect match 22 nt template mature miR828, and a single phased read (in the terminal loop) upstream of the predicted miR828-star species (shown all lowercase).
Regarding the RNA-seq antisense issue, note that row 20 of Suppl. Table S1 has a disclaimer description which the Reviewer may have overlooked: "Read 2 is the confirmed sense mRNA strand for CB5 datasets, based on an abundant transcript (bromelain RD21A, Aco011478.1) described by Chen et al. authors, used as control. The F153 ovule and stamen libraries are unstranded, based on the same control gene assessment." It was deduced empirically from the RD21A reference gene characterization that the authors used un-stranded RNA-Seq kits early in the chronology of the project when many of the F153 ovule and stamen libraries were prepared and sequenced using different Illumina platforms. As evidence to support this interpretation, note there only exists a forward strand "read 1" for leaf, ovule, and stamen SRA datasets ERR3413090 and ERR3413005-8 (most are read lengths 1 x 100 bp, supporting their earlier chronology), despite being incorrectly annotated at NCBI SRA archive as "paired read" libraries. Specifically, rows 27, 28, and 36 are antisense strand reads challenged by the Reviewer and accounted for in the row 20 note as a computational artifact, thus counted as proper sense reads based on evidence from a control gene RD21A. For rows 38, 39, 46, 48, and 51, which are from CB5 fruit tissues (SRA datasets ERR3412993-3413002; read lengths 2 x 150 bp indicating newer Illumina chemistry and chronology) the libraries are verified empirically by expression evidence to be stranded paired read libraries. Note that in these antisense read cases, every read is from the antisense direction (read 1) of paired reads, as described in the row 20 disclaimer note.
I like that the ShortStack output is included in supp. table 1. However, it is not clear to me what each locus refers to. What is the reference genome for each of these? It appears that cluster_1381 and cluster_1104 are using the Chen et al. sequencing data on the older reference genome. Why have these been shown and can you explain the lack of this locus in the older genome? I'm not sure what these loci are referring to compared to the first locus shown (cluster_4352). RESPONSE: as described above for "Conclusion1" evidence, row 11 aco-MIR828 ShortStack output is based on the new improved F153 reference genome made available as BioProject accession code PRJEB33121. ShortStack output for Rows 13 and 14 clusters were drawn from the cDNA fasta file of the original F153 reference genome available for download in phytozome12 as described in Data availability. Thus the two ShortStack analyses are different and cluster numbers are arbitrary. The reason why the older cDNA reference set from phytozome12 was used is because the new improved F153 reference genome does not have any cDNA annotation reference set. See https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/902/162/155/GCA_902162155.1_F153/ .
Your RPMs for this miRNA locus are extremely low (0.3 RPM), more than 10 fold lower than ShortStacks default locus identification threshold (5 RPM). This is a real lack and makes it challenging to identify this as a miRNA. Lacking a mir-star is also a problem, but might not be surprising for a low-expression locus. As this is a published miRNA you are identifying, I am inclined to agree that it is real. However, it is inappropriate to include an artificial sequence to make ShortStack call it as such, as is done in Supp. Figure 1. Please correct this and clearly identify that this locus lacks mir-star support, though is still likely a miRNA. Fig. S1 legend to state "This locus formally lacks miR828-star support." Suppl. Fig. S1 legend clearly explained the miR828-star species was not sequenced from the dataset, but rather was seeded manually to drive ShortStack to formally call the MIRNA as "Yes" by default parameters, instead of "N15" in which case no graphical ShortStack output like Suppl. Fig. S1 is possible. It is argued that the thought experiment as described and presented is appropriate, if it can be justified. The Reviewer will recognize, as one skilled in the art of ShortStack, that without seeding a miR828-star sequence into the dataset, there would be no graphical ShortStack output at all, now revised (see above point) to meet the Reviewer's specification to show the bona fide small RNA reads topology and abundances. The justification for the miR828-star "thought experiment" was to ensure annotation took into account default folding and trimming parameters built into the software where outputs are styled to objectively meet community standards for evidence-based MIRNA annotation. The reason for the exercise was not to obfuscate the truth that the star species was not found (which is clearly disclosed and the Reviewer concedes is not fatal because this is deeply conserved plant MIRNA), but rather to address a potential criticism that the "bulged" distal loop tripartite secondary structure (Suppl. Fig. S1) is an atypical feature of MIRNA foldbacks that could be an important determinant for Dicer recognition as substrate. This concern is grounded in the possibility a plant-specific "terminal loop-to-base" non-canonical processing sequence operates on aco-MIR828, like known for ath-miR156a biogenesis. Candidate aco-miR828 has a loop-side bulge analogous to the three nt loop-side bulge in ath-miR156a which appears as a "kink" in the foldback and could be argued as potentially critical for duplex biogenesis (Kim et al. [2016] "Structural determinants of miR156a precursor processing in temperature-responsive flowering in Arabidopsis." J. Exp. Bot. 67: 4659). Being able to pass all algorithmic criteria as demonstrated by successful ShortStack annotation of candidate aco-MIR828 (conceding the miR828-star aspect) discounts theoretical concerns about a "bulged" distal loop tripartite secondary structure possibly antagonizing Dicer duplex biogenesis at the locus.

RESPONSE: I have revised Suppl.
Consider looking into the package strucVis, an add-on for RNAfold which shows nucleotideresolution read depth on the folded diagram. I think this would greatly clarify this figure.
RESPONSE: I have revised Suppl. Fig. S1 to add the suggested strucVis image of small RNA read depth coverage across the aco-primary-MIR828 locus encompassing the entire ShortStack reads Cluster_43525.
"Analysis of independent bracteatus cultivar leaf sRNA libraries (NCBI SRA SRR5677552-7; 113.2 million reads) of unknown provenance relative to the subject CB5 genotype failed to identify any MIR828 reads." ---This is surprising! Did you also try analysis using the F153 reference, considering the provenance of this variety is unknown? I think this is important considering the considerable depth of these libraries, where we might expect around 30 reads coming from this locus. Despite the lack of a locus identified, it might be useful to look directly at the alignment through a browser, to see if any small RNAs are coming from this locus. RESPONSE: Note that rows 18 and 19 of Suppl. Table S1 state: "subject database F153 pri-MIR828 genome coordinates: CABGUK010000004.1:12043665-12043997rc; subject database CB5 pri-MIR828 genome coordinates: CABWKS010000025.1:21332774-21333104rc." I have compared these two sequences by BLASTing the entire F153 aco-MIR828 locus and they are identical except for a single nucleotide polymorphism on the 3' leg of the foldback, far downstream of the miR828-star at position 195 of the alignment (mature miR828 and miR828-star underlined). Thus the F153 and CB5 loci are identical for the purposes of running ShortStack and mapping small RNAs from bracteatus genotype sources.
Please see here.
I have also subsequently run the strucVis software on the CB5 bam-indexed ShortStack output, as suggested to look directly at the alignment: there is no evidence of any reads mapping from the independent bracteatus cultivar leaf sRNA libraries to the CB5 MIR828, as claimed (data not shown). This is despite ShortStack calling 52 bona fide MIRNAs (47 21mers, five 22mers), compared with 29 MIRNAs (24 21mers, five 22mers) called with the F153 small RNA dataset.
"...mature miR828 species abundance (0.3 reads per million in leaves; Extended data: Table  S1)..." ---Please show the approximate read depth here also, as was shown with the Xiong et al libraries.
RESPONSE: The answer is 264.7 million reads, available in Suppl. Table S1, row 5; that information has been added accordingly to the text. Figure 1 is labeled as a degradome, when it is actually a plot of 5' ends of sRNA alignments, thought to be secondary siRNAs. While I agree that secondary siRNAs can be strong evidence of targeting. However, this is entirely distinct from a degradome, which are derived from isolated mRNA degradation products. Please remove references of this as a degradome and correct the language to clearly describe this. This also should contain a citation for one of the following papers: where this evidence was used for confirming targeting ( Figure 2 has several critical problems. In general, I think the author must provide more detail in the conclusions and evidence shown in each section of the figure. I have detailed some more specific points as follows: It is deceiving to use bar plots to represent such small numbers of data points. Bars that represent an N of ideally 5 or less are much more informative shown as points representing the actual data. To show error bars (I assume... it is not indicated if these are standard error or deviation) greatly misrepresents the variance of these data. I would like to see these figures redone showing points rather than bars, or at least showing the raw data in a supplementary table (see below).

RESPONSE:
A factor perhaps not considered by the Reviewer when asking for "whisker plots," i.e. drawn from separate data points: note that library size/depth determine normalized data points per library sample, summed within each library and reported as the mean of three biological replicate libraries. Thus only three such data points could possibly be shown for such a requested whisker plot, despite the underlying basis being many independent data points making up the normalized reads per library. Error bars were expressly described as standard error of three biological replicates in the legend of Fig. 2. The legend has been revised to direct readers to the raw data which was provided in Suppl. Table S1 rows 23-36 for MIR828 (panel A: 11 independent data points for ovules, 3 data points for stamens), and Suppl. Table S2, rows 406-470 for MYB mRNAs (panel A: 11 independent data points for Aco017254 ovules, 28 for stamens; 3 data points for Aco020986 ovules, 11 for stamens). It is rebutted that bars are thus not representing N < 5 data points in Fig. 2A as critiqued except for two: MIR828 stamens and Aco020986 ovules. It is conceded the error bars are not compelling statistically, but pointed out that no claims are made for significance while maintaining data integrity by reporting observed biological variance as found in the data. The verbatim claim made from Fig. 2A data states: "evidence supports higher pri-MIR828 expression in F153 ovules concordant with lower target MYB Aco017254.1 mRNA levels." This qualified claim is grounded from 11 independent ovule MIR828 and Aco017254 data points, respectively, averaged across three biological replicates which is derivative of the experimental design as found and interpreted conservatively. Figure 2B is vague in its description of the test, and replicates involved. The legend describes an N=1 for all stages. If this is true, how can a statistical comparison be made? Stage 1 shows error bars, does this represent N>1? Additionally, precisely which samples are being grouped and compared here for the test? This must be made clearer.

RESPONSE:
The Reviewer is appreciated for due diligence and critical insights. Here are the raw independent reads data summarized from Suppl. Table S2, rows 134-405, which the reader is directed to in the revised legend for completeness: Aco017254 stages 1, 2, 3, 4, 5, 7 and 8 have respectively 38, 37, 45,11, 5, 6, and 26 data points; Aco020986 has respectively 6, 7, 7, 0, 0, 1, and 3 unique independent data points (not cross-mapped to homologous Aco017254) for stages 1-8. The statistical comparison is made by taking independent abundances for Aco017254 and Aco020986 in stage 1 MD2 fruit as one condition matrix, and stages 2-8 as the second paired condition matrix across both genes to conduct a paired two-sided Student's t-test for significance; the result is p = 0.02. The legend has been revised to clarify as requested. Stage 1 "error" bars in question are actually calculated 95% confidence intervals for hypothetical significance if there were a second biological replicate (extant experimental dataset is n=1) by using an unpaired Student's t-test and introducing a second matrix datapoint for each gene of a magnitude sufficient to give p = 0.05. That magnitude, determined empirically, is the boundary marked by the low error bar as the 95% percentile confidence interval for statistical significance. The purpose of the analysis is to graphically show by an independent estimation, consistent with the paired t-test actual result, that stages 2-8 means are all well below the lower error bar confidence interval for significance calculated for stage 1. Figure 2C is unclear. There is little to no textual description of what the author concludes from Figure 2 in general and it is mainly confined to one sentence where all three sub-figures are mentioned. This needs to be greatly expanded upon for me to understand what they perceive as the outcome of these analyses. I recommend including *all* of the raw data shown in these charts in supplementary tables in addition to a more clear demonstration of the data-points mentioned above. This allows the reader to clearly identify what is being shown in each.

RESPONSE:
Again it is noted the issue of terse description of interpretations was driven by publisher proscription on word limits. Here are the raw independent reads data summarized from Suppl. Table S1 rows 37-52 for MIR828 and Suppl. Table S2, rows 471-883 for CB5 Aco017254, referenced in a revised Fig. 2 legend: 8 MIR828 independent data points each for 1 month-and 2 month-old CB5 fruits, and 171 data points for 1 month old-, and 222 data points for 2 month-old Aco017254 expression in CB5 fruits. Here is restated a revised but nonetheless succinct paraphrased claim from the text that intends to clarify how to compare evidences in Fig. 2C with Fig. 2B: "there may be differences in expression and/or regulation of pri-MIR828, and/or target MYB Aco017254.1 between F153 yellow-fleshed versus CB5 red-fleshed genotypes…. there is a trend of lower expression of CB5 pri-MIR828 concordant with sustained higher abundance of Aco017254.1 target MYB ( Figure 2C stage 7

abundance)."
A description of methods is lacking. Though the sequencing data is derived from prior publications, the analysis performed in this paper is new. You need to describe the logical steps taken for each analysis, including processing, tools used in a step, the actual commands used, and data that is output. This would go a long way to clarifying the actual analyses they performed. This needs to be shown for RNA-seq analyses, sRNA-seq analyses, and identification of pri-miRNAs.