Ribo-Seq and RNA-Seq of TMA46 (DFRP1) and GIR2 (DFRP2) knockout yeast strains

In eukaryotes, stalled and collided ribosomes are recognized by several conserved multicomponent systems, which either block protein synthesis in situ and resolve the collision locally, or trigger a general stress response. Yeast ribosome-binding GTPases RBG1 (DRG1 in mammals) and RBG2 (DRG2) form two distinct heterodimers with TMA46 (DFRP1) and GIR2 (DFRP2), respectively, both involved in mRNA translation. Accumulated evidence suggests that the dimers play partially redundant roles in elongation processivity and resolution of ribosome stalling and collision events, as well as in the regulation of GCN1-mediated signaling involved in ribosome-associated quality control (RQC). They also genetically interact with SLH1 (ASCC3) helicase, a key component of RQC trigger (RQT) complex disassembling collided ribosomes. Here, we present RNA-Seq and ribosome profiling (Ribo-Seq) data from S. cerevisiae strains with individual deletions of the TMA46 and GIR2 genes. Raw RNA-Seq and Ribo-Seq data as well as gene-level read counts are available in NCBI Gene Expression Omnibus (GEO) repository under GEO accession GSE185458 and GSE185286.


Introduction
Here, we present Ribo-Seq and RNA-Seq data for S. cerevisiae strains lacking translation-associated proteins Tma46 and Gir2, as well as for the wild type BY4742 parent strain. Tma46 and Gir2 are yeast orthologs of two mammalian DRG family regulatory proteins: DFRP1 and DFRP2, respectively. [1][2][3] Mammalian DFRP1 and DFRP2 are binding partners of two closely related proteins, developmentally regulated GTPases DRG1 and DRG2, 1,4 while yeast Tma46 and Gir2 dimerize with their orthologs, the ribosome-binding GTPases Rbg1 and Rbg2, respectively. 2, 3,5 Thus, in both yeast and mammals, two distinct heterodimers exist, RBG1•TMA46 (DRG1•DFRP1) and RBG2•GIR2 (DRG1•DFRP1), although under some conditions RBG1 may interact with GIR2 as well. 5 The RBG1 (DRG1) containing complexes associate with mono-and polysomes. 2,3,5-7 Using 5P-Seq, it was recently shown that RBG1•TMA46 promotes efficient translation in yeast, alleviating ribosome pausing at Arg/Lys-rich regions. 7 In contrast, the RBG2 (DRG2) containing dimers are not bound to ribosomes under normal conditions. 3,6 However, they are also clearly related to translation, as GIR2 interacts with the ribosome-bound GCN1, and RBG2•GIR2 is responsible for efficient cell growth under amino acid starvation. 5,8 GCN1 is a large protein necessary for activation of GCN2, the evolutionary conserved eIF2 kinase. 9 Recently, the RBG2•GIR2 complex was detected on the leading stalled ribosome on the Cryo-EM reconstruction of a GCN1-disome complex. 10 These results suggest that GIR2 is a physical linker between RBG2 and GCN1 and that this interaction could prevent excessive activation of the GCN2 pathway upon incidental ribosome stalling.
Interestingly, neither the yeast rbg1Δ or rbg2Δ knockout strains nor the double rbg1Δrbg2Δ mutants display any defects in translation or cell growth. 3 However, a genetic screen for triple synthetic interaction demonstrates that RBGs have redundant function with SLH1, 3 an RNA helicase involved in ribosome-associated quality control (RQC). SLH1 is an ortholog of mammalian ATCC3, a component of the ASC-1 complex that disassembles collided ribosomes (see 11 and references therein).
Taken together, the above data suggest that the RBG1•TMA46 (DRG1•DFRP1) and RBG2•GIR2 (DRG1•DFRP1) complexes play a role in elongation processivity and resolution of ribosome stalling and collision events, as well as in control of GCN1-mediated signaling accompanying these processes. However, many questions remain unanswered. In particular, the individual roles of the two distinct complexes are still unclear. To improve our understanding of their functions, we systematically characterized translational defects in S. cerevisiae strains with individual deletions of the TMA46 or GIR2 genes using ribosome profiling. 12 We present RNA-Seq and Ribo-Seq data for the yeast tma46Δ and gir2Δ knockout strains. For comparison, we also provide corresponding data for three strains bearing deletions of other translation-related genes: STM1, PUB1 and YGR054W (encoding translation factor eIF2A), as well as for the wild type BY4742 parent strain. Raw sequencing data are available online in the NCBI Gene Expression Omnibus (GEO accession: GSE185458 and GSE185286).
Here we focus on wt, tma46Δ, and gir2Δ strains. The data from the other strains were used to correct for batch effects within each series. The libraries were sequenced, resulting in 31 RNA-Seq and 28 Ribo-Seq data sets, including 18 RNA-Seq and 16 Ribo-Seq data sets for wt, gir2Δ, and tma46Δ strains. Supplementary Table 1 in the Extended data 14 summarizes information about the sequencing experiments.
The experimental procedure followed the ribosome profiling protocol described in. 15 Briefly, yeast cells were grown to exponential phase (OD = 0.5-0.6) in yeast extract peptone dextrose (YPD) media (1% yeast extract, 2% peptone, 2% glucose). Cells were harvested by filtration, scraped into liquid nitrogen, and ground using a liquid nitrogen-cooled mortar and pestle with drop-by-drop addition of polysome lysis buffer (20 mM Tris-HCl pH 8.0, 140 mM KCl, 1.5 mM MgCl 2 , 0.1 mg/ml cycloheximide, 1% Triton). Cell lysates were clarified by two sequential centrifugation steps -3000g, 5 minutes, 4°С, and 20000g, 10 minutes, 4°С. The cell lysate was partially used for mRNA isolation using oligo (dT) beads. Another portion was treated with ribonuclease I for polysome disassembly and applied to a linear 10-50% sucrose gradient in fractionation buffer (20 mM Tris pH 8.0, 140 mM KCl, 15 mM MgCl 2 , 1 mM DTT, 0.1 mg/ml cycloheximide, 1% Triton) and separated on a SW-41 rotor (Beckman) at 35000 rpm, 3 hours, 4°С. Subsequently, ribosome-bound RNA fragments were collected from the monosome fraction. Ribosome-bound RNA was isolated using acidic-phenol extraction. Further Ribo-Seq and RNA-Seq library preparations were performed as described previously. 12 Sequencing data processing and analysis Reads were trimmed using cutadapt v. 2.10 16 with the following parameters for RNA-Seq (-a AGATCGGAAGAGCA CACGTCTGAACTCCAGTCAC -minimum-length 20 -q 20) and Ribo-Seq samples (-a CTGTAGGCACCATCAA TAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -trimmed-only -q 20). Additionally, for Ribo-Seq, the reads were deduplicated with seqkit rmdup v. 0.10.1, 17 and unique barcodes were then removed with cutadapt v. 2.10 (-q 20 -minimum-length 20 -u -4). Afterwards, reads were aligned against eukaryotic rRNA sequence set obtained from silva-euk 18 and rfam 19 databases using bowtie2 v. 1.2.3. 20 Only unmapped non-rRNA reads were used in the further analysis. Read mapping and counting against the Saccharomyces_cerevisiae.R64-1-1.95 (Ensembl) 21 genome assembly was performed with STAR v. 2.7.9a. 22 We estimated the position of the P-site for each dataset from the 5 0 end of the reads on the basis of the length of each footprint using plastid v0.5.1. 23 Fraction of reads in each phase and read length distribution were also obtained with plastid, see Figure S1 in the Extended data. 24 The results show that almost 90% 28nt reads are in 0 phase through the annotated coding sequences (CDSs). Then we produced BedGraph profiles from SAM data with samtools v. 1.10 25 and bedtools v2.27.1. 26 Coverage profiles were normalized using normalization factor and library size estimates from differential expression analysis (see below) separately for each bedGraph profile. Finally, we visualized coverage tracks in the modified genomic loci using svist4get. 27 Figure S2 in the Extended data 24 shows that the read counts originating from the mRNA encoded by the knockout gene in the corresponding strain are negligible. The coverage of the neighboring genes remains unaltered, i.e. there are no indications of the so-called neighboring gene effect (NGE). 28 Differential expression and Gene Ontology (GO) enrichment analysis Statistical analyses were performed in R v. 4.1.2 using edgeR Bioconductor package. 29 As mentioned above, the data were produced in two independent series which were analyzed separately. Genes not reaching 10 read count per million (CPM) in at least 4 RNA-Seq and 4 Ribo-Seq libraries were excluded from the analysis. Then, we performed the batch correction using ComBat-seq R package. 30 Principal component analysis (PCA) plots of the raw and batch corrected expression profiles are shown in Figure S3 in the Extended data. 24 A generalized linear model (glmQLFit, glmQLFTest of the edgeR package) was used to detect differentially expressed genes (for RNA-Seq, Ribo-Seq, and ribosome occupancy (RO) defined as the Ribo-Seq coverage of a CDS normalized to its RNA-Seq coverage) with the strain as a categorical variable. The false discovery rate (FDR < 0.05) was used for identification of differential expressed genes. We also performed Gene Ontology (GO) enrichment analysis for upregulated and downregulated genes with yeastmine. 31 The results are shown in Figure 1. The work by Artyom Egorov and others touches upon proteins implicated in the processivity of translation of mRNA into polypeptides by eukaryotic ribosomes, specifically in resolving the stalled (or 'collided') translational intermediates. Research in this area is accelerating with many new studies uncovering unforeseen depths of the cells' mechanisms of translation fidelity control. Not too many datasets exist in this space to date and few are multi-target studies that provide a package of data obtained in a highly similar fashion for several translation-related genes, which is the rationale that justifies well the generated datasets.

Data availability
In a series of yeast knock-out strains, the authors accurately employ common approaches of RNA sequencing and RNA-sequencing-based ribosome (translation) footprint profiling, to provide highthroughput short-read sequencing data resulting from the individual gene knock-out effects. The data are provided for the wild-type yeast of BY4742 strain used as a control, the GIR2 and TMA46 knock-out mutant strains which are in the focus of the dataset, and additionally the YGR054W (eIF2A), STM1, and PUB1 knock-out mutant strains, either commercially purchased or generated in the work.
The methods are written succinctly but capture most of the relevant detail where it matters. Perhaps, expanding on the particulars of the method used to isolate mRNA with oligo(dT) beads could add to the better understanding of the RNA-seq data (including any information about the other treatments and purifications used, such as RNA fragmentation and size cut-offs). Regarding the ribosomal material isolation, although not strictly necessary, it would add to the value of the datasets by outlining their applicability if the authors could provide more detail on the intensity of the digestion with RNase I to generate the footprints, the homogeneity of the resultant sedimentation profiles, and the boundaries used to define the monosomal fraction. For the lysate clarification, the second spin uses very high acceleration; comments about whether the spin does or does not result in any loss of the heavy polysomes (e.g., if a suitably long particle sedimentation path was employed) could be beneficial. Although the authors are referring to the prior works, to bolster the applicability of the datasets and make their uptake convenient, any information about the additional size selection of the ribosome-protected fragments used (biochemical and/or bioinformatic) would be very helpful.
The data appears as accurate. The authors provide some differential abundance and GO term enrichment information for the more and less abundant gene categories within the pairwise comparisons, to characterise the datasets. This is very useful information; more information regarding the general properties of the datasets (such as total read/mappable/unique read number tables or histograms, gene coverage saturation plots etc.) could be further attractive in the main figure. The replicated datasets are validated through principal component analysis plots in Supplementary Figure 3, which show generally good groupings between the replicated sample types. The validity of the ribo-seq data is confirmed by Supplementary Figure 1 through demonstrating triplet periodicity of the dominant-length footprint fraction usually attributed to the cycloheximide-stabilised ribosomal protection extent (28 nt; although this is shown only for the wild-type, GIR2 and TMA46 knock-out strains). The datasets are most definitely usable, presented accessibly including uploads to the NCBI Gene Expression Omnibus, and are of a high utility.
Altogether, the authors provide an excellent resource to the field, which perfectly fits the scope of F1000Research Data Note publication, and will likely be cited.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound? Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format? Yes In this manuscript, Egorov and co-authors presented transcriptome and ribosome profiling data for Saccharomyces cerevisiae strains with individual deletions of the TMA46 and GIR2 genes. The interest in the protein products encoded by these genes has increased substantially in recent years, as these protein products have been shown to be involved in translation elongation and ribosome stalling. However, to date, there were limited "omics" data available from the respective knockout strains that could help shed additional light on the function of these genes. Particularly, ribosomal profiling of the knockout strains has not been performed. The present study fills in this gap and presents comparative ribosome profiling data obtained using the wild-type and knockout yeast strains. The ribosome profiling was followed by the standard quality checks and some basic downstream analysis, i.e. analysis of the differentially expressed genes and Gene Ontology (GO) enrichment analysis. Overall, these data will be useful for further research on TMA46 and GIR2 genes' functions.
The sample preparation was performed using the standard protocols. The raw and processed data were deposited in the relevant open access repositories such as the Gene Expression Omnibus (GEO). Data processing has been described with reasonable detail (including the description of the command-line tools options and adapter sequences). The reliability of Ribo-Seq data has been demonstrated by read length distribution and the corresponding fraction of reads in each frame (Supplementary figure 1). All in all, the data seem to be technically sound and accurately described, but I would suggest adding illustrative plots showing triplet periodicity of the ribosome footprints.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound? Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Translational control of gene expression I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias • You can publish traditional articles, null/negative results, case reports, data notes and more • The peer review process is transparent and collaborative • Your article is indexed in PubMed after passing peer review • Dedicated customer support at every stage • For pre-submission enquiries, contact research@f1000.com