RNA and Proteins: Mutual Respect

Proteins and RNA are often found in ribonucleoprotein particles (RNPs), where they function in cellular processes to synthesize proteins (the ribosome), chemically modify RNAs (small nucleolar RNPs), splice pre-mRNAs (the spliceosome), and, on a larger scale, sequester RNAs, degrade them, or process them (P bodies, Cajal bodies, and nucleoli). Each RNA–protein interaction is a story in itself, as both molecules can change conformation, compete for binding sites, and regulate cellular functions. Recent studies of Xist long non-coding RNP, the U4/5/6 tri-small nuclear RNP complex, and an activated state of a spliceosome reveal new features of RNA interactions with proteins, and, although their stories are incomplete, they are already fascinating.


Introduction
RNA molecules in the cell are rarely naked. Rather, proteins are bound to them in some arrangement consistent with their regulation, protection from nucleases, transport, or formation of ribonucleoprotein particles (RNPs). A 2014 compendium of RNA-binding proteins in humans 1 concluded that 7.5% of 20,500 known proteincoding genes are found in RNPs or bound to mRNAs, where they regulate RNA metabolism. This is likely to be an underestimate, since their structural heterogeneity makes them difficult to identify de novo.
The recent discovery of a plethora of non-coding RNAs 2 in cells has invigorated investigation of proteins that bind to RNA. New methods of probing the proteins in a transcriptome have allowed simultaneous identification of a protein and its RNAbinding site. Typically, these are crosslinking-immunoprecipitation (CLIP) experiments [3][4][5][6][7][8][9] . Intact cells can be irradiated with ultraviolet (UV) light or treated with formaldehyde to crosslink proteins to RNA, then the complexes are purified from the milieu by immunoprecipitation. To identify proteins bound to mRNAs, cellular UV RNA-protein crosslinking is followed by isolation of all poly(A)-RNA 7 . Alternatively, proteins bound to a specific RNA could be recovered by annealing biotin-oligonucleotides complementary to the RNA and selective purification by streptavidin 9 . Proteins bound to RNAs could then be identified by mass spectrometry. Several groups applied this method to identify mRNA-binding proteins in human cell lines, mouse embryonic stem cells (ESCs), and Saccharomyces cerevisiae yeast cells (reviewed in Gerstberger et al. 1 ).
Assuming that there are indeed more than 1,500 RNA-binding proteins in human cells, books will be written about them and their roles in RNA biology. Here, I focus on recent advances that reveal the variety and mystery of RNPs.

Xist, the RNA that inactivates an X chromosome
Xist is a long non-coding RNA (lncRNA) that is responsible for transcriptional silencing of one of two X chromosomes in female cells [10][11][12][13] . There are approximately 200 Xist molecules bound to a single X chromosome, and each 18 kb of Xist is bound by proteins ( Figure 1). Proteins could participate in any aspect of its biology: Xist has to associate with the X chromosome, then spread along it, and finally inhibit RNA polymerase II (Pol II) transcription. After more than twenty years of efforts to identify those proteins, the power of mass spectroscopy has been applied to proteins crosslinked in cellulo to Xist.
Two research groups have recently published compendia of Xist-bound proteins. Each group first crosslinked RNA to protein in cellulo, selected Xist through oligonucleotide-directed annealing, then used quantitative mass spectrometry to identify bound proteins. An overall comparison of their results shows great similarity but also some curious and intriguing differences. Table 1 and Table 2 list the most abundant proteins recovered from each study.
The groups of Heard and Chang 14 identified 81 proteins in toto bound to Xist. Using formaldehyde, they crosslinked proteins to Xist in three different mouse cell types: a male ESC line containing Approximately 200 Xist molecules bind to an X chromosome, spread along it, and inhibit RNA polymerase II from transcribing the DNA. Xist is bound by many proteins at unknown sites and with unknown stoichiometry, which subsequently interact with each other through disordered regions or structured domains. RNA is shown as a yellow/orange strand and protein linkers as blue strands. RRM, RNA recognition motif. In contrast, a group of investigators headed by Guttman 23 took a different approach to finding Xist proteins during transcriptional silencing. After Xist induction in mouse ESCs, cells were UV-crosslinked, Xist RNP was recovered with long antisense oligonucleotides, and Xist proteins were identified by mass spectrometry. Two batches of mouse ESCs were cultured, one in 15 N-and one in 14 N-media to allow quantification by mass spectrometry (SILAC). Among their ten most abundant proteins, they found SHARP (SPEN) and RMD15, two proteins related in their architecture (they are SPEN family proteins). They also recovered six hnRNP proteins ( The identification of LBR bound to Xist explains localization of the Xist-X chromosome to the nuclear lamina 12 . Transmembrane helices anchor LBR to the lamina, while its tail contacts Xist. Positioning of Xist-X on the lamina changes the structure of the DNA and facilitates protein-mediated spreading of the Xist molecules along the length of the chromatin.
Rather than discovering unknown proteins, these investigations have re-discovered known proteins. They present a new challenge: to understand why they are particularly useful in the Xist context and how their use, and corresponding abundance, is modulated according to developmental stage or cell lineage. The general challenge is not only to understand how proteins use their RNAbinding domains and intervening sequences and disordered tails to control formation of RNPs but must also account for their temporal exchange.

RNA recognition motifs
A striking feature of proteins bound to Xist is the recurring use of tandem RRM domains. There are certainly advantages to this scheme, since affinity and specificity can be modulated by increasing the number of contacts between RNA and protein. However, neither Xist-binding sites for its associated proteins nor their binding stoichiometry are known. These biochemical characterizations are important to understand how they select their target sites on the RNA, how they bind to Xist in the milieu of other RNAs in the cell, and how they hang onto the RNA while they also bind to other cellular compartments or recruit other proteins.
RRMs 1 are the most common structural motif used in eukaryotes to bind RNA ( Figure 2) and are estimated to be found in 225 human genes. When RRMs are present in multiples, deciphering the contributions of each RRM to the whole can be quite difficult 28-31 . A recent biophysical study of two tandem RRMs revealed how they partition function.
U2 auxiliary factory (U2AF) is a heterodimer of U2AF65 and U2AF35 32,33 , which in pre-mRNA splicing aids in the recognition of a 3′ splice site 34-38 . U2AF65 has two RRMs (RRM1 and RRM2) that bind polypyrimidine tracts, but U2AF35 has a single UHM, a "U2AF homology motif", that is structurally homologous to an RRM 39,40 . RRM1 and RRM2 are tethered by a short linker (~20 amino acids) that allows them to undergo relative motion and orientation 36 . Since they bind to polypyrimidine tracts of variable length and sequence, they must be able to expand or contract to span the site 41 . Regulation by intermolecular and intramolecular interactions adds another level of complexity to RNA-binding proteins.

The spliceosome and its small nuclear ribonucleoprotein particles
It is estimated that 94% of all human genes contain introns 53-55 , thereby providing protein isoform diversity. The process of removing introns and joining exons is carried out by the spliceosome, a multi-component and dynamic assembly of RNPs 56 . A great challenge in the field of pre-mRNA splicing has been to understand how the spliceosome is physically able to carry out the concerted transesterification reactions of the splicing chemistry to yield mRNAs.
The spliceosome consists of five small nuclear RNPs (snRNPs) that dynamically associate with each other and with pre-mRNA. The major spliceosome uses U1, U2, U4, U5, and U6 snRNPs in the process of splicing 57 . Each snRNP contains a single RNA (snRNA) and multiple proteins, but while U1 and U2 snRNPs are independent, U4 and U6 form a di-snRNP that goes on to become a U4/U5/ U6 tri-snRNP 58 . The tri-snRNP is recruited to a bona-fide intron and is then remodeled, losing U4 snRNP and leaving U5 and U6 snRNPs to form the active spliceosome.
The goal of snRNP rearrangement is to allow and facilitate snRNA conformational rearrangements in the spliceosome to produce the active site for catalysis 59-61 . Rearrangements of pre-mRNA and snRNAs to prepare and position them for catalysis are mainly accomplished by protein helicases 62 . There are eight such type SF2 helicases that associate with the spliceosome along the reaction pathway 63,64 . These ATP-dependent RNA helicases are not sequence specific; they can unwind any RNA duplex. Rather, their specific targets appear to be defined by where and when they associate with the spliceosome. The Brr2 helicase is particularly critical in the transformation of pre-spliceosome intermediates 64-67 . Brr2 is unusual: it has two helicase domains (only one is active) and a long (450-amino-acid) N-terminal domain 64,65,68,69 .

Brr2, a unique RNA helicase
Brr2 enters the nucleus independently and associates with the U5 snRNP. U5 snRNP then joins the U4/U6 di-snRNP to become the U4/U5/U6 tri-snRNP 68 . The tri-snRNP is recruited by U1 and U2 snRNPs to form a pre-spliceosome.
To form the active spliceosome, two snRNPs must be displaced. U1 snRNP is released from the 5′ splice site, and U4 snRNP is removed from the tri-snRNP. It is the latter remodeling that requires Brr2, as U4 and U6 snRNAs are joined by 22 perfect base pairs and Brr2 is the helicase that separates them. Only when U6 snRNA is free of U4 snRNA can it rearrange to base pair with U2 snRNA and pre-mRNA and so form the catalytic center of the spliceosome. Clearly, Brr2 activity must be regulated such that it is inactive in the tri-snRNP but active in the pre-spliceosome. How is it regulated?
Several recent studies have delved into the details of Brr2 regulation. In a series of papers from the Wahl lab [70][71][72][73][74] , Brr2 structure and function were addressed by crystallography and biochemistry. The goal of Brr2 in the tri-snRNP is to maintain stasis. As biochemistry experiments of Brr2 show 64 , there is a plug domain at the N-terminus of Brr2's long N-terminal region (NTR). This plug folds back over the entrance of the helicase to block access of the U4/U6 snRNA duplex to the active site of Brr2. This is a unique intramolecular regulatory device, and more experiments are required to understand how it is directed to this position (and how it is displaced).
The tri-snRNP is an intermediate in the pathway to spliceosome formation. Years of enormous efforts to map intermediates 42,63,[75][76][77] have now been coupled with technological advances in cryoelectron microscopy (cryo-EM) to visualize select transitional complexes [70][71][72]78,79 . Those efforts have produced a cryo-EM structure of human tri-snRNP that captures Brr2 in its plugged conformation 72 (PDB ID 3jcr). This state of the tri-snRNP, illustrated in Figure 3, might represent its structure as an autonomous particle before it joins the pre-spliceosome, where U4 and U6 snRNAs are still base-paired to each other. If so, then proteins and RNAs in the tri-snRNP must rearrange to present U4 and/or U6 tails to the helicase active site.
In the tri-snRNP, Brr2 sits on the Jab1 domain of Prp8, but its orientation and contacts change during activation of the particle. In contrast to the structure of the human tri-snRNP, in a structure of yeast tri-snRNP, a single-stranded region of U4 snRNA occupies the RNA-binding tunnel of Brr2 73,80,81 (illustrated in Figure 3). Is Brr2 now poised to completely separate U4 snRNA from U6 snRNA? Does this separation occur before the tri-snRNP is recruited to the pre-spliceosome, or is this a paused state that requires further activation?
There is another competitive inhibitor of Brr2. Prp8's Jab1 domain has a C-terminal disordered tail that sneaks into the RNA tunnel of Brr2 to compete with U4 82 . The intramolecular plug interaction and Prp8 Jab1 cooperate to inhibit unwinding. Removing the Jab1 tail activates Brr2 helicase activity; Brr2 without its intramolecular plug also has enhanced activity 75 . Do both inhibitors operate in the isolated tri-snRNP?
Brr2 remains in the spliceosome after U4 snRNP has been expelled from the spliceosome. It is seen in a structure of yeast-activated spliceosome, which is defined by the loss of U1 and U4 snRNP and rearrangements of the remaining snRNAs to interact with each other and pre-mRNA. A cryo-EM structure of activated yeast spliceosomes (B act ) shows Brr2 perched on Prp8's Jab1 domain 79 , with its helicase activity blocked by both inhibitor interactions (PDB ID 5lqw). In an illustration from this structure, U2, U5, and U6 snRNAs are remote from Brr2 (Figure 4). Although not clear from the perspective of Figure 4, Prp8 is entwined with other proteins and the snRNAs in this complex, even as it binds Brr2.  Brr2 has separated U4 and U6 small nuclear RNAs (snRNAs), and U4 small nuclear ribonucleoprotein particle (snRNP) has been expelled from the spliceosome. Brr2 is bound to the Jab1 domain of Prp8. All 27 proteins are shown in surface representation; most are colored white. Visualized with visual molecular dynamics (VMD).
As the spliceosome progresses through its cycle, there are many short RNA duplexes that need to be unwound. The other seven SF2 RNA helicases are recruited to the spliceosome when they are needed, and then they dissociate. Brr2 remains with the spliceosome until it has completed a splicing cycle, but there are no data suggesting that it is active at any time other than in the conversion from pre-spliceosome to B act . If it is not required for its helicase activity, perhaps its long NTR contributes something to splicing. Brr2 is reported to contribute to catalysis 74,83 , to stabilize U5 and U6 in the spliceosome 68 , and to assist in the final disruption of the spliceosome and release of ligated exons 84 . If these states of the spliceosome could be trapped for structural studies, Brr2 might be captured in action.
The spliceosome is composed of hundreds of proteins 56 , many of which simply bind RNA, but others actively remodel it. In the past year, spliceosome structures have revealed connections between RNA and proteins that explain previous observations but also raise new questions. This year, structures of the spliceosome C/C* complex show another helicase, prp16, at work on remodelling 85-87 . Slowly, this RNA enzyme is giving up its secrets.

Conclusions
There is a need to not only understand specific RNPs but also define general rules of engagement, since RNA-protein interactions dominate RNA biology. Indeed, the most mysterious are the membrane-less organelles that contain RNAs and proteins 88,89 . These conglomerates of RNAs bound by RNA-binding proteins are variously thought to be centers of RNA processing, degradation, transcription, and exchange: P bodies and stress granules in the cytoplasm and nucleoli, Cajal bodies, speckles, and PML bodies in the nucleus. A current model is that disordered domains of the proteins form a fluid matrix that allows a flux of molecules through these liquid droplets 90,91 . It is a sure bet that these droplets will be objects of intense scrutiny for years to come.

Competing interests
The author declares that she has no competing interests.

Grant information
The author(s) declared that no grants were involved in supporting this work.