Alternative splice variants of rhomboid proteins: In silico analysis of database entries for select model organisms and validation of functional potential [version 1; peer review: 1 approved, 1 approved with reservations, 1 not approved]

Any reports and responses or comments on the article can be found at the end of the article.


Introduction
Rhomboid proteins are found widely in all types of organisms. In higher-order organisms, rhomboid proteins are often encoded by a large group of genes 1 , for example, upwards of twenty-two database entries for Arabidopsis and thirteen for humans (assessed as of May 2017). Rhomboids are also part of an even larger group of proteins involved in regulated intramembrane proteolysis 2-31 . So far, rhomboid genes can be divided into two subgroups encoding proteolytically active secretases-and Presenilin Associated Rhomboid Like (PARL)type rhomboid forms, and two subgroups of catalytically inactive forms ("non-proteases") such as iRhoms, Derlins, and other distantly related forms 1,[26][27][28][29] .
The 'active' category of rhomboid proteases are often found to occupy regulatory roles of different cellular activities, typically by interacting with and cleaving specific membrane-residing substrates. Examples being the epidermal growth factor signaling pathway in fruit flies 32 , the quorum-sensing mechanism of the bacterium Providencia stuartii 33-34 , yeast mitochondrial membrane remodelling 35 , and the health status of mitochondria in human cell lines 36 .
The 'inactive' subcategories of rhomboid proteins are believed to lack the needed catalytic residues used by the active rhomboid proteases 1 . Despite the absence of the catalytic residues, some of the inactive rhomboid proteins are found to be functionally significant without being active proteases [32][33][34][35]37 . In some of the reported cases, interaction alone with an inactive rhomboid, without proteolysis, is sufficient to cause effects, such as growth signaling in cancer cells (human iRhom1), dislocation of proteins in the endoplasmic reticulum protein (ER) (mammalian Derlins), ER protein quality control (Drosophila iRhom Rhomboid-5), development, and organelle biogenesis (Arabidopsis At1g74130) 26-29,37-43 . Even the active human rhomboid protease, RHBDL4, a promoter of ER-associated degradation of membrane proteins, physically interacts with ubiquitin in order to proceed with its protease activities 44 .
In addition to the subcategories of active and inactive rhomboid proteases, alternative splicing appears to generate even more variants with modified functionalities 1 . For example, one case was confirmed for the human RHBDD2 gene 45 . Levels of the two alternatively spliced RHBDD2 mRNAs were elevated in breast cancer cell lines 41 , suggesting a link between cancer cell activity, and the presence of splice variants. In Arabidopsis, the active plastid rhomboid At1g25290 was confirmed to exist as two functionally significant splice variants that differ by the presence of a potential cyclin-binding motif, a motif known to be involved in cell cycling 46 . One of the inactive plastid rhomboids predicted for Arabidopsis, At1g74130 1 , also exists as three splice variants, with distinct functionalities and different levels of interactions with the Tic40 substrate 47 . Functionality of the three At1g74130 splice variant proteins was apparent upon testing at the whole cell level in bacteria and yeast. Despite being plant-derived, the At1g74130 splice variants exhibited physiological interactions with the mitochondrial rhomboid protease Rbd1 in yeast cells, and modulated the cleavage ratio of the resident mitochondrial protein Mgm1, a ratio that governs mitochondria remodeling and respiratory status 47 .
To date, the phenomenon of diversifying rhomboid protein functionality through alternative splicing has been documented for two Arabidopsis plastid rhomboids. It has not yet been assessed as to whether this phenomenon is limited, or represents a mechanism for expanding the number of functional rhomboid forms in a wide range of rhomboid systems and organisms. Therefore, using the findings reported for At1g25290 and At1g74130 as guidance, we carried out a periodic analysis of genetic databases of model experimental species to determine the possible extent of alternative splicing in rhomboid genes and, based on coding RNA sequence entries, how splice variants may be reflective of a way to diversify functionality. A limited selection of alternatively spliced variants were then analyzed to document evidence of potential activity using different types of assays.

Periodic analysis of variant sequences in current databases and categorization scheme
The sequence entries used were assembled from current versions of the publicly-accessible databases (last sampled as recent as May 31, 2017). The databases contained both validated and currently unverified entries. We further analysed all available ESTs as verification of the different sequences. In the end, both types of entries were analyzed. We used the NCBI database (RefSeq, RRID:SCR_003496) as our primary source for the selected model organisms (Homo sapiens (human), Mus musculus (mouse), Arabidopsis thaliana (Arabidopsis), Drosophila melanogaster (fruit fly), Caenorhabditis elegans (nematode) and Saccharomyces cerevisiae (Baker's yeast). We also cross-checked with other databases such as the Mouse Genome Informatics resource (Mouse Genome Informatics, RRID:SCR_006460), TAIR for Arabidopsis (TAIR, RRID: SCR_004618), FlyBase (FlyBase, RRID:SCR_006549), WormBase (WormBase, RRID:SCR_003098), and the Saccharomyces Genome Database (SGD, RRID:SCR_ 004694). All entries used are compiled in Table S1 along with their relevant details. Table S2 provides a listing of formal and alternate names used for each rhomboid, along with Gene ID numbers and related references. For this study's objective of periodically analyzing alternatively spliced products and their potential functionality, any more recent database updates are not expected to impact the conceptual findings.
Database entries were compiled and assessed for alternative splicing before use. This was necessary to acquire sequences that resulted from alternative splicing only, as opposed to derivations from other routes. These assessments utilized all sequences deemed relevant, independent of the validation status indicated. This analysis required multiple comparisons between entries and genomic sequences, then converted into protein sequences, compared, and aligned with current models of rhomboid proteins. All variant RNA sequences were assumed to result in the generation of functional proteins, including extensively truncated products. The resulting predictions did not indicate that this assumption would not hold true. Any changes in the entries encountered in the periodic rounds of searching were assessed before updating our analyses. A prime example of such a situation occurred with the human database entries, where the newest version contained more verifications than the previous versions.
The categorization of alternative splicing events/splice variants was based on the potential impact of such changes on motifs along the protein, from amino to carboxyl terminus. The protein models and motifs used for categorization were adapted from the models reported by Lemberg and Freeman 1 . In many cases, alternative splicing events tend to influence one motif, but there were a number of cases where the impact may affect one or more motifs, depending on the type and location of the splicing event. For this study, sequences and splicing events are categorized at the motif-specific or region-specific level, and may thus appear in more than one category. For instance, a frequent occurrence is an alternative splicing event which occurs in a particular motif that would likely impact the adjacent linker region. The various splice variants are listed in Tables S3 to S10 by category. Information concerning the impact of the splicing events is also included in these Supplementary Tables. Overviews are provided in the Results section. Specific examples are highlighted when applicable.
Alignments and analyses of changes to protein structure Structural predictions were facilitated by comparing alignments with predicted structures constructed and reported by Lemberg and Freeman 1 . Three-dimensional predictions of splice variant protein structures were created and compared by utilizing Phyre Version 2 services (RRID:SCR 010270) 48 and visualized using PyMol 1.1 (RRID:SCR 000305). These 3D predictions were carried out solely for the purpose of investigating and speculating the capacity for structural impact of the different splicing events. All of these 3D predictions are provided and reported in the Supplementary Material for consideration only and do not represent established, solved structure data.
Production of select splice rhomboid protein variants and immunoblot assessments Select proteins were synthesized in Escherichia coli JM109 (DE3) and facilitated by the T7 promoter of pET20b (EMD4 Biosciences, Rockland, MA, USA). Histidine tags were joined in-frame to the carboxyl-terminus. Cells were grown at 16°C in ampicillin-containing Terrific Broth (Bioshops Inc., Burlington, Canada). Recombinant histidine-tagged proteins were purified using nickel -nitriloacetic acid affinity chromatography (Qiagen, Toronto, Canada). The protein expression and chromatography procedures followed were as reported and cited previously 47 . Proteins were quantified using the Bradford assay system (Biorad, Hercules, CA, USA) and normalized before use.
Protein samples, typically 0.5 μg per lane, were analyzed when needed using standard one-dimensional (1D) 12% (m/v) sodium dodecyl sulfate (SDS) -polyacrylamide gels. Electrophoretic and immunoblotting protocols were performed according to Laemmli 49 and Towbin et al. 50 . Immunoreactive bands were analyzed using scans, quantitated by densitometry (Scion Image 4.0.3.2, Scion Corporation, USA), normalized, and compared relative to internal references when applicable. All immunoblots were repeated at least three times within each experiment and with biological replicates. Representative results are then shown in the figures. When needed, quantitations were conducted using nonsaturated scans of the images presented in the figures.
If used, the rabbit polyclonal anti-rhomboid protein (At1g74130) antibodies were established by our lab and validated as reported in Powles et al. 51 . For samples derived from transgenic yeast cell assays, rabbit polyclonal anti-yeast mtHsp70 antibodies (Antibodies-Online Cat# ABIN488515, RRID:AB_11209968) were used for normalization as reported previously in Powles et al. 51 . The normalized ratio strategy was designed to allow semi-quantitative assessment of profile changes independent of experiment, gel origin, and exposures. Statistical analyses or details are noted where applicable.
Assays for testing biological activity of select splice rhomboid protein variants Activity assays using transgenic yeast -Protein expression in S. cerevisiae (Baker's yeast) was facilitated by the yeast -E. coli shuttle vector pACT2 (BD Biosciences-Clontech, San Jose, CA, USA). In all cases, expression of the inserted cDNAs was driven by the cloned yeast Rbd1 promoter. The introduction of plasmids was carried out using standard yeast transformation techniques. The host yeast strain C6000 used was acquired from EUROSCARF (EUROpean Saccharomyces Cerevisiae ARchive for Functional Analysis, RRID:SCR_003093) (Frankfurt, Germany). Cells were grown in glucose-supplemented media (at 30°C in glucose-supplemented yeast complete medium without leucine where applicable). All strains were prepared as populations (as opposed to from single colonies) and stored at -80°C as glycerol stocks.
A disk diffusion method was employed to test different yeast strains for changes to nystatin sensitivity as a result of the indicated splice variant proteins being expressed. Yeast cells were suspended in top agar (1% w/v) media at 5 × 10 7 cells/mL and poured onto agar plates (2% w/v). Sterilized filter paper disks (38.5 mm 2 ) infused with nystatin, were evenly positioned on the top agar, at a typical concentration of 10 disks per 90 mm plate. Each disk was infused with 10 μL of a 2.4% (v/v) nystatin solution, diluted with the appropriate yeast culturing medium. Plates were incubated overnight at 30°C. The zone of growth inhibition around each disk was then measured, and the corresponding area calculated. Multiple independent experiments were conducted to confirm the functional potential of the various splice variants.
Activity assays using yeast and externally-added proteins -Non-transformed cells (without plasmids) were also assessed with exogenously added splice variant proteins. Since amphotericin B (AmB) has the ability to introduce transmembrane channels for protein delivery, cells were incubated for one hour with 10 μg (at 1 μg/mL) of the indicated recombinant variant protein along with a 1% (v/v) AmB solution (this level has no detrimental effect on yeast cell survival). Minimal volumes of cells, typically at 5,000 cells/mL, were used. Various levels of nystatin were then added and incubated for the last 15 minutes of the 1 hour treatment period. After the 1 hour incubation-treatment, 500 cells were plated (per 90 mm petri plate) and grown for 48 hours at 30°C to assess sensitivity to nystatin as a test for biological activity and for any differences between splice variants. Results from independent replicates were analyzed statistically (t-test) and noted with details where applicable.
Activity assays using transgenic bacteria -E. coli JM109 (DE3) cells harboring various pET20b-based splice variant constructs were plated on LB agar containing varying concentrations of ampicillin. The assessment of ampicillin sensitivity was not dependent on induction of expression, but instead relied upon the inherent "leaky" expression. Plating was normalized with equal colony numbers. Results from independent experiments were analyzed statistically (t-test), and noted with details where applicable. Changes in ampicillin sensitivity were further examined using whole cell extracts and immunoblotting to assess β-lactamase expression, secretion, and processing of the precursor β-lactamase form.
Activity assays using bacteria and externally added proteins -Bacteria (HB101) were assessed in some cases with exogenously added splice variant proteins. Cells harboring pET20b were used as the ampicillin resistance model for testing biological activity and differences between splice variant proteins. Dimethyl sulfoxide (DMSO) was utilized to permit the delivery of proteins into cells 52 by an initial incubation of 30 minutes with 5% (v/v) DMSO, 10 μg of variant protein (normalized with elution buffer), and LB broth. After the 30 minute protein delivery period, each treatment was incubated with 1.25 or 1.5 mg/mL of ampicillin (and adjusted if needed) for an additional 45 minutes. Treatments were carried out at 37°C with shaking (100 rpm). Bacteria were normalized to 800 cells per treatment in a total volume of 200 μL, and plated in its entirety on LB plates and grown overnight at 37°C. Surviving colonies were counted and compared to mock treatments, which consisted of all components without proteins. Results from independent experiments were analyzed statistically (t-test) and noted with details where applicable.

Results
Rationale and justification for this database study We previously verified in separate studies a mechanism for diversifying rhomboid proteins and their functionality. This alternative splicing mechanism played diversifying roles for two different Arabidopsis plastid rhomboid genes -the active secretase type At1g25290 and the inactive PARL type At1g74130. Alternative splicing impacted different parts of the proteins with no apparent functional similarities to each other. The At1g25290 splice variants were focused on controlling the appearance of the cyclin-binding RVL motif in the protein's middle segment, right after the third predicted transmembrane region 46 . The splice variants created for At1g74130 resulted in different shortened proteins, each missing a key glutamine residue in the last carboxyl transmembrane region 47 . Both studies provided evidence that the resulting variant proteins display altered functionality 46,47 . These two sets of findings alone bring the total number of plastid rhomboid forms in Arabidopsis to at least seven, two for At1g25290, three for At1g74130 and at least one each for At1g74140 and At5g25752.
The outcomes discussed above prompted us to look at rhomboid genes of other model species for evidence of similar diversification mechanisms. We thus periodically analyzed the RNA sequence databases of model organisms with complete genome sequences, human, mouse, Arabidopsis, Drosophila, C. elegans, and S. cerevisiae. The entries were compiled for use only after extensive EST analyses (the in silico verification process). Even though the databases continue to evolve, the observations disclosed here continue to be applicable. This notion remained intact even after repetitive assessments of the databases every six months since 2012.
Is alternative splicing present in different rhomboid gene systems?
The first aspect to establish was the presence of alternative splicing and its extent within a model species, and across the different selected model species. Using the current versions of the RNA sequence databases, we compiled all possible RNA sequence entries that were derived from alternative splicing, as opposed to other possibilities (see Methods for the sorting and verification process). The human, mouse, and Arabidopsis assessments revealed evidence of alternative splicing in different rhomboid genes of these species. There were 95 entries in human, 53 in mouse and 40 in Arabidopsis ( Figure 1). In contrast, similar analyses of Drosophila, C. elegans and S. cerevisiae, revealed minimal levels to no evidence of alternative splicing. We found one possibility  Table S1. Both validated and unvalidated entry types were analyzed. each for Drosophila and C. elegans and none so far for S. cerevisiae. It is, however, possible that the outcomes observed for the latter three model species were due to the limited number of reported alternate RNA sequences. The possibility of unreported alternative RNA sequences, even with limited variations, may help determine more definitively the presence and extent of alternative splicing in these species. This was the case in Arabidopsis where alternative splice variants for At1g25290 and At1g74130 were discovered and verified upon further analysis of transcript populations 46,47 .
Of the sequenced model species analyzed, the human rhomboid system appears to exhibit the most alternative splice variants. All 13 of the rhomboid or like genes display multiple entries reflective of alternative splicing (see Figure 2 and Table S1). For instance, human PARL contains verified splice variants and additional predicted mRNA sequences or proteins. Similar situations were observed for human RHBDF2 (iRhom2), RHBDL1, RHBDD1, and RHBDD2.
The use of alternative splicing is also evident in the mouse rhomboid genes. The assessment revealed evidence of multiple alternative splice variants for mouse rhomboids ( Figure 1 and Table S1).
Currently, of the sequenced model genomes available, Arabidopsis appears to possess the highest number of rhomboid and like genes. There are 22 entries and 10 are accompanied by 1 or 2 additional splice variant sequences ( Figure 1 and Table S1). The splice variants arising from At1g25290 and At1g74130 were discovered and verified in two other studies 46,47 . Based on the trends observed for human and mouse, it is likely that there are other splice variants in Arabidopsis awaiting discovery, especially for the other 12 gene entries currently without accompanying sequence variants in the database.
What are the types of changes introduced by alternative splicing? Further analyses of the Figure 1 entries indicate that many of the occurrences are likely reflective of mechanisms for diversifying functionality (Figure 2 and Table S3-Table S10). Structural changes were observed for both active rhomboid proteases (secretases and PARLs) and rhomboid-like proteins (inactive rhomboids and iRhoms) ( Figure 3-Figure 6, Table S4-Table S9). Potentially impactful changes were located in domains found across the entire protein structure (Figure 2- Figure 6 and Table S3-Table S9). Changes were also observed within the 5' UTRs that may affect translation and 3' UTRs that may affect transcript properties (Tables S3 and 10). The changes impacting the protein can be subtle, affecting a few amino acid residues, to entire sections of the protein. In some instances, there were extensive deletions, insertions, truncations, or shortenings of the protein.
Changes to the 5' untranslated region of the transcript -One of the most widely reported alternative splicing mechanisms is designed to control entry into translation or protein translation itself 53 . This continues to be the case for the rhomboid genes examined here (Table S3). Alternative splice variants with potential effects on translation were found in human, mouse and Arabidopsis. Twelve of the 13 human, and nine of the 13 mouse rhomboid genes were accompanied by entries with changes to the 5' UTRs. Interestingly, despite the higher number of genes documented for Arabidopsis, 5' UTR splice variants were found only for 7 of the 22 rhomboid genes. However, Each region of the generalized rhomboid protein is indicated along with the total number of rhomboid and rhomboid-like genes that were accompanied by splice variants found in the databases. The genes considered belong only to the model organisms selected for study here and are tallied together independent of species. The total number of accompanying alternative splice variants determined for each region is also provided. These variants are again tallied independent of species. Entry details are summarized in Table S1. Both validated and unvalidated entry types were analyzed.     Table S2. the absence of evidence for the other 15 Arabidopsis rhomboid genes may be due to unreported sequences or awaiting discovery.
Changes to the amino terminal region -Alternative splicing events affecting the amino termini appear to be common as well in our set of RNA sequences (Figure 2- Figure 6 and Table S4). Changes affecting the region between the start methionine to the first predicted TMD are placed into this category which includes frame-shifts, alternate starting methionines, insertions and deletions. Changes in this region could affect functional aspects like protein targeting and transport, membrane insertion, assembly and topology, and assembly with complexes. With the exception of human RHBDD1, all of the other 12 human rhomboid genes are accompanied by alternative splice variants impacting their amino end sequences. The situation is slightly different in the mouse rhomboid system where 9 of the 13 genes contain variants impacting the amino end. Based on the mouse database, Rhbdf2, Rhbdd1 and Rhbdd3 do not so far have any variants impacting this region. Interestingly, only 3 Arabidopsis rhomboid genes have entries with predicted alternative splicing within the amino region -RBL4, RBL14 and RBL15. One Drosophila gene, Stet, has an altered amino terminus resulting in an alternative start methionine. Of the 5 C. elegans genes, one has an altered amino terminus due to alternative splicing. This gene, ROM-4, displays a frame-shift that shortens the length of the region.
Changes to the L1 Loop region -The L1 region contains a conserved "loop" structure. This structure lies either side of transmembrane helices and plays a role in rhomboid protease activity 32 . The L1 loop is also partially inserted into the membrane 54 . Site-directed mutagenesis of the conserved L1 loop residues revealed evidence that this loop controls the way in which rhomboids interact with lipid membranes 54 . Our analyses of the Figure 1 entries indicate that the L1 Loop region is likely subject to alternative splicing (Figure 2- Figure 6 and Table S5).
In human, splice variants involving the L1 loop accompany 9 different rhomboid genes. Generally, the outcomes of the variants range from deletion of residues from part of the L1 loop region, to altering a few residues at one end or the other of the structure.
A similar trend was found for 4 of the mouse rhomboid genes, Rhbdf1 (iRhom1), Rhbdd2, Derlin2, and Ubac2. Again, the two splice events resulted in the loss of residues from the L1 loop.
The Arabidopsis database contains splice variants of the L1 loop for two genes, RBL4 and RBL14. Like in human and mouse, alternative splicing resulted in the removal of residues or large sections of the L1 loop.
The C. elegans ROM-4 gene exhibits an alternate start methionine and a frame-shift within the L1 loop, giving rise to only the beginning part of the loop.
Changes to regions affecting other structural aspects -This category is defined as changes to the linker or other transmembrane regions (TMD) with no currently assigned functions, as opposed to the regions with distinct functions discussed above and below. Since changes to these regions, subtle or extensive, could potentially affect the other functional aspects of the protein itself, or its interactions, it would be important to assess these splicing events.
In human, 11 of the 13 rhomboid gene entries examined are accompanied by splice variants in regions of the protein that fall under this category. Examples being PARL variants lacking TMD3 and part of TMD4, or the amino end of TMD1 (see Figure 3- Figure 6 and Table S9).
Changes to neighbouring regions of the catalytic dyad -The catalytic sites of rhomboid proteases consist of residues contributed from two different transmembrane domains, TMD4 and TMD6, when using the "6+1" model 1 . PARL and secretase-type catalytic residues are characterized by the amino acids GxSx -H. The catalytic residues of iRhoms are characterized by the residues GPxx -H.
There are currently 4 entries for human PARL rhomboids accompanied by splice variants that impact catalytic potential by altering the GASG or H sites through limited deletions, or extensions and the subsequent loss of the serine and glycine residues (GASG to GA) (Figure 2- Figure 6, Table S6).
Both human iRhoms, RHBDF1 and RHBDF2 (iRhom1 and iRhom2), contain frame shifts and early terminations within the L1 loop. RHBDL1 has a predicted variant resulting from a frame shift early in the transcript. The RHBDL1 frame shift alters the peptide sequence and removes the catalytic residues. The predicted mRNA/peptide for RHBDL3 displayed the same outcome as RHBDL1. The RHBDD1 gene is accompanied by two predicted forms with frame shifts and early terminations occurring before the catalytic residues.
In mice, there are 3 entries in our data set with splice variants that impact the catalytic residues. Rhbdf1 (iRhom1) is associated with extensive alternative splicing predictions. Nine different forms are predicted to impact the catalytic potential of the protein. Eight of these predictions display 2 additional residues within TMD4 (GPAG catalytic site), a loss of a residue within TMD6, and a changing of the histidine to a proline. The predicted transcript for the ninth form contains a frame shift in TMD2 that ultimately impacts the catalytic sites. Rhbdf2 has a predicted form with a frame shift in TMD2. Rhbdl3 gene is also accompanied by a splice variant with the potential to alter catalysis. Four of the seven forms result in a frame shift within TMD3 with resulting alterations to the catalytic regions. The resulting peptides remain out of frame, altering the peptide sequence downstream from TMD3.
In our Arabidopsis entries, 3 genes show evidence of altered catalytic potential through alternative splicing. One At1g74130 mRNA database entry exhibits a frame shift and early termination. The early termination resulted in the removal of the last TMD which basically eliminates the final catalytic residue. Two additional splice variants of At1g74130 were discovered experimentally by Powles et al. 47 . These two variants displayed similar outcomes as the predicted form from the database entry above, namely early termination sites resulting in two different lengths at the carboxyl end of the protein. RBL3 (At5g07250) is accompanied by a form where the TMD containing the catalytic histidine is removed entirely. Although the Gate and the catalytic histidine residue are removed, the predicted carboxyl terminus of RBL3 is maintained in this RBL3 variant. The last Arabidopsis gene to highlight in this category is RBL 14 (At3g17611). RBL14 is accompanied by a splice variant that may alter the catalytic potential of this rhomboid by using an alternate start methionine located immediately after the catalytic GFSG residues.
C. elegans ROM-4 has an alternate starting methionine and a frame shift that results in the removal of the catalytic residues.
Changes to neighbouring regions of the L5 Cap and the transmembrane domain 5 Gate (TMD5) -Based on the 6+1 rhomboid protein model 1 , transmembrane domain 5 (TMD5) is postulated to be a feature that controls the entry of a substrate into the active site -a gating control that determines enzymatic activity 55,56 . The Gate (TMD5) appears to be a region of potentially active alternative splicing activity (Figure 2- Figure 6, Table S7).
Of the 13 human rhomboid genes with entries in our data set, 6 are accompanied by alterations to the Gating TMD. PARL, RHBDF1, RHBDL1, RHBDL3, DERL2 and DERL3 are accompanied by forms with altered TMD6 (based on the 1+6 model). The most common resulting event appears to be early termination of the protein.
The same splicing outcomes appear to be present in our set of entries for mouse, Arabidopsis and C. elegans rhomboids. Mouse Rhbdl3, Rhbdf1 and Rhbdf2 (iRhom1 and iRhom2) are accompanied by variants with deletions or early terminations caused by frame shifts. The Arabidopsis data set contains two genes with predicted alternative spliced sequences affecting the gating TMD region. Arabidopsis RBL1 possesses an insertion between the catalytic TMD4 and the linker to the Gate. The RBL3 variant is missing both the gating TMD5 and the catalytic TMD6. The C. elegans ROM-4 variant is also missing the gating TMD as a result of a frame shift.
Changes to the carboxyl terminus region -Most of the carboxyl termini changes are due to frame shifts, giving rise to different carboxyl sequences. In human, PARL, RHBDF2, RHBDL1, RHBDD1, and Derlin3, all contain variants with different carboxyl ends. The situation is similar in mice where Rhbdf1, Rhbdf2 and Rhbdl3 are accompanied by variants with different carboxyl ends (Figure 2- Figure 6, Table S8).
In Arabidopsis, At1g74130 and At5g07250 variants have shortened carboxyl termini. The three alternative splice variants of At1g74130 lack the entire predicted carboxyl terminus. The introduction of a stop codon in TMD6 (or further upstream) resulted in the early termination of translation. The At5g07250 (RBL3) variant lacks TMD5 and TMD6, but the carboxyl terminus is restored with the removal of the first 4 residues of the predicted motif.
Changes to the 3' UTR region of the transcript -There are also changes associated with 3' UTR of rhomboid transcripts (Table S10). In our set of entries, there are 9 human genes with splice variants in the 3' UTR. Some of the variants possess longer 3' UTR sequences, whereas others exhibit shorter 3' UTRs.
A similar situation is observed for mice where 5 genes are associated with variants containing altered 3' UTRs. Rhbdl2, Rhbdf1 and Rhbdf2 variants contain extended 3' UTRs, whereas Rhbdl3 variants exhibit shorter 3' UTRs.
The Arabidopsis genes At1g74130, At3g17611 and At3g58460 are accompanied by one variant each with shortened 3' UTRs. One other gene, At2g29050 (NM_001084504.1), is represented without a predicted 3' UTR in the database.
What are the potential impacts on the rhomboid protein structure upon alternative splicing? The data compiled above suggest potentially impactful changes to the functionality of the affected proteins, but this speculation is limited to linear protein sequences and motifs. We were next interested in testing out possible functionality changes using currently available predictive tools for 3-D protein structures, despite this highly speculative assessment tool. To this end, we decided to use the established 3-D structure/model of the bacterial rhomboid GlpG to test the potential impact exerted by the various splice variant types. The GlpG model is the most established of the rhomboids and offers a more complete structure for this analysis. It should be noted that there are caveats associated with the use of the bacterial GlpG to assess other rhomboid types, but this analysis is strictly focused on how changes could theoretically impact such a rhomboid structure. Because these assessments are judged as being too speculative, the outcomes are provided only as Supplementary Material (Supplementary File 1 and Figure S1- Figure S8). These outcomes may be useful starting points for guiding future structural studies that validate the outcomes. The actual impacts to functionality and structure of particular alternatively spliced protein variants also need to be studied individually through experimentation. This notion was assessed here for a selection of splice variants using the different types of activity tests. These tests were devised mainly to uncover evidence of functionality in splice variant proteins, as opposed to assigning possible biological roles. Different contexts were used to assess functionality of the selected splice variants, contexts ranging from transgenic expression to assays using recombinant proteins as exogenous additives.
The first series of tests were conducted for the Arabidopsis At1g74130 splice variants to obtain verification of functionality in a heterologous transgenic setting. For At1g74130, a functional relationship between its splice variants and a known yeast mitochondrial rhomboid substrate, Mgm1, was initially discovered using transgenic yeast 47 . As shown previously in Powles et al. 47 , each At1g74130 splice variant impacted the Mgm1 ratio (the amount of uncleaved (97 kDa) to cleaved (84 kDa)). The At1g74130 M and S splice variants individually reduced the ratio by about a third (from ratios of 0.67-0.7 to 0.45 for M and 0.39 for S) 47 . Since the At1g74130 splice variants exist simultaneously in their natural Arabidopsis context, we further assessed different combinations of the same splice variants to see if such combined interactions influence the Mgm1 ratio, an indicator of splice variant activity. Further adjustments to the ratios by the various combinations of splice variants would uncover additional evidence of interaction and functionality. The results in Figure 7D  We next assessed activity by looking at changes in sensitivity to the fungicide nystatin. Changes to sensitivity was assessed in two ways, growth/survival of yeast cells around nystatin-infused disks as a longer treatment strategy and nystatin treatments of cell cultures as a transient strategy. As shown in Figure 7A Figure 8C). In cells with pET20b only, most of the ß-lactamase were present in the mature form (29 kDa) and at high total cell levels. In contrast, bacteria expressing At1g74130 splice variants exhibited shifts toward the precursor form (31.5 kDa), with (L) being the most impacted ( Figure 8C). Additionally, there were lower levels of β-lactamase overall in these same cells that may further contribute to the higher levels of sensitivity ( Figure 8C). These results indicate that the splice variants display functionality in this setting.
Evidence of functionality was also observed using an exogenous additive approach, where recombinant splice variant proteins were used to pre-treat cells before testing for changes to antimicrobial sensitivity (see Methods). This treatment scheme would be considered a transient strategy. For yeast cells, recombinant splice variants were delivered using a sub-lethal level of amphotericin B as the pre-treatment step before testing for sensitivity (survival) to nystatin. Even though amphotericin B is a fungicide (especially at higher levels), it was feasible to utilize amphotericin B at sub-lethal levels (1% (v/v)) for delivery purposes because this compound is capable of altering the permeability of fungal membranes and allow rhomboid proteins cellular access. The treatment matrix used for these yeast assays is shown in Figure 9A. The yeast strain tested here was the same parental host line used in the earlier assays. Sensitivity to nystatin was then tested at a level of 0.5% (v/v). Overall, treatment resulted in smaller colonies at the time of plate growth documentation (compare treatment 1 to treatments 2 to 9 in Figure 9B and C; Dataset 3 59 ). All control or mock-type treatments display higher survival percentages compared to the three recombinant rhomboid pre-treatments ( Figures 9B and C; Dataset 3   at 83.23 ±2.53%. Other components of the fungicides, such as the deoxycholate present in the amphotericin B solution, were also tested and found to have no impact on survivability at the levels used in these assays ( Figure S11). Finally, all three treatments with recombinant splice variants (which includes amphotericin B and nystatin) displayed decreased survivability (albeit at different levels between 6-14%) relative to control treatments (pre-treated with At1g74130 (L) displayed 13.93 ±1.16% survival, (S) at 6.92 ±4.39%, and (M) at 10.08 ±2.53%). Yeast cells pre-treated with recombinant At1g74130 variant proteins and then treated with both amphotericin B-nystatin exhibited the smallest colony sizes ( Figure 9B). Only pre-treatments with recombinant splice variants and amphotericin B resulted in higher sensitivity to nystatin. Protein pretreatments without the delivery agent amphotericin B behaved like the controls ( Figure S11).
Changes to ampicillin sensitivity in bacteria were also observed using the transient approach. As a commonly used delivery component in many drug applications, DMSO was utilized here as the protein delivery agent in place of amphotericin B. Bacteria were pre-treated with DMSO and recombinant proteins before testing ampicillin sensitivity. Relative to the untreated (media only) and mock treatment (all components without recombinant proteins), pre-treatments with exogenously added At1g74130 splice variants decreased the number of colonies (a proxy for cells surviving the treatment) at the time of plate growth documentation ( Figure 8D and Dataset 2 58 ). The mock treatment did not differ significantly (T-test, P = 0.36) from the no treatment control, indicating that the components used in the buffer did not contribute significantly to the enhanced level of ampicillin sensitivity. Relative to the mock treatment or no treatment control, the pre-treatment of bacteria with recombinant protein additives At1g74130 (L), (M), or (S), exhibited significant reductions in colony numbers (T-test: P = 0.022, P = 0.026, P = 0.029, respectively). The No Treatment control using 1.25 mg/ml ampicillin without DMSO, represents the reference point of 100% survival. The Mock Treatment without protein additives and with DMSO resulted in 95.85 ± 6.08% survival. The treatments (5% DMSO and 1.25 mg/ml ampicillin) and pre-treated with At1g74130 (L), (M), or (S) resulted in 71.36 ± 8.96%, 79.54 ± 3.10%, and 77.00 ± 1.27% survival, respectively.
The functionality assays used for At1g74130 were applied to splice variants from two other categories of rhomboid proteins. One splice variant pair was derived from Arabidopsis At1g25290 (named (L) and (S)) and another variant originated from human Ubac2. At1g25290 was from the "Active Rhomboid Proteases" category and Ubac2 is from the "Other Inactive Rhomboid Proteins" category. The overall results for variants from these two other categories indicate functionality as well in our assay settings (select assay results are reported here).
For the At1g25290 splice variants (L) and (S), similar responses were observed in the exogenous additive-transient setting, albeit with differences in impact from that observed for the At1g74130 variants. Functionality was displayed in both bacterial ( Figure 10A and Dataset 4 60 ) or yeast cell settings ( Figure 10B and Dataset 4 60 ). The outcomes between the two At1g25290 splice variants were themselves different. The different sensitivity levels displayed by (S) relative to (L) suggest that the phenomenon observed is attributed to the added recombinant rhomboid variant (that their differences were derived from alternative splicing) and not to other components in the mixtures.
The human Ubac2 splice variant tested is a fusion between a rhomboid protein sequence (considered a pseudoprotease) and ubiquitin-associating domains 61,62 . Like the above phenomenon, recombinant Ubac2 variant proteins showed functionality as an additive in bacterial and yeast assays, albeit at a more modest level of influence on antimicrobial sensitivity (Figures 11;  Dataset 5 63 ).
In whole, not all variants will show functionality in the different test settings. The different assays does however provide evidence that splice variants from different rhomboid categories possess functionality, supporting the notion derived from the in silico  happens using the information available in the different genetic databases. Our periodic database analysis over a five year period was focused specifically on functionality, as opposed to function, to capture slight changes (potential changes at this juncture) to functionality, such as those reported above 46,47 . We also wanted to determine the extent of alternative splicing within the different rhomboid gene systems of a particular organism as well as between different species. We thus limited our analysis only to splice variant entries in current RNA sequence databases of select model organisms with completely sequenced genomes. It was important to limit our analysis so that we can address the issue of extent appropriately and then assess how alternative splicing is used to modify the functionality of distinct regions of the affected rhomboid proteins of our data set, especially regions with defined purposes. We then assessed the overall notion of the in silico findings by testing a selection of variant proteins from three different categories.
The overall in silico evidence supports the hypothesis that alternative splicing is likely used to diversify rhomboid functionality in a number of cases. This is especially the situation in human, mouse, and Arabidopsis, organisms with relatively high numbers of rhomboid or rhomboid-like genes. Currently, there is a total of 95 entries for 13 human rhomboid genes that reflect alternative splice products, 53 for 13 mouse genes and 40 for 22 Arabidopsis genes. These splice variants were also not limited to a particular rhomboid category. The diversification appears to occur generally in distinct groupings across the entire rhomboid protein sequence. Although the in silico data suggest potentially impactful changes to functionality, this speculation remains limited to linear protein sequences and motifs. Therefore, we next tested the possible changes to functionality using currently available predictive tools for 3-D protein structures, despite the highly speculative nature of these tools. This analysis was focused strictly on how changes could theoretically impact such a rhomboid structure. Because these assessments are judged as being too speculative, the outcomes of these tests are provided only as Supplementary Material (Supplementary File 1 and Figure S1- Figure S8). These outcomes may be useful for guiding future structural studies as well as validating the outcomes through extensive experimentation. The notion of diversification in functionality by alternative splicing mechanisms was tested experimentally using recombinant proteins of six different splice rhomboid variants from three different categories, Active Rhomboid Proteases, Inactive Rhomboid Proteins, and Other Inactive Rhomboid Proteins. These splice variants represent different structural changes from active sites, to truncations, to fusions.
Based on the compiled in silico data, some of the potential impacts were quite extensive and obvious. Some of the more obvious ones appear to arise from subtle changes to the protein sequence, such as the introduction or removal of a few residues. Many impacted important structural motifs, sometimes from afar or indirectly (Figure 2-Figure 6 and Figure S1- Figure S8). Although the degree of amino acid sequence conservation of rhomboid and rhomboid-like proteins is relatively low between analysis, that alternative splicing provides a mechanism for diversifying the numbers of working rhomboid proteins, and the roles of rhomboid proteins play in a particular system.

Discussion and conclusions
Alternative splicing is used by many organisms to control and to diversify protein function. Historical examples include human tropomyosin, human kallikreins (secreted serine proteases), and fungal Ski7/Hbs proteins [64][65][66] . The same appears to be possibly happening with rhomboid genes, but despite witnessing alternative splicing as a mechanism for diversifying functionality in Arabidopsis and human breast cancer cells 35,41,[45][46][47]51,67 , the number of demonstrated cases remains limited. We were thus interested in assessing how often this species and types, there are distinct conserved residues/motifs that serve the same important functions. Some of the functional motifs potentially impacted did include motifs of known functions like the L1 loop, the TMD5-L5 cap, and the catalytic dyad region. Such situations are likely to bear significant consequences with respect to functionality. For instance, the L1 loop has been the focus of several studies because it contains one of the conserved motifs outside of the catalytic cluster 54 . Although the function of the L1 loop is not entirely understood, the importance of the loop on protease functionality was demonstrated by mutagenesis experiments 32,54 . Normally, the L1 loop is partially embedded in the membrane. Mutation of the conserved WR motif in the L1 loop of GlpG decreased proteolytic activity, suggesting a modulatory role for the loop 54 . Additional evidence suggests a regulatory role for the L1 loop, and this role is linked to the formation of a rigid L1 loop structure 54 . If the L1 loop serves as an anchor to the lipid bilayer, alterations to this structure could modulate the protein's enzymatic activity with substrates. In 2007, Baker and coworkers 56 found an enhancement of proteolytic activity with their set of mutagenized L1 loop experiments, which further demonstrates the link between L1 loop and functionality. Such outcomes are certainly possible and were observed in the 3-D models predicted for the different splice variants tested here ( Figure S7). Similar outcomes were also observed for splice variants involving changes to the L5 cap and TMD5 region ( Figure S8). In addition to experiments aimed at the L1 loop, Baker and coworkers 56 also carried mutagenesis-based experiments on the TMD5 region and observed enhancement of cleavage activity with some of the structural changes in TMD5. It was hypothesized that destabilization of the TMD5 helix allowed enhanced substrate entry 56 . TMD5's destabilization is believed to alter its configuration by changing its angle/tilt and proximity to neighboring helices. This in turn alters the efficiency of gating by this region. Alterations to the efficiency of gating then in turn affects proteolytic activity 56 . This is because, normally, when the TMD5 is positioned in the 'open' conformation, the TMD5 helix pulls the L5 loop outward 55 . The movement of the L5 cap structure allows substrate entry into the catalytic cavity. The open conformation of the L5 cap is also believed to permit the entry of water into the catalytic cavity 55 . Therefore, alterations to TMD5 and the L5 cap could potentially change features that affect substrate entry. The predicted outcomes in the examples shown in Figure S8 could possibly manifest in a similar manner with equally consequential effects. The structural outcomes revealed in our theoretical assessment of splice variants are thus likely to represent a mechanism for diversifying rhomboid functionality, since these alternative splice variants likely exist in the organisms studied.
Based on a combination of the previous findings for the two Arabidopsis plastid rhomboids 46,47 , the human RHBDD2 45 , and the trends revealed in this study, the overall evidence suggests that alternative splicing is a functionally significant mechanism for diversifying rhomboid functionality. This means that splice variants of rhomboids and rhomboid-like proteins likely exist simultaneously in the cell or sub-cellular compartment. Like the two Arabidopsis plastid proteins, the alternative splice variants are likely co-expressed, modulated relative to each other to respond to the cell's needs, and interacting in some manner. The possibility of interactions between rhomboid units themselves has been reported by Wu et al. 55 for the bacterial rhomboid protease GlpP. Such interactions with different rhomboid variants/forms and populations could therefore manifest in a number of ways that affect rhomboid functionality. The theoretical approach used in this study, and the predicted outcomes that may arise are thus not without merit, and should be considered as guidance for further experimentation. The possibilities, such as the ones discussed in the above examples, are observed experimentally in other studies and in functionality assays conducted for our select splice rhomboid variants. There are a number of other experimentally tested examples where truncations have been observed to impact functionality. In addition to rhomboid proteins, examples of other types of proteins include those recently reported by Stoddart et al. 68 for an integral membrane pore, and by Quemeneur et al. 69 where shape influences protein mobility within membranes. Whatever the situation may be for rhomboids, it is clear that it is necessary to characterize splice variants for each rhomboid and to determine how splicing influences rhomboid functionality. This would be important for elucidating how the different rhomboids work as a network. Figure 7.

Competing interests
Parts of this study may be considered as being potentially related to material contained in a patent application (US 2016/0129078 A1; available here). There are no other competing interests disclosed at the time of submission.

Grant information
The research was supported in part by a grant (number 43698) from the Natural Sciences and Engineering Research Council of Canada (to K.K.) and by funds from Queen's University (to K.K.). Joshua Powles was also supported by graduate awards from Queen's University.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Acknowledgements
The authors thank Nicholas Ostan and Casssandra Nelson for their technical assistance in some of the activity assays and data collection (both were at Queen's University during their roles on this research project).
Part of the work presented in this publication was derived from the doctoral thesis by the first author, Joshua Powles. Dr. Powles's doctoral thesis was deposited according to graduate program requirements into the Queen's University electronic repository for graduate ) theses and dissertations.

Supplementary material
Supplementary File 1: Supplementary notes for Figures S1 to S8; References cited in the supplementary notes for Figures S1 to S8; Supplementary figure legends for Figures S1 to S11.
Click here to access the data. Table S1. List of alternative splice sequence entries sorted by model organisms.
Click here to access the data. Table S2. List of names and alternate names used for the various rhomboid genes and proteins.
Click here to access the data. Table S3. Summary of alternative splice variants impacting the 5' UTR and translation.
Click here to access the data. Table S4. Summary of alternative splice variants impacting the amino terminus.
Click here to access the data. Table S5. Summary of alternative splice variants impacting the L1 loop region.
Click here to access the data. Table S6. Summary of alternative splice variants impacting regions of the catalytic dyad.
Click here to access the data. Table S7. Summary of alternative splice variants impacting the L5 cap region.
Click here to access the data. Table S8. Summary of alternative splice variants impacting the carboxyl terminus.
Click here to access the data. Table S9. Summary of alternative splice variants impacting the other regions and motifs.
Click here to access the data. Table S10. Summary of alternative splice variants impacting the 3' UTR.
Click here to access the data. Figure S1. Examples of alternative splicing and their potential impacts on the structure of the amino terminal region.
Click here to access the data. Figure S2. Examples of alternative splicing and their potential impacts on the structure of the carboxyl terminal region.
Click here to access the data. Figure S3. The potential impact of alternative splicing on the structure of the Arabidopsis rhomboid At3g17611.
Click here to access the data. Figure S4. The potential impact of alternative splicing on the structure of the Arabidopsis rhomboid At3g53780.
Click here to access the data. Figure S5. Examples of alternative splicing and their potential impacts on the L1 loop structure.
Click here to access the data. Figure S6. Examples of alternative splicing and their potential impacts on the L5 Cap-TMD5 structure.
Click here to access the data. Figure S7. The potential impact of alternative splicing on the structure of the Arabidopsis plastid rhomboid At1g25290.
Click here to access the data. Figure S8. Examples of alternative splicing and their potential impacts on the structure of the catalytic region.
Click here to access the data. Figure S9. Immunoblot images used for analyzing the impact of different rhomboid combinations on the Mgm1 ratios presented in Figure 7D.
Click here to access the data. Figure S10. Immunoblot images used for analyzing the impact of rhomboid variants on ß-lactamase production and secretion presented in Figure 9C.
Click here to access the data. The manuscript presents original materials on alternative splicing of rhomboid proteins. The authors discuss potential impacts on the rhomboid protein structure. The paper has good illustration and rich Supplementary materials. A lot of work done. Thus, it is technically sound.
But the presentation has to be improved.

If applicable, is the statistical analysis and its interpretation appropriate? Partly
Are all the source data underlying the results available to ensure full reproducibility? Partly The rhomboids are a very large family of genes found across all kingdoms from bacteria to plants to animals. Based on what we know about the functions of these genes, which unfortunately is not very much at the moment, they often play critical roles in life wherever they are involved. Also interesting is that their functions are highly diversified, as mentioned in this paper. For instance, the human rhomboid family 1 gene, RHBDF1, was found to be expressed at significantly higher levels in breast cancer tissues than in normal breast tissues, with functions as different as promotion of G-protein coupled receptor mediated transactivation of epidermal growth factor receptor (reference 1), or protection of hypoxia-inducible factor 1-alpha from degradation under hypoxic conditions (reference 2); both functions are consistent with facilitating tumor growth, however. RHBDF1 belongs to the group of so-called inactive rhomboids (iRoms) because of an apparent lack of protease activities associated with otherwise enzymatically active rhomboids. This group of rhomboids, having lost their abilities to catalyze proteolysis reactions due to mutations, seem to have destined in evolution to perform other functions, such as those of chaperones, with their abilities to bind to a variety of proteins conserved and utilized in assisting protein folding, transportation, and degradation.
The paper by Powles and Ko is interesting as it addresses a fundamental cause of the massive variations of the functions of the rhomboid gene products. The authors carried out an extensive data mining operation to reveal complex differential splicing patterns of the rhomboid genes. Their findings indicate that differential splicing is common and substantial within one gene as well as throughout the gene family. Apart from the multiple transmembrane domains, the rhomboid proteins often possess a large N-terminal domain before the first transmembrane domain and a sizable loop between the first and second ones. The N-terminal and the loop are likely located on opposite side of the endoplasmic reticulum, Golgi, or plasma membrane of the cell, indicating that they have opportunities to interact with different biomolecules. In addition, in the case of RHBDF1 there are a number of amino acid residues within the protein molecule that could be subjected to post-translational modifications such as phosphorylation and glycosylation. It would therefore seem highly likely that frequent and extensive splicing of the gene transcripts exerts considerable impact on the protein structures and thus their functions. Changes in the gene transcripts could also bring about alterations in terms of modulations by micro RNA and other non-translational means. The effort by the authors is very useful to put together a data base of immense and complex differential splicing patterns of nearly the entire rhomboid gene family. The findings should be beneficial to researchers in this field, even though many of the conclusions are speculative, as pointed out by the authors, because of the lack of our knowledge on the structures and functions of most of the individual products of this enormous gene family. sequences, which they subsequently analysed by tool-z. Instead, the authors continue: "The resulting predictions did not indicate that this assumption would not hold true." What does "resulting" mean here? Which tool was used? A translation of an RNA is usually not termed "prediction". Which tool was used for the verification? Apart from these style problems, the authors did not provide any evidence that the translations "do" result in functional proteins. It still is just their assumption. In terms of missing information to recapitulate/redo the analyses, the authors do not provide any information on: A) how the set of starting proteins were assembled (search by name? Using BLAST?), B) how the entries were assessed for alternative splicing (on RNA? On protein level? By alignment?), C) the authors claim that their analyses required comparisons between genomic sequences but do not provide any genomic sequence in the main text or supplements, D) the authors claim "alignments with current models of rhomboid proteins" without providing the tool used for doing these alignments nor providing a list or reference for the "current models of rhomboid proteins", E) the authors state that they used Ni-NTA chromatography for protein purification without providing any buffer. While column washing-buffers likely do not have any effect, the elution buffer has. Was the elution done in a gradient? By a step-wise process? Just by eluting with 500 mM Imidazole? Which pH? Was the purified protein dialysed afterwards? This is essential information, because the proteins were subsequently used in other assays, and these might be compromised by pH or imidazole concentration. F) the authors do not provide any information on ampicillin-concentration used for selection, nor any cell lysis procedure. G) was the purified protein always used directly? Frozen and reused later? How long is the protein stable in which buffer (pH? Imidazole?)? H) I do not understand this sentence: "If used, the rabbit polyclonal anti-rhomboid protein (At1g74130) antibodies were established by our lab and validated as reported in Powles et al. 51 ." Did the authors use this antibody for the western blots? Why did they "validate" the antibody again? I) which media were used for yeast cell growth? Just writing "Cells were grown in glucose-supplemented media" does not make the experiment reproducible. Which glucose concentration? What is a "complete medium"? J) The authors experimentally analyse three variants for the At1g74130 gene, termed L, M, and S, but it is not described anywhere to which splicing events these transcripts belong. Every study that I know that deals with alternative splicing contains a gene structure scheme and shows which exons would be present in which transcript. This is just an incomplete list of examples, where absolutely essential information is missing. The authors would need to check every sentence in the Methods section to see, whether it is really describing a method, and they need to check where and which information is missing. => I have looked at the provided accession numbers for the genes/proteins. The numbers for human and mouse all represent "predicted transcripts", thus these could all be wrong. I recommend the authors to read a few papers about gene prediction software and pipelines, they will notice that on average every prediction in e.g. human, Drosophila, C.elegans contains 1 wrong exon (e.g. the paper from the eGASP comparison of gene prediction software on these 3 species would be a good starter). Prediction of alternative splice variants is even more error-prone. Thus, the authors should only use transcripts with cDNA evidence in their study (and keep in mind, that cDNA/EST data also sometimes contains errors from e.g. missplicing). Discussing just gene predictions is complete artificial and fiction. The authors could, for example, download a few RNA-seq datasets from the ENCODE and modENCODE data and do the RNA-seq mapping themselves to validate the gene predictions. If the authors cannot do this, they should only analyse and discuss transcript variants for which they find cDNA/EST evidence (and of course the accession numberss for these cDNA sequences need to be given). If the authors did a RNA-seq mapping themselves they could also provide some measure for the likeliness that a suggested variant is a true variant or the result of missplicing, transcription errors, etc. E.g. if thousand of reads are found for a certain gene, is it likely that a variant is a true variant and functional if only supported by 1-2 reads? Or are these 1-2 reads rather representing missplicing and other errors? => The authors claimed several times that they re-did the analysis every 6 months since 2012. Wouldn't it be a much better way to just look first whether updates on these species were made available at all before be a much better way to just look first whether updates on these species were made available at all before spending time in redoing an analysis? I know that these updates only happen occasionally, and not even every second year. Also, the techniques completely changed since about 2010. At least I am not aware of any major study providing new EST/cDNA datasets for the species studied here. All the new data since about 2008 is generated as RNA-seq data. Thus, which further cDNA evidence did the authors expect for the transcripts since 2012 so that they decided to redo the analysis every 6 month? I highly recommend the authors to read all the README-files for the various gene prediction datasets that GenBank and the other databases provide. E.g. Ensemble decided in 2012 (if I remember correctly) to not use any cDNA data anymore for validating their gene predictions. Thus, many exons with cDNA data available (e.g. from isolating and sequencing single genes by research groups) are not present in Ensemble's gene predictions anymore, if these exons are not predicted by gene prediction tools. Similarly, there are many predicted exons for which no evidence (cDNA/RNA-seq) is available. Please check the GETx-project: There you can see how many RNA-seq reads in each tissue are found for each transcript. You will find out, that for many of the "alternative" transcripts, there is not a single read. The presented discussion of the variants (figures 3 to 6) supports my assumption that most predictions are just wrong. Could the authors provide any reference that it would be possible for such a seven-transmembrane-protein (the rhomboid proteases) to result in a functional protein if e.g. 1 transmembrane region in the middle would be missing due to alternative splicing? If 1 transmembrane region would be missing (this is suggested by several variants the authors discuss) this would turn the direction of the rest of the transmembrane helices and regions: what was inside before would be outside, and what was outside would be inside. To all what I know from membrane protein structures, the transmembrane helices stick together forming a dedicated tertiary structure. Could the authors provide a reference that transcripts with early terminations would ever result in stable and functional proteins, if e.g. only some of the 7 transmembrane helices are present anymore? The C.elegans ROM-4 was stated to contain a variant with just the first TM-helix present. Do the authors really think that this would result in a functional protein? How do the authors exclude that such misspliced/unstable variants would not result in NMD?
=> The authors claim several times throughout the entire manuscript that they analysed selected model species without giving any information how these were selected. E.g. wikipedia contains a list of about 100 model organisms. Which was the rationale to just look at the 6 in the manuscript? Why not more plants? In this respect, there are many statements of the authors that are just wrong, such as "Currently, of the sequenced model genomes available,". Of the 100 model organisms at wikipedia (and there are likely more model organisms if other researchers were asked) at least for 80 of them complete genomes are available. Altogether, there are about 5000 eukaryotes with genome assemblies available, of which at least 4000 are complete (e.g. check ). In this respect, all the speculations and discussion www.diark.org about which organism contains the most homologs or the most variants are just speculations and should explicitly termed as such. E.g. Brassica species underwent another whole-genome-duplication after separation from the Arabidopsis, and will thus contain many more than the 22 Arabidopsis homologs. Fish also underwent another 1 or 2 (depending on lineage) whole-genome-duplication, therefore will also contain more homologs.
=> The authors state several times that, although not observed, they expect more alternative variants for the Arabidopsis homologs, because many more variants were identified for human. What is the basis for this expectation? Is there any reference that demonstrates, that Arabidopsis genes have as many alternative splice variants as mammalian genes, on average? All what I am aware of just contradicts this expectation. Arabidopsis genes have fewer exons (thus less possibilities for alternative splicing) and less alternative splice events. Why do the authors not expect more alternative variants for Drosophila or C.elegans, based on their rationale? Shouldn't yeast have at least some variants (doesn't have any yet)? Of course, yeast doesn't, but this should make the authors aware of the problems in their argumentation. => Although the manuscript deals with alternative splice variants, I did not find any reference on the ample literature on this subject. Not even references to a few reviews. Similarly, there is not even mentioning of the accepted types of alternative splicing, e.g. differentially included exons, alternative 5'/3' splice sites, mutually exclusive splicing, etc. Categorizing the variants detected with these categories would be much more informative than categorizing by region. Based on the descriptions in the supplementary tables, many variants are highly likely just the result from sequencing errors leading to frame-shifts or alternative amino acids. There is only a single alternative splice form that would lead to alternative amino acids for a certain regions, which is mutually exclusive splicing, but this information I did not find anywhere. This could easily be confirmed.
=> I did not see any gene structure in the manuscript or supplements. The usual procedure in the field is to provide a gene structure drawing of each gene and mark the splicing events on these structures. This is common practice since >30 years. The authors do not show any protein sequence, nor any cDNA sequence, nor any alignment. I cannot see any use of the provided tables in the supplements for other researchers. Terms such as "Original UTR missing, new UTR generated from coding region" (table S3) are not useful. How can a UTR be generated from a coding region? Does this mean that this is just an alternative translation start site? What does "Default 5'UTR missing, extended downstream from isoform 1" mean? When I think of a gene structure with exons and introns, which splice event would represent this prosaic description? I will not provide more examples here, but by browsing through the supplementary tables I did not find a single useful description. Thus, the authors should provide a gene structure scheme for each gene and mark the events for each gene accordingly. Such schemes would represent exact descriptions of the splicing events. All prosaic descriptions need to be removed.
=> The authors state in the Methods section: "This was necessary to acquire sequences that resulted from alternative splicing only, as opposed to derivations from other routes." Which other routes would lead to variants, if not alternative splicing?
=> The authors state in the Methods section: "A prime example of such a situation occurred with the human database entries, where the newest version contained more verifications than the previous versions." How did the authors determine, whether the newest version contains "more verifications"? Is there any reference that GenBank obtained more full-length cDNA data in 2016 and that these data were used in the gene predictions? => The authors confirm by experiments the alternative splicing of two Arabidopsis genes, but exactly these variants have already been published by the same group some years ago. Thus the authors do not provide any further and new evidence that any of the predicted variants of the other genes is present and functional. The authors would not even need to do experiments to show functionality. If they only analyse those transcripts for which cDNA/EST data are available, these cDNA/EST data are already the evidence. Further evidence could be obtained if the same transcripts were found for closely related species.
=> The Introduction is extremely unbalanced. E.g. for this very general statement "Rhomboids are also part of an even larger group of proteins involved in regulated intramembrane proteolysis 2-31." the authors provide 30! references, but the authors miss to provide a useful description of the protein superfamily in general. As far as I understand, these proteins are present in bacteria, archaea and eukaryotes. It would be essential information how the various subgroups are phylogenetically separated. Without such information, there is no possibility to understand the variants. For example, which of the Arabidopsis homologs are the result of the many plant whole-genome duplications? Do such duplicates have identical alternative splicing? Similar for human and mouse: Which homologs are related to which Drosophila and C.elegans homologs, do the orthologs in human/mouse (derived by the two We now have version 2 available for your further assessment. Based on the feedback, Version 2 was revised substantially to address the aspects raised and to enhance clarity for the readers in the areas of writing style and additional details.
As per your suggestions, improvements were made to the presentation by removing references that may not be necessary to the readers. Removal of these extra references allowed us to revise the Abstract, Introduction, and Methodology sections to enhance clarity with respect to the study's focus or purpose. The title was also modified slightly to reflect these revisions.
We have revised the Methods section substantially to clarify the approaches used and to provide more needed details concerning the analysis of available database entries and the functionality assays using recombinant proteins. Revisions were also made to the Source Datasets and the Supplementary Tables to enhance their presentation, to provide more descriptive titles and related aspects. We hope that there is now sufficient methodological details for others to follow.
The style of writing and presentation in the Methods was revised substantially. We have removed unneeded information and added needed details/information on the tools used, tools that reflect what was being done in the comparative analysis of available entries -namely that we compiled all available entries and assessed their status as splice variants using RNA and protein data. We have added information in the text and in the tables to clarify what the compiled entries represent, e.g., entries or accession numbers listed could represent predicted entries but they are listed and used as reference points after assessment with RNA and protein data.
We have added more information (or information deemed missing) concerning the recombinant protein work and revised the information to enhance clarity. For splicing information pertaining to the At1g74130 (L, M, S) and At1g25290 (L, S) protein variants used later in the study, we have added citations and direction to the Supplementary Material for protein information.
The multi-year facet of the database work was removed to avoid confusion and to enhance clarity. This removal allowed us to streamline the database analysis protocols and provide more details of the tools used. The Results and Discussion sections were modified similarly to reflect these changes. The removal of the multi-year aspect in combination with the above revisions should help the readers interpret the work more clearly.
Concerning the aspect of protein variants with substantial changes being potentially functional, we have provided citations as possible indications that major changes could still give rise to rhomboid proteins with functionality, such as early termination resulting in proteins with one missing transmembrane region. The choice of model organisms used was explained better by revising the style of writing/presentation and by characterizing the work as a comparative analysis or survey of available entries, as opposed to a direct comparison of splice variant numbers between species and its interpretation in this way. We hope that we have interpreted this suggestion correctly. This aspect, we believe, was also raised by Reviewer 3, Dr. Orlov.
We did not include references dealing with alternative splicing because the study was not intended to be at this level, but working at the level of surveying and comparing available entries and what these entries may reflect at the protein level. As described above, substantial revisions were focused on clarifying the intent of the work to avoid this impression. This reasoning applies to why gene structures and splicing events were not provided. All of the supplementary tables were gene structures and splicing events were not provided. All of the supplementary tables were revised to enhance the writing style of the descriptions and render them more useful. We hope that the revisions provided clarity on these issues.
The latter comments concerning passages in the Methods section were addressed by the revisions outlined above. This would also be the case for the comments pertaining to the Introduction -addressed by the revisions outlined above. Sincerely,

Kenton Ko and Josh Powles
No competing interests were disclosed.

Competing Interests:
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias You can publish traditional articles, null/negative results, case reports, data notes and more The peer review process is transparent and collaborative Your article is indexed in PubMed after passing peer review Dedicated customer support at every stage For pre-submission enquiries, contact research@f1000.com