In silico study on RNA structures of intronic mutations of beta-globin gene [version 3; peer review: 1 approved, 1 not approved]

Background: Mutation of the beta-globin gene (HBB) interferes with primary mRNA transcription, leading to beta-thalassemia disease. The IVS1nt1 and IVS1nt5 mutations were reported as two of the most prevalent intronic mutations associated with beta-thalassemia major. These mutations may affect the mRNA structure of the human beta-globin ( HBB ) gene. However, the mechanism by which variation in HBB alters the mRNA structure remains unclear. The objective of this study was to unveil the secondary and tertiary conformation difference of the mutants compared to the wildtype using in silico analysis. Methods: The sequence of HBB was obtained from Ensemble database and mutated manually at nucleotides 143 (IVS1nt1G>T) and 147 (IVS1nt5G>C). The RNA secondary and tertiary structure were performed by ViennaRNA Web Services and 3dRNA v2.


Introduction
Thalassemia is a hereditary blood disorder that induces the production of an abnormal form or inadequate amount of hemoglobin in the body.Hemoglobinopathy has been identified in approximately 71% of all countries around the world, and more than 50% of cases of β-thalassemia occur in Southeast Asia 1 .β-thalassemia is one of the most prevalent blood disorders worldwide.It can result in various traits and the coinheritance of thalassemia minor, intermedia, or major depends on many factors.People with β-thalassemia exhibit reduced hemoglobin production, and low levels of hemoglobin results in a lack of oxygen supply throughout the body 2 .
β-thalassemia is caused by mutations in the human beta-globin (HBB) gene, which is responsible for producing β-globin, a subunit of hemoglobin 3 .Most mutations associated with β-thalassemia are caused by the substitution of one or a limited number of nucleotides in HBB.These mutations affect the functions of the gene including transcription, RNA processing or translation of β-globin mRNA.β zero (β 0 )-thalassemia is caused by mutations in HBB that stop the production of beta-globin.Conversely, other mutations only reduce the amount of beta-globin protein produced, a condition termed β plus (β + )-thalassemia 4 .
People with β 0 -and β + -thalassemia have been diagnosed with thalassemia major and thalassemia intermedia, respectively.Thalassemia major has a more severe phenotype than thalassemia intermedia.Most patients with thalassemia major die at a young age 5 .Unfortunately, as many as 23,239 babies are born with inherited β-thalassemia major every year 1 .
Conformational changes in regulatory RNA induce specific human diseases.More than 95% of mutations result in only small local changes in the conformation of RNA.The same phenotype (disease) can be caused by different mutations with varying degrees of effects on the overall RNA structure 6 .
Numerous variants of non-deletional β-thalassemia are caused by mutations that interfere with processing of the primary mRNA transcript 7 .These mutations affect AG or GT dinucleotides at the exon-intron splice junction.These mutations eventually ablate regular splicing and induce β 0 -thalassemia, resulting in a β-thalassemia major phenotype 4 .Mutations in intervening sequence 1 (IVS1) at nucleotides 143 (change from guanine to thymine, IVS1nt1G>T) and 147 (change from guanine to cytosine, IVS1nt5G>C) are common mutations in Southeast Asia that result in β-thalassemia minor and major, respectively 8 .
A study on mutations that interfere with mRNA transcription may explain the mechanism by which the mRNA structure is changed by mutations in HBB.Specific in silico RNA analysis and visualization tools to study these structural changes have been developed, such as ViennaRNA Web Services, 3dRNA v2.0, and UCSF Chimera [9][10][11][12] .ViennaRNA Web Services and 3dRNA v2.0 are web interface packages that offer structure modeling for RNA.This study evaluated the effects of the intronic HBB mutations IVS1nt1 and IVS1nt5 on the RNA structure using specific in silico tools.

Secondary structure predictions
We used the RNAfold Server from ViennaRNA Web Services to predict the RNA secondary structures 9 .By selecting this applet, thermodynamic parameters, including centroids, partition function, and minimum free energy (MFE), were automatically calculated by the software as considered for structure folding.To create the RNA best structure tree plot, the wild-type and mutants sequences and structural elucidations were forwarded to the Barriers server, which is part of the ViennaRNA Web Services package 15,16 .The tools were utilized with their respective default parameters.
Free energy minimization and base pairing probabilities are the most used methods in predicting single-sequence structure using the following equations 17 .When a structure i is at equilibrium with the unpaired state, the equilibrium is described by the equilibrium constant K i as presented in Equation 1: The Gibbs free energy change for structure i, namely ΔGi, quantifies the favorability of a given secondary structure compared with the unpaired state.Similarly, the equilibrium between two structures i and j is described in Equation 2 as follows: (2) Thus, the Gibbs free energy change quantifies the favorability of a structure at a given temperature.The structure with the lowest Gibbs free energy change will be the most prevalent in solution at equilibrium.
Supplementing a free energy minimization calculation with a partition function calculation can identify the pairs that are

Amendments from Version 2
We modified the view of Figure 1.Only the mutations area was displayed in previously published, whereas complete paired sequences are retrieved in this version.
Any further responses from the reviewers can be found at the end of the article more likely to be correct.The partition function Q is defined as the sum of the equilibrium constants K i for all possible structures.
0 / all structures all structures Thus, the probability of a particular structure i being found in solution can be calculated according to Equation 4 as follows: Furthermore, the probability of a base pair i-j can be calculated by summing the equilibrium constants for structures that contain that pair and dividing by the partition function.
( ) In Equation 5, the sum of the index k is taken over all structures that have the base pair i-j.

3D tertiary structure visualization
The 3dRNA v2.0 from Biomolecular Physics and Modeling Group Xiao Lab (http://biophy.hust.edu.cn/services.html)was used to construct the tertiary structures.Whole genome sequence and 2D structure with dot-bracket form, obtained from RNAfold Server, were entered to generate 3D structures for wild-type and each mutant 10 .The prediction files were downloaded in the .pdbformat.The interactive model was visualized using the University of California at San Francisco (UCSF) Chimera version 1.13.1 18 .

Results and discussion
Intronic mutations of HBB can lead to β-thalassemia major.These mutations have spread widely around the world 19 .In this study, we observed the effects of intronic mutations on HBB mRNA structure.HBB is encoded by 3931 nucleotides located on chromosome 11 with its various regulatory genes.This gene contains 1608 bp and consists of three exons and two introns.Introns 1-2 cover nucleotides 143-272.This region is known as IVS1 (Figure 1a).In these introns, we highlighted two mutations, namely IVS1nt5G>C and IVS1nt1G>T, located at nucleotides 143 and 147, respectively (Figure 1b).
Our observation revealed differences in the MFE and other thermodynamic characteristics between wild-type and mutant HBB.We found that different point mutations occurred in the same IVS region, reflecting unique features.MFE, centroid, and partition function properties are depicted in a mountain plot in Figure 2.
The MFE has important roles for secondary structure prediction.Lower values are associated with better structures.Conformation of MFE secondary structure by RNA folding is irreversible because the sequence space sets up the shape space 20 .Centroid-based analysis is an alternative method for predicting the secondary structure.This calculation yields 30% less prediction errors than the MFE 21 .The stability of RNA molecules fluctuates in different low energy structure.
Therefore, we need to analyze the equilibrium ensemble of RNA structures via the partition function 22 .The partition function measures the quality of secondary structure by determining the confidence in base pairs predicted by free energy minimization 23 .
Based on the mountain plot result, the mutations of IVS1nt1G>T and IVS1nt5G>C changes the mRNA structure of HBB.However, IVS1nt5G>C leads to a much larger free energy change versus the wild-type.A previous study found that IVS1nt1G>T and IVS1nt5G>C take place in splice junction and consensus splice site, respectively, that might result in splice junction disturbances 24 .Therefore, we believe that this disturbance may affect the mRNA structure.
Although introns are known as non-coding regions, their alteration influences mRNA expression, which can lead to several diseases.Using the RNAfold applet from the ViennaRNA package, we analyzed the thermodynamic characteristics and structural changes of wild-type and mutant HBB (Figure 3 and Figure 4) 7,9 .
The secondary structure of mRNA is beneficial for tracing the refolding pathways of an RNA molecule.The next question that arises is how the secondary structure of mRNA is affected by mutation of HBB.To obtain the mRNA structure, the RNAfold system was used (see Methods).The prediction of the mRNA secondary structure is based on the centroid-based pair probability (Figure 3).It was found that the three structures have a distinct structural backbone.Based on the structure, unwanted folding occurred in the IVS1nt1G>T and IVS1nt5G>C mRNA structures compared with the wild-type structure.The principle of this methods is to create the secondary structure with minimal base pair distance to all other secondary structures in the Boltzmann ensemble 9 .The blue box in Figure 3 highlighted the main alteration formed within the three sequences.However, the structure of IVS1nt5G>C seems to be the best predicted according to the red color in the base-pairing probability.To further clarify the effect of folding, barrier tree plot analysis was performed (Figure 4).The result revealed that changes in the population occur earlier in IVS1nt5G>C carriers than in people carrying wild-type or IVS1nt1G>T HBB.The earlier changes in the population result in different populations toward the end of the barrier tree plot.
These intronic HBB mutations may lead to critical folding changes that could drastically affect the tertiary structure of the beta-globin chain as well as the quaternary structure.To further clarify the effects of the mutations, 3D models of wild-type and mutant and HBB were visualized using UCSF Chimera software (Figure 5).In general, the tertiary structure of each sequence was unique.The point mutation gave tremendous folding changes in the 3D model.Once the base shifted, base pairing mechanism was influenced massively.Previous research reported the predicted pathogenic effects of these intronic mutations using the HSF server.IVS1nt5G>C was predicted to be a silencer motif new site, intronic identifying elements and exonic splicing silencer (ESS) site were broken.ESS provides binding sites for proteins, promoting exon exclusion, and helps the spliceosome to decoy splice sites 25 .In addition, Seo et al. (2013) stated that most of the mutations affecting splicing disrupted the highly conserved donor and acceptor sites (GT/AG) at the exon-intron junctions and polypyrimidine tract, and the branch-point sequence may be disrupted by mutations affecting the splicing sites 26 .
RNA structure stability depends on metal or ligand binding, the alteration of which can affect the binding site activity and structure stability.The properties of RNA structures related to multiple ligand binding, ligand binding-induced conformational changes, and the ion-stabilized catalytic core have been explored.The study by Miao et al. stated that the positive charge of metal cations plays compensates for the negative charge of the RNA phosphate backbone 27 .In addition to that function, metal ions also bind to extremely specific locations on RNA 3D structures 27 .All of these findings reveal that IVS1nt5G>C significantly changes the mRNA structure.However, the protein structure of HBB still needs to be elucidated to understand the effects of the mutations on protein features after translation.
The mRNA plays important roles in protein regulation.RNAs start to form after the transcription process is completed.Once done, the structure of RNA is critical for its activities.Both IVS1nt5G>C and IVS1nt1G>T give implications to the clinical condition due to protein instability.There are some disruptions in producing beta globin chains by these mutations, hence the beta thalassemia major (BTM) patients produce beta globin chain in a limited number or even not at all.This mechanism results in unbalanced alpha and beta globin chains in hemoglobin that expedite the hemolysis and ineffective erythropoiesis contributing to reduced production of mature red blood cells 28 .Thus, the BTM patients requires lifelong blood transfusions and commonly suffer complications, such as endocrine complications and cardiac complications 29 .
This study examined the effects of mutations in terms of the minimum energy and the impact on the secondary and tertiary structures of HBB mRNA.Changes in the RNA formation may lead to disturbances of the protein structure.Protein expression can be influenced by mRNA folding, other local and global sequence features, and modulation of mRNA stability by codon usage 30 .Regarding the length of introns 1-2, the mutations occur in small RNA-affected regions where analysis is important for detection, annotation, quantification, and de novo discovery of putative small RNA molecules 31 .However, it will be interesting to clarify the effects of splicing, debranching, and trimming on the mRNA structure in future research.

Conclusion
Our research revealed the impact of intronic mutations on the secondary and tertiary structure of mRNA.Different point mutations occurring in the same IVS region were associated with unique characteristics.We have expanded our understanding of the thermodynamic aspects of these mutants.The limitations of this study included our inability to clarify the interaction of secondary structures due to the lack of an RNA analysis platform function for intronic mutations.RNA molecular simulation and wet lab experiments are required to elucidate the detailed features of the intronic mutants and their interactions with ions or ligands to further assess their functional failure, especially in genetic disease.

Zhichao Miao
European Bioinformatics Institute (EMBL-EBI), Cambridge, UK Although the authors have tried different approaches to improve the manuscript, some critical problems in 3D structure modeling still exist.I suggest removing the 3D structure modeling section and avoid discussing 3D folding.The major comments are listed below: The authors show MSA in Figure 1, but still do not show the regions that paired with the mutations.It is not necessary to show the complete MSA without annotating the secondary structure.It is better to show just the mutational regions together with the regions that form base pairs with them. 1.
Similar to RNAComposer, 3dRNA is not aiming at RNA structures longer than 200nt either.The validity of the 3D structure modeling is still problematic.There are several modes in 3dRNA.It is not clear which prediction mode the authors are using.It is difficult to believe such a long structure modeling in the fast assembly mode.

2.
According to Figure 5, the structures seem to show severe atomic clashes, which should not happen.Therefore, it is suggested to remove the 3D structure modeling section.

3.
Minor comments: It is better to show the mutational sites in Figure 2 (not just the regions).1.
The structures in Figure 5 are difficult to compare.It is better to show the stem-loop regions with different colors, while the colors in the three models correlate to each other.

2.
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: RNA structure prediction, single cell omics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for
The authors have made some changes according to the other reviewer but have not addressed my comments yet.

I would like to highlight the comments below:
It is better to support the computational method with experimental clues.

○
The authors should show the paired positions in Figure 1.
○ Figure 3, it is better to use multiple sequence alignment (MSA) to predict secondary structure and show the co-evolution.

○
For 3D structure prediction, it is difficult for an automatic method (e.g., RNAComposer) to handle a structure of more than 150nt without a good secondary structure assignment.This paper tries to use RNAComposer to predict a structure of >350nt.This is beyond the current capability of automatic prediction.From Figure 5, we can find severe atomic clashes, which are not expected in 3D structures.To improve, I suggest to focus on the mutational region, use the MSA-based 2D prediction result to predict 3D structure, and use energy minimization (e.g., Rosetta) to optimize the 3D structure.However, the only information from 3D structure prediction is the feasibility of the 2D structure rather than the functional inferences expected from this paper.

○
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: RNA structure prediction, single cell omics I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.
Figure 2 showed a subtle difference in MFE and other thermodynamics characteristics, is there any value that can be presented as a result of statistical analysis or based on best fitting model?

○
Figure 3 seems to have significant importance in terms of the structure stability (clinical consequence, for example).Any supporting literature?○ However, the author did not elaborate more on this.For example, IVS1nt5G>T mRNA structures that showed a significant difference (blue-circled).This needs to be discussed more.

○
Again, the value needs to be shown here: "In the secondary structure, the higher the number of the larger loops, the more unstable the structure".

○
Figure 5 seems to differ significantly between WT and the mutants, but the author stated otherwise?Except this statement: "excluding a small change as presented in the lower left part".This needs to be clarified and explain/discuss any structure-function relationship alteration.

Is the work clearly and accurately presented and does it cite the current literature?
Yes

Is the study design appropriate and is the work technically sound? Partly
Are sufficient details of methods and analysis provided to allow replication by others?Yes If applicable, is the statistical analysis and its interpretation appropriate?

Not applicable
Are all the source data underlying the results available to ensure full reproducibility?Yes

Are the conclusions drawn adequately supported by the results? Partly
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: My area of research is in protein engineering, protein-protein interactions/ biomolecules interactions/ structure-function relationships which include but not limited to DNA manipulation, DNA mutagenesis, cloning and expression, protein purification (various chromatography methods), protein characterization employing different biophysical methods (CD, MALDI-MS, FLuorescence Spectroscopy), Enzyme kinetics.My research interest is in the area of infectious diseases and drug discovery.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
Response: We show all the value of thermodynamic characteristics in the mountain plot to show the different on the every position paired.By using plot, we can easily compare the structures.For example, we can easily compare wild-type, IVS1nt1G>T, and IVS1nt5G>C for the MFE as shown in Figure 2(b) that shows IVS1nt5G>C has the lowest value compare with wild-type and IVS1nt1G>T.Therefore, the plot is sufficient to show subtle difference in thermodynamic characteristics.
Figure 3 seems to have significant importance in terms of the structure stability (clinical consequence, for example).Any supporting literature?
Response: We agree with this and have incorporated your suggestion throughout the manuscript."The principle of this methods is to create the secondary structure with minimal base pair distance to all other secondary structures in the Boltzmann ensemble 9 .""Both IVS1nt5G>C and IVS1nt1G>T give implications to the clinical condition due to protein instability.There are some disruptions in producing beta globin chains by these mutations, hence the beta thalassemia major (BTM) patients produce beta globin chain in a limited number or even not at all.This mechanism results in unbalanced alpha and beta globin chains in haemoglobin that expedite the haemolysis and ineffective erythropoiesis contributing to reduced production of mature red blood cells 28 .Thus, the BTM patients requires lifelong blood transfusions and commonly suffer complications, such as endocrine complications and cardiac complications 29 " However, the author did not elaborate more on this.For example, IVS1nt5G>C mRNA structures that showed a significant difference (blue-circled).This needs to be discussed more.
Response: We have updated our discussion for general alteration mainly happens in IVS1nt5G>C."The blue box in Figure 4 highlighted the main alteration formed within the three sequences.However, the structure of IVS1nt5G>C seems to be the best predicted according to the red color in the base-pairing probability." Again, the value needs to be shown here: "In the secondary structure, the higher the number of the larger loops, the more unstable the structure".

Zhichao Miao
European Bioinformatics Institute (EMBL-EBI), Cambridge, UK Nur and colleagues report a computational study of the beta-globin gene mRNA structure based on secondary and tertiary structure prediction.However, the prediction is not reliable enough to fully support the conclusion.Therefore, I suggest to keep the paper in record but not use it as a conclusion.Major and minor comments are listed here: Major comments: The paper is fully based on computational prediction.However, both the prediction approaches are not yet accurate enough to draw a conclusion without experimental validation.

○
For secondary structure prediction, MFE (RNAfold) only consider the cis-Watson-Crick base pair situation without pseudoknots.However, different sequences that can be aligned together are expected to have the same secondary structure.Using the sequence coevolution information may further improve secondary structure prediction accuracy.That is why ViennaRNA has RNAalifold to predict 2D structure from sequence alignment.Considering the modeling only includes point mutations, one cannot rule out the possibility that both wild type and mutants use the same structure but the mutation point use a base pair conformation different from cis-Watson-Crick (e.g., WC-sugar).In this case, the main topology of the structure should stay the same.The authors show the mutation in Figure 1, but do not show the positions paired with the mutational position.

○
According to the 2D structures predicted in Figure 3, the proposed structures are not likely to fold in 3D.To improve, I suggest: 1) Predict the existence of Pseudoknots; 2) Check if the mutants can use the same 2D structure according to the base pair; 3) Use multiple sequence alignment (MSA) to predict secondary structure and show the co-evolution.
For 3D structure prediction, it is difficult for an automatic method (e.g., RNAComposer) to handle a structure of more than 150nt without a good secondary structure assignment.This paper tries to use RNAComposer to predict a structure of >350nt.This is beyond the current capability of automatic prediction.From Figure 5, we can find severe atomic clashes, which are not expected in 3D structures.To improve, I suggest to focus on the mutational region, use the MSA-based 2D prediction result to predict 3D structure, and use energy minimization (e.g., Rosetta) to optimize the 3D structure.However, the only information from 3D structure prediction is the feasibility of the 2D structure rather than the functional inferences expected from this paper.

Minor comments:
The free energy calculation is not part of this paper.The authors should just cite the related publications rather introducing the ideas. 1.
The 3D structure prediction method should be spelt as "RNAComposer ". 2.  It could be good to share the predicted structure information to increase the reproducibility.

If applicable, is the statistical analysis and its interpretation appropriate? Not applicable
Are all the source data underlying the results available to ensure full reproducibility?Partly

Are the conclusions drawn adequately supported by the results? No
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: RNA structure prediction, single cell omics I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

The benefits of publishing with F1000Research:
Your article is published within days, with no editorial bias • You can publish traditional articles, null/negative results, case reports, data notes and more • The peer review process is transparent and collaborative • Your article is indexed in PubMed after passing peer review • Dedicated customer support at every stage • For pre-submission enquiries, contact research@f1000.com

Figure 1 .
Figure 1.Regional and sequence detail of mutant human beta-globin (HBB) gene.(A) Details of HBB showing the target mutations.(B) Sequence alignment of wild-type and mutant HBB was performed using ClustalW Multiple alignment in BioEdit version 7.2, mutations were highlighted in blue.

Figure 2 .
Figure 2. Mountain plots of the thermodynamic characteristics of the predicted RNA secondary structures of the wild-type and mutant HBB.(A) Centroid, (B) minimum free energy, and (C) partition function.

Figure 3 .
Figure 3. Centroid-based pair probability of the human beta-globin gene.(A) Wild-type, (B) IVS1nt1G>T, and (C) IVS1nt5G>C.Structural changes of RNA are highlighted by blue boxes.
/doi.org/10.5256/f1000research.28654.r70559© 2020 Miao Z.This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

:Figure 5
Figure 5 seems to differ significantly between WT and the mutants, but the author stated otherwise?Except this statement: "excluding a small change as presented in the lower left part".This needs to be clarified and explain/discuss any structure-function relationship

Figure
Figure 1B should include the base pair information (2D structure) of the alignment.3.

Figure 4
Figure 4 needs more explanation.Its meaning is not clear.5.