In silico prediction of structure and function for a large family of transmembrane proteins that includes human Tmem41b

Background: Recent strides in computational structural biology have opened up an opportunity to understand previously uncharacterised proteins. The under-representation of transmembrane proteins in the Protein Data Bank highlights the need to apply new and advanced bioinformatics methods to shed light on their structure and function. This study focuses on a family of transmembrane proteins containing the Pfam domain PF09335 ('SNARE_ASSOC'/ ‘VTT ‘/’Tvp38’/'DedA'). One prominent member, Tmem41b, has been shown to be involved in early stages of autophagosome formation and is vital in mouse embryonic development as well as being identified as a viral host factor of SARS-CoV-2. Methods: We used evolutionary covariance-derived information to construct and validate ab initio models, make domain boundary predictions and infer local structural features. Results: The results from the structural bioinformatics analysis of Tmem41b and its homologues showed that they contain a tandem repeat that is clearly visible in evolutionary covariance data but much less so by sequence analysis. Furthermore, cross-referencing of other prediction data with covariance analysis showed that the internal repeat features two-fold rotational symmetry. Ab initio modelling of Tmem41b and homologues reinforces these structural predictions. Local structural features predicted to be present in Tmem41b were also present in Cl -/H + antiporters. Conclusions: The results of this study strongly point to Tmem41b and its homologues being transporters for an as-yet uncharacterised substrate and possibly using H + antiporter activity as its mechanism for transport.


Introduction
A protein's structural information is crucial to understand it's function and evolution. Currently, there is only experimental structural data for a tiny fraction of proteins (Khafizov et al., 2014). For instance, membrane proteins are encoded by 30% of the protein-coding genes of the human genome (Almén et al., 2009), but they only have a 3.3% representation in the Protein Data Bank (PDB) (5785 membrane proteins out of 174507 PDB entries). Membrane protein families are particularly poorly understood due to experimental difficulties, such as over-expression, which can result in toxicity to host cells (Grisshammer & Tateu, 1995), as well as difficulty in finding a suitable membrane mimetic to reconstitute the protein. Additionally, membrane proteins are much less conserved across species compared to water-soluble proteins (Sojo et al., 2016), making sequence-based homologue identification a challenge, and in turn rendering homology modelling of these proteins more difficult. Membrane proteins can be grouped according to their interaction with various cell membranes: integral membrane proteins (IMPs) are permanently anchored whereas peripheral membrane proteins transiently adhere to cell membranes. IMPs that span the membrane are known as transmembrane proteins (TMEMs) as opposed to IMPs that adhere to one side of the membrane (Fowler & Coveney, 2006). Membrane proteins also include various lipid-modified proteins (Resh, 2016).
One IMP protein family is Tmem41, which has two human representatives, namely Tmem41a and Tmem41b; both share the PF09335 ('SNARE_ASSOC'/ 'VTT '/'Tvp38'/'DedA') Pfam (El-Gebali et al., 2019) domain. The profile of Tmem41b has recently risen due to experimental evidence pointing to its involvement in macroautophagy regulation (making it a possible Atg protein, i.e. an autophagy related protein) and lipid mobilisation (Moretti et al., 2018). Other studies identify Tmem41b to be involved in motor circuit function, with TMEM41B-knockout Drosophila showing neuromuscular junction defects and aberrant motor neuron development in knockout zebrafish (Lotti et al., 2012). Also, it has been reported that in TMEM41B-knockout HeLa cells there is an inhibition of Zika virus replication (Scaturro et al., 2018). Tmem41b has also been identified as a host cell factor for SARS-CoV-2 (Schneider et al., 2020). Tmem41b is the only common host cell factor identified for flaviviruses and coronaviruses and is the only autophagy-related protein identified as a viral host factor (Hoffmann et al., 2021).
Additionally, Tmem41b has been shown to be essential for mouse embryonic development: homozygous knockout mice embryos suffer early termination of their development after Phenotypically, DedA knock-out E. coli cells display increased temperature sensitivity, cell division defects, activation envelope stress pathways, compromised proton motive force, sensitivity to alkaline pH and increased antibiotic susceptibility

Amendments from Version 1
Input from the referees led to the conclusion that the re-entrant PDBTM screen needed to be reimplemented; the use of re-entrant loop sequences in order to perform the screen may not be appropriate due to the poor sequence similarity between the re-entrant loops with a view that a structural comparison being more informative. Subsequently, pdb structures of the loops were used for the clustering exercise. The boundaries for the experimentally determined structures were extracted from the PDBTM and the boundaries for the models were predicted using the OMP server. As this investigation focused on re-entrant loops that are immediately proceeded by a TMhelix that is packed with the re-entrant loop, all re-entrant loops in addition to the proceeding 30 residues were extracted from a non-redundant re-entrant loop containing subset of the PDB. The resulting 193 library entries, supplemented with the reentrant loop features from the ab initio models, underwent an all-against-all structural alignment utilising Dali. The Z-scores for these alignments were then used to cluster all the structures. The reimplemented screen resulted in the query re-entrant loop feature structures clustering with the re-entrant loop features of Cl -/H + antiporters; this was a similar result to the original sequence-based clustering.
An additional figure has been added to the manuscript showing a multiple sequence alignment for a selection of DedA domain proteins. The alignment has been annotated to highlight the relative positions of the DedA and the PF09665 domains as well as the re-entrant loop positions for the example DedA proteins that were modelled.
The amphipathic helix prediction test paragraph in the results section has been re-written for the purpose of clarity.
Finally, in addition to the correction of typographical errors, the citations have been updated as recommended by the referees as well as to reflect the changes in the experimental procedure.
Any further responses from the reviewers can be found at the end of the article REVISED (Doerrler et al., 2013;Keller et al., 2014). As E. coli expresses multiple DedA homologues, lethal effects are not observed as long as at least one DedA is expressed (Kumar & Doerrler, 2014;Thompkins et al., 2008). Borrelia burgdorferi contains only one DedA protein in its genome and knockout cells display the same phenotype as the E. coli knockout strains. The B. burgdorferi homologue is indeed essential (Liang et al., 2010). Interestingly, E. coli knockout cells can be rescued with the B. burgdorferi homologue that shows only 19% sequence identity with YqjA. The functions of DedA have also been studied in the pathogen Burkholderia thailandensis where one family member was found to be required for resistance to polymyxin (Panta et al., 2019).
Until the structure of poorly characterised protein families such as Pfam family PF09335 can be elucidated experimentally, ab initio protein modelling can be used to predict a fold allowing for structure-based function inferences (Rigden et al., 2017). Such methods have made significant strides recently due to the availability of contact predictions (Kinch et al., 2016). Prediction of residue-residue contacts relies on the fact that each pair of contacting residues covaries during evolution. The process of co-variation occurs as the properties of the two residues complement each other in order to maintain structural integrity of that local region and, consequently, its original functionality. Therefore, if one residue from the pair is replaced, the other must also change to compensate the physical chemical variation and hence preserve the original structure (Lapedes et al., 1999). The link between two residues can be then reliably detected in multiple sequence alignments by using direct coupling analysis (Morcos et al., 2011) as well as machine learning algorithms (Wu et al., 2020). The predicted contacts can be used for a range of analyses such as the identification of domain boundaries (Rigden, 2002; Simkovic et al., 2017a), but their main application is for contact-based modelling methods which can address larger targets than conventional fragment-assembly-based ab initio methods (Yang et al., 2020). Contact-based modelling methods have been proven successful previously in modelling membrane proteins (Hopf et al., 2012).
In the current study, we first linked the Pfam PF09335 family to the PF06695 family and chose a conveniently small Archaeal sequence and then utilised state of the art methods to make structural predictions for not only the Archaeal sequence but also for two prominent members of the Pfam family PF09335 (Tmem41b and YqjA) by exploiting data derived from sequence, evolutionary covariance and ab initio modelling. We are able to predict that both PF09335 homologues (DedA proteins) and PF06995 homologues contain re-entrant loops (stretches of protein that enter the bilayer but exit on the same side of the membrane) as well as a pseudo-inverted repeat topology. The predicted presence of both of these structural features strongly suggests that DedA proteins are secondary active transporters for an uncharacterised substrate. Dataset for custom re-entrant database A library of re-entrant loop pdb structures together with the putative re-entrant loop structures from the query protein models were clustered on their structural similarity. The library was built by obtaining a non-redundant (removing redundancy with a 40% sequence identity threshold) set of 125 chains from the PDBTM (RRID:SCR_011962) (Kozma et al., 2013) that contain at least one re-entrant loop. As this investigation focuses on re-entrant loops that are immediately preceded by a TM helix that is packed against the loop, all re-entrant loops (boundaries defined by PDBTM) in addition to the preceding 30 residues were extracted. The resulting 193 library entries (https://figshare.com/articles/dataset/repository_zip/14055212), supplemented with the re-entrant loop features (defined by the OMP server (Lomize et al., 2012) and accompanied by the preceding 30 residues) from the ab initio modelling underwent an all-against-all structural alignment using a local installation of Dali v4.0 (Holm & Laakso, 2016). The Z-scores for these alignments were then used for clustering with CLANS v1.0 (Frickey & Lupas, 2004) with a Z-score of 4.5 used as the cut-off threshold.

Model building
Ab initio models were built using the trRosetta (Yang et al., 2020) server with default settings. Conservation was mapped on to the models using the ConSurf server (Ashkenazy et al., 2016). Visualisation of models was achieved using PyMOL (RRID:SCR_000305) v2.3.0 (DeLano, 2002).  Figure 2). Both the HHpred and contact density results therefore pointed to a specific domain structure being present.
Sequence & contact prediction map analysis indicate that PF06695 is made up of a tandem repeat When the Mt2055 sequence was split at residue 60-61, the resulting N-terminal region of 60 residues and the C-terminal section of 79 residues could be aligned using HHalign (Soding, 2005) with a 78% probability and an E-value of 1.9E-3. Examination of the map of predicted contacts for Mt2055 reveals features that are present in both the N-and C-terminal halves of the protein (Figure 2c). Taken together, these data strongly support the existence of a tandem repeat within the Mt2055 protein and hence across the PF06695 and PF09335 protein families.
Interestingly, an equivalent sequence analysis with HHpred of other PF09335 homologues including Tmem41b itself does not reveal a repeat. However, inspection of their corresponding predicted contact maps does reveal features repeated when N-and C-halves of the protein are compared ( Figure 3). Apparently, evolutionary divergence has removed all trace of the repeat sequence signal in bacterial and eukaryotic proteins, although the feature remains visible by evolutionary covariance analysis.    not reflect the conserved structural domain that we predict. Given the fact that the available ab initio models were inconsistent with the transmembrane helix, secondary structure and contact predictions, we constructed our own models of Mt2055 as well as Tmem41b and YqjA with trRosetta. (https:// figshare.com/articles/dataset/repository_zip/14055212) The Mt2055, Tmem41b and YqjA models had estimated TM scores from the trRosetta server of 0.633, 0.624 and 0.635 respectively, suggesting that they were likely to have captured the native fold of the family. All-against-all pairwise structural superposition of the models with DALI gave a mean Z-score of 11.9 confirming their strong similarity. We also used satisfaction of predicted contacts to validate the models ( The presence of a re-entrant loop packed against each TM helix can also be seen on predicted contact maps for these proteins ( Figure 4b). Interestingly, each of the re-entrant helices are predicted as a single transmembrane region in the TopCons predictions. When cross-referenced with the PSIPRED secondary structure prediction it is noted that there is a predicted two-residue region of coil around the mid-point of the first TM helix prediction. A similar observation can be made for the fourth TM helix prediction with the equivalent coil region being six residues in length (see the diagonal of Figure 4b) Such a prediction would more obviously be treated as indicative of some kind of kink in the helix (Law et al., 2016) but the explanation here is that these regions form re-entrant helices. Similar contact map features, indicative of re-entrant loops packing against TM helices, can be seen clearly on the contact maps of other DedA proteins (data not shown). The MSA in Figure 1 shows the relative positions of the re-entrant loops in their respective sequences.
In order to test for test whether the membrane-parallel helices (green in Figure 3) were amphipathic, an analysis of helical wheel diagrams for the fifteen residues preceding the putative re-entrant loops was performed with HELIQUEST (Gautier et al., 2008). The quantitative measures of the hydrophobic moment for the regions being analysed ( Figure 5) support that they are indeed amphipathic helices. The hydrophobic moments ranged from 0.298 to 0.546.
The predicted presence of the amphipathic-re-entrant loop-TM helix features in DedA domain proteins prompted a desire to map sequence conservation on to the ab initio models. Using the Consurf server to perform the mapping of sequence conservation onto the query models, it revealed that the re-entrant loop sequences are highly conserved. The high sequence conservation of re-entrant loops indicate that they are likely to be functionally and/or structurally important ( Figure 6).

Re-entrant loops are also present in Cl -/H + Antiporters
The presence of re-entrant loops and the high density of conserved residues within them caused us to examine experimentally characterised re-entrant loops in the PDBTM database. A total of 193 non-redundant re-entrant helices were identified (see Methods). All 193 were clustered with the putative re-entrant loops from Mt2055, Tmem41b and YqjA relative z-scores derived from an all-against-all DALI run and subsequently clustered in CLANS (Frickey & Lupas, 2004) with a z-score cut-off of 4.5.
As expected all six re-entrant structures from the query models clustered together. The CLC transporter re-entrant structures of 3orgA (re-entrant 1 and re-entrant 2), 7bxu and 5tqq also clustered with the queries. Additionally, the re-entrant structure from an Undecaprenyl pyrophosphate phosphatase (UppP) (6cb2) also clustered with the queries. UppP is an integral membrane protein that recycles lipid and has structural similarities to CLC transporters (Workman et al., 2018). Contact maps derived from the pdb files of CLC and UppP structures show the contact map signature corresponding to the re-entrant/TM helix structural feature. Interestingly, the UppP is more similar to the query proteins being only 271 residues in length and having only 6 TM helices.    Figure 8). Re-entrant loops are known to form pores and here we have two proton-titratable residues (E39, D51) in close proximity to essential basic residues (R130 and R136) within a putative pore. This three-dimensional arrangement of key residues could serve a role in the coupling of the protonation status with the binding of a yet to be characterised substrate as is postulated for the multi-drug H + antiporter MdfA (Heng et al., 2015) where these same residues are located inside a central cavity.

Conclusions
This study demonstrates how covariance prediction data have multiple roles in modern structural bioinformatics: not just by acting as restraints for model making and serving for validation of the final models but by predicting domain boundaries and revealing the presence of cryptic internal repeats not evidenced by sequence analysis. Furthermore, we characterised a contact map feature characteristic of a re-entrant helix which may in future allow detection of this feature in other protein families.
Sequence, co-variance and ab initio modelling analyses show that the Pfam PF09335 and PF06695 domains are distantly homologous. These domains contain a structural core composed of a pseudo-inverse repeat of an amphipathic helix, a re-entrant loop and a TM helix. All PF09335 homologues contain this central core with additional TM-helices flanking either side.
Querying the models against the PDB using Dali did not yield any significant hits. However, analysis of the prediction data revealed two features of DedA proteins that independently suggest that they are secondary transporters: both an inverted repeat architecture and the presence of a re-entrant loop, which are both independently and strongly associated with transporter function (   Additionally, the fact that DedA proteins show structural similarities with H + antiporters indicate that these proteins may also couple substrate transport with an opposing H + current. Indeed, the YqjA homologue also contains strategically placed residues known to be involved in H + antiporter activity.
The ab initio models show that the essential residues come together in the region that would be buried in the membrane potentially forming a substrate chamber consistent with the transport of a specific substrate. Further research needs to be carried out to determine what this substrate is and confirm the mechanism of transport.

Open Peer Review
However, two points are unclear in the text and need to be clarified, plus, some minor changes will improve the readability of the manuscript. Finally, the models could be made available to ensure full reproducibility.

Major:
The sentence: "The analysis was performed by HELIQUEST (Gautier et al., 2008) which constructed helical wheel diagrams and provided a quantitative measure of the hydrophobic moment for the region being analysed (Figure 4)." is out of context. In that paragraph are described the reentrant helices, shouldn't the sentence (and the figure) be in the paragraph before where are mentioned the amphipathic helices? The figure discussion in the text should be extended. ○ We agree. Colouring on the B-factor column directly produces the results we show but the new blue-purple spectrum does seem to be well-adopted. We have therefore replaced the figure and updated the legend to read

László Dobson
Institute of Enzymology, Research Centre for Natural Sciences, Budapest, Hungary

Gábor Tusnády
Institute of Enzymology, Research Centre for Natural Sciences, Budapest, Hungary In this manuscript Mesdaghi et al. describe the in silico structure modeling of three homologous integral membrane proteins Mt2055, Yqja and human Tmem41b. Structure determination of transmembrane proteins lacks behind globular ones for several reasons, giving space for computational tools. The proper use of these tools may unveil important structural aspects of transmembrane proteins but interpretation of results of such analysis should be done carefully.
While the generated models in the manuscript are interesting and might be fully or partly true, the sequence analysis and interpretation of the results are problematic.
Major: -The authors should be more specific about the exact boundaries of Pfam domains in different proteins as well as the sequence relations of proteins presented in Table 1. Please provide multiple sequence alignment for these proteins indicating the localization of the two pfam domains and the proposed re-entrant loops/transmembrane regions in the sequences.
-The authors propose Mt2055 contains a tandem repeat and suggest the duplication is present in Tmem41b and Yqja structure as well even if it is undetectable from sequence analysis. The proposed domain boundary in Figure1a and arguments for tandem duplication does not seem convincing. The e-value of 1.9E-3 is quite large for the alignment. The authors should rule out that results in their paper may occur purely by chance. Please test the statistical significance of this value by generating pairwise alignments of transmembrane regions of unrelated transmembrane proteins with similar length. Moreover, contact maps for Mt2055 and Tmem41b were generated from the same multiple alignment, and therefore they must be identical/similar. Thus the similarities does not prove the tandem duplication occurred in Tmem41b too.
-Structure modeling of membrane proteins is somewhat different from globular ones for several reasons. It is highly recommended to use specific software for this task or argue why used a nonspecific one. On one hand, in general, topology prediction is more accurate than structure modeling and should be used as an input to aid the modeling. The reviewer is not sure the result of a standard ab initio structure modeling program is sufficient to question topology prediction results. On the other hand, topology prediction results are different for Tmem41b (6 TM helix) and Mt2055 (4 TM helix). Notably, other consensus topology method (CCTOP) have a similar result for Mt2055 (4 helix), but different for Tmem41b (6 helix). Using a third method (Octopus) a re-entrant loop is predicted. The authors should elaborate on such results instead of picking one method and running it on only one of the sequences.
-Authors state: "For many of the subsequent analyses, the shorter archaeal sequence was used initially but the clear homology among this set of proteins means that inferences can be drawn across the group." -Please provide the used multiple sequence alignment with pairwise similarities to support this statement.
-It is not clear how helical wheels and hydrophobic moments support the manuscript -please provide a better description or omit these results.
-Problems/Validation of re-entrant loops: The authors selected 56 sequence regions from PDBTM database and run an all-against-all Blast search and create clusters based on the search results. Since the sequence complexity of membrane regions are lowest than regions of globular proteins, the analysis should be repeated on randomly selected transmembrane segments. Please provide the list of the selected 56 re-entrant loops together with the results of the repeated analysis.
○ Authors state: "The presence of a re-entrant loop packed against each TM helix can also be seen on predicted contact maps for these proteins (Figure 3b)." Re-entrant loops cannot be seen on contact map, only parallel and anti-parallel structures. A similar contact map can be easily generated from 3 transmembrane helices (1 parallel pair and two anti-parallel ones).

○
The authors filtered removing any sequences of less than 10 residues and more than 20.
Although the exact sequence localisation and length of the predicted re-entrant loop are not provided, the regions indicated as the "sign" of re-entrant loops on Figure 3b is larger than 20 residues and on the structures the orange regions contain 7 turns, thus the sequence length of them should be more than 20 residues (7*3.5=24.5). Minor: Abstract/Results: "The results from the structural bioinformatics analysis of Tmem41b and its homologues showed that they contain a tandem repeat that is clearly visible in evolutionary covariance data but much less so by sequence analysis." -As I showed above, this statement might not be true. Moreover evolutionary covariance data is the results of sequence analysis, so this sentence is void of sense. Please rephrase. Introduction: "In the current study, we utilised state of the art methods to make structural predictions for two prominent members of the Pfam family PF09335 (Tmem41b and Yqja) by exploiting data derived from sequence, evolutionary covariance and ab initio modelling." -The most part of the manuscript deal with the sequence analysis of Mt2055, please rephrase this sentence in order to mirror this fact.
○ "Interestingly, each of the re-entrant helices is predicted as a single transmembrane region in the TopCons predictions (see the diagonal of Figure 3b) with a two-residue region of coil in the centre." -TOPCONS does not predict coils and such details cannot be seen on the figure -please clarify this sentence.

If applicable, is the statistical analysis and its interpretation appropriate? Partly
Are all the source data underlying the results available to ensure full reproducibility? No Table 1. Please provide multiple sequence alignment for these proteins indicating the localization of the two pfam domains and the proposed re-entrant loops/transmembrane regions in the sequences.

Major: -The authors should be more specific about the exact boundaries of Pfam domains in different proteins as well as the sequence relations of proteins presented in
Yes agreed, we agree that a MSA is useful. An MSA generated using PSI/TM-COFFEE has been added as Figure 1 to the manuscript. The Pfam domains in questions well as the putative reentrant loops for the modelled proteins have been highlighted to illustrate their relative positions.

Utilisation of HHalign does result in an e-value of 1.9E-3 which on its own is not compelling.
However, as highlighted, HHalign also expressed a probability score (a measure of statistical significance) of 78% which the software developers argue is a better indicator of significance than the e-value alone. Arguably the above findings alone do not provide absolutely conclusive evidence of the presence of a repeat. However, reinforcing these findings we have the repeat that is revealed by the plotting of the predicted contacts and, consequently, the inverse repeat that is witnessed by the modelling.
Moreover, contact maps for Mt2055 and Tmem41b were generated from the same multiple alignment, and therefore they must be identical/similar. Thus the similarities does not prove the tandem duplication occurred in Tmem41b too.
Interesting point: however, we can confirm that the MSAs used to generate the contacts maps for Mt2055 and Tmem41b were not identical. The MSAs constructed by DMP were constructed independently using HHblits against the Uniprot database. The manuscript used the predicted contacts from the server and the MSAs generated to make the contact predictions are not made available to download with the results. However, performing the contact prediction locally and utilising the same HHblits settings as the DMP server generates MSAs with 5000 sequences for each of the query proteins. The predicted contact maps are very similar to those presented in the paper yet analysis reveals that the MSAs had only 1010 sequences in common.
-Structure modeling of membrane proteins is somewhat different from globular ones for several reasons. It is highly recommended to use specific software for this task or argue why used a non-specific one.
At the beginning of this project we had similar thoughts to you, therefore initially Rosetta membrane was utilised to build the models. However, the membrane protocol 'forced' TM helices where it was later clear from contact map analysis that re-entrant loops should be present. Therefore, it was decided that contact restrained modelling software with proven success in regard to ab initio modelling of membrane proteins was used. Both DMPfold (local & server) as well as the trRosetta (server) models were constructed and similar folds were observed. We note that the DMPfold paper benchmarked using transmembrane protein as explicitly says it 'works just as well for transmembrane proteins.' (https://www.nature.com/articles/s41467-019-11994-0 ). The trRosetta method was benchmarked against CASP13 targets which included transmembrane proteins (https://onlinelibrary.wiley.com/doi/abs/10.1002/prot.25775). The use of these covariance-based methods for membrane proteins has a long history so the following citation has been included in the revised manuscript; Hopf, T. A., Colwell, L. J., Sheridan, R., Rost, B., Sander, C., & Marks, D. S. (2012). Threedimensional structures of membrane proteins from genomic sequencing. Cell, 149 (7), 1607-1621.
On one hand, in general, topology prediction is more accurate than structure modelling and should be used as an input to aid the modelling. The reviewer is not sure the result of a standard ab initio structure modelling program is sufficient to question topology prediction results. On the other hand, topology prediction results are different for Tmem41b (6 TM helix) and Mt2055 (4 TM helix). Notably, other consensus topology method (CCTOP) have a similar result for Mt2055 (4 helix), but different for Tmem41b (6 helix). Using a third method (Octopus) a re-entrant loop is predicted. The authors should elaborate on such results instead of picking one method and running it on only one of the sequences.
The different membrane topology prediction tools were used initially to predict the TMhelix boundaries for the query proteins. We observed the same between the results of the different methods as yourself. It was the variability of the topology predictions in addition to the contact map features that led to the conclusion that something other than straightforward TM helices is present in the Pfam domains in question. Indeed, TMHMM does show lower probability TMhelix predictions for the regions that the contact prediction and model making predict to be re-entrant loops.
To investigate further, visual representations of the membrane topology from TopCons and the psipred secondary structure prediction were plotted along the diagonal of the contact prediction for the query proteins. This clearly highlights that the N and C halves of the predicted TM helices in question are making contact with each other (by a length of around 10 residues). Additionally, the secondary structure plot shows an interruption at the halfway point of the predicted TM helices which would account for the abrupt change in direction of helix in the membrane.
Additionally, we have identified a crystal structure that is comparable in terms of size (293 residues) and has common structural features (inverted repeat with 2 re-entrant/TMhelix structures) to our query proteins; 6cb2. For this protein, the TopCons topology prediction was compared to the actual topology of the crystal structure. -Authors state: "For many of the subsequent analyses, the shorter archaeal sequence was used initially but the clear homology among this set of proteins means that inferences can be drawn across the group." -Please provide the used multiple sequence alignment with pairwise similarities to support this statement.

The above figure shows actual contacts for 6cb2 (black points) and a visual representation of the
A multiple sequence alignment is now provided as Figure 1 and we note that all query sequences share the same Pfam domains so their homology is assured.
-It is not clear how helical wheels and hydrophobic moments support the manuscriptplease provide a better description or omit these results.
- Yes, we are in agreement with you, the paragraph did seem out of place as well as unfinished. In response we have re-worked the paragraph in question providing more clarity and analysis for the amphipathic analysis of the queries; "In order to test for the presence of the amphipathic helices, an analysis of helical wheel diagrams for the fifteen residues preceding the putative re-entrant loops was performed with HELIQUEST ( Gautier et al., 2008). Yes we are in agreement with you; the imposition of the re-entrant loop boundaries for the ab initio models were relatively arbitrary. Therefore, the re-entrant loop screen against the PDBTM has been completely re-implemented. Membrane boundaries for the models have now been predicted using the OMP server. These boundaries provide the lengths of the putative re-entrant loops. Consequently it is now recognised that the 20 residue 'typical length' of re-entrant loops may not be valid for the query models and the filtering of the larger loops for the clustering stage of this research had weak justification. A list of the 125 chains from which the re-entrant structures were extracted from will be made available in a repository.

•
Authors state: "The presence of a re-entrant loop packed against each TM helix can also be seen on predicted contact maps for these proteins (Figure 3b)." Re-entrant loops cannot be seen on contact map, only parallel and anti-parallel structures. A similar contact map can be easily generated from 3 transmembrane helices (1 parallel pair and two anti-parallel ones).
Yes, a similar contact map feature can be easily generated from 3 transmembrane helices, however, this would result in a box feature of around 20x20 residues (and obviously reflected in the diagonal). Since the re-entrant loop is making contact with itself this can only result in an approximately 10 residue antiparallel feature on the contact map. Only approximately half of the TM helix that is packed with the re-entrant helix will be making contact with the re-entrant loop, therefore, this would result in an additional 10 residue antiparallel feature in addition to a 10-residue parallel feature. Together with the diagonal these will display an approximately 10x10 box feature (also reflected in the diagonal) on the contact map rather than the 20x20 box feature that three transmembrane helices (1 parallel pair and two anti-parallel ones) would produce. This can be seen below; Minor: • Abstract/Results: "The results from the structural bioinformatics analysis of Tmem41b and its homologues showed that they contain a tandem repeat that is clearly visible in evolutionary covariance data but much less so by sequence analysis." -As I showed above, this statement might not be true. Moreover evolutionary covariance data is the results of sequence analysis, so this sentence is void of sense. Please rephrase. We do not agree with this statement as sequence comparisons and co-variance comparisons are alternative methods to identify tandem repeats. Yes, co-variance is derived from sequence analysis: however, co-variance data contains information that may not be present in data acquired from conventional sequence analysis. Thank you, corrected.

•
Introduction: "In the current study, we utilised state of the art methods to make structural predictions for two prominent members of the Pfam family PF09335 (Tmem41b and Yqja) by exploiting data derived from sequence, evolutionary covariance and ab initio modelling." -The most part of the manuscript deal with the sequence analysis of Mt2055, please rephrase this sentence in order to mirror this fact.
Thank you, the introduction has been updated to reflect the emphasis on the PF09665 Pfam domain and its representative Mt2055; 'In the current study, we first linked the Pfam PF09335 family to the PF06695 family and chose a conveniently small Archaeal sequence and then utilised state of the art methods to make structural predictions for not only the Archaeal sequence but also for two prominent members of the Pfam family PF09335 (Tmem41b and Yqja) by exploiting data derived from sequence, evolutionary covariance and ab initio modelling. We are able to predict that both PF09335 homologues (VTT proteins) and PF06995 homologues contain re-entrant loops (stretches of protein that enter the bilayer but exit on the same side of the membrane) as well as a pseudoinverted repeat topology. The predicted presence of both of these structural features strongly suggests that VTT proteins are secondary active transporters for an uncharacterised substrate.' • "Interestingly, each of the re-entrant helices is predicted as a single transmembrane region in the TopCons predictions (see the diagonal of Figure 3b) with a two-residue region of coil in the centre." -TOPCONS does not predict coils and such details cannot be seen on the figure -please used, their prediction do align with published experimental work. The manuscript is well written and informative, but with a number of factual errors. I also suggest additional citations.
I would like to begin with nomenclature. I received an email from Dr. Noburo Mizushima several months ago. He has published work on the TMEM41B protein. Also included on the email was Lucy Forrest, Dirk Schneider, and Rebecca Keller. It was Dr. Mizushima's suggestion to name this protein family the "DedA superfamily" that includes both prokaryotic and eukaryotic proteins (DedA, VMP, and TMEM41 families). Accordingly, the shared domain will be called "DedA domain" and "VTT" domain would no longer be used. All recipients of this email agreed to using this nomenclature moving forward. Therefore, to avoid confusion, I would like the authors to adopt this nomenclature. I can forward the email upon request.
Since the manuscript contains no line numbers, I will list the suggested corrections by paragraph: Introduction: Paragraph 1: Formally, "membrane proteins" also include various lipid-modified proteins of both prokaryotes and eukaryotes in addition to integral and peripheral membrane proteins. Paragraph 4: "DedA" does not stand for "death effector domain". It was named in a 1987 paper 1 . See page 12213 of that article. I would like to see this article cited as well for historical purposes.
The sentence that begins with "Phenotypically, DedA knockout E. coli…" should instead read "Phenotypically, E. coli lacking both yqjA and yghB (encoding proteins with 60% amino acid identity and partially overlapping functions)…." This paragraph should also cite 2 .
The sentence that reads "As E. coli expresses multiple DedA homologues, the redundancy protects the cells from the phenotypical effects of single or multiple knock-outs as long as at least one DedA is expressed" should read "As E. coli expresses multiple DedA homologues, lethal effects are not observed as long as at least one DedA is expressed". Cite the following article 3 .
You may also point out that the sole DedA family gene in Borrelia burgdorferi is indeed essential 4 .
YqjA is misspelled "YdjA" The sentence "Attempts to rescue…." Should be removed, as it does not make sense.
The final sentence about Pseudomonas cites a non-peer reviewed proceeding abstract. I would like all citations to "Justice et al. 2016" removed from this article. This sentence can be replaced with the equally effective "The functions of DedA have also been studied in the pathogen Burkholderia thailandensis where one family member was found to be required for resistance to polymyxin" 5 . Paragraph 6: "YqjA" is spelled "Yqja" here and throughout the manuscript and should be corrected. This includes in Table 1.

Methods:
Paragraph 1: Please spell "Ydjx" and other bacterial proteins as "YdjX" with the final letter capitalized.

Results and discussion:
Paragraph 5: first sentence, remove "however". Paragraph 14: "A possible role for VTT proteins" final sentence remove "Justice et al." and instead cite 6,7 . Also, in this sentence, define "SDM" as "site directed mutagenesis".
Paragraph 15, first sentence. This statement is incorrect. Mutation of D51, E39, R130 or R136 in YqjA resulted in properly folded (membrane localized) but nonfunctional proteins unable to complement alkaline pH sensitivity of E. coli YqjA mutant and antibiotic sensitivity of YqjA/YghB double mutant.
Finally, another interesting example of a membrane protein antiporter with re-entrant helices is the undecaprenyl pyrophosphate phosphatase UppP. It is up the authors if they would like to cite these articles 8,9 .
Thank you, this omission has been rectified; 'Membrane proteins can be grouped according to their interaction with various cell membranes: integral membrane proteins (IMPs) are permanently anchored whereas peripheral membrane proteins transiently adhere to cell membranes. IMPs that span the membrane are known as transmembrane proteins (TMEMs) as opposed to IMPs that adhere to one side of the membrane ( Fowler & Coveney, 2006). Membrane proteins also include various lipid-modified proteins (Resh, 2016).' Paragraph 4: "DedA" does not stand for "death effector domain". It was named in a 1987 paper 1 . See page 12213 of that article. I would like to see this article cited as well for historical purposes. Your clarification on this nomenclature is appreciated. We have amended the manuscript to reflect this error and included the reference for its historical importance.
The sentence that begins with "Phenotypically, DedA knockout E. coli…" should instead read "Phenotypically, E. coli lacking both yqjA and yghB (encoding proteins with 60% amino acid identity and partially overlapping functions)…." This paragraph should also cite 2 .

○
The sentence that reads "As E. coli expresses multiple DedA homologues, the redundancy protects the cells from the phenotypical effects of single or multiple knock-outs as long as at least one DedA is expressed" should read "As E. coli expresses multiple DedA homologues, lethal effects are not observed as long as at least one DedA is expressed". Cite the following article 3 . where one family member was found to be required for resistance to polymyxin" 5 .

○
Thanks for these helpful suggestions. The inclusion of these points brings more clarity to the paragraph in question; "Borrelia burgdorferi contains only one DedA protein in its genome and knockout cells display the same phenotype as the E. coli knockout strains. The B. burgdorferi homologue is indeed essential (Liang et al., 2010). Interestingly, E. coli knockout cells can be rescued with the B. burgdorferi homologue that shows only 19% sequence identity with YqjA. The functions of DedA have also been studied in the pathogen Burkholderia thailandensis where one family member was found to be required for resistance to polymyxin (Panta et al., 2019)." YqjA is misspelled "YdjA" ○ Paragraph 6: "YqjA" is spelled "Yqja" here and throughout the manuscript and ○