A recall-by-genotype study on polymorphisms in the TMPRSS6 gene and oral iron absorption: a study protocol [version 2; peer review: 2 approved with reservations]

Background : Oral iron supplementation is commonly used to treat and prevent anaemia. The transmembrane protease serine 6 gene ( TMPRSS6 ) , which encodes matriptase 2, is a negative regulator of hepcidin, the key controller of iron homeostasis. Genome-wide association studies (GWAS) have identified several single nucleotide polymorphisms (SNPs) in the TMPRSS6 gene that are associated with an increased risk of iron-deficiency anaemia. We will investigate the in vivo effects of three previously reported TMPRSS6 variants (rs855791, rs4820268 and rs2235321) on oral iron absorption in non-anaemic volunteers in The Gambia. Methods: A recall-by-genotype study design will be employed. Pre-genotyped participants will be recruited from the West African BioResouce (WABR), which currently contains over 3000 genotyped individuals. Male and female volunteers will be selected based on polymorphisms (rs855791, rs4820268 and rs2235321) in the TMPRSS6 gene in the Gambian population. The effects of a single variant allele at one SNP and the additive effect of two or three variant alleles from either two or all three SNPs will be investigated. Study participants will be given Abbreviations AGP: alpha-1-acid glycoprotein, CRP: c-reactive protein, blood glucose-6-phosphate dehydrogenase, genome-wide association study, iron-refractory iron deficiency West Longitudinal Population The study protocol aims to test associations between three variants (rs855791, rs4820268 and rs2235321) in TMPRSS6 gene and serum iron concentration after an oral iron supplementation. While this study is interesting and the findings are of potential importance, there are some concern/suggestions highly recommended to be addressed by the authors. the authors should consider collecting these measures and incorporating them more complex regression


Introduction
Despite aggressive implementation of iron supplementation programs, either alone or in combination with food-based supplementation, the prevalence of anaemia remains high in low-and middle-income countries 1,2 . The World Health Organisation (WHO) has set 2050 as a target date by which the current anaemia burden will be reduced by half. In order to achieve this goal, it will be important to identify the major drivers of anaemia.
The transmembrane protease serine 6 gene (TMPRSS6), which encodes for matriptase-2, is one of the negative regulators of hepcidin 3 , the key iron homeostasis regulator 4 . When serum iron levels are low, matripase-2 suppresses hepcidin expression, allowing more iron from the diet to be absorbed through the intestines into the bloodstream 5,6 . A single nucleotide polymorphism (SNP) in the TMPRSS6 gene can lead to decreased expression or inactivation of matripase-2 7 , which would then lead to inappropriately elevated hepcidin levels, inhibited iron absorption and would thereby result in an increased risk of anaemia 5 .
Multiple SNPs in the TMPRSS6 gene have been linked to iron-refractory iron deficiency anaemia (IRIDA), a hereditary anaemia that is not responsive to oral iron supplementation 8 . In addition, many SNPs in TMPRSS6 (including rs855791, rs4820268 and rs3345321) have been linked to an increased risk of iron deficiency anaemia (IDA) in genome-wide association studies (GWAS) [9][10][11] . In Caucasian populations, rs855791 has been reported to be in strong linkage disequilibrium (LD) with rs4820268 (r 2 =0.83) and rs2235321 (r 2 =0.44) 12 . Similarly, in Asian populations, rs855791 is reported to be in high LD with rs4820268 (r 2 =0.65) 12 .
The minor allele frequency (MAF) of these SNPs varies between racial and ethnic groups. In African populations, the MAF of rs855791 is lower (10%) than in East Asians (57%), South Asians (54%) and Europeans (39%) 13 . Similarly, the MAF of rs4820268 is lower in Africans (28%) compared to Europeans (42%), whereas, the MAF of rs2235321 in Africans (41%) is similar to that of the European population (42%) 13 . The effects of these SNPs (rs855791, rs4820268 and r2235321) on iron absorption and hepcidin levels in Subsaharan African populations has not been studied.
We hypothesize that these genetic variants and similar ones in iron regulatory genes may contribute to the high anaemia prevalence in sub-Saharan Africa. Here, we propose to investigate effects of these three TMPRSS6 SNPs on oral iron absorption in Gambian adults.
We anticipate that this study will provide a biological insight into the association of these three TMPRSS6 variants with anaemia.

Study objectives and outcome measures
The primary objective of this study is to assess the impact of single and multiple copies of variant alleles of the TMPRSS6 SNPs (rs855791, rs4820268 and rs2235321) on oral iron absorption. The primary outcome measure will be the change in serum iron concentration before and five hours after a single 400 mg dose of ferrous sulfate iron given orally ( Figure 1). Secondary endpoints related to the primary objective are: (1) Increase in transferrin saturation (TSAT) above baseline after a single oral 400 mg dose of ferrous sulfate iron.
(2) Increase in serum unbound iron binding capacity (UIBC) above baseline after a single oral 400 mg dose of ferrous sulfate iron.
(3) Increase in serum hepcidin levels above baseline after a single oral 400 mg dose of ferrous sulfate iron.

Amendments from Version 1
Version 2 contains modifications that were made in response to the independent reviews.

Introduction:
The statement "and may partially be responsible for disproportionately high anaemia prevalence in sub-Saharan Africa" has now been changed to "these and other genetic variations may contribute to the high anaemia prevalence in sub-Saharan Africa".

Methods
The manuscript has been modified to indicate that: (1) data on weight and height will be collected and BMI will be included as a variable in the analysis.
(2) Participants will be informed to fast overnight (minimum 12 hours) before sample collection.
(3) Sickle cell and G6PD status will be assessed in the subjects. (6) Sickle cell haemoglobin and glucose 6-phosphatase deficiency (G6PD) status at baseline to assess potential confounding effects of these two genetic conditions, which are common in this population.

Study design
We will employ a recall-by-genotype study design, in which participant selection will be based on TMPRSS6 SNPs reported to be associated with the risk of iron-deficiency anaemia: rs855791, rs4820268 and rs2235321 10,14,15 . We will utilize the West African BioResouce (WABR), which contains the Kiang West Longitudinal Population Study (KWLPS) as the basis for selection of pre-genotyped participants 16 .

Study site
The proposed study will be conducted within the population of West Kiang (WK) District, in the Lower River Region of The Gambia, and study procedures will be conducted at the Medical Research Council The Gambia (MRCG) at London School of Hygiene & Tropical Medicine (LSHTM), Keneba Field Station 16 . Individuals that are eligible for the study but have moved to the coastal region of The Gambia will be followed-up by a fieldworker and study procedures will be conducted at the MRCG Fajara site. Participants currently residing in WK will be prioritised.

Participants
A total of 300 participants (male and female) will be recruited. Participants will be chosen based on three TMPRSS6 SNPs (rs855791, rs4820268 and rs2235321), from which we will generate nine genotype combinations, as detailed in Table 1. This will allow the investigation of the effect of each SNP individually and in combination. Composite genotype group 3 is the control group with no variant alleles. Due to the low MAF of rs855791 in our study population, we are unable to include homozygotes for the variant allele. This limited the selection of genotype combinations, and only nine combinations had sufficient participants to include in the study.
For inclusion, participants must be 18 years and above, in good physical health, have available genotype data, be able to fast overnight prior to the study visit and be able to give informed consent. Individuals will be excluded from the study if they have any signs of infection at the time of enrolment, are severely anaemic (Hb <7 g/dl), pregnant or breastfeeding, or have a positive malaria test at screening.

Sample size calculation
The total sample size will be 300. This will include approximately 62 wild type subjects and an average of 31 in each of the eight variant genotype groups. This study size will be able to detect a 12% mean difference in serum iron at five hours after oral iron supplementation between the wild type and the variant genotype groups with 90% power and a type 1 error of 0.1 in this study.

Study procedures
Potential participants with the candidate composite genotypes of interest will be selected from the study database by the principal investigator, and contact details (including address and phone number) will be extracted from the WK Demographic Surveillance System 16 by the study data manager. Participants will be contacted either in person or by telephone. Participants who provide informed consent will be invited to the study site where the rest of the study procedures will be conducted, as summarised in Figure 2. Each participant will be instructed to fast overnight for a minimum of 12 hours and then will donate a blood sample on arrival to the clinic. Weight and height measurements will be done to be used for calculating body mass index (BMI).
Each participant will be given a single dose of 400mg ferrous sulfate oral iron (2x 200mg ferrous sulfate tablets), equivalent to 130mg elemental iron. To ensure that the iron tablets are taken, a nurse will observe and record the time injestion. Participants will be asked to stay at the study site until the study is completed, which is after collecting the five hour post supplementation blood sample ( Figure 1).
All data generated from this study will be anonymised by allocating a unique study ID to each participant. Screening, enrolment and sample collection details will be collected in standard study forms and entered into the study database. Data will be double-entered by two data entry clerks and verified by a data supervisor.
In order to prevent bias in treatment, the composite genotype of individuals will not be disclosed to the study team (data management, field and clinical staff). In addition, participants will be recruited in groups at random, and individuals with different composite genotype groups will be mixed during study visits.

Sample collection
A 3ml whole blood sample will be collected at baseline. 2.5ml will be collected in lithium heparin tubes. 500µl will be collected in EDTA (ethylenediaminetetraacetic acid) micro tubes to be used for full blood count (FBC), malaria rapid testing and sickle screening.
Post supplementation blood samples (3ml blood sample in lithium heparin tube) will be collected at two hours and five hours following iron ingestion. Pre-and post-supplementation blood samples in lithium heparin tubes will be spun and the plasma aliquoted in barcode-labelled tubes and stored at -20°C for iron biomarker analysis.
Laboratory analyses FBC will be analysed using a 3-part haematology analyser (Medonic M-series, Boule Medical, Sweden). Iron biomarkers [serum iron, unsaturated iron binding capacity (UIBC), ferritin, soluble transferrin receptor (sTfR), haptoglobin (HP)] and inflammatory markers [C-reactive protein (CRP) and alpha-1-acid glycoprotein (AGP)] will be measured using a Cobas Integra 400 plus biochemistry analyser (Roche Diagnostics). Total iron binding capacity and transferrin saturation of iron (TSAT) will be calculated from serum iron and UIBC. Plasma hepcidin levels will be measured using a commercially available ELISA (DRG Instruments GmbH, Germany). The sickle rapid test will be analysed using the sodium metabisulphide method and positive samples will be genotyped by Hb electrophoresis. G6PD deficiency will be assessed using a qualitative enzyme assay (G6PD Hb+ R&D Diagnostics). Individuals carrying these variants will be excluded from the analysis if this will not significantly reduce the sample size. Otherwise, a retrospective sensitivity analysis will be done to assess the impact of these variants.
Statistical analysis plan Primary analysis will be to assess the change in serum iron between the composite genotype groups at the five hours postsupplementation time point. A linear model will be fitted with genotype group as the independent variable and serum iron or TSAT as response variables and genotype group as the main predictor, with the inclusion of age, sex, inflammation status (CRP and AGP levels) and BMI as covariates.
Using the same approach, we will also examine the effect of genotype on secondary outcome measures. The baseline iron level of the participants may vary. All secondary analysis are exploratory.
In order to remove this potential source of bias, we will adjust for baseline serum iron in the regression analysis. If the missing data rate is more than 5%, we will consider imputation. The follow-up duration is short; thus, we expect little bias from loss to follow-up. We will also consider sensitivity analysis, fitting a multivariate regression model where the main outcomes of interest (including TSAT, iron and hepcidin) will be jointly regressed to the same set of predictors.

Ethical statement
This study has been approved by the MRC Unit The Gambia at the LSHTM Scientific Coordinating Committee, MRC Unit The Gambia at the LSHTM / Gambia Government Joint Ethics Committee (SCC1429), and the LSHTM Ethics Committee (LSHTM Ethics reference number 11679). A trained field worker will visit each potential study participant to issue an information sheet detailing the purpose and nature of the study (see Extended data) 17 . Individuals who cannot read will have the information sheet translated into a language they understand by the fieldworker, in presence of an independent witness. Furthermore, participants will be given the opportunity to ask questions to the investigators that they deem important. Participants will be informed that they are free to withdraw from the study anytime, and they can further raise any question about the study with the investigators.
Participants will provide written informed consent, and those who cannot write will provide a thumbprint prior to enrolling into the study. Confidentiality of study participants will be protected by anonymising all study samples and forms by allocating a study number to each participant.
This study was retrospectively registered with ClinicalTrials.gov (NCT03341338) on 14 th November 2017.

Dissemination of information
The study results will be published in relevant peer-reviewed journals and key findings will be presented at international scientific meetings. Data sharing will be in agreement with the MRC policy on research data sharing.

Study status
The study is in the data collection phase at the time of publication.

Discussion
GWAS has identified several genetic variants associated with iron status 3,11,15,[18][19][20] . However, detailed understanding of genotype-phenotype relationships is required to identify their effects on iron absorption. The recall-by-genotype (RbG) References study design is an efficient tool for detailed investigations of genotype-phenotype relationships because it minimizes confounders and improves statistical power while reducing sample size 21 . In this study, we will use the RbG study design to assess the functional effects of the three common TMPRSS6 variants on iron absorption. We expect that this study will provide new insights into the association between these TMPRSS6 gene variants and oral iron absorption in a population where anaemia prevalence is high.

Division of Human Nutrition and Health, Wageningen University & Research, Wageningen, The Netherlands
Major comments: In the introduction the following is stated: 'We hypothesize that the variant alleles at these SNPs may impair iron absorption and may be partially responsible for the disproportionately high anaemia prevalence in sub-Saharan Africa.' This, however, contradicts the lower or at most equal MAF of the TMPRSS6 variants in African populations as compared to Caucasian populations. Especially the low MAF of rs855791 is important since it has so far shown the strongest inverse association with haemoglobin and iron concentrations in Caucasian and Asian populations. It seems to me that the conclusion then should be that TMPRSS6 variants are less likely to play a major role in the development of anaemia in African populations, unless other variants than the ones under study are responsible for such an effect. Exclusion criteria: It would seem more appropriate to exclude participants with sickle cell anaemia or G6PD deficiency instead of using it as a co-variate in the analysis. However, if this reduces the numbers of each genotype combination too much, authors may as well include them while doing a retrospective sensitivity analysis.
The primary outcome measure will be the change in serum iron concentration before and five hours after a single 400 mg dose of ferrous sulfate iron given orally. This is not the best way to measure iron absorption, which would ideally be assessed with a stable isotope method. Past studies have shown that change in serum iron concentration cannot be used as a measure of iron bioavailability at the individual level. Authors should provide references that back up the validity of their approach.

Minor comments:
Under study procedures, second paragraph: 'injestion' is miss-spelled. Sample size calculation: A type 1 error of 0 seems to be ideal, yet unrealistic. Is this a typo?
Is the rationale for, and objectives of, the study clearly described? Yes

Are sufficient details of the methods provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format? Not applicable In the introduction the following is stated: 'We hypothesize that the variant alleles at these SNPs may impair iron absorption and may be partially responsible for the disproportionately high anaemia prevalence in sub-Saharan Africa.' This, however, contradicts the lower or at most equal MAF of the TMPRSS6 variants in African populations as compared to Caucasian populations. Especially the low MAF of rs855791 is important since it has so far shown the strongest inverse association with haemoglobin and iron concentrations in Caucasian and Asian populations. It seems to me that the conclusion then should be that TMPRSS6 variants are less likely to play a major role in the development of anaemia in African populations, unless other variants than the ones under study are responsible for such an effect.

Response 1:
Thank you for this comment. The statement "and may partially be responsible for disproportionately high anaemia prevalence in sub-Saharan Africa" has now been changed to "these and other genetic variations may contribute to the high anaemia prevalence in sub-Saharan Africa".
We agree that the MAF of rs855791 is low in African populations (MAF=0.1). In fact, we will not be able to address the impact of rs855791 in this study precisely because of this low MAF. However, the other two SNPs in this study, rs4820268 and rs3345321, each have a MAF>0.3 in this population.  3 , the authors should consider collecting these measures and incorporating them into a more complex regression model.

Comment 2:
Blood collection can be conducted in a detailed plan considering participants' diet, time of day, and so forth. More importantly, the authors could improve serum iron measurement by considering a fasting blood test with the pre-defined fasting period such as 12 hours.
In 'Sample size calculation' section author claimed that 'This study size will be able to detect a 12% mean decrease in serum iron at five hours after oral iron supplementation between the wild type and the variant genotype groups with 90% power and a type 1 error of 0 in this study.' Please elaborate on this sentence and explain how a 12% mean decrease in serum iron was estimated (e.g. please state the assumed parameters such as mean and standard deviation of serum iron).
To further understand causal connection between rs855791, rs4820268 and rs2235321 SNPs and iron serum level, or iron-deficiency anaemia, the authors will need to conduct molecular experiments on mRNAs and proteins to experimentally identify direct effect of the listed SNPs on gene expression and then relate the expression level and corresponding genotypes to the trait, in this case serum iron level. In addition, a huge accessibility of GWAS, eQTL and pQTL studies enables the authors to perform in-silico analysis to verify their protocol and also hypothesise new ideas. Herein, I summarised some of GWAS results relevant to these three SNPs which can be furthered investigated in this proposed study.
While rs855791 SNP is a missense variant at TMPRSS6 gene, it is associated with protein levels of TFRC (Sun et al. 2018) 4 and transcript expression of ALAS2 in blood tissue (Westra et al. 2013) 5 . rs4820268 SNP is a synonymous coding variant of TMPRSS6 gene, but again it is recognised as trans-eQTL of ALAS2 in blood tissue (Westra et al. 2013) 5 . rs2235321 SNP is another synonymous coding variant of TMPRSS6 gene and the SNiPA tool reports that it is associated with neither complex traits (e.g. iron status biomarkers) nor transcript/protein expression (Arnold et al. 2014) 6 . Altogether, it seems that rs855791 and rs4820268 variants have an impact on ALAS2 expression that is most highly expressed in bone marrow tissue (Fagerberg et al. 2014) 7 and contributes in heme metabolism and iron homeostasis (Barman-Aksözen et al. 2015) 8 .
Last, it is necessary to make sure that this study will potentially add novel findings into the literature in which several studies focused on TMPRSS6 polymorphisms and iron related traits to date (Nalado et al. 2019 9 ; Sørensen et al. 2019 10 ). The authors may consider to focus on knowledge gaps and aim to comprehensively relate gene variants to gene expression/activity and then serum iron level. Notably, as a functional analysis, population-based differences are not much interesting because the functional effects of variants less likely vary from population to population.