A refinement to the formalin test in mice

The constant refinement of tests used in animal research is crucial for the scientific community. This is particularly true for the field of pain research, where ethical standards are notably sensitive. The formalin test is widely used in pain research and some of its mechanisms resemble those underlying clinical pain in humans. Immediately upon injection, formalin triggers two waves (an early and a late phase) of strong, nociceptive behaviour, characterised by licking, biting, lifting and shaking the injected paw of the animal. Although well characterised at the behaviour level, since its proposal over four decades ago, there has not been any significant refinement to the formalin test, especially those combining minimisation of animal distress and preservation of behavioural outcomes of the test. Here, we propose a modified and improved method for the formalin test. We show that anaesthetising the animal with the inhalable anaesthetic sevoflurane at the time of the injection can produce reliable, robust and reproducible results whilst animal distress during the initial phase is reduced. Importantly, our results were validated by pharmacological suppression of the behaviour during the late phase of the test with gabapentin, the anaesthetic showing no interference with the drug. In addition, we demonstrate that this is also a useful method to screen for changes in pain behaviour in response to formalin in transgenic lines.


Introduction
Since first reported over 40 years ago, the formalin test 1 has been widely used in pain research and is known to capture mechanisms that are likely to be relevant to many pain patients in clinic 2,3 , including the poorly localized, burning and throbbing pain sensation 4 . The unique feature of the formalin test is that it triggers two phases of nociceptive behaviour: the first one is directly linked to the stimulation of primary sensory neurons and is followed by a second phase, which is associated with inflammation and involves central sensitisation 1,[5][6][7] . These phases are marked by striking pain behaviour in which the animals lick, bite, lift and shake the injected paw. Over many years, different groups have extensively characterised the test, demonstrating robust and reproducible quantitative behaviour outcomes 5,8-12 .
The constant refinement of experimental procedures involving animals is important for the whole research community but is particularly important for the field of pain research, due to the obvious ethical implications of this type of research.
Regulations regarding the use of animals for research in the UK dates back to the 19th century, with strict safeguards to avoid or minimise animal suffering and cruelty, as well as ensuring high animal welfare standards are met 13,14 . Furthermore, the 3Rs concept (reduce, refine and replace) ensures that for every experiment, the use of animals is absolutely necessary 13,15-17 . Further to these ethical considerations, and of particular relevance to the formalin test, other evidence suggests that restraining also induces stress, behavioural and other physiological changes in the animals 18-25 , including hyperalgesia [26][27][28] , which can impact the outcome of the test. As highlighted in the above guidelines and given the current standard procedure for the formalin test, where the animals are physically restrained and may experience unnecessary levels of stress, the present study aimed to refine the current method used for the injection of formalin, without compromising its reproducibility. We posed the question of whether anaesthetising the animal at the time of formalin injection could result in more consistent injections and reduce the stress experienced by the animals, without losing the behavioural effects that formalin triggers.
We show that the use of an inhalable anaesthetic during the time of formalin administration minimises animal stress and improves injection consistency. Whilst the anaesthetic reduced the behaviours observed during the first response phase, it appears not to affect the responses observed during the second phase of the test. We validated our proposed method by showing its sensitivity to a known analgesic agent, gabapentin, as well as its efficacy in different transgenic mouse lines. Together, our data present a refined method for the formalin test, whilst also demonstrating that the second phase can occur without a behavioural response during the first phase.

Ethical statement
All experiments were performed in accordance with the UK Animals (Scientific Procedures) Act 1986 and Local Ethical Committee approval. All efforts were made to minimise the suffering of animals during the experiments by carefully following the procedures.

Experimental animals
In all experiments, adult (11 to 13 weeks of age) homozygous and wildtype, male and female littermates were used. For the gabapentin and sevoflurane experiments, wildtype C57BL/ 6NTac mice were used. Studies on the anaesthetised groups were performed on animals bred at Mary Lyon Centre (MLC; Harwell, UK), whereas the remaining animals were bred at King's College London (KCL; UK). For the experiments using transgenic animals, mice carrying the null alleles Pink1 tm1b(EUCOMM)Wtsi and Slit1 tm1b(Komp)Wtsi were generated at Harwell (UK), as part of the International Mouse Phenotyping Consortium (IMPC) and maintained as heterozygotes on a C57BL/6NTac background. The colony was intercrossed and genotyped by an independent experimenter, to ensure effective blinding during behavioural testing. Males and females were used in the experiments (except for the Gabapentin study, Figure 3). No sex-difference in behavioural responses to the formalin test was found in this study. Details on both the visual representations and statistical analysis demonstrating no differences between the genders are presented on Supplementary Figure 1. The transgenic lines chosen in this study were part of a parallel neuroscience program study being carried out at the MLC. In addition, it has been suggested a link between Pink1 and nociceptive processing 29,30 and Slit1 expression and peripheral injury 31-36 .
Housing and husbandry Animals were housed in IVC cages (Tecniplast -1284L and 1285L, with autoclaved Datesand Aspen bedding) in groups of 2 to 5 per cage, under 12-hour-on/12-hour-off cyclic lighting (30minutes dusk to dawn, dawn to dusk period), at controlled

Amendments from Version 1
Following the reviewer's comments, to which we were very grateful for, we uploaded a new version addressing all their comments and suggestions. In summary, we conducted statistical tests comparing males and females across the different experiments and observe no differences between the genders. A sentence stating the above findings and a link to the Supplementary Figure containing further details on this has been added to the Material and methods section. The raw data containing the sex of each individual animal used was also uploaded to the supplementary information.
We added the specific background in which the C57/BL6 animals were bred on and we also addressed the comments on the different response of a specific control group in relation to the other groups -a paragraph was added to the manuscript pointing out the differences observed and explaining it. Re this experiment unfortunately, this line is no longer promptly available (it has been cryopreserved and archived) and therefore we are regrettably unable to repeat these experiments as suggested by one of the reviewers. Furthermore, an explanation as to why those specific transgenic mouse lines were chosen has been added to the Experimental Animals section. All other minor points (typos and verb tense) raised by the reviewer were also addressed.   Experimental procedure Mice were anaesthetised by inhalation of sevoflurane (5% flow) (Zoetis, UK) for 2 minutes, followed by subcutaneous injection of formalin (20ul at 1.85% concentration) (Sigma, UK, Cat Nº 252549) into the right hind paw. A 30-gauge needle was used to perform the injections. A single animal was then placed into the arena (details below, and in Figure 1 Perspex acrylic glass arenas (36 x 40 x 13 cm), consisting of three mirrored and one transparent wall, were built in house (please refer to the apparatus set up in Figure 1). A recording camera was positioned to face the arena, at an approximate 30° angle to the surface on which the arena is placed and 50 cm away from the transparent wall, so all mouse behaviour could be easily recorded for later analysis. The set-up also allows annotations to be made by independent experimenters to provide accurate and reproducible observations of the changes in behaviour. Pain behaviour was scored using a stopwatch when flinching, licking and flinching continued by licking, flicking or shaking of the paw were initiated and timer was stopped once behaviour ceased. Limping, altered locomotion or grooming of other parts of the body were not counted as pain behaviours. The first phase of the test was designated as the time between zero and 15 minutes after the formalin injection, whereas the late (or second) phase was from 15 minutes onwards (up to 45 minutes). Animals were humanely culled using Schedule 1 at the end of the test. All experiments were annotated by an experimenter, blinded to the genotype and treatment.

Statistical analysis
Statistical analyses were performed using OriginLab 2017 software (Origin Group Corp.) (for area under the curve) and SPSS Statistics 20 (ANOVA repeated measures). For all sets of samples, normality tests were performed using the Shapiro-Wilk test, to check whether the data fitted in a Gaussian distribution (95% confidence intervals). Power calculations were performed using the Columbia University Biomath Calculator, following the guidelines previously described 37 . For details on the power calculation for each individual experiment, please  refer to the Extended data section of this manuscript 38 For all hypothesis testing, the minimum level of statistical significance adopted (p value) was at 0.05 -where if there was a 5% or less chance (5 in 100 or less) against the null hypothesis, so the latter would be rejected. The AUC was calculated in relation to the pain response (sec) over time.

Results
The formalin test is a robust model to study pain, and it has been demonstrated to be sensitive to various analgesic drugs 10,39-42 . The stress and anxiety-like behaviours triggered by aversive handling and restraining of the animal whilst it is being injected with the formalin, together with exposure to an unfamiliar environment, may interfere with the stimulus and the outcome of the test 43 . To check whether restraining stress could be minimised during the formalin injection, without having a major effect on the results of the test, we anaesthetised the animals with sevoflurane at the time of injection. Our results show that the anaesthetic virtually abolished the first phase of the test, whilst still preserving the second phase (Figure 2a) 38 . When compared to non-anaesthetised mice, we observed a reduction in pain behaviour of approximately 90% (Figure 2b) for anaesthetised mice during the first 15 minutes of the test (first phase). Notably, no significant changes in behaviour following the injection of anaesthetised mice were observed during the late phase of the test (20 -45 min after injection) when compared to the non-anaesthetised group (Figure 2a & 2b).
The effect of the anti-epileptic drug gabapentin as an analgesic is well documented. Indeed, many studies demonstrate that gabapentin attenuates nociceptive behaviour following formalin injection, specifically during the late phase of the test 44-49 . As the formalin test is widely used as a powerful tool to screen the analgesic effect of novel compounds at the preclinical level, we next tested whether gabapentin would also decrease the nociceptive behaviour observed after formalin injection using our newly proposed method. Our data demonstrate that in the gabapentin-treated group, there was a clear decrease in pain behaviour in the late phase of the test when compared to the control group (Figure 3a). Gabapentin treatment led to a reduction of over 50% in nociceptive behaviour after formalin injection (Figure 3b) in comparison with the non-treated group.
The use of transgenic animals in research has been crucial in elucidating numerous biological mechanisms involved in both health and disease, including in the field of pain research. Given the importance of transgenic mice and the results of behavioural tests to screen potential molecular targets, we went on to investigate whether our refined formalin method could be effective for newly generated transgenic mouse lines. For these experiments, we used two knockout mouse lines in which the target genes are known to be expressed in the dorsal root ganglia (DRG) 50-52 and in spinal cord neurons 53 .
The PTEN-induced kinase 1 (Pink1) gene has been extensively studied in the context of Parkinson's disease 54,55 and has been also linked to nociceptive processing 29,30 . Our data show that animals lacking Pink1 have a significantly lower response to the formalin test ( Figure 4a). Although the Pink1 knockout group appeared to be marginally more responsive during the first 5 minutes of the test, their overall response to formalin injection during the first phase was very similar to their control littermates ( Figure 4b). In contrast, their nociceptive behaviour during the late phase was reduced by 40% in comparison to the behaviour of the control group (Figure 4b). It appears that, despite being less responsive to the formalin, the Pink1 knockout animals have a steadier response in the second phase when compared to the control littermates, with no obvious peak in the response over time (Figure 4a).
Following the same principle, we went on to screen a transgenic mouse line with a disrupted Slit Guidance Ligand 1 (Slit1) gene, using our refined formalin test. Slit1 is a secreted protein, which has been reported to be involved in DRG and spinal cord development, and previous studies have suggested a link between injury and increased Slit1 expression 31-36 . Our data show no overall difference in pain behaviour in the formalin test between the group with disrupted Slit1 function and their control littermates ( Figure 5a). As with control mice, the pain behaviour during the first phase of the test is dramatically reduced due to the anaesthetic. Notably, a loss of function of the Slit1 gene did not lead to any change in the nociceptive behaviour during the late phase of the test (Figure 5a and 5b), as both groups spent a similar amount of time exhibiting pain behaviour ( Figure 5b).

Discussion
In this study we proposed a modified and improved method for the formalin test. We showed that by anaesthetising the animal at the time of the injection reliable, robust and reproducible results are produced, whilst diminishing stress to the animals. Our newly proposed method showed that whilst the pain behaviour in the first phase was suppressed by the anaesthetic, the response in the late phase remained comparable to the control, non-anaesthetised group. Our results were validated by pharmacological suppression of the late phase with gabapentin. Furthermore, we demonstrate that this is a useful method to use while screening transgenic lines for changes in pain behaviour in response to formalin. Reducing the levels of stress in the animals is of great advantage when performing behaviour tests. Many studies have shown that mishandling of laboratory animals can have profound impacts on some behavioural tests 43,56,57 . In particular, animal restraining has a drastic, negative effect on anxiety levels 58,59 , with evidence suggesting that the adverse effects also extend to pain behaviour, where it can lead to areflexia, hyperalgesia and, in some cases, abnormal flinching 60-64 . Given the potential adverse effect that stress, resulting from either mishandling or restraining the animal during the administration of a drug or compounds, can have on pain behaviour, we propose the use of anaesthesia during the drug application. In our method, we show that animals can be injected with formalin without causing any apparent distress, whilst increasing the reproducibility of the test, as consistency in the site of injection as well as the volume injected is improved under anaesthesia. Indeed, due to ethical considerations, it has recently been suggested that brief anaesthesia immediately before injection would be beneficial for the animals, as well as increasing the accuracy of substance application, considering that the site of formalin injection is crucial 65 . Importantly, we show that the outcome of the test is similar to that of the traditional method and that, despite the anaesthesia reducing the pain behaviour during first phase, the second phase behaviour times remained almost identical. Notably, our method has a significant advantage over a previous study which employed different inhalation anaesthetics, shown to have a negative impact on the early and late stages of the test 66 , suppressing the pain behaviour in both phases dramatically and being remarkably different to non-anaesthetised animals. In summary, our method resulted in minor changes to overall behaviour responses and provided significant advantages for ethical and stress-free animal handling.
Central sensitisation by formalin appears to be the most crucial aspect of the test when evaluating nocifensive behaviour.
The underlying mechanisms of the formalin test are not fully understood. Historical experimental data indicate that the behavioural response observed after the injection is solely due to the direct stimulation and activity of C-fibre nociceptors 1,9,67 , whereas subsequent studies suggest the involvement of Aδ and non-nociceptive Aβ-fibres 68,69 . Whilst there is still controversy regarding the circuitry and cellular and molecular mechanisms triggered by formalin, subjects which are beyond the scope of this study, we demonstrate that the second phase of the test can be used to screen pain behaviour independently of the first phase. Supporting our findings, studies showed that knockout or ablation of distinct nerve fibre populations in animal models resulted mostly in a reduction of the pain behaviour in the second phase of the test, while the first phase was not necessarily affected 69-71 . Furthermore, these studies show that, in all instances in which the first phase is affected, the second phase will also be influenced 69-71 , demonstrating therefore that suppressing or diminishing the pain behaviour during the first phase of the test is almost inconsequential when screening phenotypes and testing drugs. Therefore, our study highlights that the formalin test could be improved by diminishing unnecessary animal distress without compromising the results, given that the first phase of the test is, in most cases, not very informative.
The second phase of the formalin test alone can be used for phenotypic screening. We confirmed the sensitivity of our modified protocol by showing that the effects of the extensively characterised analgesic, gabapentin 44-48 are maintained as expected. We further validated the sensitivity of the modified formalin procedure in two different transgenic lines. The mitochondrial kinase Pink1 has been extensively studied, and mutations in this gene are notoriously linked to neuronal dysfunction in Parkinson's disease 55,72-74 . Previous studies also linked the loss of function of Pink1 in humans with abnormal pain sensation, where subjects present a higher mechanical and pressure threshold 29,30 . Notably, and similarly to the phenotype found in humans, we show here that animals with a disrupted Pink1 gene display lower nociceptive behaviour after formalin injection. However, it cannot be excluded, the hypoalgesic phenotype can be due to symptoms arising from Parkinson's disease itself -as Pink1-null mouse models might present motor dysfunction 75 . Therefore, our results not only present for the first time a pain phenotype in a Pink1-null rodent model in the context of the formalin test, but also supports the refinement of the proposed formalin method. It should be noted that control littermates for the Pink1 group showed a more pronounced response to the formalin in comparison to the other control groups. We trust the behaviour observed represents the natural variation it can be obtained when performing behavioural tests and therefore we can only emphasise the importance of having control littermates when performing these experiments. We hypothesise that there is not a biological obvious reason that can explain the variation observed apart from the fact that they are a different cohort of animals and thus likely to differ in experimenter, day and uncontrollable variances in their cage environment (e.g. genotype of parents). Further to the pain phenotype observed in the Pink1 knockout line, we also screened for any distinct phenotype observed for the Slit1 knockout mice. Despite previous studies suggesting a role for Slit1 in neuronal development 32,33,76 , our data shows that global deletion of the gene does not lead to any alteration in pain behaviour in the formalin test. Together, these results demonstrate that the refined formalin test proposed in this study can be broadly used, not only to test the efficacy of drugs, as shown with gabapentin, but also to evaluate pain phenotypes in newly generated transgenic models.
In conclusion, in this study we present a refinement to the already established formalin test. We propose that the use of an inhalable anaesthetic during the injection of formalin is not only a reliable method to improve consistency when injecting the compound, but most importantly, represents a valuable refinement. We show that this method complies with the 3Rs sought by ethical committees, as well as meeting the additional 3Rs (relevancy, robustness and repeatability) sought by scientists 13 . Moreover, we demonstrate that the test is sensitive enough to screen for possible pain phenotypes and suggest that diminishing the first phase of the formalin test has little consequence on the global pain response of the animal.  In this article, the authors demonstrate a refinement of the widely-used formalin test. They show that if the formalin injection is performed while mice are under brief general anaesthesia (Sevoflurane) the second phase behaviour is largely unaltered, even though the first phase behaviour is largely lost. Since the second phase is the more useful measure, this means that the procedure can be performed in a less stressful way, resulting in a significant improvement in animal welfare.

Grant information
The results of the study are generally convincing, and the article is clearly written and well illustrated. One concern is the relatively high level of pain behaviour in the control group for the Pink1 study (Fig 4). While the area under the curve for other groups is typically around 1000, for the controls in Fig 4 it  It would be helpful to have some explanation of why the Pink1 and Slit1 mouse lines were chosen for this study, at the end of the Experimental Animals paragraph in the Methods section.
Page 4: In the 4 line of the Results section "it is" should be inserted before "being" -otherwise it would be the experimenter who received the injection.

Are sufficient details provided to allow replication of the method development and its use by others? Yes
If any results are presented, are all the source data underlying the results available to ensure full reproducibility? Yes Are the conclusions about the method and its performance adequately supported by the findings presented in the article? We would like to thank the reviewers for the very positive reviews of our paper. The reviewer's comments were very helpful, and we have now revised our manuscript to address all of the issues raised.
The results of the study are generally convincing, and the article is clearly written and well illustrated. One concern is the relatively high level of pain behaviour in the control group for the Pink1 study (Fig 4). While the area under the curve for other groups is typically around 1000, for the controls in Fig 4 it is nearly 2000. In fact, compared to other cohorts, the Pink1-/-mice seem to have normal behaviour, while the control group show an exaggerated response. Some explanation is needed here. This point was was also raised by the other reviewer and addressed accordingly. We trust the behaviour observed represents the natural variation it can be obtained when performed behavioural tests and therefore we can only emphasise why it is important to control with animals that have been bred at the same time. Although we understand and acknowledge the reviewer's concerns, we trust the data is a true representation of the mice behaviour for this particular line. As demonstrated on the graph ( Supplementary  Figure 1 B), all the control animals used in this test were above the average control response in the other figures (above ~250 sec.), we therefore believe it is unlikely the findings are a false positive. We hypothesise that there is not a biological obvious reason that can explain this variation apart from the fact that they are a different batch of animals and thus likely to differ in experimenter, day and uncontrollable variances in their cage environment (such as genotype of parents, litters, litter size, etc.). Unfortunately, this line is no longer promptly available (it has been cryopreserved and archived) and therefore we are regrettably unable to repeat these experiments to investigate this further. We have however, added a sentence in the manuscript pointing out the differences observed and explaining it as above.

Also -what are the units for area under curve?
The AUC was calculated in relation to the pain response (sec) over time. This has been added to the materials and methods

Minor points:
It would be helpful to have some explanation of why the Pink1 and Slit1 mouse lines were chosen for this study, at the end of the Experimental Animals paragraph in the Methods section. These sentences were added: The transgenic lines chosen in this study were part of a parallel neuroscience program study being carried out at the MLC. In addition, it has been parallel neuroscience program study being carried out at the MLC. In addition, it has been suggested a link between Pink1 and nociceptive processing (48,49) and Slit1 expression and peripheral injury (50-55).
is almost twice higher than all other group controls and the Pink1-/-mice between the time 20 and 35 min. Could the authors provide a possible explanation about this discrepancy? This might lead to a potential false positive result of Pink1 effect in this study. Ideally, to confirm the effect of Pink1-/-on the behaviour induced by the formalin test, it would be appreciated to repeat this test (Pink1-/-vs its littermate control) with again randomisation and blinding.

Is the description of the method technically sound? Yes
Are sufficient details provided to allow replication of the method development and its use by others? Yes If any results are presented, are all the source data underlying the results available to ensure full reproducibility? Partly Are the conclusions about the method and its performance adequately supported by the findings presented in the article? Partly No competing interests were disclosed. Competing Interests: Reviewer Expertise: I have done a PhD in neurophysiology of pain using rat model of chronic pain, followed by a postdoc in neurobehavioral genetics using mouse models. I am now a newPI in neurobiology.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above. We would like to thank the reviewers for the very positive reviews of our paper. The reviewer's comments were very helpful, and we have now revised our manuscript to address all of the issues raised.
The study mentioned that both female and male mice were used which is very valuable and important in research. However, and except if I have missed it, I am not aware of the presence of a statistical test performed to confirm the absence of a sex-difference in behavioural responses to the formalin test in this study because sex difference can be a factor influencing the pain perception. A clarification would be appreciated as well as the presence of the sex in the source Thank you for pointing that out. data.
We have now uploaded the files containing the required information. We have also conducted statistical tests comparing males and females Additionally, it is mentioned that wildtype C57BL/6 mice were used for the gabapentin and sevoflurane experiments but without mentioning the exact background while it is clearly mentioned for the transgenic mouse study (C57BL/6NTac). Pain-like behaviour can be influenced by the mouse strains, also behavioural differences have been observed between C57BL/6N and C57BL/6J mice. It would be valuable to mention the exact background for a full reproducibility. The background has now been added (C57BL/6NTac).
The results of the sevoflurane effect, the gabapentin treatment, and Slit1 KO studies are very clear. I did, however, notice a discrepancy in the data from the control group of the Pink1-/-study compared to the control group in Slit-/-study (both C57BL/6Ntac background) and the wildtype C57BL/6 mice in the anaesthesia/gabapentin experiment. The average pain behaviour in the control mice of the Pink1-/-study is almost twice higher than all other group controls and the Pink1-/-mice between the time 20 and 35 min. Could the authors provide a possible explanation about this discrepancy? This might lead to a potential false positive result of Pink1 effect in this study. Ideally, to confirm the effect of Pink1-/-on the behaviour induced by the formalin test, it would be appreciated to repeat this test (Pink1-/-vs its littermate control) with again randomisation and blinding. Thank you for pointing this out. We trust the behaviour observed represents the natural variation it can be obtained when performing behavioural tests and therefore we can only emphasise why it is important to control with animals that have been bred at the same time. Although we understand and acknowledge the reviewer's concerns, we trust the data is a true representation of the mice behaviour for this particular line. As demonstrated on the graph (Supplementary Figure 1 B), all the control animals used in this test were above the average control response in the other figures (above ~250 sec.), we therefore believe it is unlikely the findings are a false positive. We hypothesise that there is not a biological obvious reason that can explain this variation apart from the fact that they are a different batch of animals and thus likely to differ in experimenter, day and uncontrollable variances in their cage environment (such as genotype of parents, litters, litter size, etc.). Unfortunately, this line is no longer promptly available (it has been cryopreserved and archived) and therefore we are regrettably unable to repeat these experiments to investigate this further. We have however, added a sentence in the manuscript pointing out the differences observed and explaining it as above.