Identifying stroke therapeutics from preclinical models: A protocol for a novel application of network meta-analysis

Introduction: Globally, stroke is the second leading cause of death. Despite the burden of illness and death, few acute interventions are available to patients with ischemic stroke. Over 1,000 potential neuroprotective therapeutics have been evaluated in preclinical models. It is important to use robust evidence synthesis methods to appropriately assess which therapies should be translated to the clinical setting for evaluation in human studies. This protocol details planned methods to conduct a systematic review to identify and appraise eligible studies and to use a network meta-analysis to synthesize available evidence to answer the following questions: in preclinical in vivo models of focal ischemic stroke, what are the relative benefits of competing therapies tested in combination with the gold standard treatment alteplase in (i) reducing cerebral infarction size, and (ii) improving neurobehavioural outcomes? Methods: We will search Ovid Medline and Embase for articles on the effects of combination therapies with alteplase. Controlled comparison studies of preclinical in vivo models of experimentally induced focal ischemia testing the efficacy of therapies with alteplase versus alteplase alone will be identified. Outcomes to be extracted include infarct size (primary outcome) and neurobehavioural measures. Risk of bias and construct validity will be assessed using tools appropriate for preclinical studies. Here we describe steps undertaken to perform preclinical network meta-analysis to synthesise all evidence for each outcome and obtain a comprehensive ranking of all treatments. This will be a novel use of this evidence synthesis approach in stroke medicine to assess pre-clinical therapeutics. Combining all evidence to simultaneously compare mutliple therapuetics tested preclinically may provide a rationale for the clinical translation of therapeutics for patients with ischemic stroke. Dissemination: Review findings will be submitted to a peer-reviewed journal and presented at relevant scientific meetings to promote knowledge transfer. Registration: PROSPERO number to be submitted following peer review.

We will search Ovid Medline and Embase for articles on the effects Methods: of combination therapies with alteplase. Controlled comparison studies of preclinical models of experimentally induced focal ischemia testing the in vivo efficacy of therapies with alteplase versus alteplase alone will be identified. Outcomes to be extracted include infarct size (primary outcome) and neurobehavioural measures. Risk of bias and construct validity will be assessed using tools appropriate for preclinical studies. Here we describe steps undertaken to perform preclinical network meta-analysis to synthesise all evidence for each outcome and obtain a comprehensive ranking of all treatments. This will be a novel use of this evidence synthesis approach in stroke medicine to assess pre-clinical therapeutics. Combining all evidence to simultaneously compare mutliple therapuetics tested preclinically may provide a rationale for the clinical translation of therapeutics for patients with ischemic stroke.
: Review findings will be submitted to a peer-reviewed journal Dissemination and presented at relevant scientific meetings to promote knowledge transfer.
PROSPERO number to be submitted following peer review.

List of abbreviations
NMA, network meta-analysis; DIC, deviance information criteria; NMD, normalized mean difference

Introduction
Globally, an estimated 15 million people suffer a stroke; stroke is the second leading cause of death, with six million people dying and an additional five million becoming permanently disabled each year 1,2 . The costs of stroke are high due to a combination of immediate high costs from acute care and long-term costs from resulting disability. Worldwide cost estimates range from $266 billion to $1.038 trillion per year 3 . Despite the enormous human and economic burden, only four acute interventions are currently used clinically: patient care in a dedicated stroke unit 3 , reperfusion (by pharmacological thrombolysis or endovascular mechanical thrombectomy 4 ), oral aspirin, and surgical decompression.
In the search for novel therapies for acute stroke, more than 1,000 potential neuroprotective therapeutics (e.g. anticoagulants, calcium channel blockers, free radical scavengers, GABA mimetics, etc.) have been evaluated in preclinical models 5 . Of these, only reperfusion with tissue plasminogen activators 6 , such as alteplase 3 , has had a preclinical basis. Despite its efficacy, alteplase has inherent limitations such as the risk of hemorrhagic transformation, which warrants exploration of novel adjunctive therapies that can maximize therapeutic benefit. Combination therapies with alteplase might limit reperfusion injury and cell death that can sometimes occur with this drug. However, given the multitude of therapies tested preclinically (and multiple mechanisms of action) it is difficult to assess which therapies should proceed to clinical testing.
Preclinical systematic reviews have served as a robust form of knowledge synthesis to evaluate transparently experimental therapies for more than a decade 7-9 . Previous preclinical systematic reviews have compared treatments in isolation using pair-wise meta-analyses, which limits the ability to simultaneously evaluate comparative effectiveness in the presence of many treatments of interest. Use of network meta-analysis (NMA) in comparative effectiveness research to study the relative benefits and harms of multiple interventions in humans 9,10 has risen dramatically during the past decade 11 . Such analyses allow the comparison of many interventions based on all 'direct' and 'indirect' information. In addition, this approach has the potential to establish a more rigorous framework for decisions to embark on clinical trials while reducing risks to human trial participants and the enormous costs of preclinical translation 3,7,12 . Comparison of preclinical stroke therapeutics represents an excellent case study for such work. Given the novelty of this approach, this systematic review will also serve as a case study to empirically explore the methodological nuances of applying NMA in a preclinical setting.

Protocol
This protocol will be registered in the international prospective register of systematic reviews (PROSPERO, CRD) following peer review. Our review protocol is reported in accordance with the Preferred Reporting Items in Systematic reviews and Meta-Analysis-Protocol guidelines (a complete checklist is available as Supplementary File 1) 13 . Post-protocol adjustments will be included in the final report.
Objectives Primary objective. We will perform a systematic review and NMAs to address the following question: amongst in vivo models of focal ischemic stroke, what are the relative benefits of competing therapies tested in combination with the gold standard treatment alteplase 14 in (i) reducing cerebral infarction size, and (ii) improving neurobehavioural outcomes? Secondary Objective. We will also (i) assess the risk of bias of the included studies, and (ii) explore what novel considerations for statistical adjustments are necessary for NMA of preclinical studies (e.g. method of ischemic induction, timing of treatment, species, sex, and comorbidities). We will also evaluate the challenges of applying NMA to preclinical studies (e.g. consistency, heterogeneity, availability of key study covariates).

Search and study identification
An information specialist (RS) will construct a search strategy based on a previous review of comparative stroke therapies, and limit them to include studies which compared therapies to alteplase (representative search strategy is provided in Supplementary File 2) 15 . Search strategies will be peer reviewed by a second information specialist using the peer review of electronic search strategy method 16 . Searches of Ovid MEDLINE and Embase will be carried out for articles on the effects of combination therapies with alteplase (Supplementary File 2 contains the search strategy). Of note, no language or date restrictions will be used. We will also search the CAMA-RADES database which contains data extracted from existing preclinical systematic reviews on stroke 15,17-28 . In addition to this rigorous search, we will assess bibliographies of any new studies and reviews identified. Articles in foreign languages will be translated.

Study eligibility criteria
Eligibility criteria to identify relevant studies for the current review were established in considering the Population-Intervention-Comparators-Outcomes-Study design (PICOS) framework 29 .
Population. Preclinical in vivo models of experimentally induced focal ischemia will be sought. All species/strains of animals will be eligible. Both female and male animals will be included. Neonatal animals will be excluded; however, all other ages will be considered. Studies in which focal ischemic stroke was established by transient occlusion of the middle cerebral artery or anterior cerebral artery via any method (chemical, embolic, mechanical, thermal) will be eligible. Animal models of haemorrhagic stroke, global or hemispheric brain ischemia, models of permanent occlusion without reperfusion (e.g. photothrombosis, cauterization), or delayed reperfusion such that it is considered permanent will be excluded 30 . Human studies and tissue culture studies will be excluded.

Intervention and comparator.
Studies where the treatment in combination with alteplase (e.g. alteplase + hypothermia) is compared with alteplase alone in animals that have experimentally induced focal ischemia will be eligible. Studies that compare more than one active treatment such as alteplase + hypothermia versus alteplase +FK506 (i.e. head to head comparisons) will also be included. Studies must include alteplase as a 'foundational' therapeutic in experimental arms to be eligible. All delivery routes and doses will be considered. To increase potential clinical relevance (i.e. construct validity), only studies that deliver therapies within 6 hours of induction of focal ischemic stroke will be included.

Outcome measures
-Primary outcome. Infarct size is a measure of injury reduction at the infarct site in the brain and can be measured via a variety of quantifiable techniques through noninvasive techniques (e.g. T 2 -weighted magnetic resonance imaging) or post-mortem analysis (e.g. staining of brain sections using hematoxylin and eosin). This is the most widely reported outcome in preclinical stroke studies. Infarct size outcomes will be extracted at the latest time point for each study. Separate time-point specific analyses will be conducted (e.g. an early time point <30 days vs later time points >30 days).
-Secondary outcome. Neurobehavioural measures represent a valuable means of assessing functional recovery after treatment. Neurobehavioural assessment are sensitive to detecting the array of impairments, including motor/sensory deficits (e.g. ladder rung walking-foot slip errors) as well as memory/learning deficits (e.g. Morris water maze) 31,32 . These outcomes, while labourintensive, are typically reported with less frequency than infarct volume even though functional outcomes may have the greatest clinical relevance) 33,34 . Neurobehavioral outcomes will be extracted at all timepoints and separate time-point specific analyses will be conducted as described above.

Study design
Controlled comparison studies testing the efficacy of therapies + alteplase versus alteplase alone will be sought.

Screening and study selection
Two reviewers (A.D. and H.S.C.), will review abstracts (Stage 1 screen) and full text reports (Stage 2 screen) from search results independently and in duplicate against the eligibility criteria below using Distiller SR® software (Evidence Partners, Ottawa, ON) to identify relevant articles. Discrepancies will be resolved through discussion with a senior team member (M.L. and D.C.). Both stages of screening will begin with a calibration exercise to ensure consistent application of eligibility criteria. A PRISMA flow diagram 35 will be presented to document the process of study selection.

Data extraction
Two independent reviewers (A.D. and H.S.C.) will review studies and extract data into standardized, piloted forms implemented in Microsoft Excel (Microsoft Corporation, Seattle, Washington, USA). Discrepancies will be resolved through discussion with a senior team member. We will collect data related to, but not limited to, animal characteristics (Table 1); stroke model (Table 1); intervention (Table 2a, b); and outcomes (Table 3), as well as study ID (authors, year), and study design characteristics. Measures of central tendency (e.g. mean) and dispersion (e.g. standard deviation) will be extracted as reported. Data in graphical format will be extracted using Engauge Digitizer 36 . When measures of central tendency and dispersion or sample sizes are missing (or cannot be measured digitally), authors will be contacted; if authors do not respond, the data will be excluded.

Assessment of risk of bias and construct validity
Two independent reviewers will assess the risk of bias of each included study (quality of the design, conduct and analysis for the experiment) 37 . We will assess the risk of bias using a modified version of the Cochrane Risk of Bias Tool for randomized trials ( Table 4). Risk of bias will be summarized 38 with descriptive statistics and presented graphically using standard methods and radar charts. The assessment of risk of bias will play an important role in exploring potential limitations of the evidence base and establishing the feasibility of incorporating relevant adjustments in NMA models. The construct validity of included studies (i.e. degree to which experimental model and design reflect the clinical entity of stroke and its treatment) will be assessed using elements from the CAMARADES checklist alongside criteria established by expert consensus (Table 5).
Exploring the evidence and synthesizing outcome data using network meta-analysis We will begin by exploring the pattern of treatment comparisons represented by the included set of studies using network diagrams (or using a tabular approach if necessary, should the number and pattern of comparisons be too broad to be summarized graphically). Effect estimates from all included studies will be summarized. We will summarize traits of included studies focusing on clinical (e.g. age, sex, species, stroke model, reperfusion vs. permanent model, comorbidities, severity of infarct pre-treatment, infarct location) 39 and methodological (e.g. risk of bias, timing of outcome assessment) features 27 , and review these with our clinical and preclinical experts to establish the degree of homogeneity within the included studies. For NMA, given the possibility that a large proportion of the studied interventions may have been evaluated in only a single study (and many could potentially yield very large effect sizes, which may not have been substantiated by more animals in more studies), we will exclude these interventions from NMAs performed; each of these treatments removed from NMA will neither benefit from "borrowing strength" through NMA, nor end up with a summary estimate and confidence interval different from what was reported in a single study. The reported findings for the outcomes of interest from studies removed from the NMA according to these criteria will be summarized separately in descriptive tables to ensure all relevant data are summarized. This approach will also restrict the network to a more practical size and reduce the risk of computational challenges. Where there is homogeneity of important effect modifiers, we will perform NMAs to compare interventions 9,10,40 , following procedures to assess the validity of the assumptions of homogeneity, similarity, and consistency 41 . Based upon the extracted study characteristics, we will work with our clinical and preclinical experts to establish any additional novel aspects of preclinical studies that may be important to consider in relation to judgements regarding study homogeneity beyond those anticipated in preparing this protocol. We have anticipated different species of animals (rats, mice, gerbils, dogs, sheep, non-human primates) across studies. We also anticipate that multiple reporting formats will have been used to assess both infarct volume (e.g. mm 3 , % of hemisphere or total brain, etc.) and neurobehavioral changes (e.g. seconds, % of baseline).
For meta-analysis of preclinical studies, the normalized mean difference (NMD) scale is useful in serving the purpose of synthesizing the complexity of data aforementioned 42 . Prior to

Question Responses
Sequence Generation Low risk = Randomization was mentioned and good method used High risk = Randomized but poor method used High risk = Non-randomized Unclear risk = Randomized but no method described Unclear risk = No mention of randomized or non-randomized Allocation Concealment Low risk = Method used to conceal the allocation sequence is described in sufficient detail Unclear risk = Insufficient information to determine if the allocation sequence was concealed High risk = The allocation sequence was not concealed or was concealed in a poor manner Blinding of Personnel Low risk = All personnel involved in giving intervention were blinded to the study groups Unclear = Insufficient information to determine if any personnel giving intervention were blinded to the study groups High risk = All personnel giving intervention were described to be unblinded to the study groups Blinding of Outcome Assessment Low risk (all) = Outcome assessors were blinded to the study groups for each outcome assessed Low risk (some) = Outcome assessors were blinded to the study groups for at least one outcome assessed. Select the outcomes that were blinded Unclear = Insufficient information to determine if outcome assessors were blinded during assessment High risk = Outcome assessors not blinded to the study groups Incomplete Outcome Data Low risk = N values were consistent between methods and results for all outcomes, or inconsistent N values were explained (e.g. only N=3 animals were selected for histological analysis) Unclear = The N value was either not presented in the methods or in the results, and therefore there is insufficient information to permit judgment High risk = N values were not consistent between methods and results for the final outcomes without explanation of attrition or were inconsistent between outcomes performing NMAs, we will perform traditional pairwise metaanalyses on the NMD scale for each comparison in the treatment networks where two or more studies are available to explore heterogeneity based on the I 2 statistic 29 . To perform network meta-analyses on the NMD scale, we will use an established model from the National Institute in Health and Care Excellence's TSD series 43 , adapting its identity link to the log link in order to conduct the NMA on a log ratio of means (logRoM) scale. The log ratio of means of the k th treatment and the "stroke only" control, d k = logRoM C,T k , can be estimated after model fitting, and the corresponding NMD estimate is: The NMD of the k th treatment in comparison with alteplase (k=1) is 1 -exp (d kd 1 ).
Both fixed-and random-effect Bayesian NMAs will be performed using a common heterogeneity parameter according to established methods 10, 40,43 . Model fit will be assessed by comparing the model's posterior total residual deviance with the number of unconstrained data points 43 . Selection between models will be based on deviance information criteria (DIC), with a difference of five points suggesting an important difference 43 . All pairwise comparisons between interventions will be expressed with both summary point estimates and corresponding 95% credible intervals. Vague prior distributions will be assigned for all measures of treatment effect, as well as for the between-study variance parameter in random effects analyses. NMAs will be performed using OpenBUGS software version 3.2.3 44 and the R Package R2OpenBUGS 45 . Model convergence will be assessed using established methods including assessment of Rhat (the potential scale reduction factor) and the Gelman-Rubin convergence diagnostic to see if they are near 1 9 . Surface under the cumulative ranking (SUCRA) values, and the mean rank of each intervention (with 2.5% and 97.5% quantiles) will also be estimated for each intervention 46 . Forest plots of treatment comparisons versus "stroke with no treatment" control as well as versus stroke + alteplase will be prepared for each outcome. Given the anticipated high number of interventions assessed in only a single study, a tabular approach to summarizing findings will be employed for them. We will also undertake forest plots of effects wherein interventions are ordered according to mean rank estimated from NMA.

Addressing heterogeneity and inconsistency
To check the validity of the consistency assumption (i.e., transitivity of the effect size through common comparators), a consistency model as well as an unrelated mean effects model will be fit to the data 47 . We will compare their respective DIC values to check model fitting and their posterior mean deviance contribution per study to check the consistency assumption. We will also assess the magnitude of the estimated between-study SD measure from both models, as a reduction in this parameter in the inconsistency model also provides evidence of inconsistency.
The likelihood of important clinical and methodological heterogeneity between studies is anticipated by the research team to be high and may include several nuances which are unique to the pre-clinical setting. First, several vital aspects of preclinical studies from our risk of bias assessments (described earlier) may be important adjustment factors that could have an important impact on the findings from NMAs, including randomization and blinding 24, 48 . In this work, we will use subgroup analyses or covariate-adjusted analyses to address and explore the impact that covariates have on findings and to establish the robustness of findings from primary syntheses 49,50 . We will assess the possibility to adjust for the following group level factors: animal species (e.g. mouse) and strain (e.g. C57Bl6 strain of mice), model of stroke, average animal age, percentage of female subjects, average time since stroke induction, combination therapies, cerebral blood flow, temperature, infarct location and severity, use of randomization and blinding of experimenters and outcome assessments. Alternatively, when combining data from different species, we could model animal species as an extra level in the hierarchical model for treatment effect, allowing for heterogeneity across species and assuming that treatment effects are similar across species around an overall mean effect. For the network structure, primary analyses will be performed at the treatment level. As dose may have an important effect on intervention benefits, we will also explore the range of doses associated with each intervention across studies to consider additional analyses. However, as dose response characteristics of different agents may also vary between animal species and an a priori source of information to establish appropriate dose categories is not available, any analyses pursued in relation to dose will be appropriately indicated as post-hoc. Findings from all analyses will be reported. Given the anticipated complexity of this novel application of NMA, we anticipate separate publications will be required for the primary and secondary outcomes.

Dissemination
The results of the study will be submitted for publication to a peer-reviewed journal and presented at relevant national and international conferences and scientific meetings to promote knowledge transfer.

Amendments
If amendments are required for this protocol, date of each amendment will be provided with a description for rationale for the change in this section.

Discussion
Current approaches to evaluating the relative therapeutic benefit of preclinical treatments for stroke are limited. Although systematic reviews have been conducted comparing more than a thousand candidates, many have never been systematically assessed, nor have they been assessed relative to one another, or more importantly, to the best available clinical treatment (alteplase). Use of NMA to synthesize data on all relevant available therapies may help address this knowledge gap. Thinking more broadly, the proposed review, with the application and evaluation of NMA to preclinical therapeutics, will inform translational scientists' knowledge of which preclinical stroke therapeutics have the most promise for either further preclinical research or translation to clinical trial.
In addition to addressing an important question for clinical research, we anticipate this study will inform empirical explorations of anticipated challenges of evidence synthesis that are unique to the pre-clinical setting. First, a debate among preclinical and clinical scientists is likely to exist regarding both the appropriateness and approach to synthesizing outcome data from different species as well as different models of stroke. Second, there exists an especially important need to consider a broad range of adjustments to account for between-study heterogeneity related to animal characteristics or other features; lack of availability of these key data may be sub-optimal. Our study will provide an empirical evaluation of the degree of missingness of features, such as those significant to experimental design (e.g. randomization). This will provide an indication of the changes to the available evidence when exploring adjustments of comparisons. More specifically, if the lack of reporting proves to be severe, this will provide further highlevel evidence that educational efforts are needed to improve the completeness of reporting of preclinical research 51 .
Other challenges potentially requiring consideration will include identifying optimal strategies for presenting findings (including those with many comparators rendering analysis unfeasible), analysis of studies with small sample sizes, and strategies to select the most promising therapy to translate clinically. We anticipate that this systematic review will provide insight into these and other methodologic challenges and thereby serve as an exemplar for future NMA of preclinical data to build upon.
Findings from this review will be shared with several key knowledge users including (i) the Stroke Treatment Academic Industry Roundtable 52 for development of future guidelines; (ii) the Heart & Stroke Foundation and the Canadian Partnership for Stroke Recovery to inform future potential trials 51 ; (iii) the Cochrane Stroke Group to inform a future clinical systematic review and NMA; and (iv) stroke survivors, via sharing of findings with our knowledge users.

Data availability
No data are associated with this article. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

2.
3. 4. This is a fascinating research question and the methods set out in this protocol seem appropriate. I am satisfied that the authors have set out the protocol in accordance with the PRISMA-P checklist.

Open Peer Review
I have outlined a few points below: I think the decision to reject studies which deliver therapies outside a 6 hour window warrants some additional background information. This may well be an appropriate decision; however, I don't have expertise in stroke and so it leads me to question if there is the that the possibility combination therapies could lead to greater efficacy/less harm outside this time period. In table 2, will the authors state what "other" is for species and type of model? This seems like useful information. Table 2 part b, should it state "N Initially Reported"? Is "potential bias due to sample size calculation" actually related to imprecision rather than risk of bias? Dichotomous cut offs might lose valuable information e.g. infarct <40% is within reasonable limits vs >40% is not. Is 39% vs 41% really that different?
Is the rationale for, and objectives of, the study clearly described? Yes

Are sufficient details of the methods provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format? Yes Lalu et al. provide us with their research protocol for a network meta-analysis of novel stroke therapeutics in preclinical models. The protocol is comprehensive and complete, including all necessary items like search strategy, screening and extensive data analysis. The search seems complete, using an animal filter and searching the Embase library, including Medline through Embase.
There are some minor concerns and suggestions for this paper and approach to potentially be more optimal; The paper describes both fixed and random effects meta-analysis. As a network meta-analysis already usually is performed with random effects and your expected variation is considerable, I would skip the fixed effect meta-analysis as a whole. Even when using an NMD, is it appropriate to combine MRI and histology based outcomes, as these are known to not generate equivalent outcomes in preclinical models (MRI > histology, see Milidonis, Stroke 2015)? Will the NMD completely correct for this or is a sensitivity analysis needed (MRI vs histology)? If you don't think a sensitivity analysis is needed, please explain why not. Will the NMD also be used for the secondary outcome? This is not completely clear to me now. Consider to not dichotomise certain potential effect modifiers (for example the analysis time < or >30d as mentioned on page 4). Sometimes a continuous variable can give you more information in your analysis (for a potential linear effect for example). You can also choose to do both. Following on the previous comment; network meta-analysis is usually performed through a form of metaregression, making it possible to correct (potentially mutivariably) for a number of potential confounders/effect modifiers in the primary analysis itself. This is already mentioned on page 12 for the 'covariate-adjusted analyses'. Please provide a list upfront of the potential factors you want to correct for (in order of importance/usage) and provide an explanation on the number of factors you want to correct for (potentially based on the number of included studies?). To my knowledge this is different form the stated 'review these with our clinical and preclinical experts to establish the degree of homogeneity' and would add to your future primary analysis. This also means that the studies does not necessarily need to be homogenous for your primary analysis, as the metaregression will appoint a certain effect to these 'covariables' (and will correct for the covariable). Please provide a minimum number of comparisons for a certain intervention/comparison to be included in the network meta-analysis. Will there also be 2 or more, as with the traditional pairwise meta-analysis mentioned? If no minimum can be mentioned upfront, please explain why.
Is the rationale for, and objectives of, the study clearly described? Yes Is the study design appropriate for the research question?

Are sufficient details of the methods provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format? Not applicable No competing interests were disclosed. Competing Interests: Reviewer Expertise: preclinical meta-analysis, translational cardiology.
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

The benefits of publishing with F1000Research:
Your article is published within days, with no editorial bias You can publish traditional articles, null/negative results, case reports, data notes and more The peer review process is transparent and collaborative Your article is indexed in PubMed after passing peer review Dedicated customer support at every stage For pre-submission enquiries, contact research@f1000.com