Measuring the outcome and impact of research capacity strengthening initiatives: A review of indicators used or described in the published and grey literature

Background: Development partners and research councils are increasingly investing in research capacity strengthening initiatives in low- and middle-income countries to support sustainable research systems. However, there are few reported evaluations of research capacity strengthening initiatives and no agreed evaluation metrics. Methods: To advance progress towards a standardised set of outcome and impact indicators, this paper presents a structured review of research capacity strengthening indicators described in the published and grey literature. Results: We identified a total of 668 indicators of which 40% measured output, 59.5% outcome and 0.5% impact. Only 1% of outcome and impact indicators met all four quality criteria applied. A majority (63%) of reported outcome indicators clustered in four focal areas, including: research management and support (97/400), the attainment and application of new research skills and knowledge (62/400), research collaboration (53/400), and knowledge transfer (39/400). Conclusions: Whilst this review identified few examples of quality research capacity strengthening indicators, it has identified priority focal areas in which outcome and impact indicators could be developed as well as a small set of ‘candidate’ indicators that could form the basis of development efforts.


Introduction
Research capacity strengthening (RCS) has been defined as the "process of individual and institutional development which leads to higher levels of skills and greater ability to perform useful research" 1 . National capacity to generate robust, innovative and locally appropriate research is considered essential to population health 2,3 and socioeconomic development 4,5 . However, wide global disparities in research capacity and productivity currently exist: South Asian countries account for 23% of the World's population yet produced less than 5% of the global output of scientific publications in 2013 6 ; and sub-Saharan Africa (accounting for 13% of the global population), contributes 1% of global investment in research and development and holds 0.1% of global patents 6 . Accordingly, international development partners and research funding bodies are increasingly investing in RCS initiatives in low-and middle-income countries (LMICs). The UK Collaborative on Development Research predicts the United Kingdom's total aid spend on research will rise to £1.2 billion by 2021 7 , a large proportion of which would be direct or indirect investment in RCS in LMICs. The total global spend on RCS in LMICs, while not yet calculated, would likely be many times this figure.
Despite this substantial investment, few robust evaluations of RCS initiatives in LMIC contexts have been presented in the published or grey literatures with the available evidence base characterised by reflective, largely qualitative individual case studies or commentaries 8 . RCS evaluation frameworks have been described 9-11 , but a comprehensive set of standard outcome or impact indicators have not been agreed and common indicators are used inconsistently. For example, publication count has been used as both an output 12 and outcome indicator 13 sometimes with 14 or without 10 accounting for publication quality.
The dearth of robust RCS programme evaluation and, more fundamentally, robust evaluation metrics available for consistent application across RCS programmes, has contributed to a paradoxical situation in which investments designed to strengthen the quantity, quality and impact of locally produced research in LMIC settings are themselves hindered by a lack of supporting evidence. As a substantial proportion of RCS investment is derived from publicly funded development assistance [15][16][17] , then ensuring the means to reliably evaluate impact and value for money of research and health system investments assumes even further importance. This paper aims to advance progress towards the establishment of a standardised set of outcome and impact indicators for use across RCS initiatives in LMIC contexts. As a first step towards this goal, a systematic review of RCS outcome and impact indicators previously described in the published and grey literatures is presented. The review findings highlight the range, type and quality of RCS indicators currently available and allows inconsistencies, duplications, overlaps and gaps to be identified. These results may then be used to inform planning and decision making regarding the selection and/or development of standard RCS evaluation metrics. In the interim, the resulting list of indicators may also serve as a useful resource for RCS programme funders, managers and evaluators as they design their respective monitoring and evaluation frameworks.

Search strategy and study selection
Peer reviewed publications were sought via the following databases: PubMed, Global Health, CINAHL Complete and International Bibliography of the Social Sciences (IBSS). The search was limited to English language publications and was conducted using the keywords: (research capacity) AND (develop* OR build* OR strengthen*) AND (indicator) AND (monitoring OR evaluation). The search was conducted without date limitations up until March 2018. Following removal of duplicates, all retrieved publications were subject to an initial review of the title, abstract and listed keywords. Publications that met, or were suggestive of meeting, the inclusion criteria were then subjected to full text review. Publications subjected to full text review met the inclusion criteria if they: were peer-reviewed; pertained to 'research capacity' (as either a primary or secondary focus); and included at least one output, outcome or impact indicator that has been used to measure research capacity or was proposed as a possible measure of research capacity.
The search was supplemented by a manual review of the references listed in each paper that met the final inclusion criteria and by a citation search using first author names for all papers which met the final inclusion criteria from both the initial electronic and supplementary manual searches. A further 19 papers which met the inclusion criteria were identified in this way and included in the review.
Relevant grey literature was then sought via the following databases: Google Advanced, BASE, Grey Literature and OpenGrey. The same search terms and inclusion criteria as described above were used. This search was supplemented by a request circulated across the authors' personal networks for relevant research reports pertaining to RCS evaluation which may fit the inclusion criteria. There were seven reports identified this way, resulting in a final sample of 25 publications and seven reports. Figure 1 depicts the overall process and outcomes from this search strategy.

Data extraction
Research capacity strengthening indicator descriptions and definitions were extracted from each publication/report and recorded verbatim in an Excel spreadsheet (see Underlying data) 18 . Other information recorded alongside each indicator included: the type of indicator (output, outcome or impact) (Box 1); the level of measurement (individual research capacity; institutional research capacity; or systemic research capacity); source information (author, year and title of publication/report); and a brief summary of the context in which the indicator was applied. Designation of indicator type (output, outcome or impact) and level of measurement (individual, institutional or systemic) were based on those ascribed by the author/s when reported. Where indicator type and measurement level were not reported, we used our own judgement drawing on the reported context from the respective publication/report. Outcome indicators -defined as measures of change in behaviour or performance, in the short-to mid-term, that could reasonably be attributed to the RCS initiative in full or large part (e.g. number of manuscripts published by infectious disease experts from country X following an academic writing course).
Impact indicators -defined as measures of longer-term change that may not be directly attributable to the RCS initiative but directly relate to the overarching aims of the RCS initiative (e.g. reduction in infectious disease mortality in country X).
Some publications/reports used the same indicators across different levels (i.e. as both an individual and an institutional measure) and in these cases we reported the indicator at a single level only based on apparent best fit. However, if the same publication reported the same indicator as both an output and an outcome measure, then it was reported twice. Where there was variation between the way that one publication or another classified an indicator (e.g. the same indicator being described as an 'output' indicator in one publication and an 'outcome' indicator in another), we remained true to the texts and recorded each separately. Indicators that pertained to the evaluation of course materials or content (e.g. how useful were the Power-Point slides provided?) were excluded from analysis, although indicators that focused on the outcome of course attendance were retained.

Data analysis
Once all listed indicators from across the 32 publications and reports had been entered into the Excel spreadsheet, the research team coded all outcome and impact indicators according to their respective focus (i.e. the focus of the indicated measure, such as publication count or grant submissions) and quality. Output indicators were excluded from further analysis. Indicators were coded independently by two researchers, checking consistency and resolving discrepancies through discussion and, if necessary, by bringing in a third reviewer. 'Focus' codes were emergent and were based on stated or implied focal area of each indicator. 'Quality' was coded against four pre-determined criteria: 1) a measure for the stated indicator was at least implied in the indicator description; 2) the measure was clearly defined; 3) the defined measure was sensitive to change; and 4) the defined measure was time-bound (thus, criteria 2 is only applied if criteria 1 is met and criteria 3 and 4 are only applied if criteria 2 is met).

Type and level of identified indicators
We identified a total of 668 reported or described indicators of research capacity from across the 32 publications or reports included in the review. Of these, 40% (265/668) were output indicators, 59.5% (400/668) were outcome indicators and 0.5% (3/668) were impact indicators. A total of 34% (225/668) of these indicators were measures of individual research capacity, 38% (265/668) were measures of institutional research capacity and 21% (178/668) were systemic measures of research capacity. Figure 2 illustrates the spread of indicator type across these three categories by level. The full list of 668 indicators, inclusive of source information, is available as Underlying data 18 .

Outcome indicators
The 400 outcome indicators were subsequently coded to nine thematic categories and 36 sub-categories, as described in Box 2. The categories and the total number of indicators in each (across all three levels) were as follows: research management and support (n=97), skills/knowledge (n=62), collaboration activities (n=53), knowledge translation (n=39), bibliometrics (n=31), research funding (n=25), recognition (n=11), infrastructure (n=5) and other (n=77). Figure 3 depicts the number of outcome indicators by category and level. Table 1- Table 3 present the number of outcome indicators in each sub-category as well as an example indicator for each, by the three respective research capacity levels (individual, institutional and systemic). The category and sub-category designation assigned to all 400 outcome indicators are available as Underlying data 18 . Table 4 presents the percentage of outcome indicators that met each of the four quality measures as well as the percentage that met all four quality indicators by indicator category. As shown, all outcome indicators implied a measurement focus (e.g. received a national grant or time spent on research activities), 21% presented a defined measure (e.g. had at least one publication), 13% presented a defined measure sensitive to change (e.g. number of publications presented in peer reviewed journals) and 5% presented a defined measure, sensitive to change and time bound (e.g. number of competitive grants won per year). Only 1% (6/400) of outcome indicators met all four quality criteria including: 1) Completed research projects written up and submitted to peer reviewed journals within 4 weeks of the course end; 2) Number of competitive grants won per year Sub-categories: peer reviewed publication; publication (any form of publication other than peer review); reference (e.g. records of citations); quality (e.g. rating by impact factor).

Collaboration Activities:
Indicators relating to networking, collaborating, mentoring type activities.
Sub-categories: engagement (evidence of working collaboratively); establishment (creating new networks, collaborations); experience (e.g. perception of equity in a specific partnership).

Infrastructure:
Indicators relating to research infrastructure including buildings, labs, equipment, libraries and other physical resources.
Sub-categories: suitability (the provision of adequate facilities for research); procurement (e.g. purchase of laboratory equipment).

Knowledge translation:
Indicators relating to the dissemination of research and knowledge, including conferences, media and public education/outreach.

Sub-categories: dissemination (examples of research being
communicated to different audiences); influence (using research knowledge to influence policy, the commissioning of new research, etc).

Research funding:
Indicators relating to funding for research.
Sub-categories: funds received (e.g. competitive grants); allocation (e.g. allocate budget to support local research); expenditure (use of research funds); access (access to research funding/competitive awards).

Research Management & Support (RMS):
Indicators relating to the administration of university or research institution systems that make research possible (e.g. finance, ITC and project management).
Sub-categories: career support (e.g. working conditions, salary and career development); organisation capacity (to manage/support research); research investment; resource access (e.g. to IT, libraries etc); sustainability (of RMS); governance (e.g. formation of ethics review committees); national capacity (to support research); national planning (e.g. developing national research priorities).

Skills/training activities:
Indicators relating to training and educational activities relating to research or research subject area knowledge.
Sub-categories: attainment (of new skills); application (of new skills); transfer (of new skills).

Other:
Indicators relating to any area other than the eight described above.
(independently or as a part of a team); 3) Number and evidence of projects transitioned to and sustained by institutions, organizations or agencies for at least two years; 4) Proportion of females among grantees/contract recipients (over total number and total funding); 5) Proportion of [Tropical Disease Research] grants/contracts awarded to [Disease Endemic Country] (over total number and total funding); and 6) Proportion of [Tropical Disease Research] grants/contracts awarded to low-income countries (over total number and total funding). Indicators pertaining to research funding and bibliometrics scored highest on the quality measures whereas indicators pertaining to research management and support and collaboration activities scored the lowest.

Impact indicators
The three impact indicators were all systemic-level indicators and were all coded to a 'health and wellbeing' theme; two to a sub-category of 'people', one to a sub-category of 'disease'. The three impact indicators were: 1) Contribution to health of populations served; 2) Impact of project on patients' quality of life, including social capital and health gain; and 3) Estimated impact on disease control and prevention. All three met the 'implied measure' quality criteria. No indicators met any of the remaining three quality criteria.

Discussion
This paper sought to inform the development of standardised RCS evaluation metrics through a systematic review of RCS indicators previously described in the published and grey literatures. The review found a spread between individual-(34%), institutional-(38%) and systemic-level (21%) indicators, implying both a need and interest in RCS metrics across all levels of the research system. This is consistent with contemporary RCS frameworks 10, 19 , although the high proportion of institutional-level indicators is somewhat surprising given the continued predominance of individual-level RCS initiatives and activities such as scholarship provision, individual skills training and research-centred RCS consortia 20 .
Outcome indicators were the most common indicator type identified by the review, accounting for 59.5% (400/669) of the total. However, the large number of outcome indicators were subsequently assigned to a relatively small number of post-coded thematic categories (n=9), suggestive of considerable overlap and duplication among the existing indicator stock. Just under two-thirds of the outcome indicators pertained to four thematic domains (research management and support, skills/knowledge attainment or application, collaboration activities and knowledge translation) suggesting an even narrower focus in practice. It is not possible to determine on the basis of this review whether the relatively narrow focus of the reported indicators is reflective of greater interest in these areas or practical issues pertaining to outcome measurement (e.g. these domains may be inherently easier to measure); however, if standardised indicators in these key focal areas are identified and agreed, then they are likely to hold wide appeal.
The near absence of impact indicators is a finding of significant note, highlighting a lack of long-term evaluation of RCS interventions 8 as well as the inherent complexity in attempting   to evaluate a multifaceted, long-term, continuous process subject to a diverse range of influences and assumptions. Theoretical models for evaluating complex interventions have been developed 33 , as have broad guidelines for applied evaluation of complex interventions 34 ; thus, the notion of evaluating 'impact' of RCS investment is not beyond the reach of contemporary evaluation science and evaluation frameworks tailored for RCS interventions have been proposed 11 . Attempting to measure RCS impact by classic, linear evaluation methodologies via precise, quantifiable metrics may not be the best path forward. However, the general dearth of any form of RCS impact indicator (as revealed in this review) or robust evaluative Generating new knowledge on a research problem at a regional level 25 Evidence of brain drain or not 19 Several institutions using/applying common methodology to conduct research towards common goal 25

Equitable access to knowledge & experience across partnerships 24
Proportion of positive satisfaction response from TDR staff 30 Importance of multidisciplinary research over the past 5 years 36 investigation 8,20 suggests an urgent need for investment in RCS evaluation frameworks and methodologies irrespective of typology.
The quality of retrieved indicators, as assessed by four specified criteria (measure for the stated indicator was implied by indicator description; measure clearly defined; defined measure was sensitive to change; and defined measure was timebound) was uniformly poor. Only 1% (6/400) of outcome indicators and none of the impact indicators met all four criteria. Quality ratings were highest amongst indicators focused on measuring research funding or bibliometrics and lowest amongst research management and support and collaboration activities. This most likely reflects differences in the relative complexity of attempting to measure capacity gain across these different domain types; however, as 'research management and support' and 'collaboration activity' indicators were two of the most common outcome indicator types, this finding suggests that the quality of measurement is poorest in the RCS domains of most apparent interest. The quality data further suggest that RCS indicators retrieved by the review were most commonly (by design or otherwise) 'expressions' of the types of RCS outcomes that would be worthwhile measuring as opposed to well defined RCS metrics. For example, 'links between research activities and national priorities' 19 or 'ease of access to research undertaken locally' 22 are areas in which RCS outcome could be assessed, yet precise metrics to do so remain undescribed.
Despite the quality issues, it is possible to draw potential 'candidate' outcome indicators for each focal area, and at each research capacity level, from the amalgamated list (see Underlying data) 18 . These candidate indicators could then be further developed or refined through remote decision-making processes, such as those applied to develop other indicator sets 37 , or through a dedicated conference or workshop as often used to determine health research priorities 38 . The same processes could also be used to identify potential impact indicators and/or additional focal areas and associated indicators for either outcome or impact assessment. Dedicated, inclusive and broad consultation of this type would appear to be an essential next step towards the development of a comprehensive set of standardised, widely applicable RCS outcome and impact indicators given the review findings.

Limitations
RCS is a broad, multi-disciplinary endeavour without a standardised definition, lexicon or discipline-specific journals 8 . As such, relevant literature may have gone undetected by the search methodology. Similarly, it is quite likely that numerous RCS outcome or impact indicators exist solely in project specific log frames or other forms of project-specific documentation not accessible in the public domain or not readily accessible by conventional literature search methodologies. Furthermore, RCS outcome or impact indicators presented in a language other than English were excluded from review. The review findings, therefore, are unlikely to represent the complete collection of RCS indicators used by programme implementers and/or potentially accessible in the public domain. The quality measurement criteria were limited in scope, not accounting for factors such as relevance or feasibility, and were biased towards quantitative indicators. Qualitative indicators would have scored poorly by default. Nevertheless, the review findings represent the most comprehensive listing of currently available RCS indicators compiled to date (to the best of the authors' knowledge) and the indicators retrieved are highly likely to be reflective of the range, type and quality of indicators in current use, even if not identified by the search methodology.

Conclusion
Numerous RCS outcome indicators are present in the public and grey literature, although across a relatively limited range. This suggests significant overlap and duplication in currently reported outcome indicators as well as common interest in key focal areas. Very few impact indicators were identified by this review and the quality of all indicators, both outcome and impact, was uniformly poor. Thus, on the basis of this review, it is possible to identify priority focal areas in which outcome and impact indicators could be developed, namely: research management and support, the attainment and application of new skills and knowledge, research collaboration and knowledge transfer. However, good examples of indicators in each of these areas now need to be developed. Priority next steps would be to identify and refine standardised outcome indicators in the focal areas of common interest, drawing on the best candidate indicators among those currently in use, and proposing potential impact indicators for subsequent testing and application. 1.

4.
The article is an timely contribution to an urgent question: how do we know if research capacity strengthening is working? The analysis of the problem (a. the lack of a shared reference framework for evaluating research capacity strengthening, which in turn implies that b. the scope for systematic and cumulative learning remains limited) is convincing and valid. The methodology is clearly explained and up to existing standards and expectations for this kind of exercise. The conclusions are straightforward, and the limitations well articulated (the focus on English, and the bias towards quantitative measures being the most important ones.) A few overall comments for the authors, keeping in mind the 'agenda' the article is trying to support (i.e. developing good examples of RCS indicators), and its potential uptake: RCS lack definition too, not just indicators. The article does not differentiate between research capacity strengthening done at the national level and at the international level, or in different fields (health sciences vs social sciences, etc.). While this is key to the aim of the paper to 'survey' existing indicators, the lack of solid evaluation of RCS can be also understood as the result not so much of 'underdevelopment' of the field, but of its overdevelopment in the absence of a shared definition of what RCS is. In this sense, putting all RCS (indicators) in the 'same box' might in fact reinforce the confusion around what is there to be measured, and how. International donor-funded, project-based RCS efforts differ (in scope, objectives and means) from the RCS effort of a science council or a local research training institution -despite overlaps. Often, the difference in objectives might make indicators hard to include in the same box. In this sense, the paper should acknowledge the lack of a shared definition of RCS, and the limitation it poses to an analysis of indicators. For this specific article, it might be useful to define RCS as international, donor-funded, project-based set of activities. Arguably, the very need of a discussion on RCS evaluation is largely driven by the fact that RCS is part of the evaluation-heavy international donor sector. This might help further defining the relevant timeframe for the search, and situating RCS historically.
RCS is more than the sum of quality outputs. I wonder about the lack of discussion on 'process indicators' given the nature of RCS as a set of activities. These are notoriously difficult (but not impossible) to use in the donor-funded, project-based, time-bound RCS efforts, but might be very relevant to describe change and ultimately impact.
RCS impacts research systems, policy, or development? When it comes to discussion of impacts and impact indicators, the lack of definition of RCS becomes an insurmountable limitation. The study could acknowledge the need for unpacking the link between output, outcome and impact measurement/definition (particularly in light of lack of shared definition of RCS) in internationally funded programs, as a complementary exercise to the surveying of indicators. The fact that the very few impact indicators identified reveal an expectation for RCS to deliver impact on population health outcomes is a good example of the limitations imposed by lack of clear definitions.
How important is the UK? Given the global audience of the piece, it might be useful to explain why the figures relating to projected RCS funding from the UK are significant to describe larger trendsparticularly if figures include both 'direct' and 'indirect' RCS.

Is the study design appropriate and is the work technically sound? Yes
Are sufficient details of methods and analysis provided to allow replication by others?
impact of training and institutional development programs. The study refers briefly to RCS interventions, taking training as an example, but this only related to training which makes up a small percentage of the overall efforts towards RCS.
It would be very interesting to situate this welcome study in the context of broader discussions and debates on RCS, particularly as a contribution to theory and practice at strengthening research capacity at individual, organizational and system levels. The latter of these is the most complex to conceptualise, to implement, and to measure, and is receiving valuable attention from RCS stakeholders such as the Global Development Network (GDN, 2017 ) through their Doing Research Program -a growing source of literature for subsequent review.
As the authors of the study note, there is a danger in identifying RCS indicators that are seen as having universal application and attractiveness because they are relatively easy to measure. There is an equal, related danger that, due to relative measurability, a majority of RCS interventions become so streamlined in terms of their approach that they begin to follow recipe or blueprint approaches.
The study is agnostic on different approaches to RCS. Work undertaken by the Think Tank Initiative (TTI) for example (Weyrauch, 2014 ) has demonstrated a range of useful RCS approaches, including flexible financial support, accompanied learning supported by trusted advisors/program officers, action learning, training and others. In a final evaluation of the Think Tank Initiative (Christoplos , 2019 ), training was et al. viewed as having had the least value amongst several intervention types in terms of RCS outcomes, whilst flexible financial support and accompanied learning processes were viewed as being significantly more effective. It would be interesting to identify indicators of outcomes or even impacts that might relate to different types of RCS interventions which were not included in the publications reviewed by this study.
A key indicator of RCS identified by the TTI evaluation, which interestingly does not appear explicitly in the indicator list of this study, was leadership. As the authors indicate, there are likely to be other valuable indicators not surfaced through this review and this requires more work.
This study offers a very important contribution to a field currently being reinvigorated and is highly welcome. Rather than being valued because it may potentially offer a future blueprint list of indicators, (not least since, as the authors observe, the indicator list generated in this study is partial in comparison to a much wider potential range), its value lies particularly in its potential for contribution to further debate and dialogue on the theory and practice of RCS interventions and their evaluation; this dialogue can in turn be further informed by access to a more diverse set of grey literature and by engagement with stakeholders who have experience and interest in strengthening this work. Hopefully the authors of this study, and other researchers, will continue this important line of work and promote ongoing discussion and debate. funders such as the International Development Research Centre and others that support LMIC research are included.
The limitations section is clear.
It would have been helpful to have the authors elaborate a bit more on the dearth of qualitative indicators, appreciating the fact that they would have 'scored poorly by default' because of the methodology used. Could the authors comment in the conclusion on areas for indicator development (like qualitative indicators; equity-related indicators -for e.g. I note that perception of equity in a specific partnership was part of the definition for collaboration and in the 'other' category, but to my knowledge, equity didn't really appear elsewhere)?

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Yes
No competing interests were disclosed.

Competing Interests:
Reviewer Expertise: Public health research; evaluation I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.