Keywords
pathogen data platforms; impact; resources; survey
This article is included in the ELIXIR gateway.
Regional/national SARS-CoV-2 genomic data platforms (DP) have played a key role during the Covid-19 pandemic to centralize data, curate, process and re-share them in a consistent form and pseudonymized/anonymized to international repositories. In Europe, several countries were able to establish such infrastructures rapidly and put them in production over the course of 2021, some earlier.
This survey aimed to estimate the effort that was needed to establish and run these DPs during the sequencing peak of the pandemic in 2021, including activities from data curation to data brokering, and what it would take to expand these DPs to other pathogens and antimicrobial resistance from 2023 onwards.
Overall, a median of 10 person-months (PM) were used by each DP over 2021 and a median of 18 PM (per year) would be needed to expand activities from 2023 onwards. This survey shows that short-term funding remains commonplace and a struggle for the majority of DPs. Key supporters and arguments (e.g. centered around efficiency and cost-savings) for public health authorities and research funding bodies have also been identified to help individual data platforms in strengthening their funding proposals.
Ultimately, we propose that DPs get connected into a supra-national entity to build a stronger case and get access to major infrastructure funding grants at the European and global levels.
pathogen data platforms; impact; resources; survey
Discussions within ELIXIR CONVERGE WP9 have highlighted different levels of support from national funding bodies to their local SARS-CoV-2 regional/national data platforms (DPs) in charge of centralizing SARS-CoV-2 genomic data collection for a region/country and then brokering the resulting curated/analyzed data to e.g. GISAID, the EU Covid-19 Data Portal and/or local public health authorities (cf. Figure 1 for an explanation of data brokering). Data sharing with international data repositories is important because it supports the global surveillance efforts led by international public health authorities (WHO, ECDC, CDC), as well as the research community in accessing the data to contribute to e.g. understanding better disease and transmission mechanisms or in developing new treatments.
Individual data producers can process the data, store it, and submit it directly to international repositories or public health databases. Alternatively, in the data brokering model, several data producers can submit their data to a common data recipient. This recipient might be in charge of curating the data, analyzing it with common pipelines, storing it, and re-sharing parts of the data to public health databases and international repositories (as agreed with the data providers). The latter service is often referred to as “data brokering” i.e. sharing data on behalf of others within a well defined ethical and legal framework. Note that legal aspects should be considered along all the steps. Image modified from1 (CC BY 4.0).
In order to help resource managers of these regional/national DPs in building a stronger case to encourage their local authorities and funding bodies to finance such infrastructures serving surveillance of SARS-CoV-2 and beyond (e.g. other pathogens or antimicrobial resistance), ELIXIR CONVERGE WP9 agreed to conduct a survey among its members over the Fall 2022.
The conducted survey focused solely on the resources needed for coordinating and handling data collection at regional/national level, followed by data curation, data analysis, data annotation, data brokering (e.g. to international repositories and public health authorities) and results reporting, as well as for the underlying IT, bioinformatics and software developments. It did however not cover costs and resources needed for generating the data at e.g. diagnostic labs and/or sequencing centers, nor to produce and get all the legal and ethical agreements signed.
The survey was split in three parts:
1. Understanding the impact of regional/national data brokering platforms.
2. Understanding the resources used during the SARS-CoV-2 pandemic (expressed as person-months (PMs)).
3. Beyond SARS-CoV-2: estimating the resources for maintenance and expansion to other pathogens and antimicrobial resistance.
The survey was implemented by ELIXIR CONVERGE WP9 as a Google Form. It was advertised within the ELIXIR community and in particular the ELIXIR CONVERGE WP9 members. Participation was voluntary. Responses from 11 institutions were collected from August to September 2022. Participants had the option to submit responses anonymously. 10 out of 11 participants agreed to share their identity with the survey conductors (AN, NPW, EH, IC, CM).
Important: In this report, only aggregated, anonymized data are shared. The results of this questionnaire have been treated confidentially by the survey conductors. Disclosed information on participating institutions and DPs is done in written agreement with the concerned participants.
Participation in the survey was on a voluntary basis and responses were kept confidential. Only aggregated results are presented. By participating in the survey, participants consented to their data being used in aggregated form in this publication, that they are co-authoring. Ethical approval: not applicable, no human subjects were enrolled for this study. This article relates to a survey that does not constitute a research project within the meaning of the Swiss Federal Human Research Act. Consequently, it is not an activity subject to authorization by the ethics commission within the meaning of art. 45 of the Federal Human Research Act.
We present below the results to each question of the survey. The respondents who disclosed their identity (10/11) represent a diverse panel of European countries (Figure 2). We list below in alphabetical order the regional/national DPs that have a public website and agreed to disclose it here:
• COG-CZ: https://virus.img.cas.cz/lineages
• Danish Covid-19 Genome Consortium: https://www.covid19genomics.dk/home
• Datenplattform COVID-19: https://datenplattform-covid.goeg.at/
• Norway: https://elixir.no/
• Platform of Computational Medicine of Andalusia2: https://www.clinbioinfosspa.es/COVID_circuit/
• REGIONAL COVID-HUB/Covid-19 Data Portal PL: https://wlkp.covidhub.pl, https://covidhub.psnc.pl/eng/
• Swiss Pathogen Surveillance Platform3: https://spsp.ch
Created with mapchart.net.
The top three identified benefits of having a regional/national DP acting as data broker rather than individual labs submitting to international repositories were (i) to ensure common best standards on e.g. data anonymization, (ii) to ensure higher data quality thanks to curation and common standards and (iii) to ensure that sensitive data is not siloed by using e.g. pseudonymised identifiers that can be used to link data to other datasets where authorized, e.g. with national public health authorities (Table 1). The optimization of resources and processes was deemed at least “Very important” by 8/11 respondents. None of the proposed answers was qualified as “Not important” by respondents.
Answers have been sorted by column “Essential”, then by “Very important”, then by “Important”.
Additional benefits highlighted by the participants included (a) having a better overview of the available data within a region/country; (b) serving as open education resources and trusted knowledge bases for public awareness and to emphasize how data is used for public health decisions; (c) fostering data interoperability globally by ensuring adhering to common international standards; (d) reinforcing the influence of DPs in fostering better alignment with FAIR principles by international repositories that ingest DP data (e.g. establishment of APIs, removal of barriers to the free re-use of public-funded data).
Participants were asked to evaluate their maturity. More than 70% of surveyed DPs were in production for brokering SARS-CoV-2 data to the EU Covid-19 Data Portal. The remaining DP were still under development (Figure 3).
All regional/national DPs perform data curation (Figure 4). While this task might be sometimes overlooked upon resource planning, our survey shows that this task represents a considerable workload between 1-6 person-months (PM) in +50% of DPs, 7-12 PM in about 20% of DPs and more than 20 PM in the remaining 20% of DPs (Figure 5). Data analysis and annotations are performed by more than 75% of DPs with a very variable workload, likely depending on the level of automation of the analysis pipeline and whether the DP has the capacity to perform tailored analyses for its end users. Reporting to public health authorities is also performed by more than 75% of DPs, representing a smaller workload of 1-3 PM for the majority of them.
Radar plots show the estimated workload in person-months (1-3, 4-6, 7-9, 10-12, UKN: unknown/hard to evaluate) for the general tasks performed by a DP, including development and maintenance of the service. Note that not all DPs may perform each of the displayed tasks.
On the data brokering side, more than 50% of surveyed DPs submit consensus genomes to GISAID and ENA (EU Covid-19 Data Portal), and also raw datasets to the ENA (Figure 4). Of note, workload for setting up and submitting data to the ENA is generally heavier than for submitting to GISAID (Figure 5). The distinction on the workload for data updates was not included in this survey; we highlight here that ENA has an update mechanism through its API, which is not the case for GISAID which requires manual interventions.
In order to understand the resources needed over a year, participants were asked how many PM they used over the course of 2021, with the assumption that it may be representative of the effort needed during a pandemic year. In total, 270 PM were used over 2021 by the 11 DPs, with a great variety in effort but a median of 10 PM (Figure 6). For these resources, the relative amount of funding came from the Ministries of Public Health (49%), institutional money (18%), Ministries of Research (16%), public funding agencies (15%) and private foundations (2%) (Figure 7 - note that these percentages are not based on the amount of funding but on the percentage of funding they represent at each resource). On whether the obtained funding was sufficient or came with significant hurdles, participants mentioned that:
• The received funding was generally sufficient, although it sometimes required re-allocating funds internally or struggling for getting more funding from various sources as demand evolved in the dynamic pandemic context.
• Receiving funds from multiple sources came with substantial overhead for writing grant proposals and financial/scientific reports to each funding body, resulting in an effective reduction in the available money for the developments.
• Rapidly recruiting people with the desired skills (DevOps, web development, data brokering) was sometimes challenging.
At the time of this survey (August-September 2022), 70% of DPs were fully covered in terms of funding for the whole year, while the remaining 30% of DPs had limited resources that ended before the end of the year (Figure 8).
The last part of the survey aimed at determining the short term funding and features envisioned by regional/national DPs beyond SARS-CoV-2. First, we aimed to assess if public health authorities envision building upon their SARS-CoV-2 surveillance platforms to also monitor other pathogens or antimicrobial resistance (Figure 9). In the Fall 2022, the interest by public health authorities to expand the DP functionalities beyond SARS-CoV-2 over 2023 remained unclear for more than 45% of DPs, highlighting the difficulty to make even short-term plans for resource maintenance and expansion.
11 respondents.
For those countries that expressed wishes for DP expansion, the top needs mentioned by more than 50% of respondents covered expanding (i) to other pathogens, (ii) to antimicrobial surveillance, (iii) to reporting and monitoring capacity, and (iv) to a One Health platform (Figure 10).
11 respondents.
These activities represented an overall 327 PM, with varying levels across DPs and a median of 18 PM (Figure 11). The funding to cover these resources over 2023 was however largely uncovered at the time of the survey, in the Fall 2022 (Figure 12).
10 respondents. Median: 18 PM.
In the last part of the survey, participants were asked what could help make a stronger case for regional/national DPs within public health authorities and within other funding bodies. Within public health authorities, the elements that stood out in more than 50% of respondents answers were:
• General satisfaction of users at public health authorities in using/interacting with the DP (e.g. through reports they receive or data access) - 64%
• Gain of time and quality for the public health authority since the DP performs data quality checks and curation where needed - 64%
• Possibility for the public health authority to automatically fetch/receive data and process it for their dashboard, since there's a single point of data entry for them - 64%
Within other funding bodies, important elements mentioned by more than 50% of respondents included:
• Visibility of the platform at the European level (e.g. through ELIXIR) - 73%
• Number of sequences shared to international repositories as open data - 64%
• Enabling FAIR data - 55%
• Number of laboratories contributing data to the DP - 55%
• General satisfaction of DP users - 55%
The participants also identified the following supporters of regional/national DPs to help making a stronger case for mid and long term funding (showing answers with more than 50% agreement):
Our survey shows that at least 8 regional/national DPs were in production for SARS-CoV-2 data brokering at the time of this survey (Figure 3). While here maturity was evaluated with a single criteria, we suggest to use in the future a pathogen DP Maturity Model to better account for differences in DPs maturity and foster capacity building and quality among DPs.
The performed tasks varied but always included data curation (Figure 4). The median workforce of 10 PM over 2021 (Figure 6) and envisioned 18 PM over 2023 (Figure 11) reflect the willingness to expand to other pathogens and applications beyond SARS-CoV-2 while doing so in good conditions with sufficient resources, whereas during the pandemic some DPs might have over-worked with limited resources. There is however a clear need for more visibility towards short- and mid-term funding, as at the time of the survey, 60% of DPs were still not covered for most of their activities over 2023 (Figure 12). The large panel of funding bodies exemplifies that 50% of the surveillance infrastructures are still covered by research money (institutional money, public and private foundations, Ministry of Research) (Figure 7), a situation that is not sustainable in the long term since infrastructures hardly receive research funding past their pilot phase. While the research community is realizing the importance of funding infrastructures with dedicated calls, the present study demonstrates that even established national infrastructures with proven track-record of performance and end-user satisfaction during an urgent context still struggle to find sufficient short-term funding.
This study highlights the need to connect existing DPs to make a stronger case and collaboratively apply to common funding schemes at the European and global levels to establish a major infrastructure for genomic surveillance of pathogens. This will hopefully also encourage national public health authorities to realize that their country is part of a greater network and that without adequate funding, the belonging to the network might be lost with all its consequences upon outbreak events and for global surveillance.
The participants answered the survey confidentially and it was agreed that only aggregated data would be published (as they were giving details on resources and also potentially raising issues with their funders and collaborators). The individual responses data, available in an online spreadsheet, might be accessed by reviewers upon request, provided confidentiality can be guaranteed. The aggregated data used to generate the plots are available on Zenodo.
Zenodo: A survey into the contribution and ressources of pathogen data platforms, https://doi.org/10.5281/zenodo.10020999. 4
Data are available under the terms of the Creative Commons Zero “No rights reserved” data waiver (CC0 1.0 Public domain dedication).
The authors are very grateful to all the participating institutions who filled in the questionnaire, as well as to ELIXIR CONVERGE WP9 members and to the ELIXIR Hub for support in reviewing the questionnaire and disseminating it.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)