Keywords
Food, nutrition, ELIXIR, data reuse, bioinformatics, interoperability
This article is included in the ELIXIR gateway.
This article is included in the Agriculture, Food and Nutrition gateway.
Food, nutrition, ELIXIR, data reuse, bioinformatics, interoperability
Europe is faced with a range of food-, nutrition- and health-related challenges, often embedded in socioeconomic inequalities.1 Obesity levels and chronic diseases such as type II diabetes, cardiovascular diseases, dyslipidaemia, increased allergies, asthma, and neurodegenerative diseases, and certain forms of cancer, are often related to those food choices. Moreover, healthy, and sustainable diets are a cornerstone included in the “From Farm to Folk” European Union strategy for the European Green Deal to reduce the risk of life-threatening diseases and the environmental impact of our food system. Therefore, there is growing interest in research on public and personal health, as related to food, nutrition behaviour and well-being of consumers throughout the life cycle.
Resolving the above issues require understanding of food choices, food intake, food composition and the effect of nutrition on health. These concepts and their relations are complex and only partially understood so more data is needed to improve our understanding. The required data include deep geno- and phenotyping data, from human nutritional studies covering metabolic and health, but also including behavioural and socio-economic data. These big (and many small) data represent an opportunity for the development of products and interventions that help shift current consumption patterns towards more healthy and sustainable diets for each citizen.
The Food and Nutrition (F&N) Community is well organized and has already collated datasets, FAIRified several of its data sources and developed tools (see Table 3) that would be immediately available for the meaningful harmonization, integration, and analysis of these datasets. Several efforts have collected and are collecting datasets in a FAIR way (ELIXIR Position Paper) and many semantic resources, such as ontologies, have been built, for instance in the EU-funded projects ENPADASI, Richfields, EuroDISH, JPI HDHL INTIMIC knowledge platform, and the FNS-Cloud (see Table 2). Furthermore, a dedicated community is active and organized in NuGO, an association of universities and research institutes focusing on the joint development of the research areas of molecular nutrition, personalised nutrition, nutrigenetics, nutrigenomics, nutriepigenomics and nutritional systems biology. A part of the community is also organised in EuroFIR AISBL an international non-profit association ensuring sustained advocacy for food information in Europe. The community is working towards an ESFRI research infrastructure: Food, Nutrition and Health RI. However, to date, efforts are still fragmented, and the integration requires a wide range of data science [1], better ontologies, improved databases, data harmonization, algorithms and tools for easy sharing and dissemination of the data, while respecting data protection policy. Collaborations between dieticians, nutritionists, bioinformaticians, biostatisticians, systems biologists, consumer scientists, data scientists and knowledge engineers are needed for an interdisciplinary approach to nutrition issues.
To address these challenges, we organized a workshop “ELIXIR Food and Nutrition Community Workshop” with independent food, nutrition, bioinformatics, systems biology, data and computer science experts, and those within ELIXIR. This opinion article summarizes the interactions in the workshop and its outcomes and describes the potential role of the F&N Community in relation to ELIXIR.
ELIXIR coordinates bioinformatics resources across its member states and helps researchers to find, analyse, and exchange biological data. It is a distributed infrastructure with a Hub based in Hinxton, United Kingdom, and an increasing number of Nodes located throughout Europe. As of June 2020, ELIXIR has 22 national Nodes, with European Bioinformatics Institute (EMBL-EBI; co-located with the Hub), working as a separate Node.
The workshop was organised on the 23rd and 24th of September 2019 in The Hague (The Netherlands). The invitation to the workshop was widely advertised through the newsletter of the NuGO, the ENPADASI, Richfields and FNH-RI mailing lists, and ELIXIR dissemination channels, including ELIXIR Technical Coordinators, Heads of Node mailing lists and the ELIXIR newsletter.
The workshop included 26 participants from across 14 countries (NL, DE, BE, SI, UK, SE, DK, ES, CH, IT, FR, FI, EE, IE) representing the ELIXIR Nodes, including the ELIXIR Hub, and additional countries. The objective of the meeting was to identify the principal challenges in food and nutrition and prioritise actions, in particular those within the scope and mission of ELIXIR. The workshop showcased flash presentations on the (inter) national F&N activities, ideas, and requirements that an ELIXIR F&N Community could address. An overview of ELIXIR Use Cases and Platforms was also presented by a representative of the ELIXIR Hub. The ELIXIR Training Platform was presented by one of the Training Platform coordinators. The presentations were followed by discussions on the needs and challenges present in the F&N community. The following challenges were identified:
• In order to measure a health effect, the individual health status needs to be defined
• It is clear that every individual has different dietary needs, but what advice is for an individual is still hard to define (e.g. what advice is needed for an individual at risk for a non-communicable disease)
• Questionnaires are now the standard to quantify what individuals have eaten. These measures are very unreliable. Complex dietary and food intake biomarkers may make these measures more reliable.
These F&N challenges can only be efficiently resolved if a number of bioinformatic challenges are overcome:
• Standardization/interoperability
○ Terminology: ontology development (not all relevant ontologies for F&N are in place) to make data interoperable
○ Standardization of questionnaires (questionnaires are rarely aligned)
○ Tools for standardization to result in homogeneous data and metadata (tools for standardization are not always available to F&N scientists)
○ Technical interoperability solutions
○ Training/Capacity Building data stewardship, data annotation/curation, etc.
○ Findability, Accessibility and Connectivity of datasets (Diet intake, Health, Food composition, Consumer data, and Omics data): all shared data should be (machine) Findable and Accessible
○ Rich meta-data capture (not all datasets have complete meta-data to be Reusable)
○ Ethical compliance and approval in case of person-identifiable data (including consent: authorisation and authentication including permissions registry and confirmation of informed consent and added purpose binding)
• Data reuse
• Advocacy and Training
○ Researchers and governmental organisations publish research documents to spread knowledge but hardly publish data (raw or modified). Convincing arguments and incentives must be created to consider publication of documents of equal importance as publication of data.
○ Training/Capacity Building on FAIR data and standards (not all F&N scientists know the FAIR principles and are aware of the standard’s importance to align their data management plans with FAIR principles and where to find support)
• Tool and service availability and interoperability
○ Make tools and services available with which data can be analysed, visualised and manipulated
○ Need for alignment of existing and new analysis software with ELIXIR existing ELIXIR efforts (e.g. microbiomics/GALAXY and similar dataflows)
○ Public and private repositories must be integrated in such a way that it allows users to easily transfer data into existing tools for their data processing. This would lead to a landscape of repositories and tools where an arbitrary number of systems can be connected or chained to perform data analysis.
• Networking actions
These challenges are described in more detail in the following sections.
Individual health concept
Being healthy as opposed to having a disease, is generally not the way healthiness is defined today. It has been accepted by the World Health Organization that being healthy also includes physical, mental and social well-being. This indicates the need for new ways to quantify health status and include relevant data on well-being and in healthy people. How can we go from general health advice to subgroup or even to personal advice, linking with a human data use case? Can we use dynamic data for this? Can we set up methods to measure health status in relation to nutrition?
In order to measure health we need to operationalize the concept of health and to make it measurable. In the F&N community this is often done by challenge tests to probe the individual resilience to a specific health challenge. The idea behind challenge tests is that a dynamic response is telling more about the status of the underlying processes than only the baseline values. Challenge tests are therefore a way forward in quantifying individual health.
Ideally, we would like to know all relevant aspects of health or at least the most important ones for general metabolic balance but this may seem like a difficult target to reach. Using omics, it is potentially possible to measure multiple responses to a complex health challenge and to span the individual responses across many metabolic pathways covering the human as well as the microbial composition and metabolism. This approach would circumvent the need for pre-defining all aspects of health while allowing actual variability in health to be determined. Exposing individuals to a multitude of different nutritional challenges is therefore a prerequisite for measuring nutritional health. A diet is in fact such a multifaceted health challenge and extracting information on health status is therefore likely embedded in many multi-omics nutritional datasets. For example, the variation of gut microbiome composition, often in response to external cues such as diet,2 and its far-reaching effects on the host in highly prevalent disorders is widely documented in the literature.3–6 In addition, the microbiome is emerging as an important determinant of metabolic responses to food and a driver for inter-individual variability in metabolic health biomarkers.7,8 For a more meaningful interpretation of the mechanisms underlying this two-way interplay and for translating microbiome research into effective benefits for the host, a correlation with other -omics profiles, such as metabolome9 or host transcriptomes,10,11 through physiological and pathological conditions, is required. Relatively few such datasets have been shared and open access to varied nutritional data is therefore an important goal. For this purpose, both data on diseased and healthy citizens are needed. This case study is focused on connecting the human nutritional datasets available in the F&N community to the datasets available in ELIXIR (including the BBMRI and ECRIN data) that are more focused on diseased subjects.
Nutrigenomic biomarkers are needed to clarify association between the personal genetic background, phenotype, food, and nutrition. Additionally, interactions between drugs/pharmacotherapy/toxicology and food/nutrition should be investigated.12,13
To accomplish this, datasets from complex nutritional challenge tests and dietary interventions covering multiple foods, nutrients and common food additives and toxicants are needed and need to be integrated in smart ways with the current knowledge. Work in this area has been initiated years ago in collaborative projects under NuGO, however a generalized model for data interpretation has not yet been developed. Currently additional steps are being made in the projects FNS-cloud and JPI HDHL INTIMIC knowledge platform (JPI KP).
A considerable number of highly controlled meal studies and dietary intervention studies exist (e.g. see Phenotype database and trials registries) and data from these studies are in principle decided by the consortia to be FAIR; however, they are not yet fully FAIRified and efforts to provide such datasets including related rich omics data are therefore a major priority. However, rich metadata for data intensive omics, such as the microbiome, are an issue, in particular as comprehensive (rich) diet-related metadata are challenging to obtain routinely in nutritional intervention studies. In the JPI KP rich meta-data on the nutritional studies are integrated with metabolomics and microbiome data. This type of data (metabolomics and microbiome) are also available within ELIXIR and solutions are available to share them. Integrating the Phenotype database with these solutions, e.g. Metabolights would improve the current solutions and would make the nutritional data available to ELIXIR.
In addition, omics analysis software (and hardware) for optimal data extraction, standardization, integration, and analytics tools to address the individualized response across multiple pathways are needed for the F&N community. Several of these tools exist, but some are fragmented or in an immature form and need to be joined to comprehensive workflows, where ELIXIR may help.
Individual dietary needs
Beyond defining individual health and resilience there is a need to develop methods to deliver dietary advice at an individual level to improve health. Data integration from a multitude of sources and different data types is crucial in understanding the individual human phenotype and what is optimal for his/her health. This challenge requires integration of data on behaviour, dietary intake, health-related endpoints, biometric dynamics, images, and omics data. Combining this information from a rich data source could provide a prediction of the longer-term health effect of a specific food or dietary intake even at the level of an individual.
Understanding and measuring food intake and its dynamics is central for developing effective lifestyle interventions. It is also relevant for the food industry, which has agreed to work with EU member states to make food supply healthier through food reformulation. Several tools and eHealth solutions relevant to a broader community and food industry have been developed. For instance, RICHFIELDS, JPI HDHL Foodball, ENPADASI, EIT Food, and FNS-Cloud have delivered the standardized requirements and ontologies14 that were needed to unambiguously describe the determinants of consumers' food choices, the composition of food and dietary intake (including biomarkers of intake) as well as the subsequent metabolic effects of food components in the human body. A diversity of tools is needed to integrate (personal) data that can be linked to dietary, health, and behavioural (consumer) reference data and ontologies. To give personal advice, a combination of user-friendly apps with sensors to capture non-invasive or minimally invasive biometric signals in real time are needed, that are able to onboard individuals and keep them connected and motivated. Individual data will be generated that can be used to give personalized advice but can also be used from research. For secondary use of data, consent is needed, unless data are completely anonymized. To adhere to privacy legislation federated learning solutions may be needed. To support the reuse of these individual data, the infrastructure to support this is being developed where personal health and consumer study data will be stored safely (FNH-RI). ELIXIR has the knowledge and tooling that can help to develop the solution mentioned.
Complex intake biomarker
All the above challenges require data on what people have eaten and their responses. Many self-reported questionnaires/diaries exist to enable collection of these data and some solutions are developed to make this data interoperable. However, it is important to recognise the limitations associated with such self-reported data: participants tend to under-report their food intake and the approaches are burdensome on participants thus resulting in issues with incomplete data.15,16 Therefore, there is a great need for molecular biomarkers for food intake (BFIs) to provide stronger tools for intake assessment and understand the relationship to biomarkers of health and disease. A considerable number of such biomarkers have been identified17,18 and in the JPI FoodBAll project a systematic approach was used to define,19 search20 and validate21 BFIs and to introduce elements of an ontology for the area.22 Moreover, several food challenge studies were conducted to find new candidate BFIs for the most common foods in Europe, covering specific meats,23,24 fruits,25,26 vegetables,27 legumes,28 and several others. However, biomarkers for many foods have still not been proposed and many of the candidate biomarkers published are still not fully validated. An example of an important aspect of validation, which is often missing in the work on BFIs is the quantity of the food ingested; the current methodology is largely qualitative and combining data across several studies could be a facile way to improve this aspect of validation. To develop and validate this type of biomarker rich standardized questionnaires on intake, as the (not-so-) gold standard, and metabolomics data (as potential source of the biomarkers) are often used,29 but as more and more markers are identified they may also be further refined by biomarker combinations.26,30 Work on biomarker approaches to assess whole diets have also been advanced in recent projects and could develop into another set of tools in the intake biomarker toolbox.31–33 However, tools to search information related to biomarker validation and rich databases of nutritional studies with associated metabolomics data are needed to reach a coverage of the diet with appropriate biomarkers to allow improved assessment of dietary intakes and compliance. These data need also to be made interoperable, both in terms of the metadata provided with studies and the information needed to identify the compounds identified as potential BFIs. This requires rich databases of mass spectral and other compound information that can be searched and used for verification. Some of this work has been started by other international players and some of it in the JPI HDHL Foodball project, but the solutions and resources require further solutions for interoperability. The interaction with ELIXIR may make further development possible especially by connecting to the MetaboLights initiative or similar repositories.
A key issue is to understand the relationships between consumer behaviour, food intake, and nutritional status to help consumers to choose a healthy diet. This information would also be helpful to commercial companies in the development of more healthy foods and dietary solutions. These types of data are collected by dietary monitoring systems in interventions and observational studies conducted across Europe using a standardized methodology. Important in this data collection is storage of relevant metadata on the study design as well as the methods used. Moreover, standards for dietary assessment, measurements for dietary intake and nutritional status are needed for comparability and quality of data, e.g. to start the discussion about FAIR (Findable, Accessible Interoperable and Reusable) food data. Guidelines to improve reporting of dietary assessment and nutrition research will help to re-use existing evidence for public health recommendations.34 Relevant ontologies for nutrition knowledge can guide users towards better use and appraisal of research findings.35
In the last decades, a great amount of work has been done in predictive healthcare using Artificial Intelligence methods as a result of the existence of publicly available biomedical vocabularies and standards together with tools for standardization of health-related data. Despite the large number of resources in the health domain, the food and nutrition domain is still low-resourced. There exist only a few food ontologies that are developed for a specific application scenario, with a small number of studies that are focused on exploring relations between different ontologies and standards. For this reason, the workshop “Big Food and Nutrition Data Management and Analysis - BFNDMA” was initiated at the 2019 IEEE International Conference on Big Data (Los Angeles, USA) and is an ongoing initiative, with a focus on methodologies for management and analysis of food and nutrition data.
The project RICHFIELDS started working on standardization of food items that are described and classified using different standards, by presenting a method known as StandFood36 that is a synergy of Natural Language Processing and Machine Learning in order to standardize and classify foods with regard to the FoodEx2 provided by the European Food Safety Agency (EFSA). The method focuses only on identifying lexical similarity between the English names of the food items. Further, this work continues as a part of the current project FNS-Cloud, where different food semantic standards (i.e. Hansard taxonomy, Food-On, OntoFood, and SNOMED-CT) have been explored for food data annotations. The results showed that not all ontologies relevant for F&N are in place and that the current ontologies do not have a good coverage.37 Based on the results, the FoodOntoMap,38 a semantic resource was created, which provides links between different food ontologies that can be further reused to develop applications for understanding the relation between food systems, human health, and the environment. Additionally, the FoodBase39 corpus is one of the first annotated recipe corpuses with food entities standardized using the Hansard taxonomy. To make this information available for subject-matter experts, the FoodViz40 tool has been presented as a visualization tool for presenting automatically annotated food-related textual data, where subject-matter experts can check the automatically annotated results and also make corrections (i.e. manual annotations). In this way, textual data related to food, nutrition, and health, as a typical example of unstructured big data being collected from different data sources, can be handled.
Together with ELIXIR, the F&N Community will standardize, connect, and model the data pipeline. Furthermore, standardisation in food behaviour data (why do I eat what I eat), especially food choice motives, will provide tooling to create unique data pipelines based on linkage between food intake and its determinants including contextual variability and will prevent high measurement errors and time-consuming data collection. All tools and services will also become available for the wider ELIXIR community.
One of the aims of the interaction between ELIXIR and the F&N Community may be to unify the RICHFIELDS, ENPADASI and FNS-Cloud requirements and ontologies and link them with other standardised ones (provided through ELIXIR). Standardisation in consumer and human nutrition science is mandatory to exchange data and to monitor and analyse food behaviour. Data gathered by the ELIXIR Implementation Study “A microbial metabolism resource for Systems Biology”, are of special interest in this respect, as they can be exploited to integrate metagenomics data with the human nutritional phenotype. The effect of food in modulating the human gut microbiome is broadly recognized, but standardized tools for exploring big data at the interface between nutrition and the microbiome for self-sustainability of health are still fragmented and lack acceptable levels of standardization and integration.
The F&N Community is now using a number of profiling tools to explore the relationship of nutritional change with biological outcomes. This clearly aligns the area with a number of topics covered in ELIXIR. This includes data processing and analysis tools, databases for proteomics, genomics, mass spectral information and microbiomics. Especially for development of biomarkers related to health, dietary exposures and biological effects there is a good prospect for cross-fertilization of the food and nutrition area with several other topical areas within ELIXIR.
Some of the datasets in the F&N Community are already well structured, but not yet fully FAIR. For example, ENPADASI has developed three templates for data collection covering all types of study design, study content and study objectives. The project has collected 111 intervention studies (of which 27 studies were made open access) and 23 observational studies. Those studies include clinical chemistry data, transcriptomics, genomics, metabolomics and microbiome data. As databases the Phenotype database and the Opal/Mica systems were used. Datashield was used to integrate data from the different systems and the FAIR principles were considered as much as possible. Minimal requirements for nutritional data sharing and information on studies needed for quality appraisal were developed, these requirements will make sure that all information relevant to judge the study quality are shared, where possible. In addition, based on the templates and uploaded studies nutritional terms were identified that were mapped to existing ontologies and new ontologies are being developed for nutritional terms (ONS) and nutrition epidemiology (Ontology for Nutritional Epidemiology). As indicated, we need to connect to other data sources and knowledge to fulfil the goals in the community. This requires full implementation of FAIR and Omics related standards in ELIXIR.
RICHFIELDS worked towards solutions to collect data on dietary habits. This includes privacy sensitive data. Therefore, RICHFIELDS has examined how the GDPR Regulation addresses these privacy matters. RICHFIELDS provides a framework for the design of the ethical and legal aspects and includes the following recommendations: (1) use of pseudonymisation with appropriate safeguards for unauthorised reversal of pseudonymisation; (2) use of appropriate technical and organisational measures to ensure GDPR compliance; (3) systems for dealing with queries and requests from data subjects; (4) appointment of a Data Protection Officer (DPO); (5) mechanisms for handling freedom of information (FOI) requests; (6) use of suitable data protection clauses for trans-border data transfer; (7) obtaining insurance to cover liability in the event of data breaches; and (8) the establishment of an independent ethics committee with the remit to monitor the activities, protocols on matters relating to security, transfer of data to third countries, assessing genuineness of requests from data users and procedures for dealing with ethically suspect requests, and procedures for handling requests from data subjects. This framework may be integrated with the solutions on this matter in ELIXIR.
The above-described F&N challenges all require complex interaction of diverse data sets. This integration will require knowledge networks, specific tools and algorithms development and analysis pipelines and may require hardware (cloud solutions). Although some Food and Nutrition knowledge networks and analysis pipelines are in place (e.g. Micronutrients Wikipathways, NutriGenomeDB), many are still lacking or require better generalization for reuse. Currently the F&N community has no cloud solution available, which may be needed especially if microbiome data (or other datasets that require large computation volume) are being integrated. Several of the datasets needed in the community will be private data, for which explicit consent is needed if the data is reused. Solutions for requesting secondary data reuse are therefore important and need to be outlined to improve ease of application.
Understanding the relation between diet, microbiome, metabolome and health was identified by popular vote as the one area where:
In our first implementation study we will first work towards a list of available datasets and tools within our community and then connect the currently available F&N databases to Metabolights.
ELIXIR technical activities are performed by ELIXIR Nodes and supported by the Hub. The Nodes run bioinformatics resources and services focusing on national priorities and contributing to a harmonised strategy across Europe. At the national level, an ELIXIR Node consists of research institutes within a member country, building on their national strengths. At the European level, ELIXIR’s activities are structured around Platforms and Communities, which bring resources and expertise together from several ELIXIR Nodes. The ELIXIR Platforms are responsible for the implementation of the ELIXIR Scientific Programme, which is organised into five key areas: Data, Tools, Compute, Interoperability and Training.
The 11 Communities that are currently recognised are: Federated Human Data, Rare Diseases, Human Copy Number Variation, Microbiome, Metabolomics, Proteomics, Galaxy, 3D-BioInformatics, Intrinsically Disordered Proteins, Microbial Biotechnology and Plant Sciences. The Communities drive the work of the ELIXIR Platforms by describing their bioinformatics requirements. A close collaboration between the ELIXIR Communities and Platforms safeguards services developed by the ELIXIR Platforms would be fit for purpose.
F&N activities are well represented within Europe and ELIXIR nodes. Following the establishment of (inter) national efforts such as ENPADASI, RICHFIELDS, NuGO and the formation of the FNH-RI, an ELIXIR Community on F&N can have a positive impact on the community, strengthening collaboration and delivering a more harmonised strategy amongst service providers. F&N aligns with the ELIXIR Platforms and Communities as well as other health related themes represented and proposed in ELIXIR like human data and rare disease and including omics data. Although the use case (understanding the relation between diet, microbiome, metabolome and health) described above is not the only activity of interest within the ELIXIR Nodes, we proposed this use case as a starting point of common interest to catalyze the collaboration amongst ELIXIR Nodes and ELIXIR Platforms.
The F&N Community is focussing on why people eat what they eat, and how that affects their health. Food (sustainable, healthy, affordable, reliable, and preferable), behaviour (purchase, preparation and consummation) and dietary intake data and integration tools are needed to fulfil the current challenges. Moreover, the F&N Community needs to connect to other (biomedical) health-related communities and ESFRIs (BBMRI, ELIXIR and ECRIN) as prevention of disease requires understanding the nature of health relative to disease. The community has developed and will develop big and linked open data solutions and tools as for example, food purchase apps and e-Health, but also models regarding sustainable food and food security (making sure the whole system provides enough food on a sustainable basis) are essential to come to this goal, these may be relevant to ELIXIR. The molecular resources and bioinformatics tools available in ELIXIR are relevant to the F&N Community. Moreover, shared data is available in the community, including several omics related datasets (e.g. transcriptomics, metabolomics and microbiome). These resources are highly relevant to ELIXIR. The metabolomics use case is specifically relevant for the Food and Nutrition community as the effects of food are small and diverse, which makes metabolomics an important platform.
The F&N use case requires all the Platforms of ELIXIR (Tools, Interoperability, Data, Compute, and Training) as all these areas are relevant to the F&N Community. Our well-structured and publicly available databases containing data on healthy ‘subjects’ (citizens) and composition of foods should become interoperable and part of the ELIXIR FAIR data resources. These data on lifestyle and health prevention will be a new asset to the FAIR data backbone, as there currently is a predominance of patient data. With these datasets FAIRified and interoperable, the ELIXIR tool set will make it possible to analyse data in the whole range from health prevention to disease. Moreover, the different projects and consortia have delivered several databases, software tools and training materials that are useful for a broader community, which is described below. The F&N Community will help to strongly advocate the ELIXIR services and broaden their user base.
The ELIXIR request for 2021 Data Platform priorities focuses on sustaining Europe’s life science data infrastructures in the long term by working on guidelines and indicators to improve the impact of data resources and their long-term sustainability. Additionally, this platform aims to improve links between curated and non-curated data resources and literature.
The F&N Community realised early on that they need a study capturing tool that can capture the full study design, large study outcome data sets from for instance transcriptomics studies and rich phenotype descriptions and that also offers an analysis platform for integrated data analysis for related studies. The Phenotype database was developed for this purpose after evaluating other study capture environments available at that time. These alternatives, especially ISA-creator, were not able to capture study designs typical for the domain (cross-over studies) and did not have the necessary support for food intake registration and the nutritional phenotype in general. The continued development of the phenotype database was supported in the ENPADASI project which also allowed development of a food and nutrition ontology that can be used to capture most aspects that are specific to the field.14 Further development of that ontology will happen in the new Food Nutrition-Security Cloud project (FNS-Cloud).
Study capture databases like the phenotype database basically are project level databases that come in between the capturing of individual studies happening in the departments, clinics, metabolic wards, and laboratory environments and the scale of technology specific to ELIXIR recommended data repositories and Biosamples/Biostudies. Other examples are Molgenis (used in the rare disease field), ISA-tools and FAIRdom. We think there are important challenges in the integration of study level data capturing. Discussions with especially Molgenis, have indicated that co-development of a template system and software libraries that support ontology term selection based on combinations of free text entry and pull-down menus would be really useful and would in practice lead to collection of more interoperable data. Increased data interoperability and the ability to find and access data between instances of study capture databases would also allow federated analysis across such instances.
The Tools Platform drives access and utilisation of bioinformatics research software by working closely with services and connectors. Additionally, this Platform aims to facilitate the discovery (bio.tools, EDAM), benchmarking (OpenEBench) and interoperability of bioinformatics software, by focusing on software development best practices (4OSS), and on strategy for workflows and software containers (BioContainers).
The tools provided and used by the F&N Community will be registered in the bio.tools registry, and related concepts will be added to EDAM if needed. Training in software best practices and development of Software Management Plans will help the community with providing high-quality computational tools.
The Interoperability Platform provides support to the discovery, integration and analysis of biological data based on FAIR principles. A set of recommended tools and services have been selected by the Interoperability platform and they are named Recommended Interoperability Resources, including persistent identifiers, metadata, and data markups (Bioschemas), standard for workflow description, registries for ontologies, controlled vocabularies, exchange and storage formats. This Platform facilitates work on the description of interoperability services and organises specialised BYOD (Bring Your Own Data) workshops with the aim to improve the FAIRness of data resources.
F&N data must be interoperable in order to align with other data sources to be able to answer the complex scientific challenges. Extension of the current metadata standards toward food and consumer science is needed to make this possible. Part of this has been included in ENPADASI and Richfields, but full integration with other interoperability platforms is lacking and black areas are still present.
The Compute Platform is devoted to the compute, transfer, storage, authentication, and authorization related to biological data relying on services provided by ELIXIR Nodes and other e-infrastructures.
Key elements of the Compute Platform are:
1. Identity and access management, including AAI.
2. Making datasets available in relevant cloud providers.
3. Defining and coordinating an ELIXIR hybrid cloud ecosystem.
4. Community containers being deployed and operated at scale.
It doesn't directly provide resources but can help in brokering access.
The interaction with the compute platform will allow the F&N Community to process an increasing number of relevant experimental data (i.e. omics) using standardized bioinformatics pipelines and statistical analyses. Therefore, the resulting data structure will be suitable for comparison, integration and modelling. The computing power necessary for the analyses that the F&N Community needs for the development and application of nutrition health care models will be huge. The Community should have a helpdesk that knows the access points for the Compute Platforms and how to use them.
Training is a key component of the sustainability of a community. We will make all training material available for the ELIXIR Community to collect data on food composition, dietary assessment, and food related behaviour of highest scientific quality to support food intake studies in surveillance, nutritional interventions and clinical studies. For new food and health web-based and mobile applications new training material will be developed (based on a structured gap analysis on the training needs). Training on the use of measuring behaviour, dietary intake, food matching and imaging algorithms and tools, and their limitations will be made accessible. Moreover, the training packages developed by ENPADASI on nutritional data upload and by RICHFIELDS and ENPADSI on guidelines in relation to ethics, privacy and IP with a focus on data sharing will be made part of the ELIXIR training portfolio.
Courses from the F&N Community will be registered in TeSS (the ELIXIR Training portal) and first steps will be taken to define learning paths considering the different starting points of the users. On the other hand, ELIXIR courses, such as those on tools and services related to data management and stewardship will be necessary to the F&N Community. Special attention, also due to COVID-19 challenges, should be focused on e-learning materials, tools and services and virtual or hybrid training/CB events. ELIXIR training platform services are listed on the Platform’s website.
Connection with innovation and SME forum
Omic approaches are becoming a reality also at the company level and not only in the academic field. A rapid and cost-effective and on-site technology represents the gateway to the application of (nutri-)genomics directly in food production and nutritional screening supporting technology transfer. This has led to a positive increase to data production and access by private citizens (e.g. microbiome screening) but also sometimes an incorrect use of analytical techniques, with a potential long-term impact on data reliability. The guidelines for correct data and pipeline integration may derive from the connection between the F&N Community and the SME forum in order to collect the real needs of companies, in terms of training, capacity building and research lines.”
Alignment with ELIXIR communities and focus groups
The ELIXIR Platforms are currently complemented by 11 scientific Communities, here we indicate the Communities and focus groups that are most relevant to the F&N community:
• The Federated Human Data community for long-term strategies for managing and accessing sensitive human data and connecting consumer and patient data.
• The Rare Diseases community for privacy issues on the individual data and describing phenotypes.
• The Marine Metagenomics (Microbiome) community for the solutions in the area of microbiome/metagenome analysis.
• The Biodiversity Focus Group for the accessibility to taxonomic and molecular data (including other metadata) related to the species described so far (biodiversity catalogues).
• Plant Science Community for the link between plant science in general and plants as food compounds.
• The Metabolomics Community for readouts of intake and health.
• The Toxicology (not yet an approved Community) on describing phenotypes.
• The newly developing Microbiome community and the Microbial Biotechnology Community for two different microbiome approaches.
• The Machine learning Focus Group for complex data integration (including omics and personalization) with specific focus on metabolomics and microbiome, and software and pipelines for analysis.
• Other Communities are Proteomics, Galaxy, 3D-Bioinformatics and Intrinsically Disordered Proteins.
The F&N Community is in several ways in line with the other communities defined so far in ELIXIR and requires similar data solutions, but it also adds new data sources and new solutions to the current ELIXIR Communities. For instance, the F&N Community adds a consumer perspective to the current Communities in ELIXIR. Working with consumer data has privacy and ethical issues that are in line with the other communities (especially the rare diseases because of the individual data and the possibility of de-anonymization). Interestingly people also collect health and food-intake related data with smart phones and other personal devices, including sensors to capture non or minimally invasive biometric signals in real time. Individual digital health is becoming a revolution. Advances in sensor design, smart device connectivity and data acquisition help to keep track of parameters such as food intake, calories burnt or activity levels to complement self-reporting dietary electronic notebook.41 In addition, smart devices can be used to send meal photographs, notifications and reminders to complete tasks providing additional data to evaluate diet adherence or consumer experience.7 This is expected to become more relevant and especially more used in the F&N field rapidly. For ELIXIR this means involvement in a new and modern data field that wasn’t covered so much up to now. The individual data collection needed for this is in line with the work of the Rare Diseases Community.
The current Federated Human Data Community is directed towards patients, which could create mutual enrichments by aligning with the data on healthy free-living individuals that are collected in the F&N Community. To make this possible, alignment of standards is needed.
The relation between dietary intake, microbiome and health is becoming more and more obvious. Standardization of metagenomics research is needed to bring the research to the next level. The Marine Metagenomics (Microbiome) Community is actively working in the area and the F&N Community can benefit from their developments. Links to workflows, sequence reference resources and tools for taxonomic/genetic profiling of microbiomes developed by the Microbiome Community are relevant to the F&N Community. For example, ELIXIR-Italy provided amplicons and shotgun metagenomic data analysis tools, DNA barcoding reference databases and recently a virtual research environment targeted in particular at eukaryotic microbial communities. In the context of this connection, specific sections in the described IT resources could be dedicated to nutrition-related microorganisms (such as gut microbes, food production chain microbes, probiotics…). In this respect links to the Biodiversity Focus Group and the Microbiome Community are relevant as well. For example, the increasing connection of nutritional -omics datasets will contribute to the identification of unbiased microbial biomarkers. This is difficult, given the intrinsic nature of the microbiome and its genetic pool. In contrast to the human inherited, largely static and “non-coding” genome, the metagenome is gained and has a very dynamical composition in response to a multitude of factors. Often viewed as hosting only pathogens, causing disease, or as passive bystanders, it is now observed that the microbiome has a wide ‘grey zone’ that cannot be simply classified into this dichotomy. Many ‘pathogens’ inhabit disease free-hosts (e.g., Helicobacter pylori) and also commensals may promote pathology onset under certain conditions (e.g. immuno depressed or highly stressed subjects). Because of this high variability between individuals, it is very difficult to establish what the normal condition is, what dysbiosis is and whether the latter is associated with an unhealthy phenotype42 or diet. For this, the F&N Community has proposed some general metrics, such as alpha species diversity, the ratio of Firmicutes to Bacteroidetes phyla, and the relative abundance of beneficial genera versus facultative anaerobes or pro-inflammatory microbes.43
The role of the microbiome also spans beyond the microbiome composition and requires functional analysis. A central question is whether we can predict and model the metabolic activities in the gut starting from available, mostly sequence, data. In a practical sense that leads to evaluation of DNA and RNA sequences to predict the presence of enzymatic activity on the protein level using recognition of translated domains and mapping to microbiome genome scale metabolic models and combination with metabolomics data. This has clear links with the Microbial Biotechnology and Metabolomics Communities and for the complex integration of these omics data and the personalization of nutritional interventions with the machine learning community and also to the currently active Systems Biology Focus Group.
Understanding the effect of food on health requires measures of intake and of health; metabolomics platforms can deliver these markers as has been shown in the JPI HDHL project, Foodball. Therefore, a link to the Metabolomics Community is very important to the F&N Community. Linking to other communities is also important; for example, linking to the plant science community is crucial because of the interrelationship between plant nutrients and their effects on human metabolism and health. Large-scale transcriptomics studies demonstrate that edible plants, also including their derivatives such as olive oil, can induce significant changes in the human gene expression profile including many ncRNAs (i.e. miRNAs and lncRNAs) that are key regulators of gene expression. It is noteworthy that plant nutrients have been demonstrated to have a positive impact on many signalling and metabolic pathways related to diabetes, obesity, neurodegenerative disorders and in general to the immune response to stress factors and inflammation. At cell process level plant nutrients have been demonstrated to have positive effects (i.e. gene expression) on different DNA repair mechanisms, apoptosis, oxidative phosphorylation and mitochondrial metabolism, just to cite only the most relevant. To elucidate the molecular mechanisms and the plant nutrients able to induce such beneficial effects on human health requires an effective integration of the Plant Science, Metabolomics and F&N Communities at level of competences, data resources and analysis tools. The F&N Community members have partly implemented the FAIR principles in ENPADASI and have developed ontologies, quality assessment tools, and standardized questionnaires that have been made publicly available as much as possible.
The F&N Community brings data on consumer science and ways to collect them to ELIXIR. Moreover, we bring ways to collect rich meta-data e.g. on cross-over designs, healthy phenotypes, challenge studies, protocol sets, dietary data and dietary questionnaires.
The F&N community is aligned with several other ESFRI; see Table 1 for these interactions.
A number of projects have previously developed bioinformatics tools for the Food & Nutrition field. These may also become for the ELIXIR Food & Nutrition Community. Some of these projects are producing data and/or tools which can be used by the F&N Community (see Table 2).
Project Name | Project Website |
---|---|
NuGO | http://www.nugo.org/ |
MPG | https://www.wikipathways.org/index.php/Portal:Micronutrient |
ENPADASI | http://www.enpadasi.eu/ |
EuroFIR | https://www.eurofir.org |
FNS-Cloud | https://www.fns-cloud.eu/ |
Food4Me | https://www.food4me.org |
FoodBall | https://foodmetabolome.org |
FoodPhyt | https://research.chalmers.se/en/project/9396 |
QuaLiFY | http://www.eurodish.eu/ |
Environment task in EJP-RD | https://www.ejprarediseases.org/ identifying role of (micro) nutrient in rare disease networks |
CIRCLES | https://circlesproject.eu/ Controlling mIcRobiomes CircuLations for bEtter food Systems |
MASTER | Sustainable Food Security |
Table 3 shows an initial overview of data and tools that the F&N community has available at the time being. During the upcoming Implementation Study this will be extended.
Tools/Data | Short explanation and benefit for ELIXIR F&N Community |
---|---|
SOP portal | Procedure sharing and analysis provenance |
NuGOWiki | Annotating food molecular food constituents with nutritional information |
NuGO Blackbox | Needs for automatically updating, shared and collaboratively maintained compute infrastructure based on BioLinux. A kind of early cloud. But also an early collection of needed tools and ideas for NuGO single sign-on (now available as ELIXIR AAI) |
Micronutrient Genomics Project | Expert groups for different groups of micronutrients; did collaborative research on that and started to create micronutrient pathways, with its own portal on WikiPathways, and to prepare reviews for individual micronutrients, network (?) evaluation for combinations of effects (e.g. in epigenetics and in antioxidant effects) Possibility to move that to projects, although only funded as an IRSES for exchanges. |
Phenotype Database | DASH-IN, and ENPADASI project |
FoodBAll ressources | http://foodmetabolome.org/ |
Nutritional Ontology | ONE and ONS |
NuGOarray | For legacy only, not likely to lead to new needs |
Biomarker development | Including challenge tests for health biomarkers (discussed above) |
Nutritional systems biology | Typically changes in diet act in concert and affect many aspects of the system, so it is less “find the interesting, affected gene/protein or possible drug target” and more “understand the small system changes that act in concert”. That meant that nutritional research was amongst the first to adopt technologies aiming for such system wide effects like pathway and network analysis. |
NutrigenomeDB | Easy-to-use web application that allows exploration of differential gene expression profiles from nutrigenomics experiments through data tables and interactive visualizations |
Food intake | Food4Me Food Frequency Questionnaire (FFQ); validated electronic measurement of food intake, available in 8 languages & food images for estimation of amount of food |
Foodbook24; validated web based dietary assessment based on 24 h recall | |
Food Profiler; validated App in multiple EU countries for collection of user data on food consumption patterns and consumer behaviour. | |
Healthy Eating Index calculator; Healthy eating index can be calculated based on food groups derived from Food4Me FFQ and is a measure for diet quality according to EU standards | |
Knowledge rules | Food4Me decision trees; a decision support tool that uses a tree-like graph or model of decisions that uses biological user data (intake, genetics, health, anthropometrics) and delivers nutritional recommendations based on this user data. |
Metabotypes decision trees; a decision support tool that uses a tree-like graph or model of decisions based on subgroups of users that matches on their clinical chemistry and anthropometric data and receive dietary advice tailored to this ‘metabotype’. | |
Food composition | EuroFIR FoodEXplorerTM & FoodBasketTM; web based delivery of information on food composition and recipes |
Food composition, food consumption, Total Diet Study and brands | FoodCASE, an information system to manage and generate food composition, food consumption, total diet study and brand data. The tool offers several functionalities to estimate, analyse, link, visualise, and publish data. |
Rules toolset | A cloud-based, efficient tool for delivering personalised healthcare information. The Rules Toolset is a unique product that supports scientists to transform the synergistic input of nutritional, biological, medical and genetic information into a comprehensive report in the simplest way, regardless of the complexity of the logic. |
RICHFIELDS architecture | Platform with food, health, behaviour reference data and ontologies, and knowledge rules |
Food classification & description tools | Langual Food product Indexer; FoodEX2 Browser Tool |
ECRIN Nutrition ontology | Taxonomy and ontology |
EFSA Food ontology | Taxonomy and ontology |
RICHFIELDS Behaviour ontology | Taxonomy and ontology |
Integrated web services | APIs to food reference databases and knowledge rules |
Smart Food Intake | 2 hr dietary recall and underlying food choice motivations |
mobile Food Record | Image-based intake assessment tool (Nestlé) |
Automated Consumer-centred advice generation | knowledge rules based on Bayesian Belief Networks integrating with drivers of food choice & psychosocial factors |
3D printer | Personalised food manufacturing |
Smart food manufacturing hub | Personalised food manufacturing |
Personalised menu Planners | Via linear modelling approach (Qualify) |
1 Data Science can have different specialisations, depending on the domain of research; it is used in fields like bioinformatics and biostatistics, and in other areas as well like computer science, biology, biochemistry, medicine, statistics, math and engineering. Data Science uses the tools, code them or even develop new and better data models and algorithms.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the topic of the opinion article discussed accurately in the context of the current literature?
Yes
Are all factual statements correct and adequately supported by citations?
Partly
Are arguments sufficiently supported by evidence from the published literature?
Yes
Are the conclusions drawn balanced and justified on the basis of the presented arguments?
Yes
References
1. Jan I, Sharma P, Bansal P: Microbial Bioinformatics Approach in Food Science. In: Karnwal, A., Mohammad Said Al-Tawaha, A.R. (eds) Food Microbial Sustainability. Springer, Singapore. 2023. 267-288 Reference SourceCompeting Interests: No competing interests were disclosed.
Reviewer Expertise: Biochemistry, Phytochemistry, Computational studies, and Bioinformatics.
Is the topic of the opinion article discussed accurately in the context of the current literature?
Yes
Are all factual statements correct and adequately supported by citations?
Yes
Are arguments sufficiently supported by evidence from the published literature?
Yes
Are the conclusions drawn balanced and justified on the basis of the presented arguments?
Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Food production, Ag-nutrition variation, Ontology development and data annotation, Bioinformatics.
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | ||
---|---|---|
1 | 2 | |
Version 1 25 Aug 22 |
read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)