The future of metabolomics in ELIXIR

Metabolomics, the youngest of the major omics technologies, is supported by an active community of researchers and infrastructure developers across Europe. To coordinate and focus efforts around infrastructure building for metabolomics within Europe, a workshop on the “Future of metabolomics in ELIXIR” was organised at Frankfurt Airport in Germany. This one-day strategic workshop involved representatives of ELIXIR Nodes, members of the PhenoMeNal consortium developing an e-infrastructure that supports workflow-based metabolomics analysis pipelines, and experts from the international metabolomics community. The workshop established metabolite identification as the critical area, where a maximal impact of computational metabolomics and data management on other fields could be achieved. In particular, the existing four ELIXIR Use Cases, where the metabolomics community - both industry and academia - would benefit most, and which could be exhaustively mapped onto the current five ELIXIR Platforms were discussed. This opinion article is a call for support for a new ELIXIR metabolomics Use Case, which aligns with and complements the existing and planned ELIXIR Platforms and Use Cases.


This article is included in the gateway. ELIXIR
This article is included in the gateway. EMBL-EBI

Introduction
Metabolomics aims to provide novel insights into the biochemistry of organisms by characterising the presence and concentrations of low molecular weight compounds from biological samples. It measures both endogenous (produced within an organism) and exogenous (those introduced from the environment including food components and drugs) metabolites. The primary analytical tools for such high-throughput data collection are mass spectrometry (MS), often preceded by chromatographic or electrophoretic separation technologies, and nuclear magnetic resonance spectroscopy (NMR). These technologies produce relatively large and complex data sets that require bioinformaticians, cheminformaticians, biostatisticians and computer scientists to develop and apply a wide range of algorithms, software tools, repositories and computational resources to process, analyse, report and store the data and metadata.
The field celebrated its coming of age in 2016 1 and progressed primarily through developments in analytical and computational tools, from which biomedical discoveries followed. As shown in Figure 1, the term 'metabolomics' is still gaining momentum and the global market for metabolomics was valued at $5.9 billion in 2014 and was expected to reach $12.5 billion by 2020, with a compound annual growth rate (CAGR) of 13.0% (https:// goo.gl/yXTiJD). The future is bright for the application of metabolomics in academic and industrial laboratories, scientific instrument companies, government laboratories and contract research organisations. Yet, several challenges remain. Discussions amongst both independent metabolomics experts, and those within ELIXIR (http://elixir-europe.org/), culminated at the recent workshop the "Future of metabolomics in ELIXIR". This opinion article summarises the interactions in the workshop and its outcomes.
ELIXIR coordinates bioinformatics resources across its member states and help researchers to find, analyse, and exchange biological data. It is a distributed infrastructure with a single Hub based in Hinxton, United Kingdom, and an increasing  One important consideration in this arena is that different confidence levels can be associated with structural identification of metabolites 17 , and standardized annotation schemes for such evidence descriptions are currently emerging 18 . The highest confidence level can only be observed when instrumental data for an authentic chemical standard is matched to the data for the biological samples. However, authentic chemical standards are not available for many metabolites and therefore bioinformatics and chemoinformatics, including through ELIXIR supported resources, are essential to solve this community-wide hurdle.
Recently, as one strategy to accelerate the identification of metabolites in biological systems, a task group within the International Metabolomics Society has promoted the idea of characterising model organism metabolomes 19,20 . The philosophy of this task group is to leverage upon the critical mass of knowledge and activity that already exists for model organisms, linking to these other efforts and exploiting resources such as sequenced genomes to predict metabolism (as targets for experimental investigation) using genome-wide metabolic reconstructions or metabolic pathway databases KEGG 12 , BioCyc 13 , and WikiPathways 21 . The task group has set a grand challenge for the community, to identify and map all metabolites onto metabolic pathways, to develop quantitative metabolic models for model organisms, and to relate organism metabolic pathways within the context of evolutionary metabolomics 22 . To assist the experimental community in generating and gathering high-quality metabolomics data about biological models, an Implementation Study proposal called "MetabolHomes" has been submitted to ELIXIR as part of the present Metabolomics Use Case, which aims at providing users with a generic data model and the associated software tools for data management, visualisation, and annotation.

Tools and standards registries
Since its inception, the Metabolomics community has developed a plethora of computational tools for data analysis (https://goo. gl/Crf2Ye), as well as data and minimum information standards (https://goo.gl/gouSQY). There is a general feeling that it is difficult for the average researcher to navigate both areas.
The suggestion here was to contribute to and improve long-term supported registries of tools and standards that help researchers decide which mature, well-tested tools and standards to use for which purpose. There is a need for more metabolomics resources to be appropriately included in Tools and Data Service Registry (tools), and the FAIRsharing registry (standards, databases, repositories and data polices), the two resources part of the ELIXIR platforms".

Compound identifier mapping
In order to interpret the biological relevance of measured metabolites, their structures must be mapped to existing knowledge i.e., to pathways using a multi-omics data integration approach. Many metabolomics experiments measure a metabolite and characterise the structure with a retention time, one or more m/z values, or an NMR spectrum. Sometimes this characterisation can be linked to an identity of a specific chemical compound but often it can be only linked to a compound class. However, the tagged identity is commonly different from what is found in metabolism knowledge bases. In fact, different knowledge bases may have different focuses and representations of compounds. For example, one knowledge base may focus on the biological role of the metabolite, while another contains precise representations of the metabolites chemical structure and properties. Furthermore, knowledge bases are typically either too broad or too narrow in scope, and frequently not interoperable. Chemical structure mapping is therefore an important aspect to ensure interoperability between experimental and biochemical resources.
No generic solution currently exists, and people use either mapping based on expert knowledge 23,24 or on equivalence based on the chemical structure, e.g. with the InChI string or key 25,26 . However, neither approach is well-suited for solving the issues around ambiguities in the characterisations of both the experimental side and the knowledge side. Theoretical solutions exist for linking these facts, such as scientific lenses 27 , but these need to be extended to service the metabolomics research field.
In addition, pattern recognition analysis, such as Pavlidis template matching (Pavlidis and Noble, 2001) could further assist in identifying the biological role of the metabolite in the metabolic network and add to its chemical identification. Pavlidis template matching clusters a metabolite based on its concentration pattern with other metabolites of known identity or class in an experiment (or multiple experiments that are combined in a meta-analysis).

Omics data integration
Metabolites function as both reactants and products of metabolic reactions. However, they also serve as regulatory molecules of proteins, affecting the structure and control of protein interaction and gene regulatory networks. This dual role of metabolites ensures that metabolomics is an integral aspect of systems biology research.
There is a great need for standardised integrated multi-omic analyses for the comprehensive understanding of the cellular physiology with significant applications in biomedicine and all the spectrum of biotechnology. Thus, establishing standardized protocols of multi-omic (i.e. metabolomic, transcriptomic, proteomic, and interactomic) data representation, integration, visualization and interpretation is of great importance. Currently, metabolomics data can be integratively visualised with transcriptomics data. However, there is a lack of integrated omic databases for most model systems and it is not self-explanatory how a genomicist/proteomicist could integrate his/her data with metabolomics data that refers to the same biological system and vice versa. Additionally, there is a lack of harmonization of the experimental design, sample collection, handling and quenching protocols of metabolomics monitoring, which further increases the challenge of integrated omics analysis.
The ELIXIR group proposed that these issues of integrated omic analysis and the standardization of metabolomic data interpretation in this context, could be tested and explored via comparison and analysis of a controllable reference biological system, such as a well-characterized human cell line.
Genome scale metabolic networks and metabolic pathway databases contain information both on metabolites and their reactions with corresponding genes and proteins. Thus, these networks provide valuable context for simultaneous interpretation of metabolomics data and other omics data. However, mapping metabolites in these databases is a heavy workload (see above), since most of databases use specific identifiers for small molecules, where ideally chemical structure mapping should be applied. This is a particularly striking issue with genome-scale metabolic models, as these were initially built for constraint-based computational studies (flux balance analysis and related), where the chemical structure of small molecules plays no role. Because of this, no effort was put into those models to use proper small molecules identifiers, and instead only short and ambiguous names were used for small molecules, making mappings very difficult. Hence, omics data integration was not considered at all in their design and most of these databases (available in SBML format) do not provide standard metabolite identifiers. The MSI recommends a number of identifiers e.g. HDMB ID, ChEBI ID, CAS ID, and IUPAC Name, but only InChI and InChIKey encode the chemical structure themselves.
Recently, the Recon human genome-scale metabolic reconstruction network was enriched to incorporate InChIs, but more comprehensive chemical structure mapping is needed to capture all of the biological details. However, still most of the existing genome-scale metabolic reconstructions available for other organisms have not been enriched at this level. There is thus a strong need to coordinate with this community in order to facilitate the integration of metabolomics data in the context of these networks.
The problem of integrating metabolomics data into genome-scale metabolic models does not end in the community being able to map small molecules available there to proper identifiers, it only begins there. Classically, most established analysis methods used to simulate those models (Constraint-based analysis methods) are not prepared to use metabolite concentrations/abundances, as they only aim to balance incoming and outgoing reaction fluxes on each metabolite. So the use of metabolomics data in the context of these networks will present new challenges to the modelling community as well.

Metabolite identification
Unlike genomics, the analytical Platforms used in metabolomics and lipidomics will not per se deliver a molecular identity, i.e. a specific chemical compound, but only the spectral characteristics.
In untargeted metabolomics, metabolite identification remains the main bottleneck in data analysis and interpretation 28 . The typical output is a spectrum containing (a large number of) features, which are characterised in NMR by the location and intensity of signals on a frequency axis, and in MS by m/z values (and possible additional information like retention time if coupled to a chromatographic system or drift times if coupled to ion mobility). In targeted metabolomics, data acquisition instrument parameters are tuned to detect (a list of) target compounds, thus making it possible to deliver tables of metabolite abundances, ideally with absolute quantification, for downstream biochemical interpretation. In recent years, both approaches have been improved towards each other, resulting in widely targeted metabolomics, covering hundreds of compounds 29 .
Furthermore, computational tools for untargeted metabolomics methods have improved their ability to deliver metabolite annotation, albeit with varying levels of certainty 30 . Concerted efforts in ELIXIR and the community can facilitate to improve the current situation, by removing the burden on developers to manually connect different tools into pipelines, and on experimentalists, by providing the tools and resources that give access to the knowledge required for biochemical interpretation of the data.

Metabolomics Use Case in ELIXIR
After extensive discussions during the workshop, metabolite identification was identified by popular vote as the one area where:

Alignment with ELIXIR Platforms
The Metabolomics community in ELIXIR has vast expertise in the five areas represented by ELIXIR Platforms (Data, Tools, Interoperability, Compute and Training). Next, we summarise the current Platforms' priorities, how the selected metabolomics Use Case aligns with the Platforms and a general alignment of metabolomics activities within Europe.

Data Platform
The Data Platform focuses on sustaining long term Europe's life science data infrastructure by working on guidelines and indicators to improve data resources impact and long-term sustainability. Additionally, this platform aims to improve links between curated and non curated data resources and literature.
On the data side, metabolite identification requires a) the availability of high-quality curated resources for compound de-replication (the process of finding known chemical compounds in databases based on their spectroscopic and chromatographic fingerprints) as well as b) the establishment of workflows to push data on newly identified metabolites into the existing repositories. For a), the reference layer of the MetaboLights 32 database plays a crucial role and needs to be extended. The reference layer holds information about individual metabolites, their chemistry, their spectral data (MS, NMR), as well as their role in pathways and biological systems. New metabolites identified in studies deposited into MetaboLights (http://www.ebi.ac.uk/metabolights/) are being curated by the MetaboLights team and added to the reference layer. In particular, characterization of the metabolome of biological models (e.g. organisms, tissues, biofluids, cell lines) is of major importance for the understanding of biochemical mechanisms, for the exploration of phenotype diversity and for the identification of new biomarkers. Due to the variety and the complexity of each biological system, gathering and curating knowledge about metabolomes can be best achieved by expert communities. In genomics, the GMOD project (http://gmod.org/wiki/Main_Page) provides biological research communities with open-source software components for annotating and managing data about model organisms. Developing such data models and software tools in metabolomics to gather, analyse, and curate data will therefore be critical to produce high-quality knowledge about model metabolomes. The curated data (spectra, compounds, networks) and workflows will be of high value as input for the corresponding reference repositories and e-infrastructures (MetaboLights, ChEBI, MetExplore, Workflow4Metabolomics, PhenoMeNal).
Europe is a major provider of massive and high-quality metabolomics data. Large endeavors such as the MRC-NIHR National Phenome Centre (http://www.imperial.ac.uk/phenome-centre), Phenome Centre Birmingham (http://www.birmingham.ac.uk/ research/activity/phenome-centre/index.aspx), the Netherlands Metabolomics Center (http://www.metabolomicscentre.nl), and the French MetaboHUB (http://www.metabohub.fr/home.html 31 ) infrastructure are producing data in key scientific and socioeconomic areas, including the ELIXIR Use Cases (https://www. elixir-europe.org/use-cases). Valorisation of this wealth of data requires annotation practices to be refined and new software tools to be developed to assist chemists in formatting, validating, referencing, and curating their annotations. All parties mentioned above have been engaging in the European GO-FAIR (https:// www.dtls.nl/go-fair/) initiative through their participation in Phe-noMeNal, which has recently been co-organising the launch of a hub for FAIR metabolomics data in goFAIR. The FAIR data movement has gained considerable momentum in Europe, where FAIR 33 stands for data being Findable, Accessible, Interoperable and Reusable.

Tools Platform
The Tools Platform drives access and exploitation of bioinformatics research software by working closely with services and connectors. Additionally, this Platform aims to facilitate the discovery, benchmarking and interoperability of bioinformatics software by focusing on software development, best practices and on strategy for workflows and software containers.
Workflow management e-infrastructures such as Workflow4Metabolomics (http://workflow4metabolomics.org/) 34,35 , PhenoMeNal, and Galaxy-M 36 are key European resources built on the Galaxy environment 37 that simultaneously address the two challenges of 1) high-performance, user-friendly, modular, and reproducible data analysis (needed by the experimental community), and 2) collaborative contributions from the bioinformatics community. Comprehensive workflows for preprocessing, statistical analysis, and annotation of data from liquid chromatography -MS (LC-MS), direct infusion MS (DIMS), gas chromatography -MS (GC-MS), and NMR technologies can be created, tailored, run, saved, shared, and publicly referenced with digital object identifiers (http://workflow4metabolomics.org/referenced_W4M_histories). A recent questionnaire has shown a need to further develop such tools and workflows, as part Galaxy, that are well supported through community-based training, to further improve the standardisation and automation of data processing and analysis 38 .
Standardization of compound annotation is critical for such Platforms i) to enable individual modules to communicate between each other and with external resources (e.g., repositories for raw data, mass spectra, compounds and metabolic networks) and ii) to deliver useful and FAIR data to the end-user. Conversely, new modules can be developed to integrate and harmonize annotations from complementary resources.

Interoperability Platform
The Interoperability Platform provides support to the discovery, integration and analysis of biological data organised in projects, centred around persistent identifiers, metadata and data standards for exchange and storage formats in addition to controlled vocabularies and linked data. This Platform facilitates work on the description of interoperability services and organises specialised BYOD (Bring Your Own Data) workshops with the aim to improve the FAIRness of data resources.
Experimental metabolomics data must be interoperable in order to facilitate integration with existing knowledge bases and other omics data. The collaborative development the ISA 39 framework for experimental metadata standards will help achieve such interoperability, as it has already embedded in the ELIXIR Plant Use Case. ISA will serve as bridging element with other omics applications and the FAIR sharing movement with this Metabolomics use case".
Interoperability can be realized by community accepted data standards and ontological molecule representations; a relevant list is available on FAIRsharing (https://fairsharing.org/collection/ H2020PhenomeandMetabolomeaNalysisPhenoMenalProject), a resource of the ELIXIR Interoperability Platform." Persistent Identifiers for metabolites are a central need here, as is the more general chemical structure mapping problem (see above). The latter is a need that this metabolomics Use Case has in common with the Human Data, Rare Diseases, Marine Metagenomics and Plant Science Use Cases. The interoperability needs for metabolomics, however, extends beyond chemical structures: more standardized interoperability of experimental data, such as NMR spectra, is also required. Introduction of the SPLASH 40 for NMR would benefit the other Use Cases too.

Compute Platform
The Compute Platform is devoted to the compute, transfer, storage, authentication and authorization related to biological data relying on services provided by ELIXIR Nodes and other e-infrastructures.
With PhenoMeNal, Europe now has at least one major initiative to support computing with large-scale metabolomics data. The Phe-noMeNal e-infrastructure enables researchers to deploy and test metabolomics workflows in public clouds (Amazon EC2, Google Compute Platform) or local, in-house OpenStack environments in cases where sensitive data cannot leave the institution. It also provides a number of commonly used workflows for metabolomics that include the eventual identification of metabolites in metabolomics experiments and the mapping onto biological pathways. PhenoMeNal unites major metabolomics laboratories across Europe and forms an essential component for our next steps to launch a European infrastructure for metabolomics service laboratories.

Training Platform
This Platform aims to increase the professional skills for managing and exploiting data. The training activities focus on researchers, trainers and service providers, but also include e-learning, the discovery of training materials and measuring the impact of training.
The need for further development of training programmes in metabolomics across Europe is well recognised, including training in metabolite identification 41 . Over the last few years multiple training courses have been established, somewhat ad hoc, both in relatively large training centres as well as individual laboratories that specialise in a particular aspect of metabolomics. Currently, there is a critical need to improve the coordination between these training courses and initiatives, and to ensure that all stakeholders across Europe and beyond (e.g. NIH training centres in USA and Metabolomics Australia) are able to readily access courses, from introductory to advanced, including online and face to face. This is one of the objectives of the newly formed European Metabolomics Training Coordination Group (EmTraG, http://www.emtrag.eu), led initially by a team in ELIXIR-UK with support from several other ELIXIR Nodes and the ELIXIR Training Platform.
Specifically in the context of metabolite identification, several introductory training courses teach the basics of metabolite annotation and identification. In addition the Birmingham Metabolomics Training Centre (BMTC), an ELIXIR-UK training resource, runs a course "Metabolite identification with the Q Exactive and LTQ Orbitrap" in partnership with Thermo Scientific. Formalising a Metabolomics Use Case within ELIXIR could enable the expansion of the delivery of such courses, for example through the EmTraG network. Training partnerships with instrument vendors can be extremely valuable, increasing the quality of the training material and facilities, as also has been achieved by Waters Corporation partnering with the Imperial International Phenome Training Centre and the BMTC.
Training in metabolite identification requires materials and case studies related to both data acquisition and bioinformatic analysis of the acquired data and a multi-disciplinary training team of analytical chemists and bioinformaticians to deliver courses. The provision of courses currently focuses on hands-on training at training centres, as described above at the BMTC, and which typically can train 6-12 scientists per course. However, in the growing discipline of metabolomics, there is a requirement to provide training to larger numbers that is only achievable through online training resources. The matching of trainee learning objectives to the type of course provided is key and recent examples of online courses have demonstrated their power in delivering Massive Open Online Courses (MOOCs) or more specialised Small Private Online Courses (SPOCs). At BMTC, the introductory course on metabolomics MOOC has been used by greater than 3000 active learners and the first SPOC focussed on data processing and analysis in metabolomics was completed by more than 50 people. However, courses for greater levels of hands-on training in the laboratory are focused on training in the laboratory; through the use of video media we can envisage some courses operating via online resources.
For training purposes, the Galaxy framework has also shown that it could be an efficient Platform to explain tools, parameters and workflows to life scientists without any skills in scripting (R, Bash, Python). The trainees can focus on their scientific questions regardless the technical aspects and programming language barrier. Since 2014, the Workflow4Metabolomics group (ELIXIR-FR, Metabo-HUB) have conducted three sessions of one week based on their Galaxy instance. For those who wish to use the command line after the training session, the bridge is easy since the parameters within Galaxy are mapped exactly on the native software.
ELIXIR training modules can be classified into three types of trainers: 1) Life Scientist (TrR): "Bring Your Own Data" training addressed to the experimentental community optimally promote good practices for using software and critically interpreting the results, and provide feedback about specific training needs. As an example, during the Workflow4Experimenters (ELIXIR-FR, MetaboHUB) oneweek courses (W4E) (http://workflow4metabolomics.org/events), participants learn to analyze their own MS or NMR datasets by using the Workflow4Metabolomics Platform. Morning sessions are dedicated to methodology and tools and afternoon sessions are devoted to tutoring. Such training offers unique opportunities to discuss the designs, methods, and tools for comprehensive and rigorous data preprocessing, statistical analysis, and annotation.
2) Communities of Developers (TrD): to enrich the tools and compute Platforms based on good practices guidance. As an example, the ELIXIR-EXCELERATE and IFB (ELIXIR-FR) European Galaxy Developer Workshop (EGDW, https://www. elixir-europe.org/events/elixir-excelerate-and-ifb-europeangalaxy-developer-workshop), hosted in Strasbourg, aimed to teach the best-practices about the tool integration and advanced features like the Galaxy API, visualisation and administration (high performance computing (HPC), Docker).

Alignment with ELIXIR Use Cases
The ELIXIR Platforms are currently complemented by four Use Cases across four scientific communities: 1. The Human Data Use Case for long-term strategies for managing and accessing sensitive human data.
2. The Rare Disease Use Case for development of new therapies for rare diseases.
3. The Marine Metagenomics Use Case works towards a sustainable metagenomics infrastructure to support research and innovation in the marine domain.
4. Plant Science Use Case for infrastructure development for genotype-phenotype analysis of crop and tree species.
As one of the core omics technologies, metabolomics forms an important component in all of the science-driven Use Cases established so far in ELIXIR. In particular, for the human data Use Case, the PhenoMeNal e-infrastructure is very relevant as is working to develop cloud-based resources for computing with big clinical metabolomics data. Significant synergies between the privacy and ethics work package in PhenoMeNal and the parties working on the human data Use Case are obvious.
As one of the best established molecular phenotypes, metabolomics is already widely used for the genotype-phenotype analysis of crops and in tree species, providing an immediate interplay with the Plant Science Use Case which is also supported by the common of use of the ISA experimental metadata framework.
A rich corpus of work exists on the metabolomics of marine organisms where communities of, for example, marine sponges and marine microorganisms produce pharmacologically active polyketides with diverse chemical structures, which are investigated based on genomic and macroeconomic data of these communities.
In all those application scenarios, the mapping of spectral features in metabolomics data to identified chemical compounds (metabolites) and thereby to molecular pathways is crucial for the understanding of the biochemistry underlying the Use Case in question.
Following the formation of national infrastructures for metabolomics, for example with the French infrastructure in Metabolomics and Fluxomics MetaboHUB and the Netherlands Metabolomics Centre Foundation (NMC), the COSMOS initiative 7 for the coordination of standards in metabolomics provided the first coordination action between all relevant efforts in metabolomics in Europe. Leading to the establishment of the worldwide metabolomeXchange network, COSMOS paved the way for PhenoMeNal. As a core organiser of the launch of the node for FAIR data 33 in metabolomics as part of the goFAIR initiative, the PhenoMeNal consortium made another necessary step to establish itself as an authority for metabolomics data ELIXIR Greece has included a computational metabolomics and protein interactomics Use Case, including the formation of standardized integrated metabolomic and proteomic databases and the evolvement of tools for combined metabolic and protein network analysis, in the strategic planning of its national infrastructure management and processing for the European Open Science Cloud (EOSC).
Significant synergies can been leveraged with the suggested sister Use Case for proteomics 42 . The metabolomics and proteomics communities have been extensively interacting on the data standards and formats side, where metabolomics has been able to adapt and adopt work that has been started by the proteomics community. Based on these preparatory steps, our proposal to establish metabolomics as a Use Case in ELIXIR is a logical progression.

Competing interests
No competing interests were disclosed.

Grant information
The meeting was funded by PhenoMeNal, European Commission's Horizon2020 programme, grant agreement number 654241.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The manuscript presents an opinion paper concerning the outcomes of the workshop that took place within the context of the strategic planning of ELIXIR and PhenoMeNal consortia. The article describes the actual level and the diversity of efforts within the European Metabolomics community. It represents main bottlenecks in reporting QA/QC, the metabolite identification, and the lack of comprehensive and coordinated metabolite databases. The authors describe their plans how to address challenges in metabolomics mainly within the ELIXIR platforms. The paper is undoubtedly, informative for the major part of common efforts of the European Metabolomics Community. It is mainly focusing on ELIXIR plans yet the intentions of the authors are to inform the community on this issue and this is clearly reflected in the title.

Open Peer Review
This reviewer feels that the complexity of the non human metabolome (plant, microbe etc) is not appropriately addressed and the relative databases of secondary metabolites existing worldwide should be more seriously considered.

Comments:
Comments: Some corrections should be considered before indexing.
In page 5 the statement "For NMR, a few and sparsely populated repositories provide raw data for individual metabolites, such as the metabolomics collection in BioMagResBank" is somehow contradictory with the fact that HMDB is mentioned just few sentences later and MetaboLights is also mentioned in a completely different context in another paragraph. The sentence "ELIXIR Greece has included a computational metabolomics and protein interactomics Use Case" is repeated twice in page 7 and page 10 In page 6 the reference Pavlidis and Noble 2001 appears without any relevant number and is not mentioned in the references section General comments: I have answered "partly" to some of the questions above, since this article constitutes a condensation of a workshop/meeting and as such contains opinions, not necessarily easily backed with either results or literature. However, it should be noted that the authors have done a remarkable job of justifying opinions and claims as far as possible.
infrastructure, with an emphasis on European research.
The manuscript promotes a phenome-oriented view on metabolomics, with emphasis on bioinformatics of known/identified compounds and permeating a systems biology view at the expense of the more explorative side of untargeted approaches. This is understandable from the perspective of Elixir and PhenoMeNal, but somewhat misrepresentative of the larger metabolomics community, which obviously hasn't yet matured to the stages of other (older) omics disciplines due to e.g. technical heterogeneity in platforms and difficulties in identification. On the other hand, these potentially different perspectives converge in the emphasis on metabolite identification.
The manuscript is furthermore likely difficult to penetrate for those readers that may not be fully up to date on European and international efforts for streamlining bioinformatics infrastructures. A more thorough introduction would help lower that threshold.
More detailed comments: The article pre-supposes knowledge from the reader on organisations, infrastuctures and current research political agendas on European level, as well as what similar trajectories can be observed outside of Europe. A presentation of past efforts in streamlining standards and needs for international organisations, as well as a more thorough introduction on Elixir, PhenoMeNal and other initiatives would have been in order to help the less initiated reader to follow the article. These background issues should be more thoroughly explained, broadened and put into relevant context to reduce the feeling of this being an Elixir-internal lobbying paper. E.g.
Consider in relation to this aspect that: "The objective of the meeting was to identify the principal challenges within this field and prioritise actions, in particular those within the scope and mission of ELIXIR." AND the article title: "The future of metabolomics in ELIXIR" The authors state that: "ELIXIR coordinates bioinformatics resources across its member states and help researchers to find, analyse, and exchange biological data. It is a distributed infrastructure with a single Hub based in Hinxton, United Kingdom, and an increasing number of Nodes located throughout Europe. As of July 2017, ELIXIR has 20 national Nodes, with European Bioinformatics Institute (EMBL-EBI; co-located with the Hub), working as a separate Node." This description, however, hardly describes Elixir from a functional point of view. Moreover, considering that: "This opinion article is a call for support for a new ELIXIR metabolomics Use Case" there is surprisingly little information on the Elixir use cases in the background section. What are the aims, objectives and functionalities of the use cases and how do they relate to the core challenges in the metabolomics community? Overlaps between use cases and challenges are addressed later in the article, but for something this important, this should have been addressed in the introduction. The challenges/bottlenecks in metabolomics have been expertly identified. However, these challenges are not exclusive to Elixir, but represents challenges to the entire metabolomics community.
2. "Genome-based metabolic reconstruction databases: These databases are built based on Correction: genome annotation, mostly of enzymes and their association to known reactions, and thus may not cover the entire metabolome since some have no enzymatic function associated" genes that are enzymes The section on current standards and techniques for multiOMICs integration is sketchy: "Currently, metabolomics data can be integratively visualised with transcriptomics data.". There are several interesting and promising initiatives for data integration extending well beyond visual integration with transcriptomics as the authors are well aware. There is, however, a lack of validated tools. Especially for supervised data analysis of integrated data (data integration isn't limited to systems biology through pathway analysis, but is also highly relevant from a biostatistical point of view). But I also agree that there pathway analysis, but is also highly relevant from a biostatistical point of view). But I also agree that there is "a lack of integrated omic databases", although I believe that there are several crucial aspects of data integration that need to be addressed well before we go into database issues. The systems biology approach will furthermore not be advisable in an untargeted metabolomics setting: Where complementarity in biological information from the different omics layers may be utilised for predictive modelling of e.g. disease pathophysiology of risk modelling compound identification (and may, in before fact even be used as a strategy compound identification through e.g. Bayesian network modelling). for Regarding metabolite identification, the authors state: "Concerted efforts in ELIXIR and the community…" The order should probably be the inverse.
: "The Data Platform focuses on sustaining life science data infrastructure" Correction long term Europe's A reflection on the "Data Platform" section is that many researchers seem to have perceived reporting formats for the MetaboLights service as somewhat of a hurdle (as are most "general" issue added value tasks not corresponding to the core task at hand for most funded research (e.g. identifying predictive biomarkers of a specific disease). Whereas I have seen several BYOD initiatives for actual analysis of data, I still haven't seen BYOD approaches would metadata handling. Maybe these exist -From a user-friendliness perspective it would be interesting to see how this hurdle is being tackled. It would also be interesting to have survey results on perceived bottlenecks in metabolights (and similar repositories) if such exist, which could potentially focus efforts into FAIR reporting.

Are all factual statements correct and adequately supported by citations? Yes
Are arguments sufficiently supported by evidence from the published literature? Partly

Are the conclusions drawn balanced and justified on the basis of the presented arguments? Partly
No competing interests were disclosed.

Competing Interests:
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
This article is a nice summary of a meeting of European scientists who have a range of interests in metabolomics-based science. The review provides a summary of the main areas that the investigators considered critical to move move both the technological aspects of the work forward, with securing the identification of 'unknown' metabolites found in biological samples, as the primary need. The group also identified various aspects of data science, data science and bioinformatics as other high priority needs.
I found the review to be detailed, informative and well referenced. The review is unabashedly Eurocentric, appropriately so given that it was generated with the goal of highlighing European efforts. Thus, it sometimes reads as an advert for ELIXIR, but that does not distract measurably from the fine recitation of the opinions of the participants. The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias You can publish traditional articles, null/negative results, case reports, data notes and more The peer review process is transparent and collaborative Your article is indexed in PubMed after passing peer review Dedicated customer support at every stage For pre-submission enquiries, contact research@f1000.com