MetaGenSense: A web-application for analysis and exploration of high throughput sequencing metagenomic data

The detection and characterization of emerging infectious agents has been a continuing public health concern. High Throughput Sequencing (HTS) or Next-Generation Sequencing (NGS) technologies have proven to be promising approaches for efficient and unbiased detection of pathogens in complex biological samples, providing access to comprehensive analyses. As NGS approaches typically yield millions of putatively representative reads per sample, efficient data management and visualization resources have become mandatory. Most usually, those resources are implemented through a dedicated Laboratory Information Management System (LIMS), solely to provide perspective regarding the available information. We developed an easily deployable web-interface, facilitating management and bioinformatics analysis of metagenomics data-samples. It was engineered to run associated and dedicated Galaxy workflows for the detection and eventually classification of pathogens. The web application allows easy interaction with existing Galaxy metagenomic workflows, facilitates the organization, exploration and aggregation of the most relevant sample-specific sequences among millions of genomic sequences, allowing them to determine their relative abundance, and associate them to the most closely related organism or pathogen. The user-friendly Django-Based interface, associates the users’ input data and its metadata through a bio-IT provided set of resources (a Galaxy instance, and both sufficient storage and grid computing power). Galaxy is used to handle and analyze the user’s input data from loading, indexing, mapping, assembly and DB-searches. Interaction between our application and Galaxy is ensured by the BioBlend library, which gives API-based access to Galaxy’s main features. Metadata about samples, runs, as well as the workflow results are stored in the LIMS. For metagenomic classification and exploration purposes, we show, as a proof of concept, that integration of intuitive exploratory tools, like Krona for representation of taxonomic classification, can be achieved very easily. In the trend of Galaxy, the interface enables the sharing of scientific results to fellow team members.


Background HTS & metagenomics
The detection and characterization of emerging infectious agents has been a continuing public health concern. High Throughput Sequencing (HTS; or Next-Generation Sequencing, NGS) technologies have proven to be efficient at providing access to comprehensive analyses and unbiased detection of pathogens in complex biological samples.
Most large-scale genomic (re)sequencing projects involve both sequencing technology, computational analyses, and genotyping expertise. Current NGS platforms including Illumina, Ion Torrent/Life Technologies, Pacific Bioscience and Nanopore can generate reads of 100-10,000 bases long allowing better coverage of the genome at lower cost. However, these platforms also generate huge amounts of raw data.
Besides those sequence files, it has become important to also consider and store associated sample related metadata (collection date, location, …). In addition, NGS projects usually represent such a huge amount of relevant sample-specific sequences that efficient data management and visualization resources have become mandatory. Those challenges accompanying HTS technologies have raised the following fundamental questions: (1) How do we best manage the enormous amount of sequencing data?
(2) What are the most appropriate choices among the available computational methods and analysis tools? The issue concerning the growing amount of data can be managed through a dedicated Laboratory Information Management System (LIMS), solely to organize and plan their analysis, and to provide perspective. The question regarding the lack of adapted intertwining among the wide spectrum of available tools was in part filled by workflow management systems, even though it still requires fairly advanced knowledge of the tools available at hand.
Indeed, today, hundreds of bioinformatics tools are available, each demanding specific parameterization 1 . NGS data analysts have been designing workflows in order to automate complex processing pipelines, both possibly using existing codes, avoiding rewriting software, and supporting parallel, distributed computations. These features make the workflows relatively simple to construct, make them easily reusable, hence aid reproducibility. Modern workflow managers, such as Galaxy 2-4 , offer the possibility of sharing them with others. As such, Galaxy as a data analysis and workflow management system, provides biologists with a hands-on toolbox to build multi-step computational workflows for data-processing, quality control, and analytic results aggregation, while additionally ensuring analysis reproducibility. As a prerequisite to a system for composing pipelines for large-scale analyses, there is a need for an adapted and up-to-scale computational infrastructure capable of doing the processing and data storage.
We therefore developed MetaGenSense, a bioinformatics webapplication framework whose principle purpose is to ease the scientists' work in management of NGS-project related data and analysis results. MetaGenSense is built on three fundamental components, two of which are specific to the project: a dedicated LIMS and a Django-based web user-interface. The third component is Galaxy, which is the main bioinformatics workflow management system. In the following paragraphs, we describe the interface's implementation and display how communication between the different parts takes place, in a smooth and user-friendly, managing web-user interface.

Software tool -implementation
MetaGenSense global description MetaGenSense is a managing and analytical bioinformatics framework that is engineered to run dedicated Galaxy workflows for the detection and eventually classification of pathogens. It aims to facilitate large-scale genomic analysis for experts in sequencing among project partners. The web application was built and can be deployed in order to facilitate access to high throughput sequencing analysis tools, acting as an information resource for the project and interacting research partners. With its user-friendly interface, it was designed to take advantage of bio-IT provider resources (a local Galaxy instance, sufficient storage and grid computing power), for analysis of input data and its metadata. Also in MetaGenSense, a dedicated LIMS (postgreSQL-based) was implemented to ensure data coherence. The web interface design is based on the Django web framework (http://www.djangoproject.com). Moreover, the communication with Galaxy is covered by the Bioblend library 5 which provides a high-level interface for interacting with the Galaxy application, promoting faster interaction, and facilitating reuse and sharing of scripts.
The use of the available Galaxy tools and workflows is automated and seamless with MetaGenSense. Galaxy, as a pipeline management software, lets you define workflows and "pushes" the data through that pipeline. The pipeline manager ensures that all the tools in the pipeline run successfully, generally spreading the workload over a computational cluster. For example, MetaGenSense is used at the Pasteur Institute to do the bulk of the data processing for a large number of HTS projects, and can be adapted to launch any set of instructions stitched together in a dedicated workflow available in the Galaxy workflow designer interface.
A dedicated LIMS A LIMS can be described as a software-based laboratory that offers a set of key features that support modern laboratory operations. Those systems have become mandatory to manage the quantity of metadata related to both raw data and their analysis results, obtained through bioinformatic tools. In MetaGenSense, the LIMS is based on a postgresql database. It was designed and structured with expert knowledge from biologists and bioinformaticians. Its main feature is that it was designed to store note-worthy and shareable information resulting from analysis, as well as the applied workflowrelated parametrization. The database's schema is available in the Supplementary Figure 1.

Amendments from Version 2
The difference between version 2 and version 3 of this article is Figure 2. It was slightly modified and built in a better resolution.

REVISED
An intuitive Django-based web user interface Django is a high-level Python Web-framework. It provides rapid development, as well as clean pragmatic design, and serves as data management and displaying backbone for a large number of websites where interaction is important. Moreover, the python language (https://www.python.org/) has become a reference programming language for a huge number of scientific applications.
MetaGenSense's UI is divided in 4 modules: 1) User authentication management, 2) LIMS, 3) Workflow, 4) Analysis. Each has a specific function, and the task-partitioning has been designed to allow independent evolution of each part according to the user's needs.
1. User authentication management: can be done either by communication with an LDAP user authentication database, or through a user management database.
2. LIMS: ensures the organization of the data according to the project at hand. The LIMS can be provided with sample metadata, which can be shared with selected project-members, and ensures sample traceability, which is an important component of any present-day core resource laboratory management system.
3. Workflow: manages (a) the connection with the Galaxy instance, (b) execution tracking (the Galaxy "user histories") (c) the data from Galaxy "libraries" (the datasets), (d) association and import of data from a data-library to a Galaxy user-history instance. (e) Execution of the selected Galaxy workflow. This module handles data storage and links the samples to the selected workflow.

Analysis:
A result file can be saved in order to be shared with other users involved in the project, or can be exported using the Galaxy export functionality, for download.
Communication between MetaGenSense and Galaxy is dealt systematically, using the BioBlend API 5 , a highly dedicated and specialized python library, giving access to most Galaxy functionalities. For example, it is possible to fetch the Galaxy users, create the user's data-libraries, … As a side-note, interaction with the BioBlend development team was necessary for fine-tuning specific core functionalities, leading to a concomitant finishing and perfection of the tools and accompanying API. Specifically, BioBlend functionalities are used to communicate with Galaxy, as described in Figure 1.
A use-case example of MetaGenSense typically involves the following steps (cf Figure 2).
• User Authentication with LDAP or any authentication system defined by admin sys.
• Creation of a new project, with a name, a context, a short description and (most importantly) the other persons involved in the project. Please note that to include a person, they need to have logged at least once on the application.
• Supplying the LIMS database with metadata including sample information as the library sequencing protocol, supplementary run details and the path to raw data files.
• At this point, the user needs to copy their input data files to their Galaxy transfer directory. MetaGenSense detects new files that are copied within the exchange Galaxy project directory. Those data files need to be copied manually. This solution was chosen such that the right files would be deposited deliberately in the right target Galaxy directory, with correct user/owner's file permissions (typically on a UNIX file-system). In the exchange Galaxy directory, a subdirectory can be created, named after the project, and the raw data can be copied in that directory. This way, MetaGenSense detects the files that will be taken into account for use with Galaxy and analysed.
• In the MetaGenSense GUI, the "Workflows" button allows to import new files to analyse into Galaxy.
• Using a new Galaxy "history", an appropriately selected workflow, and the set of workflow input data-files, an analysis can be launched. As usual in Galaxy, the workflow status can be monitored.
• Intermediate and final results can be exported for download (using the native Galaxy tools), or they can be saved in the LIMS i.e. be tagged as potentially interesting and shared with project members.
• Further exploration of workflow results can be done using, for example in a metagenomics data analysis context, Krona 6 , that enables taxonomic information to be explored.

Conclusions/Discussion
The technology evolution in molecular biology, especially in NGS, has moved biology into the big data era (consisting of handling data, computation requirements, efficient workflow design, and knowledge extraction). With this trend, the challenges faced by life scientists have been shifted from data acquisition to data management, processing, and knowledge extraction. While many studies have recognized the big data challenge, few systematically present approaches to tackle it. New findings in biological sciences usually come out of multi-step data pipelines (workflows). Galaxy is a workflow-management tool which can deal with big data. However, it is still necessary to globally optimize the data flow in an overall multi-step workflow in order to eliminate unnecessary data movement and redundant computation. On the other hand, data information traceability has become an inevitable requirement in a present-day laboratory setup, foreseeing that knowledge-embedded data and workflows are expected to be an integral part of future scientific publications.
We therefore, engineered MetaGenSense, a Django-based web interface which helps biologists, who in particular are unfamiliar with the design of Galaxy workflows, to quickly obtain analysis results from HTS sequencing projects. It uses Galaxy as a workflow management software and the BioBlend API to remotely manage data upload, workflow execution as well as analysis of results. MetaGenSense covers data processing up to presentation of data and results in a human-readable data format. Its main advantages encompass data handling through its incorporated LIMS, user and project handling in a cooperative context, it enables data sharing without compromising data confidentiality, it features automated workflow execution, resulting altogether in decreasing the data and analysis delivery time. MetaGenSense is available as open-source from GitHub, and can be deployed very easily. For testing, and for users to be able to evaluate MetaGenSense's modularity, we added to our Github directory, a special release coupled with a virtual machine image containing, the tools needed in our metagenomic example workflow, a galaxy instance containing those tools as well as a prototyped workflow, mainly focused on metagenomic sample analysis. MetaGenSense is pre-configured on it and directly usable through a web browser. Once tested, it can be easily adapted to a variety of other NGS projects.

Competing interests
No competing interests were disclosed. The installation instructions were much improved, and I was able to set up MetaGenSense without any major difficulty. The availability of a Virtual Machine was especially convenient for quick testing/assessing of the application, and I was able to run the example pipeline on the provided test data easily.

Grant information
The manuscript itself was also much improved and I am now happy to approve this manuscript. I think this is a nice example of how one can integrate a LIMS/project management system with Galaxy at the back-end for analysis.

Minor (optional) remarks:
Unless there is a compelling reason not to, please consider adding the requirements.txt file to the github repo, it was a little bit confusing that I had to create this file myself.
Your installation instructions refer to the command "manage.py migrate" but this appears not to be implemented until Django 1.7, yet your documentation lists the requirement "Django==1.6.2". Consider changing the requirement version for Django from 1.6.2 to 1.7 (I had no problem running the rest of the setup using Django 1.7) or changing this part of the installation instructions to fit with F1000Research 2. 3.

4.
the rest of the setup using Django 1.7) or changing this part of the installation instructions to fit with Django 1.6.2.
Capitalize "galaxy" in last paragraph of the manuscript ( [..] the tools needed in our metagenomic example workflow, a galaxy instance containing those tools [..]) For the VM, it would be useful if you also provided the login credentials for the admin user (mgs_admin) for MGS/Galaxy in your README on github, for those readers who may be interested in seeing how the admin side of the application works.
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
No competing interests were disclosed. I would have preferred if the response letter was more specific, quoting the suggestions offered by the reviewers and directly underneath the author responses showing how they have addressed them. Currently I have no easy way to check how you have addressed my previous comments. The response letter has some grammatical errors. I paste below the bits that are grammatically incorrect. I would recommend that authors have proof read their submitted materials before sending them.
"according to reviewers remarks" "this application is more design" "Description of the software is also a more precise" I am unable to run the image http://webext.pasteur.fr/metagensense/metagensense.ova as I do not have the ability to create virtual machines. Perhaps having a set of images that describe exactly the point authors want to make regarding this virtual machine could help the review process. This seems to be done in Figure 2, although the resolution is not sufficient.
I am unable to read the font inside of Figure 2's window screenshots. I would like to be able to see what each of them shows. This way I would be able to see how the interface works for each of the stages (e.g., Connection, Project creation, etc.).
I went to GitHub and I went to "http://metagensense.test.fr:8000" but the answer I got from the server was "metagensense.test.fr refused to connect".
In the installation notes of the README.md, the first step points to virtualenv, virtualenvwrapper. I

F1000Research
In the installation notes of the README.md, the first step points to virtualenv, virtualenvwrapper. I would have appreciated if authors could provide in there how I can install virtualenv. It is just need to take a few instructions from https://virtualenv.pypa.io/en/stable/installation/.

I have read this submission. I believe that I have an appropriate level of expertise to confirm that
it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
No competing interests were disclosed. . We tried to clarify http://metagensense.readthedocs.io/en/latest/?badge=latest the following points: The documentation on the main github page was simplified and we clearly separated the virtual machine test documentation and the metagensense local installation procedure. The documentation available on the platform Read the Docs was proofread and that enabled us to emphasize each step of the installation procedure. We added documentation for the Virtualenv installation procedure The Figure 2 which was in a low resolution was updated on the github repository and the publication.
You will find below the answers to your remarks: [Q1] I am unable to run the image http://webext.pasteur.fr/metagensense/metagensense.ova as I do not have the ability to create virtual machines. Perhaps having a set of images that describe exactly the point authors want to make regarding this virtual machine could help the review process. This seems to be done in Figure 2, although the resolution is not sufficient.
[A1] The point of the Virtual Machine is to allow users to test the MetaGenSense application before installing it on their local infrastructure. The online documentation available at the URL: now includes a step by http://metagensense.readthedocs.io/en/latest/connection.html step user guide.

Q2]
[ I am unable to read the font inside of Figure 2's window screenshots. I would like to be able to see what each of them shows. This way I would be able to see how the interface works for each of the stages (e.g., Connection, Project creation, etc.).

[A2] The image for each stages has been uploaded with a better resolution so the readers can have a better idea of MetaGenSense's looks without installing the virtual machine. It is available at the URL: https://github.com/pgp-pasteur-fr/MetaGenSense/blob/master/doc/images/metagensense_complete.jpeg
F1000Research available at the URL: https://github.com/pgp-pasteur-fr/MetaGenSense/blob/master/doc/images/metagensense_complete.jpeg [Q3] I went to GitHub and I went to "http://metagensense.test.fr:8000" but the answer I got from the server was "metagensense.test.fr refused to connect".
[A3] The metagensense.test.fr URL is only available when running the MetaGenSense virtual machine. The documentation available on the github repository has been corrected to clarify this point.

Q4] [
In the installation notes of the README.md, the first step points to virtualenv, virtualenvwrapper. I would have appreciated if authors could provide in there how I can install virtualenv. It is just need to take a few instructions from . https://virtualenv.pypa.io/en/stable/installation/

[A4] We changed the documentation and added information regarding the installation of a virtual environment.
Thank you again for your review. We hope the changes we applied to our github documentation will clarify MetaGenSense's description and its installation procedure. The authors describe their application, MetaGenSense, a web-based application for analysing metagenomic data. It provides a user-friendly interface which combines a LIMS system with a Galaxy backend for computation and workflow management.
The Django framework is very nice, and I think the integration of Galaxy with a LIMS system is very useful and something many readers will be interested in.
However, many aspects of this application are tailored specifically to the authors' local setup. I installed parts of the application, but since no information is provided on how to install the various components F1000Research 1.
parts of the application, but since no information is provided on how to install the various components (LIMS/Django/KRONA/BioBlend), and since the Galaxy server used in the code was not accessible to me, it was not fully functional, and because documentation was lacking, it was unclear to me how to proceed with the setup.
To make this work more valuable to the readers, the following additions would be helpful: Installation instructions for the code on GitHub. The readme file is empty at the moment.
How to install the various components (LIMS, Django UI, KRONA, BioBlend)? And how to connect the different components together? How to configure the webserver correctly (apache/nginx/other)? Which parts of the code are specific to the authors' local setup and need to be adapted when readers install their own MetaGenSense instance?
A description of the Galaxy workflows used by the authors would also be very interesting, which tools are used? are they available from the Galaxy tool shed?
Either create a demo server with an example project or add screenshots of the application to the manuscript. The UI looks quite nice, show it to the readers.
The case study section is very technical, and would be enhanced by showing the use-case in terms of a real biological example, add screenshots of a real-world analysis to the various steps in this section. Minor Edits: Capitalize the word "Galaxy" throughout.
In section "Bioinformatics and HTS projects", BiobBlend --> BioBlend I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
No competing interests were disclosed.

Competing Interests:
Author Response 06 Aug 2015 , Institut Pasteur, France Olivia Doppelt-Azeroual Thank you very much for your critical review. We posted the answers for your remarks underneath them in bold.
To make this work more valuable to the readers, the following additions would be helpful: Installation instructions for the code on GitHub. The readme file is empty at the moment. 3.

4.
https://github.com/pgp-pasteur-fr/MetaGenSense/releases/tag/v1.0) How to install the various components (LIMS, Django UI, KRONA, BioBlend)? And how to connect the different components together? How to configure the webserver correctly (apache/nginx/other)? Which parts of the code are specific to the authors' local setup and need to be adapted when readers install their own MetaGenSense instance?
Concerning the LIMS, Django, BioBlend, everything is in the application itself so the connection between the components is natively implemented.
For the apache, it is directly linked to Django which is deployed on an apache server. It is very well documented on this url: For your last question, please look at the "set the settings" part of the README file.
A description of the Galaxy workflows used by the authors would also be very interesting, which tools are used? are they available from the Galaxy tool shed?

The workflow included on the virtual Machine Galaxy instance contains a light version of our metagenomic analysis workflow. A small fastq file is also included to test it.
Either create a demo server with an example project or add screenshots of the application to the manuscript. The UI looks quite nice, show it to the readers.
Thank you for that. Yes, another reviewer suggested that we put screenshots of the UI. We added a figure that resumes all the windows and their use for each step. The figure is the available at this url: https://github.com/pgp-pasteur-fr/MetaGenSense/blob/master/doc/images/metagensense_complete as well as each of the steps (in bigger picture). This figure is now added to the publication.
The case study section is very technical, and would be enhanced by showing the use-case in terms of a real biological example, add screenshots of a real-world analysis to the various steps in this section.

F1000Research
No competing interests were disclosed. MetaGenSense is intended to help find pathogen data in metagenomic data created through next generation sequencing. Measured data including the sequencing reads and metadata are fed into a Laboratory Information Management System (LIMS). The application can fetch that information and pipe it into predefined Galaxy workflows, run them and visualise the output via a framework called KRONA.
The introduction to the article is perhaps too long (almost half of the article). There are sections that are not necessarily related to the research presented here, e.g., the paragraph focusing on the assembly problem of next generation sequencing reads. It would be useful, however, that the authors give a more comprehensive introduction into metagenomics as this topic is only covered very briefly at the beginning of the introduction.
The section on the software tool itself is very technical. I have trouble identifying a clear train of thought. Also it could be shorter and more precise. The case study does not really seem to be a case study on how the application can be used to actually find pathogen information in metagenomic data but is more like a step by step protocol on how to use the application. I suggest that this kind of information should be moved to the documentation and that instead a concrete biological example is demonstrated in the article. Moreover, the title says that MetaGenSense can visualise its results. However, this is not shown in the article. Therefore I would advise the authors to consider replacing the current figure with a figure demonstrating the results of a concrete biological use case.
The discussion and conclusion seem to be a summary rather than a discussion. MetaGenSense seems to lack many of the standard requirements of a quality software product We could not find any documentation. The README file on GitHub does not contain any information.
The last update to the code was months ago, suggesting that the program is not being developed and maintained actively.
We could not find any tests.
We could not find any examples, demos or even screenshots of the interface.
Therefore we are not convinced that MetaGenSense adheres to the journal's quality standards.
We believe that the article should be revisited and documentation, live examples and tests should be added to the software before the article should be indexed.
We have read this submission. We believe that we have an appropriate level of expertise to state F1000Research the framework, as well as the results. Moreover, we also would like to change the title of the publication for it to be more adapted to our approach; replacing visualisation by the word exploration which is really the goal of an application like MetaGenSense. The title of the second version of the article is: "MetaGenSense : A web application for analysis and exploration of high throughput sequencing metagenomic data."

Reviewer Comment:
The discussion and conclusion seem to be a summary rather than a discussion.
MetaGenSense seems to lack many of the standard requirements of a quality software product We could not find any documentation. The README file on GitHub does not contain any information.
MetaGenSense README file is now complete. We also have written on a userGuide available directly on our Github repository through the web tool readthedocs : http://metagensense.readthedocs.io.

Reviewer Comment:
The last update to the code was months ago, suggesting that the program is not being developed and maintained actively.
The code in the GitHub repository was committed just before the submission of the article. A few debug and add-ons have been implemented since the previous release.

Reviewer Comment:
We could not find any tests.
As mentioned earlier, we implemented a Virtual Machine Image containing the infrastructure to test our framework. It is pre-configured so that any user can start using MetaGenSense with a web browser. It is available on the Institut Pasteur server as it was too big to be uploaded on GitHub ( http://webext.pasteur.fr/metagensense/metagensense.ova). Documentation about this image is available on the README file of our GitHub repository. As Metagenomic analyses are time and storage consuming, we made available a very light version of our workflow with a small fastq file. However, it is enough to test the framework and to understand how the database, the Django framework and the related Galaxy instance are working together.

Reviewer Comment:
We could not find any examples, demos or even screenshots of the interface.
For the screenshots, we added a picture of the framework at each step of its use. It is also available at the url: https://github.com/pgp-pasteur-fr/MetaGenSense/blob/master/doc/images/metagensense_complete
Page 4 worflow_remote should be workflow_remote like BWA through galaxy -should be, "like BWA through Galaxy..." Page 5 Case study -use example: Need to be consistent by starting bullet points with capital letters.
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
No competing interests were disclosed.

Competing Interests:
Author Response 06 Aug 2015 , Institut Pasteur, France Olivia Doppelt-Azeroual Thank you for your review. For each of your remarks, we wrote responses in bold: For example, it would be interesting to learn more about the two prototyped workflows for analysing metagenomics data which are alluded to in the "Pre-designed Galaxy workflow" section on page 5. This publication aims to present an application combining as you wrote above a lims, a direct link to any Galaxy and a way to sort and manage Galaxy results. The workflow you choose to use is totally arbitrary. In essence, MGS was designed in a way that any Galaxy workflow can be plugged-in.
To facilitate testing, we implemented a Virtual Machine image pre-configured to test MetaGenSense directly on a web browser. Instructions for download and use are available in the GitHub README file. However, as instructions for installation of the framework are now available, any developer can download and link MetaGenSense to his Galaxy.
The "Case study -use example" section could also be improved by providing screenshots of the MetaGenSense GUI which are relevant to each or some of the steps. At the moment, I have no idea what the GUI for MetaGenSense looks like since there is also no example instance of MetaGenSense available on the Web which would have been useful for reviewing this paper.
Yes we agree with you, we omitted to add screenshots in the first version of the article. We added a new figure that resumes all MetaGenSense steps and functionalities. It is available at: https://github.com/pgp-pasteur-fr/MetaGenSense/blob/master/doc/images/metagensense_complete.jpeg . It is now the 2nd figure of the article. Moreover all the small figures (steps) are also available in the doc directory of our GitHub repository.