BeerDeCoded: the open beer metagenome project

Next generation sequencing has radically changed research in the life sciences, in both academic and corporate laboratories. The potential impact is tremendous, yet a majority of citizens have little or no understanding of the technological and ethical aspects of this widespread adoption. We designed BeerDeCoded as a pretext to discuss the societal issues related to genomic and metagenomic data with fellow citizens, while advancing scientific knowledge of the most popular beverage of all. In the spirit of citizen science, sample collection and DNA extraction were carried out with the participation of non-scientists in the community laboratory of Hackuarium, a not-for-profit organisation that supports unconventional research and promotes the public understanding of science. The dataset presented herein contains the targeted metagenomic profile of 39 bottled beers from 5 countries, based on internal transcribed spacer (ITS) sequencing of fungal species. A preliminary analysis reveals the presence of a large diversity of wild yeast species in commercial brews. With this project, we demonstrate that coupling simple laboratory procedures that can be carried out in a non-professional environment with state-of-the-art sequencing technologies and targeted metagenomic analyses, can lead to the detection and identification of the microbial content in bottled beer.


Introduction
Beer is probably the world's oldest and most widely consumed alcoholic beverage on the planet, with a worldwide production of nearly 2 billion hectolitres (2·10E11 litres) annually [The Barth Report, Hops 2015/2016], and, as DNA sequencing becomes increasingly cheap, whole genome sequencing and metagenomic analyses are being explored as tools to better understand brewing in particular, and food fermentation in general 1 . Complex microbial communities influence the wine-and cheesemaking process throughout 2,3 . Indeed, microbial communities contribute to nutritional and aromatic properties, as well as shelf life of the products. In the case of wine, microorganisms are present in the soil, on the grapes, and in the fermenter, being carried over from the vine to the must to the wine, and there is increasing evidence for the existence of an important microbial contribution to the notion of "terroir" (i.e regional environmental factors that affect the properties of the final product) 4-7 . One question that remains unanswered is whether there is such a thing as a "terroir" for beer.
Of particular interest is sour beers, such as lambic and gueuze, beverages produced without the controlled addition of known yeast cultivates. Instead, the wort is exposed to ambient air, allowing naturally occurring bacteria and yeasts to start the fermentation and leading to a production that is difficult to standardize. To our knowledge, three initiatives are currently exploring the role of the beer microbiome in the brewing process and how it shapes the characteristics of the final product. Using metagenomic analyses, Kevin Verstrepen and colleagues at KU Leuven, Belgium, study the production of lambic, a traditional Belgian beer produced by spontaneous fermentation [VIB project 35]. Similarly, Matthew Bochman and colleagues at Indiana University, USA, have recently published preliminary results showing how the microbial community evolved over the fermentation process, together with the relative abundance of the organic acids that give sour beer its characteristic taste 8,9 . Similarly, researchers at the University of Washington, USA, have studied open-fermentation beer and discovered a novel interspecific hybrid yeast 10 .
To investigate the microbial composition of a collection of commercial beers, we initiated BeerDeCoded in the context of Hackuarium, a Swiss not-for-profit organisation that supports unconventional research projects and promotes the public understanding of science. Members of the Hackuarium community are interested in participatory biology and want to promote interdisciplinary citizen research and innovation outside traditional institutions, using low-cost, simple and accessible technologies. The goal of the BeerDeCoded project is not only to broaden the scientific knowledge about beer, but also to improve the public understanding of issues related to personal genomics, food technology, and their role in society. With the release of this first data set, we built the proof of concept for a targeted metagenome analysis pipeline for beer samples that can be used in high schools, citizen science laboratories, craft breweries or industrial plants.

Beer sample preparation
The content of each beer sample was mixed to homogeneity by inversing the bottle several times. 50 mL were transferred into a conical tube and centrifuged (5000 rpm, 20 min, 4°C) to collect cells and other precipitable material. Pellets were resuspended with 1 mL TE buffer (Tris 10 mM, EDTA 1 mM,pH 8.0) and transferred into 1.5 mL tubes. The samples were centrifuged (10000 rpm, 10 min, 4°C), the supernatant was removed and the pellet stored frozen (-20°C) until future analyses. The ZR Fecal DNA MiniPrep kit (Zymo Research) was used for DNA extraction with minor modifications to the original protocol 11 . Sludge pellets were used instead of the 50-100 mg of fecal material suggested by the manufacturer.
Quality control for DNA extraction To ensure the DNA was free from proteins and other contaminants, the absorbance of DNA samples was measured at 230, 260 and 280 nm using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific).

ITS amplification
Yeast genomic DNA was amplified using the fungal hypervariable region ITS1 (internal transcribed spacer 1) as previously described 11 using the following primers: BITS (5'-CTACCTGCGGARG-GATCA-3') and B58S3 (5'-GAGATCCRTTGYTRAAAGTT-3'). Typical PCR reactions contained 5-100ng of DNA template. Amplicon size (500nt) was verified using gel electrophoresis and with a fragment analyser. ITS amplicons were purified using AMPure XP beads following the manufacturer's instructions (Beckman Coulter). Dual indices and Illumina sequencing adapters were attached using the Nextera XT Index Kit following manufacturer's instructions (Illumina).

Sequencing
MiSeq sequencing was performed using the MiSeq v3 reagent kit protocol (Illumina). Briefly, the amplified DNA was quantified using a fluorimetric method based on ds-DNA binding dyes (Qubit). Each DNA sample was diluted to 4 nM using 10 mM Tris pH 8.5 and 5 uL of diluted DNA from each library were pooled. In preparation for cluster generation and sequencing, 5 uL of the pooled final library was denatured with 5 uL of freshly diluted

Amendments from Version 1
This new version is meant to answer the issues raised in the comments received from referees 1-3 (see also our responses to referees' reports). We added Table 1 and two references. Because the sensitivity of our analysis was changed, some fungal species that were initially described were not relevant anymore and have been removed from the main text and figures. All information related to version 1 is still available in the raw data and original files and figures are on the project GitHub repository.
See referee reports REVISED 0.2 N NaOH and combined with 30% PhiX control library to serve as an internal control for low-diversity libraries. After loading the samples on the MiSeq, paired 2x 300bp reads were generated and exported as FASTq files.

Bioinformatics analysis
The curated set of ITS sequences from the Refseq database (Targeted Loci) was used to build an ITS index for the Burrows-Wheeler Aligner (BWA, version 0.7.13) 12 . The BWA was used with standards parameters to map the paired-end reads of each beer from the fastq files to our ITS index. The BAM files were sorted and indexed using samtools 13 . A quality control of the BAM files was performed using SAMstat (version 1.5) 14 . A read quality threshold above 3 (MAPQ score) was applied in order to remove low quality and non-unique mapping reads. Subsequently, the number of ITS per beer and per species were counted and only species with over 10 reads were taken into consideration. Visualization of the results were performed with R (version 3.4.0).

Results
Over the month of June 2015, a total of 124 individuals contributed over 10,000 Euros to a crowdfunding campaign that provided financial resources for the first stage of the BeerDeCoded project. Reaching out to the public through this campaign also enabled crowdsourcing a collection of 120 beer samples from 20 countries. We have subsequently demonstrated that it is possible to extract DNA directly from bottled beer using low cost methodologies, typically available to citizen scientists (see Methods).
The internal transcribed spacer regions (ITS) of fungal species 15 were then amplified and, after quality control, 39 samples were sent for DNA sequencing. These 39 commercial beers originated from 5 different countries: 30 were from Switzerland, five from Belgium, two from Italy, one from France and one from Austria. We obtained an average library size of 600K reads (min 350K, max 2400K see Table 1) with more than 99% of reads mapping to the ITS database per sample.
A total of 42 fungal species were identified, 24 of which were present only in a single brew. This high variety of wild yeasts in commercial beers was unexpected (Figure 1 A), with some brews containing traces of up to more than 10 different fungal species (Figure 1 B). The beer in which we measured the highest ITS diversity (19 fungal species) was Waldbier 2014 Schwarzkiefer, an  Austrian beer brewed using pine cones collected in local forests. Two other beers contained more than 12 fungal species: La Nébuleuse Cumbres Rijkrallpa (a sour/wild ale beer made with cranberries and the fermented corn "Chicha") and Chimay Red Cap, a Belgian trappist beer. Using hierarchical clustering, we built a proximity tree of the different beers ( Figure 2).
Consistent with its widespread use for fermentation, brewer's yeast (Saccharomyces cerevisiae) was detected in all beer samples, accounting for between 11% (Orval, an ale beer by Belgian Brasserie d'Orval) and 99% (Tempête, an ale from the Swiss brewery Docteur Gab's) of all sequencing reads. In most samples, S. cerevisiae was present at very high levels (typically 90-97% of reads, Figure 3). More surprisingly, Saccharomyces mikatae, a species used in winemaking 16 was also relatively abundant in all samples (0.5-5%). Interestingly, most brews were found to contain low to medium abundance of multiple other yeast species, including Saccharomyces kudriavzevii and Saccharomyces eubayanus (a probable parent of Saccharomyces pastorianus) and Brettanomyces bruxellensis (typically used for the production of the Belgian beers). Non-conventional, as well as wild yeast, such as Saccharomyces cariocanus and Saccharomyces paradoxus, two species closely related to Saccharomyces cerevisiae were also found. Another example is Kazachstania sp., a wild yeast of commonly found in brines 17 . The presence of this species may be of interest, as it was previously reported that adding the parent Kazachstania servazzi to the brewing process 24 hours before the ale yeast contributed to the production of high level of esters, producing a strong fruity and floral aroma 18 .

Discussion and future perspectives
While a continuous process of market consolidation has lead to 5 companies controlling more than half of global beer production, there has been an explosion of craft industries over the past years, especially in Europe and North America. In 1978 there were 89 large industrial breweries in the USA. In 2016, there were 5,301, among them 3,132 small, independent microbreweries (American Brewers Association). There is a parallel with Hackuarium, an independent "craft" science initiative that has branched out from large institutional research institutes and provides an environment that allows scientists to explore topics that are rarely found in academia or industry. What is truly unique is the participation of individuals with no formal science training, and therefore the strong focus on citizen science and communication. With the BeerDeCoded project, we explored the potential of crowdfunding and crowdsourcing in engaging members of the general public in the production of scientific knowledge. We demonstrated that it is possible to execute complex molecular analyses on everyday products using limited resources and technical support from research institutions, and no financial support from traditional funding sources. The resulting dataset contains the ITS profile of 39 bottled beers from five different countries, revealing the low abundance but widespread presence of wild fungal species. It is a proof of concept that sequencing beer metagenomic information can be done, at least partly, with the help of the public. For the current analysis, we relied on high-throughput sequencing technology available to us through a partnership, a technology that may be out of reach for individuals working in non-traditional research environments. In the future, we would like to overcome this limitation, for example by providing a pipeline based on portable sequencing technologies, such as Oxford Nanopore's minION instrument. Further analyses could also go as far as shedding light on the so-called biological "dark matter" of the beer ecosystem 19,20 .
With the costs of DNA sequencing falling dramatically, and with the emergence of portable and user-friendly instrumentation, we believe that it is a favorable time to expand the application of DNA analysis to novel fields, including food and beverage. This industry is starting to explore the potential of genome sequencing to understand the contribution of various species to product characteristics. The sequencing of the full genome of 157 brewing yeast strains was, for example, recently reported 21 . Metagenomic analyses could also have important implications for the optimization and batch-to-batch reproducibility of the various fermentation processes, as well as quality control, traceability and authentication of the products. One hypothesis that could be investigated further in the future is whether the presence of a specific fungal species can be diagnostic for a unique geographic area. In our data set, the non Saccharomyces yeast that contributes to wine aroma through the production of volatile compounds, Wickerhamomyces anomalus, was found exclusively in five of the brews manufactured in Switzerland. The limited sample size, however, does not allow us to draw a statistically significant conclusion, and it remains to be seen if W. anomalus is present in beers from other locations as well. Due to inherent limitations of DNA sequencing, it is difficult to anticipate whether the microbes identified are likely to be having an impact on the fermentation process. However, based on the identification of strains present in brews with desired characteristics, controlled experiments in which the microbial composition of the brew is altered could allow us to investigate if the presence of specific microorganisms affects flavour 22 . The origin of each yeast species could also be investigated; i.e. whether they come with the ingredients or from the environment at the production site. Techniques to sample airborne DNA exist 23 . Furthermore, other protocols could also be used to catalogue plant DNA 24 , such as malt and hop varieties, and to map the bacterial diversity.  In order to standardize and simplify our pipeline, and facilitate the contribution of new data and their further analysis by individuals not involved in this initial study, we are in the process of developing a BeerDeCoded repository and a Galaxy instance 25 . This tool will enable any citizen scientist to carry out beer metagenomics and reproduce our analysis. In the meantime, we encourage researchers from other laboratories, microbreweries and citizen laboratories to further explore our data set, and invite them to consider contributing additional data in the future.

Data availability
The dataset contains the metagenomic profiles for 39 beers. The data was obtained using a targeted approach based on the phylogenetic typing with internal transcribed spacers (ITS) of ribosomal sequences. All methods, quality control, processed tables, metadata and code are accessible at: https://github.com/beerdecoded/Beer_ ITS_analysis. The raw data are stored in the SRA database in the bio project PRJNA388541

Grant information
This project was crowdfunded thanks to the support of 124 contributors to the BeerDeCoded campaign that took place in June 2015. For a full list of backers, see the kickstarter project page. Some of these individuals played a role in data collection, as they provided the beer samples of their choice for analysis and participated in DNA extraction workshops. I thank the authors for addressing all the concerns I brought up. However, while the post-alignment QC has reduced the amount of false positives, I'm still concerned about the lack of pre-alignment read trimming (which should be standard practice). The methods section or material on the project's github page make no mention of any trimming of the reads prior to alignment. As demonstrated in my first referee report, by trimming potential adaptor and primer sequences and removing poor quality bases at the ends of reads, the fungal diversity of the beers is reduced further (i.e. the amount of false positives are decreased), even with lenient post-alignment filtering (MAPQ < 3). I would recommend that the authors either redo the analysis once more with proper pre-alignment filtering as well, or clearly state in the results and discussion section that no such step was carried out and that the results may therefore contain a large number of false positives. For example, it is highly unlikely that all beers would contain traces of , a yeast that to my knowledge never before has been isolated from fermentation Saccharomyces mikatae environments and only ever from forest samples in Asia (1, 2). As demonstrated by the results in my previous referee report, is no longer detected in the samples I analyzed when pre-alignment S. mikatae quality trimming was performed. I also participated in a crowdfunding campaign to use next-gen enunciating to Competing Interests: analyze beer samples (https://experiment.com/projects/mapping-the-sour-beer-microbiome). This data note describes the fungal microbiome of 39 (commercial and homebrewed) beers as determined by next generation sequencing of ITS amplicons. The project was crowdfunded and many of the individual funders were also involved in providing beer samples and assistance during DNA extraction. While the results will be of interest, particularly for the brewing industry, I have some concerns with the analysis methods and the results presented in this first version of the manuscript.

Major comments:
To determine the microbiome of the beers, the authors align the raw sequencing reads to a concatenated dataset containing fungal ITS sequences. To my understanding, no quality control or filtering was performed prior to and after the alignment. This will cause a large number of false positives, as many of performed prior to and after the alignment. This will cause a large number of false positives, as many of the intragenic ITS sequences are very similar. To demonstrate, I repeated the analysis on six samples I retrieved from the NCBI SRA: According to the results presented in the manuscript, each of the samples contained traces of at least 11 different species (see Figure 1 and Figure 3).
What I did to the sequencing reads was 1. Trim them using 'cutadapt' as follows (any similar tool would do the same job) Remove 20 first bases of each read Remove bases from end of read when quality score is less than 15 Remove any reads shorter than 200 base pairs Approximately 80% of the bases were retained from each set of reads after these steps 2. After this the reads were aligned to the concatenated ITS sequence dataset using bwa mem with default settings as the authors had done. Reads mapping to the different sequences were then counted with the script used by the authors (obtainable from github). The results with no post alignment filtering: https://www.dropbox.com/s/llg94fgk23ag264/Beer_results_nofilter.txt 3. After this I removed all reads that did not map to a unique location (i.e. could be mapped to the ITS sequences of several species) and reads where the two paired reads mapped to different sequences. This was done by removing all alignments with a MAPQ score below 4 and 'awk': https://www.dropbox.com/s/0iimh5fbb40qeh0/Beer_results_mapq4.txt As can be seen, the diversity is reduced considerably, and if all hits where the read count is less than 10 are also removed (as the authors had done), most samples now contain only and/or S. cerevisiae B. . bruxellensis 4. The amount of false positives can be further reduced by filtering by a higher MAPQ score (e.g. 30): https://www.dropbox.com/s/3x4gyylsiykyu7o/Beer_results_mapq30.txt I therefore suggest that the authors redo the analysis with proper filtering to remove poor quality alignments and false positives in the results. The results and conclusions will subsequently have to be rewritten accordingly.

Minor comments:
In the Methods section, under Beer sample preparation: I assume the DNA was extracted from the frozen yeast pellet? Any reason why it was not attempted to extract DNA from the beer itself, e.g. using the method described in reference (23)? This would allow analysis of filtered beers as well.
Is Figure 2 necessary, since Figure 3 shows the clustering as well? Also, why does the clustering in Figures 2 and 3 differ? Were these generated using different clustering methodologies?
It is mentioned that some of the beers contained speciality ingredients, such as pine cones. Do the authors know at what point in the production process these were added (i.e. pre-or post-boil)? This would have a large impact on how these ingredients affect the beer microbiome.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound? Partly

Are the datasets clearly presented in a useable and accessible format? Yes
No competing interests were disclosed. Competing Interests:

I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
Author Response 10 Oct 2017 , Hackuarium, Switzerland Luc Henry Regarding the major comments: -We thank the referee for his thorough analysis of our results and for his comments. In our analysis, we initially choose to favor sensitivity over specificity and we did not filtered the reads based on their quality. We considered that, with 300 bp paired-end reads on ITS amplicons, we had enough specificity. But as referee 3 pointed out, we should take care of multiple mapping reads to for instance discriminate between the Saccharomyces sp. that are quite close to each other. Using a filter on MAPQ score > 30 is quite stringent, but the referee's point on non-unique reads is critical. We have now re-performed the analysis using a series of MAPQ threshold. As expected, the higher the threshold, the more we lose some of the ITS detected, and some sensitivity. In our update of the article, we report results after using a filter with a MAPQ score of 3. This led to a reduction of the total fungal species identified, from 88 to 42, as well as of the unique occurrences, from 52 to 24. In addition we have performed quality control (QC) with samstat of our BAM files in order to see the quality score distribution of the reads of the alignment to the ITS database. The result of samstat has been added to our github repository. Samstat results show that about 3.2 % of reads are not aligned when reads with MAPQ score > 3 are taken into account. In addition we have checked the distribution of fragments size in the BAM files. We have an average fragment size of 383 bp with a sd of 10 bp. We have a very low fraction of fragments below 200 bp, or discordant pairs. Thanks to the comments of the Referee 3, we improved the specificity of the analysis and we may have excluded artifacts. Due to this re-analysis, we have updated Figure 1, Figure 2 and Figure 3, the code in our github repository as well as the main text. We have also added the QC informations.
Regarding the minor comments: -With the goal to maximise our chances to obtain some preliminary results with the budget of the Regarding the minor comments: -With the goal to maximise our chances to obtain some preliminary results with the budget of the kickstarter campaign, we decided to use the pelleted material with the educated assumption that pelleted cells may protect DNA better than DNA left in solution. Reference (23) intrigued us: their DNA preparation method requires additional lyophilization, pulverization and digestion with amylase and we decided to start with an easier protocol to facilitate participation from the general public.
-We included both Figure 2 and Figure 3 to provide two different representations of the same results. The differences in the tree are due to the different display of the dendrogram. The same three is underling based on the same distance (euclidian distance on log10 counts matrix, with ward clustering method). -The last comment is an interesting work hypothesis. We however do not know when these ingredients are added to the brewing process, and whether they were pre-treated before addition. If these ingredients contained living microorganisms at the moment of addition, they may indeed affect the brewing process and the taste of the final product. We believe collecting this metadata is outside the scope of this dataset description.
No competing interests were disclosed. . present fungal microbiome data from 39 different beers as the culmination et al of a crowdfunded citizen science campaign. These data will be of interest to citizen scientists and financial backers of the project, as well as those in the fermentation (especially beer) industry. Overall, the data seem sound, but I have some concerns:

Major comments
Were any controls for contamination used, , are all of the fungi identified actually from the beer i.e. samples? The sequencing of a non-beer sample such as water that had been handled in the same way as the beer samples would help to determine if fungal DNA contamination occurred during sample processing.
In line with the comment above, the manuscript states that "…microorganisms, or their DNA, could be carried over from the ingredients to the final product." Can the authors comment on whether they know if they are detecting the fungi themselves or DNA remnants from fungi that came from the raw ingredients of the beer? Again, one could add purified control DNA to a mash, brew and bottle a beer, and then try to detect that DNA by PCR in the end product (or even at various stages along the brewing and fermentation process). Attempting this with various concentrations of DNA would also yield information on how many cells of a particular species would be necessary on malted barley, for instance, to be detected in the final beer.

Minor comments
In the introduction, the authors state that "…sour beer…[is] produced without the controlled addition of In the introduction, the authors state that "…sour beer…[is] produced without the controlled addition of known yeast cultivates." Although this may be true for some types of sour beer like lambic and gueuze, many sour beers made in the U.S. are inoculated with known strains of yeast. In those cases, the souring bacteria are usually the unknowns.
Why was a fecal DNA prep kit used for DNA extraction?
The authors collected 120 beers from 20 countries but only sequenced the fungi from 39 (mostly from Switzerland). Is there an explanation for this attrition?

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound? Partly

Are sufficient details of methods and materials provided to allow replication by others? Partly
Are the datasets clearly presented in a useable and accessible format? Yes I also participated in a crowdfunding campaign to use next-gen enunciating to Competing Interests: analyze beer samples (https://experiment.com/projects/mapping-the-sour-beer-microbiome).
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
Author Response 10 Oct 2017 , Hackuarium, Switzerland Luc Henry Regarding major comments: -We did not include a water sample to account for any contaminant during the DNA extraction process. Processing such a control at this point would not reflect the original experimental conditions. We will include this control in the future when we will process new beers.
-We did think of experiments to identify DNA remnants from raw ingredients but did not have the means to perform them. Indeed, the beer samples were sent from all over the world (Europe and Switzerland for the 39 reported in this data set) and we had no possibility to collect other related samples (raw ingredients, brewing environment, etc). We think that it is out of the scope of the current study. In this setup, it is not possible to know afterwards from which ingredient the DNA comes from and we comment on this in the text.
Regarding minor comments: -We rephrased the sentence about sour beers accordingly.
-A majority of the samples we received were from industrial/filtered beers. Unfortunately the volumes we had at hand (typically a 330 mL bottle or below) did not yield enough material (DNA of good purity) to obtain sequencing results, as judged by QC of PCR products. In order to detect DNA in these beers, we would probably need to process a much larger volume of beer. We therefore did not include these samples in the final data set.
No competing interests were disclosed. clustered into OTUs? This would be needed to understand how many species/OTUs are in a given sample but could not be classified due to a lack of reference database. Without this the ITS diversity in a sample cannot be correctly estimated.

Minor comments:
"we built the proof of concept for a targeted metagenome analysis pipeline for beer samples that can be used in high schools, citizen science laboratories, craft breweries or industrial plants" It would be good to at least briefly discuss how this is currently limited by the need to have access to a high-throughput sequencer. It would be great if " " could be defined in the introduction for those not too familiar with terroir oenology "a total of 88 fungal species were identified, including 52 unique occurrences" are unique occurrences those species which are only found in a single beer? I'd suggest rephrasing it for a better understanding. "Interestingly, most brews were found to contain low to medium presence of multiple other yeast species, including Saccharomyces bayanus (used in winemaking and cider fermentation), Saccharomyces kudriavzevii and Saccharomyces pastorianus (used in lager manufacturing), Saccharomyces eubayanus (a probable parent of Saccharomyces pastorianus) and Brettanomyces bruxellensis (typically used for the production of the Belgian beer styles)" please include citations for these explanations of the different taxa. It's a matter of taste, but I recommend rethinking the use of , c.f. "microbial dark matter" for an explanation of why. http://merenlab.org/2017/06/22/microbial-dark-matter/

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound? Partly

Are sufficient details of methods and materials provided to allow replication by others? Partly
Are the datasets clearly presented in a useable and accessible format? Yes We thank the referee for his constructive remarks, and in particular for his comments on our methods section. We have added the details he requested (see below).

Regarding major comments:
Regarding major comments: -The minor modifications to the ZR Fecal DNA MiniPrep kit instructions are already described in the methods section of the article, as well as on the methods description available in the GitHub repository. These consist of, instead of starting with 50-100 mg of fecal material as suggested by the manufacturer, using a sludge pellet obtained through centrifuging 50 mL of beer (as described in the methods). Similar to soil and fecal material, beer samples were found to contain PCR inhibitors (see Juvonen and Haikara, J. Inst. Brew. 115(3), 167-176, 2009). The ZR Fecal DNA MiniPrep kit can overcome this because it provides filter columns designed to remove PCR inhibitors. This choice was previously described by Bukolich and coworkers (eLife 2015;4:e04634 DOI: 10.7554/eLife.04634).
-In order to do the alignment we have used BWA (version 0.7.13 ) with standards parameters with 300 bp paired-end reads.
-The ITS reference database contains 5361 ITS sequences. The average size of ITS is 585 bp with a standard deviation of 90 bp.
-Hierarchical clustering was done by applying the basic Ward clustering algorithm with the euclidian distance computed on the log10 read count. We modified the legend of the Figure 2 accordingly.
-We have to acknowledge the relatively large size variability between the different libraries. The amount of ITS detected can be affected by the sequencing depth and by the richness of the beer ecosystem. Accordingly, if the sequencing is not deep enough, we will clearly miss some low abundance species. If the beer sample contains a low variety of fungal species, sequencing deeper will not however provide additional information. In our analysis, it does not seem to be any correlation between the library size and the variety of ITS detected. The sample producing the largest library (2,4 mio reads, "Les Trois Dames") is not the one with the largest detected ITS variety (11 species). Also, the beer sample containing the largest ITS diversity ("Waldbier 2014 Schwarzkiefer", with 38 fungal species) had a library size of only about 0.35 mio reads. A rarefaction analysis is beyond the scope of this dataset description. As suggested, we added a table (Table 1) with the different informations regarding the libraries such as the mapping percentage and the total number of reads. In this table we observe that more than 99% of reads map to the ITS database. Consequently, there is a limited interest to try to find missing species or Operational Taxonomic Units (OTUs), and we can reasonably conclude that the ITS database used is comprehensive enough.
Regarding minor comments: -The current data-set is a proof of concept that sequencing beer metagenomic information can be done, at least partly, with the help of the public. For the current analysis, we indeed had to rely on high-throughput sequencing technology available to us through a partnership with the genomic facility at the University of Lausanne. In the future, we would like to overcome this limitation, e.g. by using a minION sequencer. A remark was added to the discussion.
-The text was modified to clarify the notion of "terroir".
-The text was modified to clarify the notion of "unique occurrences".
-The text was modified (species were removed) to reflect changes due to an updated sensitivity of the analysis (based on another referee's comment), and a reference was added. -The so-called "microbial dark matter" concept is regularly used by microbiologist and we think that it points to an interesting hypothesis worth mentioning, although it has nothing to do with the dark matter in physics, as explained in the blog post mentioned.
No competing interests were disclosed. Competing Interests: