Associations between chlorophyll a and various microcystin health advisory concentrations

Cyanobacteria harmful algal blooms (cHABs) are associated with a wide range of adverse health effects that stem mostly from the presence of cyanotoxins. To help protect against these impacts, several health advisory levels have been set for some toxins. In particular, one of the more common toxins, microcystin, has several advisory levels set for drinking water and recreational use. However, compared to other water quality measures, field measurements of microcystin are not commonly available due to cost and advanced understanding required to interpret results. Addressing these issues will take time and resources. Thus, there is utility in finding indicators of microcystin that are already widely available, can be estimated quickly and in situ, and used as a first defense against high levels of microcystin. Chlorophyll a is commonly measured, can be estimated in situ, and has been shown to be positively associated with microcystin. In this paper, we use this association to provide estimates of chlorophyll a concentrations that are indicative of a higher probability of exceeding select health advisory concentrations for microcystin. Using the 2007 National Lakes Assessment and a conditional probability approach, we identify chlorophyll a concentrations that are more likely than not to be associated with an exceedance of a microcystin health advisory level. We look at the recent US EPA health advisories for drinking water as well as the World Health Organization levels for drinking water and recreational use and identify a range of chlorophyll a thresholds. A 50% chance of exceeding one of the specific advisory microcystin concentrations of 0.3, 1, 1.6, and 2 μg/L is associated with chlorophyll a concentration thresholds of 23, 68, 84, and 104 μg/L, respectively. When managing for these various microcystin levels, exceeding these reported chlorophyll a concentrations should be a trigger for further testing and possible management action.


Abstract
Cyanobacteria harmful algal blooms (cHABs) are associated with a wide range of adverse health effects that stem mostly from the presence of cyanotoxins. To help protect against these impacts, several health advisory levels have been set for some toxins. In particular, one of the more common toxins, microcystin, has several advisory levels set for drinking water and recreational use. However, compared to other water quality measures, field measurements of microcystin are not commonly available due to cost and advanced understanding required to interpret results. Addressing these issues will take time and resources. Thus, there is utility in finding indicators of microcystin that are already widely available, can be estimated quickly and , and used as in situ a first defense against high levels of microcystin. Chlorophyll is commonly a measured, can be estimated , and has been shown to be positively in situ associated with microcystin. In this paper, we use this association to provide estimates of chlorophyll concentrations that are indicative of a higher a probability of exceeding select health advisory concentrations for microcystin. Using the 2007 National Lakes Assessment and a conditional probability approach, we identify chlorophyll concentrations that are more likely than not a to be associated with an exceedance of a microcystin health advisory level. We look at the recent US EPA health advisories for drinking water as well as the World Health Organization levels for drinking water and recreational use and identify a range of chlorophyll thresholds. A 50% chance of exceeding one of a the specific advisory microcystin concentrations of 0.3, 1, 1.6, and 2 μg/L is associated with chlorophyll concentration thresholds of 23, 68, 84, and 104 a μg/L, respectively. When managing for these various microcystin levels, exceeding these reported chlorophyll concentrations should be a trigger for a further testing and possible management action.

Referee Status:
Invited Referees Over the last decade, numerous events and legislative activities have raised the public awareness of harmful algal blooms [1][2][3] . In response the US Environmental Protection Agency (USEPA) has recently released suggested microcystin (one of the more common toxins) concentrations that would trigger health advisories [4][5][6] . Additionally, the World Health Organization (WHO) has microcystin advisory levels for drinking water and for a range of recreational risk levels 7,8 . While these levels and associated advisories are likely to help mitigate the impacts from harmful algal blooms, they are not without complications.
One of these complications is that they rely on available measurements of microcystin. While laboratory testing (e.g., chromatography) remains the gold standard for quantifying microcystin concentrations in water samples, several field test kits have been developed. Even though field tests provide a much needed means for rapid assessment, they are not yet widely used and are moderately expensive (approximately $150-$200 depending on specific kit) with a limited shelf life (typically one year) 9,10 . Additionally, each technique requires nuanced understanding of the detection method (e.g., limit of detection, specific microcystin variants being measured, and sampling protocol).
Fortunately, cyanobacteria and microcystin-LR has been shown to be associated with several other, more commonly measured and well understood components of water quality that are readily assessed in the field 11 . For instance, there are small or hand held fluorometers that measure chlorohpyll a. Additionally, chlorophyll a is a very commonly measured component of water quality that is also known to be positively associated with microsystin-LR concentrations 12,13 . Recently, Yuan et al. 13 explored these associations in detail and controlled for other related variables. In their analysis they find that total nitrogen and chlorophyll a show the strongest association with microcystin. Furthermore, they identify chlorophyll a and total nitrogen concentrations that are associated with exceeding 1 µg/L of microcystin. These findings suggest that chlorophyll a concentrations could also track the new USEPA microcystin health advisory levels for drinking water. Identifying this association would provide an important tool for water resource managers to help manage the threat to public health posed by cHABs and would be especially useful in the absence of measured microcystin concentrations.
In fact, this is a similar tact to the World Health Organization who, in addition to advisory levels for microcystin, have also proposed related advisory levels for cyanobacteria abundance and chlorophyll a 7,8 . The chlorophyll a concentrations proposed by the WHO are for low (< 10 µg/L), moderate (between 10 and 50 µg/L, high (between 50 and 5000 µg/L), and very high risk (>5000 µg/L) 8 . While these advisories have proven to be useful tools they do suffer from being coarse, broad, and have been found to overestimate actual risk 14 .
In this paper we build on these past efforts and utilize the National Lakes Assessment (NLA) data and identify chlorophyll a concentrations that are associated with higher probabilities of exceeding several microcystin health advisory concentrations 6,8,15 . We build on past studies by exploring associations with the newly announced advisory levels and by also applying a different method, conditional probability analysis. Utilizing different methods strengthens the evidence for suggested chlorophyll a levels that are associated with increased risk of exceeding the health advisory levels as those levels are not predicated on a single analytical method. So that others may repeat or adjust this analysis, the data, code, and this manuscript are freely available via https://github.com/USEPA/microcystinchla.

Data
We used the 2007 NLA chlorophyll a and microcystin-LR concentration data 15 . These data represent a snapshot of water quality from the summer of 2007 for the conterminous United States and were collected as part of an ongoing probabilistic monitoring program 15 . Water quality data, including chlorophyll a and microcystin-LR were obtained via an integrated sample taken from the surface of the lake down to 2 meters. Samples were taken at the same time from the index site (e.g. near the centroid of the lake) and these provide the source for both chlorophyll a and microcystin-LR 15 .
For our analysis we only used samples that were part of the probability sampling design (i.e. no reference samples) and from the first visit to the lake (e.g. some lakes were sampled multiple times). The detection limit for microcystin-LR was 0.05 µg/L. Approximately 67% of lakes reported microcystin-LR at the detection limit. For this analysis we retained these values as removing them would erroneously reduce the confidence intervals around the conditional probabilities. Data on chlorophyll a and microcystin-LR concentrations are available for 1028 lakes.

Amendments from Version 1
This second version was edited as suggested by the three referees and details are outlined in our response. A summary of these changes are: -Bootstrapped conditional probabilities were re-run with 10,000 iterations. This resulted in slightly different final values, although conclusions are unchanged -Several references were added about related studies -More details about the National Lakes Assessment sampling protocol were added.
-A static copy of the dataset is now included with the repository at https://github.com/USEPA/Microcystinchla/ raw/master/inst/extdata/nla_dat.csv -Corrected typos and unclear text -Added discussion about possible regional relationships -Clarified how measurements at the detection limit were handled See referee reports REVISED analysis are freely available as an R package at https://github.com/ USEPA/microcystinchla.
Lastly, we assessed the ability of these chlorophyll a thresholds to predict microcystin exceedance. We used error matrices and calculate total accuracy as well as the proportion of false negatives. Total accuracy is the total number of correct predictions divided by total observations. The proportion of false negatives is the total number of lakes that were predicted to not exceed the microcystin guidelines but actually did, divided by the total number of observations.

Results
In  Table 2. Specific chlorophyll a concentrations that are associated with greater than even odds of exceeding the advisory levels were 23, 68, 84, and 104 µg/L for 0.3, 1.0, 1.6, and 2.0 µg/L advisory levels, respectively (Table 2 & Figure 1).
The chlorophyll a cutoffs may be used to predict whether or not a lake exceeds the microcystin health advisories. Doing so allows us to compare the accuracy of the prediction as well as evaluate false negatives. Total accuracy of the four cutoffs predicting microcystin exceedances were 74% for the USEPA children's drinking water advisory, 86% for the WHO drinking water advisory, 89% for the

Analytical methods
We used a conditional probability analysis (CPA) approach to explore associations between chlorophyll a concentrations and World Health Organization (WHO) and USEPA microcystin health advisory levels 17 . Many health advisory levels have been suggested (Table 1), but lakes with higher microcystin-LR concentrations in the NLA were rare. Only 1.16% of lakes sampled had a concentration greater than 10 µg/L. Thus, for this analysis we focused on the microcystin concentrations that are better represented in the NLA data. These were the USEPA children's (i.e. bottle fed infants to pre-school age children) drinking water advisory level of 0.3 µg/L (USEPA Child), the WHO drinking water advisory level of 1 µg/L (WHO Drinking), the USEPA adult (i.e. beyond pre-school aged individuals) drinking water advisory level of 1.6 µg/L (USEPA Adult), and the WHO recreational, low probability of effect advisory level of 2 µg/L (WHO Recreational).
Conditional probability analysis provides information about the probability of observing one event given another event has also occurred. For this analysis, we used CPA to examine how the conditional probability of exceeding one of the health advisories changes as chlorophyll a increases in a lake. We expect to find higher chlorohpyll a concentrations to be associated with higher probabilities of exceeding the microcystin health advisory levels. We also calculated bootstrapped 95% confidence intervals (CI) using 10,000 bootstrapped samples. Thus, to identify chlorophyll a concentrations of concern we identified the value of the upper 95% CI across a range of conditional probabilities of exceeding each health advisory level. Using the upper confidence limit to identify a threshold is justified as it ensures that a given threshold is unlikely to miss a microcystin exceedance.
As both microcystin-LR and chlorophyll a values were highly right skewed, a log base 10 transformation was used. Additional details of the specific implementation are available at https: //github.com/USEPA/microcystinchla. A more detailed discussion of CPA is beyond the scope of this paper, but see Paul et al. 18 and Hollister et al. 19 for greater detail. All analyses were conducted using R version 3.2.3 and code and data from this   USEPA adult drinking water advisory, and 91% for the WHO recreational advisory (Table 3, Table 4, Table 5, & Table 6). However, total accuracy is only one part of the prediction performance with which we are concerned.
When using the chlorophyll a cutoffs as an indicator of microcystin exceedances, the error that should be avoided is predicting that no exceedance has occurred when in fact it has. In other words, we would like to avoid Type II errors and minimize the proportion of false negatives. For the four chlorophyll a cut-offs we had a proportion of false negatives of 9%, 8%, 6%, and 5% for the USEPA children's, the WHO drinking water, the USEPA adult, and the WHO recreational advisories, respectively. In each case we missed less than 10% of the lakes that in fact exceeded the microcystin advisory. While this method performs well with regard to the false negative percentage, it is possible that is a relic of the NLA dataset and testing with additional data would allow us to confirm this result.

Discussion
The log-log association between microcystin-LR and chlorophyll a indicates that, in general, higher concentrations of microcystin-LR almost always co-occur with higher concentrations of chlorophyll a yet the inverse is not true ( Figure 2). Higher chlorophyll a is not necessarily predictive of higher microcystin-LR concentrations; however, chlorophyll a may be predictive of the probability of exceeding a certain threshold.
Indeed, the probability of exceeding each of the four tested health advisory levels increased as a function of chlorophyll a concentration ( Figure 1). We used this association to identify chlorophyll a concentrations that were associated with a range of probabilities of exceeding a given health advisory level (Table 2). For the purposes of this discussion we focus on a conditional probability of 50% or greater (i.e., greater than even odds to exceed a health advisory level). The 50% conditional probability chlorophyll a thresholds represents 28.6%, 11%, 8.9%, and 7.2% of sample lakes for the USEPA Child, the WHO Drinking, the USEPA Adult, and the WHO recreational levels, respectively.
There are numerous possible uses for the chlorophyll a and microcystin advisory cut-off values. First, in the absence of microcystin-LR measurements, exceedence of the chlorophyll a concentrations could be a trigger for further actions. Given that there is uncertainity around these chlorophyll a cutoffs the best case scenario would be to monitor for chlorophyll a and in the event of exceeding a target concentration take water samples and have those samples tested for microcystin-LR.
A second potential use is to identify past bloom events from historical data. As harmful algal blooms are made up of many species and have various mechanisms responsible for adverse impacts (e.g., toxins, hypoxia, odors), there is no single definition of a bloom. For cHABs, one approach has been to utilize phycocyanin to screen for or identify bloom events 20 . This is a useful approach, but phycocyanin is not always available, thus limiting its utility especially for examining historical data. Using our chlorophyll a cutoffs provides a value that is also associated with microcystin-LR and can be used to classify lakes, from past surveys, as having bloomed.   The values we propose are national and may miss regional variation in water quality, including, chlorophyll a and microcystin-LR 22 . A set of regional conditional probabilities would be interesting; however, limiting the analysis to the data available per region would make interpretation difficult. The sample size for each of the regional conditional probabilities would be reduced and the number of lakes in each region that exceed the microcystin values would also be reduced. Thus, our confidence in the conditional probabilities would be less (i.e. greatly increased confidence intervals) and the relationships less pronounced as we have fewer lakes on which to base the probabilities. Thus, this dataset is best for making national scale recommendations.
There are two other limitations with the 2007 National Lakes Assessment dataset. First, it represents a single sample from a lake and does not capture temporal dynamics. Second, validation of the predictions with the 2007 data alone would be challenging as the data would need to be subset and this would only sever to increase the uncertainty of our conditional probabilities, reducing our ability to validate the presence of microcystin-LR. The 2012 National Lakes Assessment would be ideal for this task. However, as of this writing, the 2012 National Lakes Assessment data are not public. When these data are released, a validation of this approach can be completed then.
Lastly, using chlorophyll a is not meant as a replacement for testing of microcystin-LR or other toxins. It should be used when other, direct measurements of cyanotoxins are not available. In those cases, which are likely to be common at least in the near future, using a more ubiquitous measurement such as chlorophyll a will provide a reasonable proxy for the probability of exceeding a microcystin health advisory level and provide better protection against adverse effects in both drinking and recreational use cases.

Competing interests
No competing interests were disclosed.

Grant information
The author(s) declared that no grants were involved in supporting this work.

General comments
This study provides an elegant framework to predict the severe impairment of U.S. lakes and reservoirs by cyanobacterial harmful algal blooms. I especially appreciated the clever use of conditional probability analysis to identify chlorophyll threshold above which MC concentrations exceed WHO drinking water a and recreational provisional guidelines. Chlorophyll is a regularly measured water quality variable, and a this study indeed offers a promising approach that can me used in many lakes around the world to identify problematic lakes or regions. In that regard, it would have been interesting to account for the spatial heterogeneity across this landscape --as indicated by Beaver (2014), some regions of the et al. continental U.S. are more likely to be MC hotspots. Accounting for this heterogeneity will likely help explain some of the noise in the biplot shown in Fig. 2. I have also analyzed the MC data from the same dataset and found that accounting for different ecoregions in my model (as presented in Beaver et al., 2014) helped further explain the probability of detecting versus failing to detect MC in lakes. I am curious to know how this would play out with your modeling approach (conditional probability analysis).

P4
Change: "Thus, to identify chlorophyll a concentrations of concern we identify the value [...]" To: "Thus, to identify chlorophyll a concentrations of concern we identified the value [...]" Change: "were highly skewed right," To: "were highly right skewed," Change: "Lastly, we assess the ability of" To: "Lastly, we assessed the ability of" Change: "We use error matrices and calculate total accuracy" To: "We used error matrices and calculate total accuracy" Change: "For chlorophyll a, the range was" To: "Chlorophyll a ranged from" Please specify that this chlorophyll a range corresponds to a range from oligotrophic to hypereutrophic lakes.
Change: "The associations between chlorophyll a and the upper confidence interval" To: "The association between chlorophyll a and the upper confidence interval" Figure 2 should first be presented in the Results section.
Change: "This is the case as the probability of exceeding each of the four tested health advisory levels increases as a" To: "Indeed, the probability of exceeding each of the four tested health advisory levels increased as a" Change: "We used this association to identify chlorophyll a concentrations that are associated" To: "We used this association to identify chlorophyll a concentrations that were associated"

I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
No competing interests were disclosed.

Competing Interests:
Author Response 01 Jun 2016 , US EPA, USA Jeffrey Hollister Thank you for your review. We have just submitted our revisions and expect the new version to be available in the next few days. Below are our responses to the specific issues you raise. We feel the paper is stronger after this revision. Any questions, feel free to contact us or comment here. Thanks again! ## Spatial/Regional Heterogeneity in Chlorophyll/Microcystin realtionship We agree that there are likely regional differences and would like to account for this; however, sample sizes for each region vary (67 to 155) and are relatively small. The resulting conditional probability analysis would have very wide confidence intervals. Thus, comparison between regions would be difficult and inferring a pattern would not be possible. We have added additional text in the discussion (second to last paragraph) that raises this issue. Additionally, we have added the Beaver et al. reference in this discussion. -Change: "Thus, to identify chlorophyll a concentrations of concern we identify the value [...]", To: "Thus, to identify chlorophyll a concentrations of concern we identified the value [...]" -**Response:** Changed -Change: "were highly skewed right,", To: "were highly right skewed," -**Response:** Changed -Change: "Lastly, we assess the ability of", To: "Lastly, we assessed the ability of" -**Response:** Changed -Change: "We use error matrices and calculate total accuracy", To: "We used error matrices and calculate total accuracy" -**Response:** Changed -Change: "For chlorophyll a, the range was", To: "Chlorophyll a ranged from" -**Response:** Changed -Please specify that this chlorophyll a range corresponds to a range from oligotrophic to hypereutrophic lakes. -**Response:** We did not changes this as we fill Figure 2 is not presenting results of our analysis but justifying the approach as is a better fit to be introduced in the Discussion.

## Specific Edits
-Change: "This is the case as the probability of exceeding each of the four tested health advisory levels increases as a", To: "Indeed, the probability of exceeding each of the four tested health advisory levels increased as a" -**Response:** Changed -Change: "We used this association to identify chlorophyll a concentrations that are associated", To: "We used this association to identify chlorophyll a concentrations that were associated" -**Response:** Changed Overview: The manuscript/article addresses a critical question applicable to recreational and drinking water managers: Can we rapidly predict potentially harmful cyanobacteria blooms using traditional water quality methods? This question is likely to become more relevant according to the most current literature as blooms are expected to increase in frequency in the midst of a warming global climate facing more extreme storm and drought events . Using the rather large and nationally applicable National Lakes 1-6 extreme storm and drought events . Using the rather large and nationally applicable National Lakes Assessment database for the United States, the authors demonstrate some of the strengths and weaknesses of using chlorophyll as an indicator for at-risk conditions that could warrant management a action or follow-up testing for cyanotoxins. The authors are also able to assign action levels or at least share possible action levels for management action using conditional probabilities. The strengths and weaknesses of selected probabilities are described using analyses similar to specificity and sensitivity in the form of accuracy and 'avoiding type II errors'. The authors do not propose using chlorophyll as a a proxy to replace toxin measurement, but as a tool to help facilitate targeted monitoring of toxins during at-risk conditions.
The article is meritorious in that it does provide a meaningful starting point for lakes Overall Comments: with no phycocyanin measures and for providing a meaningful starting point for developing some semblance of an action level that could be employed by recreational water and drinking water managers concerned with cyanotoxins. The article does stand to improve significantly in some key areas, which are as follows: (1) Additional discussion in methods related to the National Lakes Assessment (2) Additional discussion needed on how the data were organized for data analysis (3) Improved discussion needed on alternative indicators for cHABs and cyanotoxins not assessed in the NLA (4) Consideration of region-specific criteria or limitations of national recommendations for chl . a (5) Greater discussion on limitations of NLA and need for model validation/future studies.
(1) Additional discussion needed in methods related to the National Lakes Assessment: The readership may not be aware of the U.S. NLA performed in 2007. The author(s) should clarify where samples were collected (nearshore or from the surface in the deeper waters). NLA chlorophyll samples a were take from the profundal zone rather than the littoral zone. The readership may also be interested in how many chl samples were collected from each lake. Where were the microcystin-LR samples a collected?
(2) Additional discussion needed on how the data were organized for data analysis: Were these samples paired (collected at the same time from the same locale) or are these some type of aggregated value over a lake season? Describing this in the methods will really help for understanding the importance of this work. Paired results (MC-LR and Chl from the same day) are much more impactful for a demonstrating the rapid advantage of chl compated to using results that are a seasonal average a indicating that the hypereutrophic and eutrophic lakes (ones with the highest chl ) are also the ones that a are most likely to have a cyanoHAB event sometime during the year.
(3) Improved discussion needed on alternative indicators for cHABs and cyanotoxins not assessed in the Brief mention is given to phycocyanin (one study), and the additional language (about phycocyanin NLA: not always being available for measure and when measured, it is for only measuring pigment and not toxins) is equally relevant for chl . The same handheld fluorometers and continuous monitoring a in vivo solutions available for chl are now widely available for phycocyanin, often at the same cost as a rapid a measure for chl . Phycocyanin, like chl does not measure toxin either, but phycocyanin in many a a, studies has outperformed chl , and in some studies it has not (especially when toxin concentration is a low). Historical records on PC are likely not as great as chlorophyll . Overall, several studies on this topic a have been produced in the last two to four years (see Zamyadi and Dorner's work), with one study using phycocyanin to predict non-alcoholic liver disease presuming a relationship with cyanotoxins (Zhang et al.

2015)
1-6 (4) Consideration desired on region-specific criteria or limitations of national recommendations for chl : a With nearly 30% of the lakes in the temperate plains being coded as poor for chlorophyll in the 2007 a NLA, what impact would these conditional probabilities have on these lakes? Should the lake managers in this region be monitoring continuously all the time? What are the mean/median chlorophyll levels for a this part of the U.S? Regional variability may be really important and did the conditional probability approach take this into consideration or can it take it into consideration? Is there a way to evaluate if there are significant regional effects in the U.S? For nutrient standards in the U.S. and macroinvertebrate assessments, EPA has had to issue region-specific guidelines/criteria, etc. for some parameters.
The (5) Greater discussion needed on limitations of NLA and need for model validation/future studies: paper fails to address the limitations of the NLA -as a reader, I'm not aware of the limitations. I have much respect for the NLA, but I do have questions regarding the number of samples for each lake. Furthermore, a statement or two discussing the need to validate modeled data may be worthwhile. Is there a way to see if the probabilities actually align with the accuracy and type II error rates predicted by the conditional probability approach?
Near the bottom of the abstract, the units seem quite high for Abstract-Specific Comments: microcystins (g/L) rather than micrograms/L. The micro Greek symbol (mu) may have been lost during uploading.
(1) In discussing the lake exceedances of the various recommended levels by EPA, Results Comments: the addition of 'drinking water' is appropriate in my opinion. Although it is mentioned earlier in the methods, further providing the information in the results is helpful to a novice reader or a person just becoming familiar with drinking water regulations and guidelines, as the U.S. EPA child level may be presumed by a reader to be a level for recreation in a lake rather than a level associated with finished drinking water after water treatment. (2) "All lakes had reported chl concentrations that exceeded a detection limits" Does this mean that some were over range? Or does this mean that "All lakes had detectable levels of chl a" The wedge pattern in figure 2 is not apparent in figure 2, however, the logic Discussion Comments: makes sense and is supported visually by the conditional probability plots in fig 1. If figure 2 could have two lines of best fit (similar to the way some researchers do for funnel plots on publication bias papers), it may be easier to see the wedge shape. understanding the importance of this work. Paired results (MC-LR and Chl a from the same day) are much more impactful for demonstrating the rapid advantage of chl a compated to using results that are a seasonal average indicating that the hypereutrophic and eutrophic lakes (ones with the highest chl a) are also the ones that are most likely to have a cyanoHAB event sometime during the year.

Climate Articles Highlighting Current
-**Response:** We agree and have added some additional wording to the Data section indicating that the samples are taken at the same time.
## Improved discussion needed on alternative indicators for cHABs and cyanotoxins not assessed in the NLA: Brief mention is given to phycocyanin (one study), and the additional language (about phycocyanin not always beiThese blooms are expected to increase in frequecy and severity due to the impacts of climate change ng available for measure and when measured, it is for only measuring pigment and not toxins) is equally relevant for chl a. The same in vivo handheld fluorometers and continuous monitoring solutions available for chl a are now widely available for phycocyanin, often at the same cost as a rapid measure for chl a. Phycocyanin, like chl a, does not measure toxin either, but phycocyanin in many studies has outperformed chl a, and in some studies it has not (especially when toxin concentration is low). Historical records on PC are likely not as great as chlorophyll a. Overall, several studies on this topic have been produced in the last two to four years (see Zamyadi and Dorner's work), with one study using phycocyanin to predict non-alcoholic liver disease presuming a relationship with cyanotoxins (Zhang et al. 2015) -**Response:** We agree that phycocyanin is more closely linked to microcystin than is chl *a*. Our paragraph mentioning phycocyanin was confusing and did suggest that chl *a* had a stronger association. Wording of that paragraph has been changed and the Ahn et al paper was added as reference. We feel that further discussion of phycocyanin, while important, is beyond the scope of our paper with its focus on chlorophyll.
## Consideration desired on region-specific criteria or limitations of national recommendations for chl a: With nearly 30% of the lakes in the temperate plains being coded as poor for chlorophyll a in the 2007 NLA, what impact would these conditional probabilities have on these lakes? Should the lake managers in this region be monitoring continuously all the time? What are the mean/median chlorophyll a levels for this part of the U.S? Regional variability may be really important and did the conditional probability approach take this into consideration or can it take it into consideration? Is there a way to evaluate if there are significant regional effects in the U.S? For nutrient standards in the U.S. and macroinvertebrate assessments, EPA has had to issue region-specific guidelines/criteria, etc. for some parameters.
-**Response:** We agree that there are likely regional differences and would like to account for this; however, sample sizes for each region vary (67 to 155) and are relatively small. The resulting conditional probability analysis would have very wide confidence intervals. Thus, comparison between regions would be difficult and inferring a pattern would not be possible. We have added additional text in the discussion (second to last paragraph) that raises this issue. Additionally, we have added the Beaver et al. reference in this discussion. ## Greater discussion needed on limitations of NLA and need for model validation/future studies: The paper fails to address the limitations of the NLA -as a reader, I'm not aware of the limitations. I have much respect for the NLA, but I do have questions regarding the number of samples for each lake. Furthermore, a statement or two discussing the need to validate modeled data may be worthwhile. Is there a way to see if the probabilities actually align with the accuracy and type II error rates predicted by the conditional probability approach? -**Response:** We added a paragraph to the discussions about validation and the single -**Response:** We added a paragraph to the discussions about validation and the single sample limitations of the NLA.

## Specific Edits
### Abstract-Specific Comments: -Near the bottom of the abstract, the units seem quite high for microcystins (g/L) rather than micrograms/L. The micro Greek symbol (mu) may have been lost during uploading.
-**Response:** Oops! Looks like it did occur during upload. Have double checked the final for the proper units.
### Results Comments: -In discussing the lake exceedances of the various recommended levels by EPA, the addition of 'drinking water' is appropriate in my opinion. Although it is mentioned earlier in the methods, further providing the information in the results is helpful to a novice reader or a person just becoming familiar with drinking water regulations and guidelines, as the U.S. EPA child level may be presumed by a reader to be a level for recreation in a lake rather than a level associated with finished drinking water after water treatment.
-"All lakes had reported chl a concentrations that exceeded detection limits" Does this mean that some were over range? Or does this mean that "All lakes had detectable levels of chl a" -**Response:** Changed wording to: All lakes had detectable levels of ...

### Discussion Comments:
-The wedge pattern in figure 2 is not apparent in figure 2, however, the logic makes sense and is supported visually by the conditional probability plots in fig 1. If figure 2 could have two lines of best fit (similar to the way some researchers do for funnel plots on publication bias papers), it may be easier to see the wedge shape.
-**Response:** Agreed that the wedge is not pronounced. We have changed the wording to better describe the pattern. Also, we disagree with the lines of best fit as those would then imply some sort of linear (presumably) pattern that we are not actually highlighting. Additionally the added lines would then require discussion and would detract from the focus on conditional probability.
No competing interests were disclosed.

Title and Abstract:
For clarity, the authors might consider replacing "various" with "World Health Organization and U.S. Environmental Protection Agency". chlorophyll thresholds should not include significant digits (i.e., ± 0.1) but instead be whole numbers.
I would organize the information in table 1 by either concentration (low to high) or advisory type (drinking or recreational) and concentration (low to high). It might also be useful to include the number of lakes represented in each category based on microcystin. In table 2, I would add the specific microcystin concentration target under each advisory type to avoid having to look back at table 1 for these data.

Conclusions:
The purpose of this study is to use a simple measurement (chlorophyll) to determine the threat that microcystins pose to a waterbody relative to existing microcystin concentration targets. Most waterbodies lacked microcystin and Figure 2 clearly shows that there are a huge number of waterbodies across a large chlorophyll range that apparently had microcystin concentrations at the detection limit of 0.05 ug/L. I am concerned about the microcystin data at the detection limit. They appear to be false positives. I agree with the authors who acknowledged that high chlorophyll is not always a good predictor of high microcystin. What should be done for those waterbodies with high concentrations of chlorophyll but that had no or barely detectable microcystin?

Data:
I am confused about the data collected and available for the 2007 National Lakes Assessment. For example, I organized this dataset in July 2010 and found that 1158 lakes were sampled once (1152 of these lakes included data for both chlorophyll and microcystin) and 95 of the 1158 originally sampled lakes were sampled a second time in 2007. Yuan et al. 2014 (Freshwater Biology) used data for 1077 sampled lakes. The current study (as well as the National Lakes Assessment website and report) describes data for 1028 lakes. Clarity about these discrepancies is not necessarily the authors' job, but it would be good to understand why the differences exist across these datasets. Also, for this study, how were data used for lakes sampled twice in 2007?
, US EPA, USA Jeffrey Hollister ## I think the authors need to more broadly consider the existing literature and describe how their findings relate to and build from past studies. Below, I provide some related studies that the authors might want to consider.
- -**Response:** We have added some text to the Data section indicating how we deal with the detection limit. We feel it is important to keep these values in the analysis as removing them would inflate our confidence around the conditional probabilities. We hope this is clearer in our revision.
## Presenting histograms of chlorophyll and microcystin concentrations for the study lakes would be useful.
-**Response:** We have chosen to present the distribution information in text and present for both chlorophyll and microcystin the range, mean, and median. Figure 2 also indicates the distribution of both. Lastly, the data are availble via [code from the GitHub repository](https://github.com/USEPA/Microcystinchla/blob/master/R/get_nla.R).
## I am not an expert on conditional probability analysis. Based on the authors' text (second paragraph in Analytical Methods section), it appears that this analysis considers multiple events over time. If their dataset includes single measurements in a waterbody, I don't understand where the temporal component comes into the analysis. Again, I could be totally misunderstanding how this analysis works and should probably read the relevant references the authors provided.
-**Response:** We have added some additional text in the methods about the NLA as well as in the Discussion on NLA limitations. In short, this is not a temporal analysis and is based on a single snap shot.
## Based on increasing error in the conditional probability plots as chlorophyll increases, the reported chlorophyll thresholds should not include significant digits (i.e., ± 0.1) but instead be whole numbers.
-**Response:** Done. NEED TO DO on table already in overleaf ## I would organize the information in table 1 by either concentration (low to high) or advisory type (drinking or recreational) and concentration (low to high). It might also be useful to include the number of lakes represented in each category based on microcystin.
-**Response:** Table re-orderd based on concentration. Number of lakes (as percentage) included in text. Need to do directly on table in overleaf.
## In table 2, I would add the specific microcystin concentration target under each advisory type to avoid having to look back at table 1 for these data.
-**Response:** Done. Need to transfer to overleaf. ## Most waterbodies lacked microcystin and Figure 2 clearly shows that there are a huge number of waterbodies across a large chlorophyll range that apparently had microcystin concentrations at the detection limit of 0.05 ug/L. I am concerned about the microcystin data at the detection limit. They appear to be false positives. I agree with the authors who acknowledged that high chlorophyll is not always a good predictor of high microcystin. What should be done for those waterbodies with high concentrations of chlorophyll but that had no or barely detectable microcystin? -**Response:** We added some discussion about this in the last paragraph of the Data section. We feel that these should be left in as removing them would erroneously inflate our confidence intervals and impact the conditional probabilities. Essentially these are lakes with very low microcystin but widely varying chlorophyll values.
## I am confused about the data collected and available for the 2007 National Lakes Assessment. For example, I organized this dataset in July 2010 and found that 1158 lakes were sampled once (1152 of these lakes included data for both chlorophyll and microcystin) and 95 of the 1158 originally sampled lakes were sampled a second time in 2007. Yuan et al. 2014 (Freshwater Biology) used data for 1077 sampled lakes. The current study (as well as the National Lakes Assessment website and report) describes data for 1028 lakes. Clarity about these discrepancies is not necessarily the authors' job, but it would be good to understand why the differences exist across these datasets. Also, for this study, how were data used for lakes sampled twice in 2007? -**Response:** We share your confusion! There are many "types" of samples included with the raw NLA data. For this analysis, we only used the probability samples (i.e. no reference samples) and only used the first visit to a lake. Additionally, lakes that had no data reported for either chl or microsystin were not included. As noted, this results in 1028 samples ## Although all of the National Lakes Assessment data are publicly available, the authors should provide the dataset that they used for this study.
-**Response:** Code to access the data is available from [USEPA/microcysinchla](https://github.com/USEPA/microcystinchla). We have also added in a static .csv file to this repository of the data used for our analysis. This is listed in the "Data and software availability" section.
Thank you for your review. We have just submitted our revisions and expect the new version to be available in the next few days. Below are our responses to the specific issues you raise. We feel the paper is stronger after this revision. Any questions, feel free to contact us or comment here. Thanks again! ## Specific Edits: ### Title and Abstract: -For clarity, the authors might consider replacing "various" with "World Health Organization and U.S. Environmental Protection Agency".
-**Response:** As this has already been indexed, we thought it best to limit the edits to the title.
No competing interests were disclosed. Competing Interests: