Peptide arrays of three collections of human sera from patients infected with mosquito-borne viruses [version 1; peer review: 1 approved with reservations, 1 not approved]

Background: Global outbreaks caused by emerging or re-emerging arthropod-borne viruses (arboviruses) are becoming increasingly more common. These pathogens include the mosquito-borne viruses belonging to the Flavivirus and Alphavirus genera. These viruses often cause non-specific or asymptomatic infection, which can confound viral prevalence studies. In addition, many acute phase diagnostic tests rely on the detection of viral components such as RNA or antigen. Standard serological tests are often not reliable for diagnosis after seroconversion and convalescence due to cross-reactivity among flaviviruses. Methods: In order to contribute to development efforts for mosquitoborne serodiagnostics, we incubated 137 human sera on individual custom peptide arrays that consisted of over 866 unique peptides in quadruplicate. Our bioinformatics workflow to analyze these data incorporated machine learning, statistics, and B-cell epitope prediction. Results: Here we report the results of our peptide array data analysis, which revealed sets of peptides that have diagnostic potential for detecting past exposure to a subset of the tested human pathogens Open Peer Review


Introduction
Zika virus (ZIKV) is an arbovirus within the Flavivirus genus and the Flaviviridae family. In addition to ZIKV, many other mosquito-borne viruses exist that negatively affect public health, including dengue virus (DENV) and chikungunya virus (CHIKV), among others. ZIKV is primarily transmitted by the bite of infected Aedes spp. mosquitoes, with limited instances of sexual transmission also being reported [1][2][3][4] . The recent worldwide epidemic has demonstrated that ZIKV is a neuropathic virus that is associated with fetal microcephaly and other congenital defects in infected pregnant women, and Guillain-Barré syndrome in adults 5 . Due to the number of ZIKV infections in recent years and the continued threat of ZIKV re-emerging around the world, there is still an urgent need for rapid and accurate surveillance assays in order to rapidly identify new outbreaks. Distinguishing between infection with multiple co-circulating arboviruses that have similar clinical signs and symptoms makes accurate prevalence calculations and diagnosis extremely difficult-especially after convalescence 6-10 .
The sequence similarity at the amino acid level in many flavivirus immunogenic protein regions contributes to the observed cross-reactivity in serological assays, which is especially high in the E protein and also present in the NS1 protein 11 . Although reports showing antibodies against other viral proteins are detectable 12 , the E and NS1 proteins are the primary targets of the humoral anti-flavivirus immune response in humans [13][14][15] .
Recent efforts to generate whole-genome sequences for these pathogens enable the application of bioinformatics tools to mine the data for trends and patterns that can be clinically applicable [16][17][18][19][20] . The meta-CATS (metadata-driven Comparative Analysis Tool for Sequences) algorithm is a statistical workflow that rapidly identifies sequence variations that significantly correlate with the associated metadata for two or more groups of sequences 21 . This algorithm has been used previously to identify residues within 15-mer linear peptide regions that have high predicted specificity and sensitivity values and that could therefore be useful for detecting antibodies against a variety of mosquito-borne Flavivirus species 22 . Quantifying the reactivity of this set of peptides using high-throughput custom peptide arrays enables the efficient and simultaneous testing of the set of peptides against a variety of serum samples with higher efficiency than what is possible with manual enzyme-linked immunosorbent assay (ELISA) technology alone 23 .
The data presented here are the product of combining upstream computational methods to predict peptides capable of distinguishing each virus with downstream high-throughput screening of relevant sera using peptide arrays. We have recently completed an analysis of 137 serum samples using custom peptide arrays (each containing 866 experimental viral peptides) to identify 15-mer linear peptides that could be useful as serodiagnostic reagents to detect prior infection with mosquito-borne viruses. Specifically, we tested peptides representing different co-circulating mosquito-borne viruses, including: ZIKV, DENV 1-3, CHIKV and West Nile virus (WNV). Applying machine learning, a weighting scheme, and B-cell epitope prediction algorithms to these data enabled us to identify pools of 8-10 peptides that are predicted to be immunodominant across human sera from previously infected individuals in Central and South America. In addition, we have separately evaluated these peptides using an ELISA method with a set of well-characterized sera. These data could be used by the scientific community to develop improved serological diagnostic methods for detecting past infection with one or more of these viral pathogens.

Methods
Peptide preparation and microarray printing A subset of the previously predicted diagnostic peptides 21 , representing multiple mosquito-borne virus species and subtypes, were synthesized at the Center for Protein and Nucleic Acid Research at The Scripps Research Institute (TSRI) 23,24 . This selected collection of peptides consisted of 15-mers with sequences that represented the consensus amino acid sequence among strains belonging to each of our six target taxa including: CHIKV, DENV1, DENV2, DENV3, WNV, and ZIKV. Peptides on the array that represented mosquito-borne virus taxa for which there were no serum samples were ignored in downstream quantification and computation. As such, a total of 25, 51, 28, 34, or 70 peptides in the E protein as well as 15, 19, 15, 23, or 70 peptides in the NS1 protein (all derived from DENV1, DENV2, DENV3, WNV, or ZIKV sequences, respectively) were evaluated in these experiments. A set of 25 peptides spanning portions of the CHIKV E2 protein that had previously been reported as relevant for detecting anti-CHIKV antibodies were also included 25 . Synthesized peptides were suspended in 12.5 μL DMSO and 12.5 μL of ultra-pure water. Immediately prior to printing, suspended peptides were diluted 1:4 in a custom protein printing buffer [saline sodium citrate (SSC): 300 mM sodium citrate, pH 8.0, containing 1 M sodium chloride and supplemented with 0.1% Polyvinyl Alcohol (PVA) and 0.05% Tween 20], in a 384-well non-binding polystyrene assay plate. Two positive control peptides, hemagglutinin A (HA) (YPYDVPDYA) and FLAG tag (DYKDDDDK), together with a dye that permanently fluoresces at 488 nm (Alexa Fluor 488) were included in the print to guide proper grid placement and peptide alignment, as well as to serve as printing controls as well as controls to quantify the maximum fluorescence for the assays.
Quadruplicate sets of all peptides were printed onto N-hydroxysuccinimide ester (NHS-ester) coated NEXTERION Slide H (Applied Microarrays) slides at an approximate density of 1 ng/spot, using a Microgrid II (DigilabGlobal) microarray printing robot equipped with solid steel (SMP4, TeleChem) microarray pins. Humidity was maintained at 50% during the printing process. Immediately prior to interrogating the arrays, slides were blocked for 1 h with ethanolamine buffer to quench any unreacted NHS-ester on the slide. All slides were used within 2 months of printing and were stored at -20°C 23 . Spent diagnostic serum samples were provided by collaborators  working under three separate clinical studies in Honduras, the  United States, and Nicaragua. These sera were collected from  a total of 137 consented human patients under IRB supervision and were characterized as positive for antibodies against  at least one of: ZIKV, DENV1, DENV2, DENV3, WNV, and/or  CHIKV. A total of 32 deidentified plasma samples from patients suspected of Zika, chikungunya or dengue in Honduras were obtained at the discretion of health care providers at the Hospital Escuela Universitario from patients (ages 6-73 years old). These samples were sent to the Centro de Investigaciones Geneticas at the Universidad Nacional Autonoma de Honduras in Tegucigalpa, Honduras for ZIKV, CHIKV and/or DENV molecular testing. Of these patients, 23 had infection with DENV and nine had infection with ZIKV confirmed by RT-qPCR during the acute phase. Convalescent samples were collected from these patients 10-30 days post-onset of symptoms between June 1 to November 30, 2016 and were tested on the custom arrays.

Serum sources
A total of 73 de-identified human serum samples were obtained from the Vanderbilt Vaccine Center Biorepository. Sera from individuals with previous history of natural infection with DENV, WNV, CHIKV, or ZIKV (confirmed by serology or RT-qPCR) while traveling in the Caribbean, Central or South America, or West Africa were included on arrays. For WNV, sera were from individuals with confirmed previous history of natural infection contracted during an outbreak in 2012 in Dallas, TX. The samples were collected in the convalescent phase, months to years after post-onset of symptoms.
A total of 32 de-identified human sera were collected from the Pediatric Dengue Cohort Study in Managua, Nicaragua 26,27 . Early convalescent-phase samples were collected 15-17 days post-onset of symptoms from 9 Zika cases that were confirmed as positive for ZIKV infection by real-time RT-qPCR between January and July, 2016. Late convalescent samples were obtained from 21 DENV-positive cohort participants after RT-qPCR confirmed DENV1 (n=7), DENV2 (n=8), or DENV3 (n=6) infection and 2 DENV-negative subjects, all in 2004-2011, prior to the introduction of ZIKV to Nicaragua. Samples were analyzed by inhibition ELISA 28,29 and neutralization assay 30,31 . The PDCS was approved by the IRBs of the University of California, Berkeley, and Nicaraguan Ministry of Health. Parents or legal guardians of all subjects provided written informed consent; subjects 6 years old and older provided assent.
High-throughput screening and quantification of characterized patient sera Once the peptide microarrays were printed, aliquots from a subset of samples were used to optimize the screening and detection processes. Specifically, dilutions ranging from 1:50 to 1:1000 were evaluated to determine the optimal dilution level for subsequent screening. A 1:200 dilution was selected to achieve an optimal balance between the available aliquot volumes and assay sensitivity.
The 137 characterized sera were separately subjected to high-throughput screening using the synthesized peptide arrays. Sera were tested for IgG reactivity using the custom peptide array at TSRI. For immunolabeling, the incubation area around the printed grids was circumscribed using a peroxidase anti-peroxidase (PAP) hydrophobic marker pen (Research Products International Corp) and the subsequent steps were performed in a humidified chamber at room temperature on a rotator. Control anti-HA (mAb 12CA5, Scripps Research, mouse IgG, RRID:AB_514505) and anti-FLAG monoclonal antibodies (Invitrogen, MA1-142-A488, RRID:AB_2610653) were assayed at a concentration of 10 μg/ml while 10 μl of human sera were diluted 1:200 in PBS buffer containing Tween (PBS-T) and incubated for 1 h followed by three washes in PBS buffer. The arrays were then incubated for 1 h with goat anti-human IgG tagged with Alexa Fluor® 488 (Invitrogen, cat. #: A-11013, RRID: AB_2534080) as a secondary antibody. Arrays were washed three times in PBS-T, two times in PBS, and another two times in deionized water and centrifuged to dry at 200 × g for 5 mins.
The fluorescence of the processed slides was quantified using a ProScanArray HT (Perkin Elmer) microarray scanner at 488 nm and 600 nm, and images were saved as high-resolution TIF files. Imagene® 6.1 microarray analysis software (BioDiscovery; ImageJ could be used as an open-access alternative) was used to calculate the fluorescence intensity of the area within the printed diameter of each peptide as well as the fluorescence of the same diameter directly outside of the area occupied by each peptide. The mean and median fluorescence signal and background pixel intensities, as well as other data for each antigen, spot were calculated, digitized, and exported as individual rows in a comma-delimited file for subsequent analysis.
Data processing to identify immunodominant epitopes A custom script 32 was written to implement a previously described array processing workflow 24 with a minor change to use the median foreground and background values instead of mean values to minimize outlier effects (available on GitHub). Negative background values were interpreted as zeroes. Briefly, background correction was calculated by subtracting the median background from the median foreground measurements for each spot on each array. Normalization was performed by dividing the background-corrected values for each spot on each slide by the non-control spot having the largest fluorescence value on each slide as has been described previously 24 . The quadruplicate spots for each peptide on each array were then summarized into a single value by calculating the median value of the quadruplicate spots for each peptide to further reduce the effects of any outliers. The normalized relative fluorescence intensity values for all peptides and all samples were output as a separate file together with summarized quantitative values indicating how well each peptide was recognized by each of the polyclonal serum samples.
A separate script was used to transform all relative fluorescence intensity values for each peptide into Z-scores, and separate tables were constructed to contain the summarized Z-score values for all peptides (as columns) representing each of the viral taxa and all samples (as rows) that were tested with the peptide array. A random forest algorithm (randomForest version 4.6-12 package in R) was applied to each of these tables in order to identify the peptides that were best able to differentiate between each of the viral taxa. In this case, the number of trees generated in the random forest for each species was 100,000, and the number of variables randomly sampled as candidates at each split was equal to the square root of the number of columns present in each table.
The values representing the mean decrease in Gini index were calculated separately for samples obtained from each of the three collections as well as all possible combinations of two or more collections. These data were then used to identify the top 30 peptides according to their usefulness in identifying the correct virus taxon. The BepiPred algorithm was then used to predict the number of residues that are frequently present in B-cell epitopes, and would therefore contribute to increased affinity and binding by antibodies in downstream assays 33 . The peptides were then assigned a cumulative rank based on the epitope prediction and Gini values, and the 10 highest-ranking peptides across the E and NS1 proteins for each viral taxon, as well as 8 peptides in the E2 region for CHIKV, were categorized as the most likely to have high immunodominance and therefore be recognized by antibodies in sera collected from previously infected patients in the western hemisphere. Statistical comparisons of quantitative differences between the Gini and normalized fluorescence values for sets of peptides were performed using Student's t-test.
Peptide validation using ELISA Each peptide was synthesized (LifeTein, LLC) and 2 ng of peptide was diluted in 50 μL of ddH2O. Natural human IgG protein (abcam, cat. # ab91102), complement component C1q from human serum (sigma, cat. #: C1740), and labelled secondary antibody (ThermoFisher, cat. #: A18817, RRID: AB_2535594) were used as additional controls. Pools of two peptides were used to coat duplicate wells on a 96-well Immulon 4HBX plate (ThermoFisher, cat. # 3855) and incubated at 4°C overnight. Next, 100 μL of blocking buffer (PBS+5% BSA) was added to each well and incubated for 2 hours at room temperature prior to three washing steps with washing buffer (PBS + 0.05% Tween 20). Human serum was diluted 1:25 in blocking buffer and 50 μL of this solution was added to each well prior to incubation for 2 hours at room temperature. Each plate was then washed four times with washing buffer and 50 μL of HRP-conjugated anti-human IgG antibody (ThermoFisher, cat. #: A18817, RRID: AB_2535594; 0.1 mg/mL diluted 1:20,000) was added to each well, followed by incubation at room temperature for 2 hours. Each plate was then washed four additional times before incubating at room temperature for 30 minutes with 75 μL of TMB substrate (abcam, cat. # ab171523). Then, 75 μL of stop solution (abcam, cat. # ab171529) was added to each well and a BioTek-synergy HT plate reader was used to quantify the fluorescence in each well at 450nm within 15 minutes.
ELISA data processing A normalization process was implemented that adjusted fluorescence values in each well based on the control wells included on each ELISA plate. This normalization enabled the downstream comparison between plates 34 . A downstream quality control method was also implemented to ignore results from ELISA plates that displayed high levels of background, inconsistent signal from multiple control wells, and samples observed to have at least two wells for each taxon with higher than expected signal.

Data records
Overall, we screened 137 unique serum samples for their reactivity against a panel of viral peptides ( Figure 1 and Underlying data 35 ). These samples, together with the clinical diagnosis, were collected from patients with known past exposure to at least one of the viruses targeted by our peptides   (Table 1). Also contained within the Underlying data are files describing the metadata of peptides included on the array and each experimental sample 35 .
The data from each array is contained in a single tab-delimited text file and contains the quantitative data captured from a single serum sample on a single peptide array 35 . A subset of the fields in each file include: location of each peptide spot on the array, peptide identifier, raw mean and median foreground fluorescence at 488 nm, raw mean and median background fluorescence at 488 nm, and other data collected from the raw image.
A matrix containing the transformed Z-score values for each peptide was then formatted for input into a random forest (RF) machine learning algorithm to assist with ranking peptides according to virus taxon. To do so, a column was added to the matrix assigning each sample to the virus taxon that was known to have infected the patient (e.g. "Zika" or "Non-Zika"). Z-score values in columns containing the predicted peptides from each taxon were then captured and input into the RF algorithm.
The benefit of the RF algorithm is that it is capable of ranking the importance of features, which are peptides in this case, based on a known classification. The ranking is based on the mean decrease in Gini index, which is a value that quantifies node impurity. In other words, the higher the Gini index value, the more important the feature is in correctly identifying the virus taxon.
In order to account for geographical, genetic, and populationbased factors, we computed the mean decrease in Gini index for individual collections (e.g. Nicaragua or Honduras), all relevant pairs of collections (e.g. Nicaragua and Honduras, Honduras and United States), and the combination of all collections from our sera providers. These calculations were accompanied by a class-error rate that quantifies the number of samples characterized as being positive for ZIKV that were predicted to be ZIKV samples.
This class-error rate information for each individual or combination of collections was then used to weight the peptide rankings results. Briefly, this involved multiplying the average rank for each peptide in each comparison by the average weight and dividing it by the sum of weight. This process works to increase the rank of peptides that have consistently high Gini values. We used these rankings to identify the top 25 species-specific peptides for each virus taxon. This process was repeated for non-ZIKV samples, including WNV, DENV1-3, and CHIKV.
In order to conserve resources for the peptide array and decrease the number of peptides that would be incorporated into the future ELISA assay, we used the existing BepiPred 2.0 algorithm to predict which of our 15-mer peptides contain the highest number of amino acids that are most often recognized by antibodies 36 . These B-cell epitope predictions were then used to reduce the 25 best peptides identified from machine learning, to the 10 best peptides that are predicted to not only be species-specific, but that are most likely to contain speciesspecific epitopes. In the case of ZIKV, we also reviewed the spot size and shape in the peptide array images to ensure that there were no irregularities that could negatively bias our results. The BepiPred 2.0 results enabled us to predict which peptides would be most seroreactive for each selected taxon. The mean maximum score from the BepiPred 2.0 analysis was calculated to be 0.58 (range: 0.55 -0.63). These scores are associated with a specificity greater than 81%.

Computational validation
Given the serological cross-reactivity that has been reported among many of our targeted mosquito-borne viruses 37 , we recognized the need to validate the results of our highthroughput screen. To do so, we not only ensured that those generating the peptide array data were "blinded" to the phenotype of each sample, but we also computationally evaluated two distinct but complementary comparative and quantitative metrics that are described below.
First, we compared two serum samples from pediatric patients that had not been infected with DENV prior to sample collection. The data from the DENV-specific peptides in these samples were then compared to those from a representative DENV-positive sample to verify the differences in signal between known positive and known negative samples. This comparison would also provide a better understanding of the contribution of cross-reactivity, which has been reported previously 37 , on our platform ( Table 2). This comparison showed that the DENV-negative samples had less than four percent of the normalized fluorescence values, well below the 10 percent that was observed in the DENV-positive sample. Transforming these raw data into Z-scores further increases the observed differences in fluorescence values and, provides additional support to the unbiased nature of the data produced in these experiments.
We next wanted to assess the technical rigor of our approach by performing a statistical analysis of the observed experimental variation in the peptide array experiments. In this case, data was available for six of our target viruses for which sera was evaluated on the arrays. We specifically wanted to quantify the reactivity of the best-performing peptides for each sample against in a panel of comparisons (Table 3). The results from this analysis identified noticeable differences in the signals for ZIKV and WNV ( Figure 2). However, we observed that the quantified values for the other four virus taxa were lower than the values for all samples combined and did not meet statistical significance when comparing known positive and negative samples ( Figure 3). These results show that incorporating Gini scores and immune epitope predictions into our computational pipeline contributed to our ability to identify sets of peptides that were capable of distinguishing between past infection with a subset of our target viruses.
It is also important to recognize that each peptide was printed at non-adjacent sites on each array in quadruplicate to minimize experimental bias due to the location of any given spot on the array. Incorporating technical replicates was an important component of the experimental design. Such an approach enables improved replication of the results and also increases the scientific rigor of the resulting dataset upstream of any data processing workflows.

Experimental validation
The number of samples that were evaluated for prior exposure to each virus was insufficient to allow the use of in silico cross-validation techniques that are generally applied to the classifier predictions. We therefore designed custom 96-well ELISA plates to validate the ability of the peptides (Figure 4). The highest predicted reactivity to accurately detect prior infection by each of the target viruses.
These custom ELISA plates were incubated with 26 human convalescent sera that had been previously characterized as positive for at least one of our target virus taxa using complementary methods, including plaque reduction neutralization test (PRNT) from convalescent serum, IgM antibody capture enzyme-linked immunosorbent assay (MAC-ELISA) from post-acute phase serum, and/or quantitative real-time PCR (qRT-PCR) from blood collected during acute infection. These samples were obtained from public sources including: BEI Resources (5 samples), the World Reference Center for Emerging Viruses and Arboviruses (7 samples), or the United States Centers for Disease Control and Prevention (16 samples).   After processing and correcting the raw ELISA data, we found that the well-characterized samples showing a normalized absorbance ratio greater than 1.5 correlated with cases of previously confirmed Zika infection (Table 5-Table 30). Consequently, we compiled the normalized absorbance ratio data and categorized any peptide pool found to have a normalized ratio value greater than 1.5 was classified as a "borderline" result.
In order to increase specificity, any sample with at least two wells of the ELISA plate having normalized ratios greater than 1.5 were labeled as putative "positive" for prior infection with the target virus.
Given the p-values associated with the peptide array results, we decided to especially focus on samples that were positive for ZIKV. As such, instances where excessive signal was detected for all viruses were processed in a way that still identified samples having at least 2x stronger signal for ZIKV peptides than for DENV peptides in the same sample were labeled with a "Z" to differentiate them from other categories.
The summarized results of the ELISA data revealed a fair amount of concordance with the "gold standard" methods and displayed overall sensitivity and specificity of 61.5% and 50%, respectively (Table 4). Interestingly, these values fluctuated depending on the collection that was analyzed and were affected by small sample size from two of the collections.

Discussion
The array data reported in this manuscript were used to identify high-scoring peptides that could be used as serodiagnostic reagents in an ELISA format to distinguish between prior infection and seroconversion to a panel of mosquito-borne viruses. Our workflow incorporated both computational and laboratory components to improve identification of regions that were uniquely recognized by virus-specific antibodies to each virus and could therefore be useful as serodiagnostic peptides. Sabalza et al. described a protocol to identify ZIKV specific diagnostic epitopes through peptide microarrays; however, they only used one human serum sample, did not use any bioinformatics analysis, and the identified peptides sequences were not provided 38 .
The integration of Gini values calculated by the random forest machine learning algorithm with the BepiPred B-cell epitope prediction algorithm, enabled us to identify the best peptides for each taxon. This approach improved our selected peptides to those that had increased affinity and binding to antibodies 33 . We purposely chose peptides in both the E and NS1 proteins (E2 protein of CHIKV) to improve our ability to detect epitopes within viral antigens that are known to circulate in the bloodstream 11 .
We observed that a few of our selected peptides displayed high reactivity and Gini values, while other selected peptides had lower measured values. We attribute a subset of these unexpected differences to the imposed requirement of being located within a predicted B-cell epitope. Reactivity is an essential measurement for individual samples, while Gini values are useful to rank peptides based on their ability to identify peptides that differentiate one taxon from the others. As such, Gini values are better able to identify linear epitopes that differentiate taxa and that are sufficiently immunodominant across patient populations. We, therefore, are confident in the results from taxa where the Gini values were significantly different between selected peptides when compared to the remaining peptides. By providing the raw data in a publicly-accessible resource, we expect these data to be subject to re-analysis and meta-analysis using alternative methods.
We also noticed cases where the comparisons of our selected peptides yielded non-significant p-values in various comparisons, especially among dengue viruses. The most likely explanation for this observation is the high degree of cross-reactivity that occurs between linear epitopes derived from these viruses. While other existing serological assays are capable of distinguishing between these highly related taxa, they primarily rely Table 4. ELISA data compared with sample characterization data and metadata.    on recognition of conformational epitopes by IgG antibodies circulating in the bloodstream. It is, therefore, possible that linear peptides in the selected proteins may be inadequately suited to differentiate between these taxa. Given the incomplete histories and serology that was performed in a subset of our tested samples, additional work is needed to determine whether incomplete metadata contributed to this finding. Additional laboratory experiments are being performed to calculate the specificity and sensitivity for our sets of peptides in a larger number of human serum samples from various clinical cohorts.

Control
With these publicly accessible peptide array data, it could also be possible to perform the opposite analysis in a way that would search for regions that were recognized with reduced specificity and could therefore be useful to identify peptides that could indicate past infection by at least one of these viruses. Similarly, these data could be mined to identify linear peptides that could be used as antigens to generate an antibody response to such epitopes towards the development of additional "universal" monoclonal antibodies.
The ELISA data indicate that this method could be a more resource-and time-efficient approach to PRNT. Although results against alternative characterization methods vary widely, additional criteria have been added to PRNT results to account for the high degree of cross-reactivity between ZIKV and DENV 39 . The observed sensitivity and specificity values could potentially be improved through additional experimentation and optimization. Screening additional well-characterized samples with our ELISA method could shed additional light into a more accurate gauge of ZIKV seroprevalence and could guide public health decisions.
These data help to quantify the human humoral response to multiple mosquito-borne viruses and could be useful to identify, map, and/or design native or synthetic antigens that provide increased protection against natural infection by these viruses. Our data could also be relevant to the design of a mosquito-borne virus vaccine. However, care must be taken in designing such experiments to ensure that antibody-dependent enhancement does not increase the risk of adverse events following administration of the vaccine. This project contains the following underlying data: • 1 Metadata file for: Information on Peptides Included on Array.

Data availability
• 1 Metadata file for: Metadata for Experimental Samples.
• 151 Data files containing quantitative data for the peptide arrays.
Data are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication).
they only have incomplete patient history ("Given the incomplete histories and serology …additional work is needed to determine whether incomplete metadata contributed to this finding."). Yet, in the section "Serum sources", the authors give some ranges on patient age and days after onset of symptoms. In my opinion, a full patient information table (e.g. in supporting information) is essential (even though data might be missing), showing patient age, sex, known previous arbovirus infections (primary vs. secondary infection), days after onset of symptoms, etc. of all patients. From my experience, there can be a large difference in antibody response in patients 10 days vs. 30 days after onset of symptoms, so these patients should be categorized and analyzed as different groups. I believe that patient stratification is one of the keys to more statistically significant data. The following general comments might be helpful to discuss the statistical significance of the results: I do not fully agree with this general assumption: "Although reports showing antibodies against other viral proteins are detectable 12, the E and NS1 proteins are the primary targets of the humoral anti-flavivirus immune response in humans 13-15." Although discussed before (citation 11), this approach ignores for example the highly specific serum biomarker from the NS2B protein in Zika. This approach might miss other similarly important peptide biomarkers. In addition, adding peptides from the nonstructural proteins might help to boost the overall sensitivity of the assay (yet, losing some specificity).

1.
By choosing consensus sequences, the authors limited the peptide selection, which is valid and -with current array technology -necessary. However, this probably limits the sensitivity of the microarray results and the subsequent ELISA. Depending on the selected protein regions and patient origins (-> virus genotypes), there might be several important genotypic variants, which the authors would have neglected. 2.
In addition, the array protocol seems not optimal and there are some inconsistencies in the methods section. Some general remarks, questions, corrections and suggestions regarding the array processing: The raw data states that the scanner Innopsys Innoscan 1100AL and the Mapix software were used. Yet, the in the manuscript, it is stated: "…using a ProScanArray HT (Perkin Elmer) microarray scanner at 488 nm and 600 nm, and images were saved as high-resolution TIF files. Imagene® 6.1 microarray analysis software…". Please correct. The scanner wavelengths were supposedly 488 nm and 600 nm. In the raw data files, it says 488 nm and 635 nm. Please correct. If possible, please also give the laser powers and gain (PMT) settings for the scans (according to the raw data: wavelengths 635, 488: "LaserPower=10.0, 5.0", "PMTGain=50, 50").

1.
For the microarray experiments, the authors claim to have used the Alexa 488 labeled secondary antibody: Invitrogen, cat. #: A-11013, RRID: AB_2534080. This is an (H+L) chain specific antibody, so it will also bind to IgM antibodies (please include this information!)! The use of the Ab labeled with Alexa 488 is a bit odd, since the original data files state that the scanning of the main interaction was performed at 635 nm. Yet, the data of the 488 nm channel is not included in the raw data and, according to the protocol, it seems to me that this dye is only used for controls (see section "Peptide preparation and microarray printing"). Are you sure that you used this specific secondary antibody? If so, I would highly recommend changing to a 635 nm channel compatible secondary antibody, since the 488 nm channel gives much higher background and generally very strong autofluorescence.

2.
In contrast, for the ELISAs, the authors apparently used a different secondary antibody, which is Fc specific, so it should only bind to IgG. Thus, the microarray data cannot be directly compared to the ELISA data.

3.
The final steps of the array washing and drying protocol are in my opinion not optimal: "… another two times in deionized water and centrifuged to dry at 200 × g for 5 mins." The pH of distilled water should be checked. Since deionized water is only weakly buffered (very low salt concentration), deionized water may have a low pH (especially observed in ddH2O). By washing the arrays with it, the lower pH may destroy interactions of weakly (or pH sensitive) binding antibodies. Instead, to remove salt residues from the array surface, you might want to use 1 mM Tris buffer in the future.

4.
In addition, for future experiments, centrifuging arrays for drying is not optimal, since drying effects may cause artifacts (coffee ring drying effects, etc.). Instead, using a jet of air to quickly remove droplets from the surface causes much less artifacts.

5.
Overall, the manuscript is well written and the bioinformatics seems sound. Yet, the authors can take some more care in presenting the data and should correct and/or explain the inconsistencies in the methods section (scanner, wavelengths, antibodies).
Future analyses should focus on patient stratification and possibly more homogenous samples (different cohorts may be difficult to compare). This may significantly improve statistical outcome. In this manuscript entitled "Peptide arrays of three collections of human sera from patients infected with mosquito-borne viruses" by Pickett et al., the authors report the screening of patient sera using peptide microarrays for antibody interactions. The goal is to identify disease specific biomarkers, since "serological tests are often not reliable for diagnosis after seroconversion and convalescence due to cross-reactivity among flaviviruses".
Overall, the approach is sound and has very good potential and the authors did an impressive job in bioinformatic prediction and analysis. Yet, the results show that there is room for technological and methodological improvement.
I believe that the data presentation can be improved. Thus, I suggest the following minor changes: 1) The title "Peptide arrays of … human sera…" is in my opinion misleading. "Human sera are analyzed/screened on/with peptide arrays" or "Arrays are incubated with human sera".
Response: We thank the reviewer for their input on how to improve the title. We have changed the title to "Peptide arrays incubated with three collections of human sera from patients infected with mosquito-borne viruses".
2) It is somewhat difficult to understand the peptide selection, because there is just a reference to another publication. A brief 2-3 sentences description on the peptide selection process would help.
Response: We agree with the reviewer and have incorporated a more detailed description of the peptide selection process in the last two paragraphs of the Introduction section, as well as the first paragraph of the Methods section in version 2 of the manuscript. We anticipate that these changes sufficiently clarify the peptide selection process that was informed by our previously published analysis.
3) To judge the quality of the microarray data, it would be highly beneficial to have some kind of heat map of all array results together (raw and/or normalized median peptide staining intensity per patient). I found it somewhat unconventional to deposit a tsv result file on github. Only providing raw data makes a quick review and validation rather difficult.
For an example, see Figure 2   The following general comments might be helpful to discuss the statistical significance of the results: 1. I do not fully agree with this general assumption: "Although reports showing antibodies against other viral proteins are detectable 12, the E and NS1 proteins are the primary targets of the humoral anti-flavivirus immune response in humans 13-15." Although discussed before (citation 11), this approach ignores for example the highly specific serum biomarker from the NS2B protein in Zika. This approach might miss other similarly important peptide biomarkers. In addition, adding peptides from the nonstructural proteins might help to boost the overall sensitivity of the assay (yet, losing some specificity).
Response: We thank the reviewer for pointing out this unclear phrasing. We have corrected the text in the Introduction section to address the usefulness of the nonstructural proteins as potential serological markers. We have also edited the second paragraph of the Discussion section to mention potential future experiments that would incorporate peptides from nonstructural regions.
2. By choosing consensus sequences, the authors limited the peptide selection, which is valid and -with current array technology -necessary. However, this probably limits the sensitivity of the microarray results and the subsequent ELISA. Depending on the selected protein regions and patient origins (-> virus genotypes), there might be several important genotypic variants, which the authors would have neglected.
Response: We appreciate the reviewer mentioning the concept that consensus sequence may be affected by strain-or type specific amino acid substitutions. We agree that the usefulness of this approach should be evaluated in future experiments as the capacity and capability of arrays continue to improve. We have added text to inform the readership of this potential weakness of our approach in the fourth paragraph of the Discussion section.
In addition, the array protocol seems not optimal and there are some inconsistencies in the methods section. Some general remarks, questions, corrections and suggestions regarding the array processing: 1. The raw data states that the scanner Innopsys Innoscan 1100AL and the Mapix software were used. Yet, the in the manuscript, it is stated: "…using a ProScanArray HT (Perkin Elmer) microarray scanner at 488 nm and 600 nm, and images were saved as highresolution TIF files. Imagene® 6.1 microarray analysis software…". Please correct. The scanner wavelengths were supposedly 488 nm and 600 nm. In the raw data files, it says 488 nm and 635 nm. Please correct. If possible, please also give the laser powers and gain (PMT) settings for the scans (according to the raw data: wavelengths 635, 488: "LaserPower=10.0, 5.0", "PMTGain=50, 50").
Response: We thank the reviewer for finding these inadvertent errors in the text. We have changed the 600 nm wavelength in the manuscript to the correct value of 635 nm, which now matches what is reported in the raw data files. The instrument description in the raw data files is an artifact generated by the scanner software, while the instrument text in the manuscript is correct. We have also included the catalog information for the secondary antibodies, laser power information, and PMT gain in the "High-throughput screening and quantification of the characterized patient sera" subsection of the Methods section.
2. For the microarray experiments, the authors claim to have used the Alexa 488 labeled secondary antibody: Invitrogen, cat. #: A-11013, RRID: AB_2534080. This is an (H+L) chain specific antibody, so it will also bind to IgM antibodies (please include this information!)! The use of the Ab labeled with Alexa 488 is a bit odd, since the original data files state that the scanning of the main interaction was performed at 635 nm. Yet, the data of the 488 nm channel is not included in the raw data and, according to the protocol, it seems to me that this dye is only used for controls (see section "Peptide preparation and microarray printing"). Are you sure that you used this specific secondary antibody? If so, I would highly recommend changing to a 635 nm channel compatible secondary antibody, since the 488 nm channel gives much higher background and generally very strong autofluorescence.
Response: We appreciate the reviewer asking us to clarify the text surrounding the secondary antibodies that were used. Indeed, the secondary antibody used to detect serum antibodies bound to viral peptides were in the 635 nm channel, while the 488 nm channel was used solely for the secondary antibodies bound to the control peptides. We have added the correct catalog information for the secondary antibody used to detect reactivity to viral peptides in the "Highthroughput screening and quantification of the characterized patient sera" subsection of the Methods section. We have added a new sentence in the third paragraph of the Discussion section to inform the readership that the H+L secondary antibodies that were used could also recognize bound IgM.
3. In contrast, for the ELISAs, the authors apparently used a different secondary antibody, which is Fc specific, so it should only bind to IgG. Thus, the microarray data cannot be directly compared to the ELISA data.
Response: The reviewer is correct. The microarray data were generated to narrow down the candidate viral peptides to a much smaller number that would be tractable to test with ELISAs. Changing the secondary antibody between the platforms does affect the ability to directly compare them, and some of the signal loss that was observed could be attributed to that change of reagents.
4. The final steps of the array washing and drying protocol are in my opinion not optimal: "… another two times in deionized water and centrifuged to dry at 200 × g for 5 mins." The pH of distilled water should be checked. Since deionized water is only weakly buffered (very low salt concentration), deionized water may have a low pH (especially observed in ddH2O). By washing the arrays with it, the lower pH may destroy interactions of weakly (or pH sensitive) binding antibodies. Instead, to remove salt residues from the array surface, you might want to use 1 mM Tris buffer in the future.
Response: We appreciate the reviewer sharing this insight and we will take it into account when performing future experiments.
5. In addition, for future experiments, centrifuging arrays for drying is not optimal, since drying effects may cause artifacts (coffee ring drying effects, etc.). Instead, using a jet of air to quickly remove droplets from the surface causes much less artifacts.
Response: We thank the reviewer for providing such a useful recommendation to reduce the number of artifacts on the arrays that result from the drying process. We will incorporate this information in future experiments.
Overall, the manuscript is well written and the bioinformatics seems sound. Yet, the authors can take some more care in presenting the data and should correct and/or explain the inconsistencies in the methods section (scanner, wavelengths, antibodies).
Response: We thank the reviewer for taking the time to review this manuscript so thoroughly and believe that the changes incorporated in the most recent version adequately address the inconsistencies that were found.
Future analyses should focus on patient stratification and possibly more homogenous samples (different cohorts may be difficult to compare). This may significantly improve statistical outcome.
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias • You can publish traditional articles, null/negative results, case reports, data notes and more • The peer review process is transparent and collaborative • Your article is indexed in PubMed after passing peer review • Dedicated customer support at every stage • For pre-submission enquiries, contact research@f1000.com