Revealing HIV viral load patterns using unsupervised machine learning and cluster summarization [version 1; peer review: 1 approved, 1 approved with reservations]

HIV RNA viral load (VL) is an important outcome variable in studies of HIV infected persons. There exists only a handful of methods which classify patients by VL patterns. Most methods place limits on the use of viral load measurements, are often specific to a particular study design, and do not account for complex, temporal variation. To address this issue, we propose a set of four unambiguous computable characteristics (features) of time-varying HIV viral load patterns, along with a novel centroid-based classification algorithm, which we use to classify a population of 1,576 HIV positive clinic patients into one of five different viral load patterns (clusters) often found in the literature: durably suppressed viral load (DSVL), sustained low viral load (SLVL), sustained high viral load (SHVL), high viral load suppression (HVLS), and rebounding viral load (RVL). The centroid algorithm summarizes these clusters in terms of their centroids and radii.


Introduction
The primary clinical goal of HIV treatment and patient engagement is suppression of the HIV viral load (VL), as measured by low or undetectable circulating HIV RNA levels.However, VL most often fluctuates over repeated measurements, with a range that spans 8 orders of magnitude from 0 (undetectable) -10 7 copies/mL.VL is regularly monitored for signs of progression of HIV infection.Standard HIV treatment protocols are based on VL measurements 1 , especially when monitoring responses to antiretroviral therapy (ART).Monitoring of VL helps to determine whether ART therapy was able to successfully suppress patient VL 2 .Individuals with sustained high viral loads (SHVL) are at greater risk of secondary transmission, clinical progression to AIDS, and death [3][4][5][6] .In contrast, significant reduction in VL or high viral load suppression (HVLS) both lead to immune recovery, as measured by CD4 T cell levels 7 , and can reduce or eliminate the risks of SHVL.Furthermore, patients sustaining low-level viral load (SLVL), or with a rising VL after previous suppression, have a high incidence of treatment failure 8 .Thus, developing an objective measure of VL status, and categorization of patients by time varying patterns of VL, is critical for standardizing both therapy and comparing research protocol efficacy.
Reports in the current literature differ in the definition "high viral load" [9][10][11][12] , and their findings of how long it takes a patient on highly active anti-retroviral therapy (HAART) to suppress their VL 2,9,10,13 .We summarize some of the published approaches here (for greater detail see Supplementary File 1).With respect to VL levels, Terzian et al. defined SHVL as two consecutive viral load measurements (VLM) ≥100,000 copies/mL 9 .Durably suppressed viral load (DSVL) was defined as all VLM <400 copies/mL.In contrast, Greub et al. focused on detecting low level viral rebound (LLVR) by first considering patients with an initial consecutive VLM pair <50 copies/mL, and classified LLVR as having subsequent maximum VLM between 51-500 8 .Alternatively, Rose et al. investigated the use of five different frameworks to categorize suppressed versus notsuppressed VL 10 .Their approach excluded patients with VLM<200 at baseline, and stratified the remainder with regard to VL suppression using an 8 month window centered around 24 months after the start of VLM (18-30 months).Another approach was used by Phillips et al., and characterized VLM responses to ART 13 , utilizing a 24-40 week window and a rule-based method to identify two populations of HIV patients (Viral Failure and Viral Rebound).Despite these studies, no formal standard has been adopted by the field to classify a patient as having DSVL, SHVL, HVLS, SLVL, or rebounding viral load (RVL) patterns.
Classifying patient VL states outside of research studies is further complicated in that real-world VL measurements are taken intermittently over time, and missing data is common due to a variety of factors (e.g.travel, social circumstances, nonadherence).This leads to incomplete and irregularly spaced data points.In addition, differences in the sensitivity of the multiple VL clinical assays available results in multiple cut off points for "undetectable" viral loads analyzed at different facilities, further complicating analyses.Thus, there is a need for analytic techniques that can adjust for these details and classify VL states, both across research studies using different methodologies and to consistently classify patients in clinical practice.
Machine learning methods can provide objective, unsupervised classification of patient clinical status 14 .These methods begin by collecting a set of features from patient data (e.g.demographics, laboratory measurements, therapies) and then performing computational clustering to identify similar patient classes.Some groups have applied machine learning methods to HIV research studies 15 to predict HIV VL responses 16 or CD4 T cell counts 17 , to distinguish between suppressed and viremic patients 18 , and to select therapeutic regimens 19 .None, however, have used machine learning to create a standard classification for VL status with irregularly sampled VL measurements across a cohort of patients.
To address these issues, we propose a set of unambiguous features which, when combined as a feature vector, capture the distinct dynamic patterns present in VL measurements over time.In addition, we have developed a novel centroid algorithm to cluster HIV positive subjects based on these patterns.Here we present the derivation of this method, and demonstrate its application to clustering 1,576 HIV patients with repeated VL measurements over a 5 year period.We found that patient VL measurements can be clustered into five time-varying patterns that correspond well to clinically relevant states.We note that the method and resulting categories can be used to standardize definitions of VL patterns across research studies, and potentially for clinical classification.

Human subjects protection
This proposal was reviewed and approved by the University of Rochester Human Subjects Review Board (protocol number RSRB00068884).Consent was waived by the review board due to de-identification of the data set.The analysis in this paper is presented in compliance with Center for Medicare Services (CMS) current cell size suppression policy 20 .Data were coded such that patients could not be identified directly in compliance with the Department of Health and Human Services Regulations for the Protection of Human Subjects (45 CFR 46.101(b)(4)).

Study data
We obtained medical encounter data from all patients with an HIV diagnosis in the University of Rochester Medical Center's electronic medical record system (EMR) between 2011-2016, including age, gender, race, ethnicity, and VL measurements.There were a total of 1,892 patients with at least one VL measured, with 1,576 of these patients having at least three VL measurements.

Hardware and software specifications
Analyses were performed on a Windows 8 server with Intel(R) Xeon(R) CPUs E5-2620 v2 @ 2.10GHz and 256GB of RAM.Python 2.7 was used for most data mining and machine learning under Spyder v.3 installed from Anaconda2 (64-bit).The default packages available in Anaconda were used for analysis, including, but not limited to: NumPy, scikit-learn, SciPy, datetime, csv, math, Matplotlib, pip, operator, copy, random, and time.Using pip we installed the webcolors and pydotplus packages for rendering a decision tree.SQLite was used to store, query, and clean ~the data.Analytic code is available for download at https://github.com/SamirRCHI/Viral_Load_Data_Categorization.

Viral load analysis methods
Since VL data is asynchronous and noisy, with variable numbers of data points for each subject, we excluded patients with ≤ 2 VL measurements as too few to accurately assess VL patterns.Based on temporal patterns of VL described in the literature, the VL pair distribution of our data (Figure S1), and a further extensive investigation into the data, we hypothesized six potential temporal VL patterns, defined in Table 1 and illustrated in Figure 1.
It is important to note that these definitions are pattern based, and do not explicitly select absolute VL cutoff levels or a specific temporal window, as other reports have done [8][9][10]13 . Thi has the advantage of allowing the absolute VL levels and critical time windows to emerge from the analysis.It also does not preclude incorporation of absolute levels (e.g.VLM>400) at a later stage into the pattern specification.

Feature vector definition
Mathematical notations for this work are described in Table 2.We next designed a feature vector to capture characteristics that would allow us to distinguish between VL patterns.VL values at the lower limit of detection are a function of the specific assay used, and appear in our data set as 0, 20, and 48 copies/mL (Figure S1).Thus, plots of the log 10 transformed data have discretely spaced values at the lower level of detection, capturing the undetectable range of viral load.Additionally, we adjusted the data by log 10 [V L + 10] to avoid log 10 [0].The addition of 10 to VL (instead of 1) is used to minimize the distance between the undetectable values: 0, 20, and 48 (copies/mL).Thus, in our notation, all the values related to viral load are assumed to have been adjusted to this measure.For example, min V L = log 10 [0 + 10] = 1 and max V L = log 10 [10 7 + 10] ≈ 7.
Using the transformed VL data, we next extract several relevant features of the VL measurements over time.These features are used for machine learning classification of individual patient VL time series, and designed to distinguish patterns in VL change while minimizing the effects of noise.We do not limit feature extraction based on the total elapsed time of viral load measurements because the optimal time-point for determining viral load class is not well established.The attributes for feature extraction are: relative area of viral exposure, weighted recency reliability, adjusted maximal difference, and interquartile range.The definitions include: 1. Relative area of viral exposure (Area) -the area under the viral load curve relative to the total viral load area possible, which has a range between [0,1].We choose a normalized, relative score, as the total time span between the first and last viral load measurement, which differs between patients.This feature is similar to finding the

V L 
, where 1 ≤ i ≤ V LM p .

max V L
The maximum viral load for all patients, (10   mean and median, except it is sensitive to the dimension of time, hence yielding more information.The feature is calculated by summing the area of each trapezoid created by each pair of viral load values, followed by dividing by the total possible area (Equation 1).
, , , )( ) 2. Weighted recency reliability (wRR) -Due to viral load noise, the last measurement may not be an accurate reflection of a patient's viral load trend.For example, a patient may have a VL whose average slope is negative, indicating high viral load suppression over time (HVLS).If, however, the last measurement is slightly higher than the trend, heavily weighting this last measurement could lead to misclassifying the patient as rebounding viral load (RVL).
To account for this, we calculate a weighted mean where the weight of the VL measurement increases with time.More specifically, the weight function follows an inverse square root function This has the advantage of avoiding rapid convergence of g(x) to zero when time is measured in units of days (Equation 2).Weighted recency is then calculated as the dot product of the viral loads and weights divided by the sum of the weights (Equation 3).
, 1 We were also interested in how reliable wR is as a representation of the patient's viral load trend.To this end, we calculated the absolute deviations from the viral load measurements to wR (Equation 4).Rather than averaging the deviations, we take the median to reduce the effects of outliers and call this our weighted recency reliability measure (Equation 5).We take the inverse to force the range of the result to be between [0,1]; a property made to use in our next proposed feature, adjusted maximal difference.
3. Adjusted maximal difference (Adj MD) -this is timeindependent the difference between the "peak" and last VL measurements.To distinguish between viral load suppression or emergence, we calculate the "peak" as the maximum of the absolute deviations (Equation 4) and retain the sign of the result.We expected the positive scores to effectively isolate the EVL group, however, we instead found that retaining the positive (emergent) scores lead to mis-categorization of SHVL and RVL groups without clearly identifying EVL patterns.This, along with other investigation into the data, led us to conclude that the EVL pattern may not exist in our data, but we refrain to make generalizations to all healthcare facilities.With this consideration, we force (ground) the positive scores down to zero for proper labeling of SHVL and RVL (Equation 6).
Due to the varying nature in viral load measurements, we are hesitant to use the final viral load measurement as a means of judging suppression.Thus we propose to use wR instead.To reduce the effects of rebounding patients being falsely labeled as suppressed patients, we multiply our result by wRR -as rebounding patients are expected to have a low score in the range [0,1].The maximal difference is necessary in order to ensure that the suppression type of viral load patterns are classified appropriately (Equation 7). ) ,argmax ( grnd( ) max( ) 4. Interquartile range (IQR) -This feature is added to further segregate the rebounding patients and follows the standard interquartile range calculation (Equation 8).

Statistical analysis
Machine learning methods for cluster classification were compared by calculating F 1 scores, the harmonic mean of precision and recall 22 , defined by Equation 9-Equation 11.F∨, and F IQR .Finally, the terms label assignment, VL pattern membership assignment, patient categorization, and prediction, all refer to the same principle: To assign the most appropriate label which characterizes the viral load pattern of a patient.However, while the principle is the same, the method of assigning such an appropriate label differs depending on the categorization or the learning method used.

Feature extraction and normalization
We began by transforming viral load data by min-max normalization 22 to equally weight the temporal features of the VL series (Equation 12).That is, we normalize the features, F, to a range between [0, 1] using Equation 12where Next, we examined each of the four features for all patients with ≥ 3 viral load measurements (N = 1,576 patients), and did not find distinct bi-variate clustering (Figure S2).A feature correlation coefficient analysis (Supplementary Table S1) revealed that the Adj MD feature is linearly independent of Area and wRR.In contrast, there is modest linear dependence between IQR and Adj MD, and between Area and both wRR and IQR.As expected, the largest linear dependency is between wRR and IQR.These results suggest the separation between viral load patterns will be most noticeable between the Area and the Adj MD features -as we designed them to be.Also, although Adj MD is dependent upon wRR, we find that their correlation coefficient is very low (0.033).

Hierarchical clustering
We then performed hierarchical clustering of the individual subject VL patterns using a Euclidean distance metric and Ward's linkage criterion 23 to minimize the total within-cluster variance.Patients showed a clear separation into 5 distinct groups, which had clinical significance (Figure 2 and Figure 3).The cluster with the lowest viral loads and the highest weighted recency reliability (n=442) corresponds to the DSVL patient group.The patients corresponding to the SHVL group (orange; n=46) exhibited the highest relative Area and very low IQR.
Compared to the DSVL cluster, the blue cluster (n=535) has slightly greater area and IQR with a significant difference in the weighted recency reliability.Using this information, along with the general patterns shown by Figure 3, we identify this as the SLVL group.The algorithm also identifies the RVL (black; n=237) and HVLS (purple; n=316) clusters.The RVL cluster has a low weighted recency reliability and high IQR.In contrast, the HVLS cluster has a lower area, higher weighted recency reliability, indicating little variation in the terminal portion of the VL time series, and most importantly very low adjusted maximal differences (Figure 4).
VL patterns are similar within clusters, and dissimilar between clusters Figure 3. Interestingly, there are patients within each cluster whose last VL measurement occurs near 1,827 days.This is equivalent to the full span of five years of VL monitoring data set.This suggests that these clusters don't disappear after some elapsed time, but rather each type of pattern can be found at virtually any time point.
We found large VL spikes within the time series of the HVLS group.We hypothesize that this may be due to the asynchronous timing of measurements between subjects, the natural variation in biological responses, or patient variability in adherence to therapy.This observation also reflects one limitation of asynchronous outcomes data sampling, which lacks a "completion" endpoint characteristic of most prospective, randomized clinical trials.If measurements ended at a spike, the adjusted maximal difference feature may be weighted in the favor of the patient being classified as RVL.This may indicate that some patients classified as suppressing their viral loads should have been classified as having rebounding viral loads.Alternatively, may indicate that these features do not restrict a patient to forever to one category, but allow for dynamic classification as a function of biological or therapeutic responses.

Comparison of categorization methods
Using the same data set, we next compared our VL pattern categorization method to those previously published in the literature.Visually, we find that the SLVL group detected by our method is very similar to the LLVR group defined by Greub et al. (Figure 5).Furthermore, it appears that the methods trying to capture SHVL, viral rebound, and viral failure patients did not succeed as well as the identification of SHVL and RVL patients in our method.RMVL repeat continuous visually appears to have performed very well in identifying patients whom have suppressed their viral loads.However, the results suggest that our analysis performs slightly better in identifying the suppression group (HVLS), as we find that the last VL measurements (black dots in Figure 5) are consistently low using our method.
The other methods may not have performed as well as they rely on a window or a consecutive pair measure, which may be too subjective for assigning VL pattern membership.Furthermore, notice that patients with baseline VL<200 (Figure 5) contain VL patterns which can reach as high as 10 6 copies/ml, which is in contrast to Rose et al.'s assumption that these patients have consistently low viral variation.Lastly we wish to emphasize that while some of these categorization methods are successful in identifying a specific group of patients, our method is unique as it attempts to associate each VL pattern to a specific category, without using categories such as "Not Suppressed", "Unspecified", or "Omitted".

Supervised learning of VL patterns
We next used the classes identified by hierarchical clustering to compare several machine learning models, with the goal of identifying methods that could be trained to prospectively assign HIV patients to VL categories (i.e.SHVL, SVL, SLVL, DSVL, and RVL).Unsupervised learning (e.g.hierarchical clustering) is useful for establishing the data structure of VL categories and their locations in the feature vector space.Once the model is established (e.g.cluster boundaries), supervised learning methods are better suited for prospective cluster assignment, given a robust "ground truth" for model training, as they do not depend on re-analysis of the entire population.
To this end, we compared the predictive power of several supervised learning methods for HIV cluster assignment, including: k-nearest neighbors (kNN), decision tree, support vector machine (SVM), Adaboost, and random forests.Models were trained on the original data set, and we then ranked their prediction power by their average F 1 score derived by leave-one-out cross-validation (LOOCV) on the clustered results (Table 3).We compared the ability of these methods to reconstruct the originally identified clusters, even when allowing for variability in cluster numbers (e.g.kNN with k={7, 9}, or DT without a maximum depth specification).All methods performed comparably well, with the notable exception of Adaboost.This generally high performance was expected because the VL pattern categories are well-separated as a result of the clustering.k-Nearest Neighbors and k=5, was computationally efficient and yielded the best results in Table 3.
We next considered the trade-off of predictive precision versus model interpretability.Critical clinical evaluation of machine learning results is important to protect against mis-categorization and clinical error.For this reason, many have advocated using models that are more clinically interpretable.kNN is dependent upon the entire training set for prediction, as it does not inherently "learn" patterns 24 , hence it does not meet our interpretability criteria.In comparison, SVM offers a simpler model, but it's results could be non-intuitive for clinicians.And although Decision Trees offer the best interpretability, overly complex trees may be generated, as occurred in our study (Figure S3).
We found that pruned decision tree rules, with a maximum depth of 5 levels, met this interpretable criteria, however at a slight cost to the predictive power (Table 3).The extracted decision rules are shown in Figure 6.Each category has a rule with a high proportion of true positive samples following the rule relative to all samples for the category (support).Similarly, a high proportion of the predicted class was found in the rule (precision), indicating that the rules can be summarized into a majority rule.Note that the sum of the support does not necessarily add up to one for each class because some samples belonging to that class may have been otherwise placed into a different rule, making the precision of that rule weaker.As an alternative interpretable model we explored the use of centroid cluster summarization, which is often used in clustering algorithms, and is flexible enough to accommodate different centroid determination methods 22,25 .To determine the effects of different centroid determination algorithms, we compared seven different methods: multidimensional mean, multidimensional median, best representative center, bounding box method, smallest disk method, polyhedral center, and a novel "push and pull" (PnP) method inspired by force-directed graph drawing such as the Fruchterman-Reingold's algorithm 26,27 (see Supplementary File 1).Force directed clustering methods maximize inter-cluster center distances, while minimizing intracluster distance, and are the basis for modularity clustering in graph theory 28 .
We then combined the centroid cluster summarization approach with a radius-based classification prediction algorithm.Let c i be the ith cluster center with corresponding radius r i , where r i is calculated as the distance to the farthest intra-cluster sample from c i , then for a new sample s choose its predicted cluster membership j such that 2 j j s c r − is a minimum.We refer to this method as radial normalization classification.
Comparing the representative F 1 power of the centroid radial normalization methods (italicized in Table 3) to common machine learning algorithms, we find that the centroid interpretation loses some predictive power.However, the centroid summary is highly interpretable because the entire model can be expressed concisely (Supplementary Table S2), and understood clearly.For example, a clinician classifying a patient by VL time series values would compare observed feature values with the ranges given in Figure 6, and find which classification the patient's data fits best within.In the case of the centroid method, if an observed value appears to fall in multiple categories, then they should be assigned to the one closest to the center (this allows a clinician to cross-check model predictions).

Temporal state variation
HIV patient viral load states are often fluid, with class changes (e.g.SHVL → HVLS) occurring due to therapy, viral genetics, social and other factors.To examine this aspect of classes, we use the k-Nearest Neighbors (k=5) model, fit to the original clusters, to predict the class state of each patient with ≥ 3 VL measurements using only partially retained VL data.For example if a patient has 6 viral load measurements, then we predict the class state at 3, 4, 5, and 6 VLM, which may yield SHVL → SHVL → RVL → HVLS as its prediction.We then constructed a state-transfer network using the trace-route method 29 , revealing several interesting relationships: 1. Patients on therapy appear to suppress their viral loads at a positive linear rate throughout the entire 900 day span.This is quite different from the literature which suggests that if a patient is going to suppress their VL, it will be within 32 weeks, or 224 days 13 (Figure 7A).  2. DSVL classification appears unstable for the first 400 days, suggesting that patients in this class should be monitored carefully during this initial period (Figure 7A).
3. The number of patients classified as SHVL drops considerably until ∼500 days after first classification.After this point, those patients who have not yet left the SHVL category, may not do so (Figure 7A).
4. The two sets of classes {DSVL, SLVL} and {SHVL, RVL, HVLS} are well separated (i.e.without much transfer between sets; Figure 7B).This appears to suggest that patients whose viral load is consistently low or durably suppressed tend not to transfer into a high viral load state (i.e.RVL or SHVL), at least in this data cohort.
5. SHVL patients in this cohort tended to transfer out of the class at a much higher rate than the transfer in, suggesting positive patient care (Figure 7B).This observation is consistent with reports in the literature that entry into treatment, with adherence to a HAART regimen, generally results in viral load suppression.
6.The state transfer diagram illustrates that the most frequent state transition over time is remaining within the same cluster (Figure 7B) assignment.

Discussion
Researchers have previously performed HIV population case studies using differing schema to classify VL patterns 2,9,10,13 .We have developed a unique method for standardizing the algorithmic classification of VL patterns using a set of optimally segregating features.These features have been specifically engineered to optimize unsupervised clustering of temporal sequences of VL data that are asynchronous and noisy.Our findings demonstrate their success in identifying five viral load patterns often reported in the literature [7][8][9][10][11][12] .It is possible that additional viral load patterns may emerge in the future, for example due to new HIV variants that are resistant to current therapies.The method reported here is flexible enough to recognize such new temporal patterns of VL responses.It is also general enough that models could be trained on other viral infections that have patterns of natural or treatment related patient responses (e.g.hepatitis B and C, parvovirus B19), although this may require defining new features that capture disease specific pattern variants.
A common practice in data analytics is to calculate the centroid as the average of the points 22,30 .However, Table 3 suggests that the mean is not necessarily the best centroid for HIV viral load data.We note two advantages of the centroid algorithm: First, we can choose the centroid best corresponding to the shape of the data, and second, we can use it to mathematically determine the amount of over-lap between n-dimensional cluster spheres (i.e.viral load categories).This method may facilitate crosscomparison of HIV research studies by providing a standard for VL pattern classification.Such standardization would be immensely useful in meta-analyses [31][32][33][34][35] , potentially revealing the influence of different patient care strategies or new relationships between different patient populations.
Our work also explored the trade-off with respect to predictive accuracy between model interpretability and more complex, "black box" approaches to classification.The interpretability versus predictability problem is well known in the deep learning literature [36][37][38] .Interpretability is a desirable attribute in clinical classification systems, allowing clinicians to integrate causal physiology and diagnostic information with data features in a way promotes clearer bedside clinical reasoning.Using an interpretable model for assigning viral load pattern membership may be advantageous when a clinician wishes to use the assigned pattern membership to aid in making a critical clinical decision (e.g.choosing between treatment options), or when examining features that may be linked to a mechanism (e.g.slope of VL decline and viral genotype).A "black box" or more complex model may make such decisions or interpretations more difficult 39 , and can favor the use of simpler models at the expense of some predictive power.
Along these lines, we have also proposed a novel centroid-based algorithm for summarization of clustering results.This algorithm is not meant to supplant other well defined supervised learning algorithms, but rather to aid in interpretable assignment of VL patterns from other data sets into one of the five categories.The algorithm results are concise, allowing investigators to build the model in their preferred programming language.Hence this method may improve and standardize HIV population research by giving precise definitions to the varying temporal VL patterns, and potentially improving patient care.
Several caveats apply to our work.As noted, this is a single center study, and thus our method should be tested with a much larger data set to cross-validate the categories represented by the clusters.In addition, our feature vector was designed specifically based on observed VL patterns previously reported in the literature rather than objectively clustering the data using a standard time-series based clustering method [40][41][42] .This may limit generalizability to other VL analyses.In addition, some of our features are slightly collinear -with the greatest correlation coefficient being between IQR and wRR (-0.717).However, while HVLS and RVL both have a varied range of IQR, it is clear that the HVLS class has greater wRR than the RVL class due to HVLS patients having a long consistent viral load tail.Furthermore IQR helps distinguish the HVLS and the SLVL or SHVL class, hence both IQR and wRR are necessary despite the slight correlation.Finally, because our method normalizes time into number of days since first VL measurement, we lose the ability to look for seasonal or yearly patterns in the data.
Our data set did not have patients in whom VL was initially suppressed, and then rebounded (EVL).We originally hypothesized the existence of six distinct VL patterns, we found that the emergent VL group was not a pattern identified in our data.Perhaps this is a consequence of a high rate of local patient engagement in therapy in this cohort study, access to care, or the effectiveness of highly active anti-retroviral therapy regimens.We hypothesize that these conditions may not always exist (e.g. in areas where HAART is expensive, when people may lose the ability to pay for therapy), and that in such cases the EVL pattern may indeed be present and significant.
Based on the formulation of the adj MD and wRR features, we hypothesize that a consequence of the grounding function is that any EVL pattern, if exists, will be grouped under RVL.This grouping may be appropriate as one can argue that going from a suppressed state to a high VL state is a form of rebounding.Clinical treatment of these patterns is likely to be similar.Further work with data sets that contain RVL patterns will need to be done to test these hypotheses.Unfortunately, we are not aware of any such data currently in the public domain.
Our method used hierarchical clustering to define groups, with a cutoff for group specification at a high level in the branching tree (i.e. level 5).Such thresholds or tuning parameters are characteristic of most unsupervised clustering algorithms 22,43 .However, identification of important sub-clusters by using a lower threshold is also possible.Clustering results may change depending on the parameter chosen, revealing finer betweencluster differences as the number of clusters increase.The hierarchical clustering algorithm has the advantage that a proper cut-off can be easily visualized.For example, choosing a lower cut-off may reveal that the suppression group splits itself into categories with different rates of HIV viral load suppression during treatment.Researchers wishing to engineer a new feature vector for VL pattern segregation may find useful the Supplementary material on features we considered but subsequently removed due to poor performance.

Conclusions
We have proposed a set of four unambiguous features which have been successfully used in segregating five different types of temporal viral load patterns: durably suppressed viral load (DSVL), sustained low viral load (SLVL), sustained high viral load (SHVL), high viral load suppression (HVLS), and rebounding viral load (RVL).We have also proposed a novel centroid-based cluster summary algorithm.The use of this algorithm may improve meta-analyses or population studies of viral load patterns by standardizing the classification of HIV patient categories.Furthermore, the segregation process used in this paper (i.e.identifying domain specific features, performing unsupervised clustering, interpreting the results with a cluster summary) can be used to model other viral infections and the response of VL levels over time to treatment or natural disease progression.We also found that using a temporal state variation method is important when considering patient viral load classifications, as changes in patient response can continue to occur beyond previously estimated time frames.
Strengths: very well developed models and excellent reporting of the results through figures and documentation.The code and datasets are available in Github.
Weakness: the model is trained and implemented on a suboptimal dataset.Treatment response in HIV infection (and thus the modeling of viral response) is well understood and best modeled with the knowledge of the time of treatment initiation, and a full understanding of variable influencing treatment response.Having a cohort that is described solely by "time from first measured viral load" is to all purposes, suboptimal.An additional issue is the reliance of a limited number of viral load determinations for an unclear number of individuals.Depending on the circumstances of sampling, having three viral load over an undisclosed time period is note devoid of many uncontrolled biases.Lastly, the text is equivocal in the utilization of the last time pointthe reviewer understands that the information contained in the last point may be weighted because of the possibility that it is noisy.Unfortunately, in real life, that is the moment where strong predictive models are needed.It is possible that this was actually the goal of the authors.
Summary: this work is a valuable contribution to the field, and the basic concepts and models will hopefully be deployed in the study of datasets that are more appropriate for this exercise.It is desirable that future modeling includes a more ambitious plan to move from the current train-test approach to one that establishes the generalization of the model.It will also be critical to observe the predictive value of the model on longer term outcomes.

Is the rationale for developing the new method (or application) clearly explained? Yes
Is the description of the method technically sound?This is an extremely well written interesting paper with excellent figures.The proposed methodology has great importance and utility for both research studies and clinical programs.
My only very minor comments are: For the descriptions given in Table 2 for mathematical notation.I suggest changing the description of "refers to a single patient" to "refers to a specific patient". 1.
In the paragraph on page 6, under the heading Analytic terminology, editing is needed on line 8, where it says: clusters.interchangeably.

3.
Is the rationale for developing the new method (or application) clearly explained?Yes Is the description of the method technically sound?Yes

Are sufficient details provided to allow replication of the method development and its use by others? Yes
If any results are presented, are all the source data underlying the results available to ensure full reproducibility?Yes Are the conclusions about the method and its performance adequately supported by the findings presented in the article?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: mathematical modeling of HIV I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias • You can publish traditional articles, null/negative results, case reports, data notes and more • The peer review process is transparent and collaborative • Your article is indexed in PubMed after passing peer review • Dedicated customer support at every stage • For pre-submission enquiries, contact research@f1000.com

Figure 1 .
Figure 1.Possible HIV viral load patterns.Examples of each type of viral load pattern.Note that actual viral load patterns are noisier and may often be more difficult to distinguish.The magnitudes of viral load values reflect those found in the dataset.

Figure 2 .Figure 3 .
Figure 2. Dendrogram of hierarchically clustered patients.Clustered using the Euclidean distance along with Ward's method.Numbers on the bottom axis show number of patients in each cluster.The corresponding viral load pattern plots can be found in Figure 3. DSVL = Durably Suppressed Viral Load, SHVL = Sustained High Viral Load, SLVL = Sustained Low Viral Load, RVL = Rebounding Viral Load, HVLS = High Viral Load Suppression

Figure 4 .
Figure 4. Feature segregation from hierarchical clustering.Each patient is colored corresponding to the results from the hierarchical clustering in Figure 2. The artificial line of points is a result of the grounding function used in Adj MD.Area = relative area of viral load exposure, wRR= weighted recency reliability, IQR = interquartile range, AdjMD = adjusted maximal viral load difference.

Figure 5 .
Figure 5.Comparison of patient categories with existing methods.A 2D binning of VLM counts for every patient category.Each row uses a different categorization method, and the method name is located to the right of the row, and the title of each subplot is the category assigned by the indicated method.The columns of each 2D bin are normalized based on the maximum number of logged viral load measurement (VLM) counts in the column: log 10 [1 + V L M ].Bin color for a count of 0 is copper, and other bin colors range from white to teal (the maximum of the log 10 [V L M counts] in the column of the bin).The black dots represent the last viral load measurement for the patient (opacity ≥ 0.3; 2D bins have variable opacity for the dots).The bottom row is our analysis is the same as Figure 3, but represented as a 2D bin.DSVL = durably suppressed viral load; LLVR = low level viral rebound; SLVL = sustained low viral load; SHVL = sustained high viral load; HVLS = high viral load suppression, RVL = rebounding viral load.

Figure 6 .
Figure 6.Extracted rules from pruned decision tree and polyhedral CM rule region.Support is the fraction of true positives satisfying the rule relative to all samples of the class.Precision is the proportion of true positives versus all positives in the rule.Rules are sorted in order of application, first by the level of the decision tree depth (Depth), and then by descending precision.The colored regions represent the the values for which the rule holds (rule feature space).For the centroid method (CM; shaded gray) bounds were calculated by the polyhedron method, where the rectangular bar is the center and the radius is the area inside the parentheses.Area = relative area of viral load exposure, wRR= weighted recency reliability, IQR = interquartile range, AdjMD = adjusted maximal viral load difference.

Figure 7 .
Figure 7. Class state variation.A) Classification using kNN, with k=5, trained on the original five clusters to predict on partially retained viral load for patients ≥ 900 days of data.The number of patients in one class between 0-900 days are shown relative to the first state classification (i.e.third viral load measurement).B) A trace-route map of class state transfers (class 1 → class 2 ) as a function of partially retained viral load derived from model.Nodes represent viral load classification and arrows reflect the volume of state transitions between successive VL measurements (e.g.SHVL→DSVL).Self-loops (e.g.RVL→RVL) indicate no change in state reflecting stable classification.

Yes
Are sufficient details provided to allow replication of the method development and its use by others?YesIf any results are presented, are all the source data underlying the results available to ensure full reproducibility?YesAre the conclusions about the method and its performance adequately supported by the findings presented in the article?Partly Competing Interests: No competing interests were disclosed.Reviewer Expertise: Host and pathogen genomicsI confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.Reviewer Report 28 September 2018 https://doi.org/10.5256/f1000research.17007.r37937© 2018 Blower S. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Sally Blower Center for Biomedical Modeling, Semel Institute of Neuroscience and Human Behavior, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA, USA This is an extremely interesting study that proposes a novel quantitative methodology for classifying HIV patients by viral load patterns.The authors propose four computable characteristics of time-varying viral load patterns and a novel classification algorithm.They demonstrate their approach by classifying a group of 1,576 HIV positive patients into five categories based on viral load patterns.

Table 1 . Viral load state definitions. Abbrv. Name Definition
*Colors are used throughout the manuscript to identify clusters

Refers to the i th viral load count of patient p in
7 extracted from patient p's viral load pattern.The words sample or point are also used here RVL (black; n= 237) and HVLS (purple; n=316) clusters.interchangeably.The term feature (F) can be thought of as a column vector for all patients in the dataset consisting of the four attributes:  ∨, and IQR from a set of patients (using their viral load patterns) with the formulations given above.Then a feature vector ( p F  ) contains the values p A  , wRR p , D ∨ p , and IQR p A F , F wRR , D