Keywords
Binary classifier, ROC curve, accuracy, optimal threshold, optimal cutoff, class imbalance, game theory, minimax principle.
This article is included in the Bioinformatics gateway.
This article is included in the Machine learning: life sciences collection.
Binary classifier, ROC curve, accuracy, optimal threshold, optimal cutoff, class imbalance, game theory, minimax principle.
We have prepared a revised version of the article that aims at answering the concerns raised by the reviewers. We think our work is improved thanks to the comments. A point by point response follows.
Reviewer Luis Diambra raises two main concerns.
1. This work should put the problem in the context of supervised versus non-supervised learning.
We have modified the first paragraph in the introduction to clarify this point.
2. The problem of generalizing a learning rule/score deduced from a training set.
We have modified the first paragraph of the discussion to take this point into account.
Reviewer Pieter Meysman raises two main concerns.
1. Loss of generality in the specific case a=d=1 and b=c=0.
Following the work of Peter Flach (reference 4), I would argue that the uncertainty in the values of a, b, c and d is equivalent to the uncertainty in the values of qP and qN, so considering only the class ratio is enough for the point made in the article. The second paragraph of the Theory section was modified accordingly.
2. Nature not being a conscious agent as assumed in game theory.
The reviewer is right. We clarified this in the third paragraph of the Theory section.
To read any peer review reports and author responses for this article, follow the "read" links in the Open Peer Review table.
Many bioinformatics algorithms can be understood as binary classifiers, as they are used to investigate whether a query entity belongs to a certain class1. Supervised learning trains the classifier by looking for the rule that gives the correct answers to a training set of question-answer pairs. On the other hand, score-based binary classifiers are trained in a non-supervised manner. Such classifiers use a scoring function that assigns a number to the query. During training, the scoring function is modified to give different scores to the positives and negatives in the training set. After training, the classifier is used by assigning a query to the class under consideration if its score surpasses a threshold. A minority of users are able to choose a threshold using their understanding of the algorithm, while the majority uses the default threshold.
Binary classifiers are often trained and compared under a unified framework, the receiver operating characteristic (ROC) curve2. Briefly, classifier output is first compared to a training set at all possible classification thresholds, yielding the confusion matrix with the number of true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN) (Table 1). The ROC curve plots the true positive rate (TPR = TP/(TP + FN)), also called sensitivity,) against the false positive rate (FPR = FP/(FP + TN)), which equals 1-specificity) (Figure 1, continuous line). Classifier training often aims at maximizing the area under the ROC curve, which amounts to maximizing the probability that a randomly chosen positive is ranked before a randomly chosen negative2. This summary statistic measures performance without committing to a threshold.
TP: Number of true positives. FP: Number of false positives. FN: Number of false negatives. TN: Number of true negatives.
Training set | |||
---|---|---|---|
p | n | ||
Classifier output | p’ | TP | FP |
n’ | FN | TN |
The descending diagonal TPR = 1 – FPR (dashed line) minimizes classifier performance with respect to qP. The intersection between the receiver operating characteristic (ROC) curve (continuous line) and this diagonal maximizes this minimal, worst-case utility and determines the optimal operating point according to the minimax principle (empty circle).
Practical application of a classifier requires using a threshold-dependent performance measure to choose the operating point1,3. This is in practice a complex task because the application domain may be skewed in two ways4. First, for many relevant bioinformatics problems the prevalence of positives in nature qP = (TP + FN)/(TP + TN + FP + FN) does not necessarily match the training set qP and is hard to estimate2,5. Second, the yields (or costs) for correct and incorrect classification of positives and negatives in the machine learning paradigm (YTP, YTN, YFP, YFN) may be different from each other and highly context-dependent1,3. Points in the ROC plane with equal performance are connected by iso-yield lines with a slope, the skew ratio, which is the product of the class skew and the yield skew4:
The skew ratio expresses the relative importance of negatives and positives, regardless of the source of the skew4. Multiple threshold-dependent performance measures have been proposed and discussed in terms of skew sensitivity3,4, but often not justified from first principles.
Game theory allows us to consider a binary classifier as a zero-sum game between nature and the classifier6. In this game, nature is a player that uses a mixed strategy, with probabilities qP and qN=1-qP for positives and negatives, respectively. The algorithm is the second player, and each threshold value corresponds to a mixed strategy with probabilities pP and pN for positives and negatives. Two of the four outcomes of the game, TP and TN, favor the classifier, while the remaining two, FP and FN, favor nature. The game payoff matrix (Table 2) displays the four possible outcomes and the corresponding classifier utilities a, b, c and d. The Utility of the classifier within the game is:
a: Player I utility for a true positive. b: Player I utility for a false positive. c: Player I utility for a false negative. d: Player I utility for a true negative.
Player II: Nature | |||
---|---|---|---|
p | n | ||
Player I: Classifier | p’ | a | b |
n’ | c | d |
The payoff matrix for this zero-sum game corresponds directly to the confusion matrix for the classifier, and the game utilities a, b, c, d correspond to the machine learning yields YTP, YFP, YFN, YTN, respectively (Table 1). In our definition of the skew ratio, the uncertainty in the values of a, b, c and d is equivalent to the uncertainty in the values of qP and qN4. Thus, we can study the case a=d=1 and b=c=0 without loss of generality4. Classifier Utility within the game then reduces to the Accuracy or fraction of correct predictions2–4. In sum, maximizing the Utility of a binary classifier in a zero-sum game against nature is equivalent to maximizing its Accuracy, a common threshold-dependent performance measure.
We can now use the minimax principle from game theory6 to choose the operating point for the classifier. This principle maximizes utility for a player within a game using a pessimistic approach. For each possible action a player can take, we calculate a worst-case utility by assuming that the other player will take the action that gives them the highest utility (and the player of interest the lowest). The player of interest should take the action that maximizes this minimal, worst-case utility. Thus, the minimax utility of a player is the largest value that the player can be sure to get regardless of the actions of the other player. In our case, nature is not a conscious player that chooses the action that gives them the highest utility. We instead understand our application of the minimax principle as the consideration of a worst-case scenario for the skew ratio.
In our classifier versus nature game, Utility/Accuracy of the classifier is skew-sensitive, depending on qP for a given threshold3,4:
The derivative of the Utility with respect to qP is zero along the TPR = 1 − FPR line in ROC space (Figure 1, dashed line). The derivative is negative below this line and positive above it, indicating that points along this line are minima of the Utility function with respect to the strategy qP of the nature player. According to the minimax principle, the classifier player should operate at the point along the TPR = 1 − FPR line that maximizes Utility. In ROC space, this condition corresponds to the intersection between the ROC curve and the descending diagonal (Figure 1, empty circle) and yields a minimax value of 1 − FPR for the Utility. It is worth noting that this analysis regarding class skew is also valid for yield/cost skew4.
We showed that binary classifiers may be analyzed in terms of game theory. From the minimax principle, we propose a criterion to choose an operating point for the classifier that maximizes robustness against uncertainties in the skew ratio, i.e., in the prevalence of positives in nature and in yield skew, i.e., the yields/costs for true positives, true negatives, false positives and false negatives. This can be of practical value, since these uncertainties are widespread in bioinformatics and clinical applications. However, it should be noted that this strategy assumes that a score optimized for a restricted training set is of general validity.
In machine learning theory, TPR = 1 − FPR is the line of skew-indiference for Accuracy as a performance metric4. This is in agreement with the skew-indifference condition imposed by the minimax principle from game theory. However, to our knowledge, skew-indifference has not been exploited for optimal threshold estimation. Furthermore, the operating point of a classifier is often chosen by balancing sensitivity and specificity, without reference to the rationale behind7. Our game theory analysis shows that this empirical practice can be understood as a maximization of classifier robustness.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
References
1. Blockeel H, Struyf J: Deriving Biased Classifiers for Better ROC Performance. Slovene Society Informatika. 2002; 26 (1).Competing Interests: No competing interests were disclosed.
References
1. Watkin T, Rau A, Biehl M: The statistical mechanics of learning a rule. Reviews of Modern Physics. 1993; 65 (2): 499-556 Publisher Full TextCompeting Interests: No competing interests were disclosed.
Competing Interests: No competing interests were disclosed.
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | |||
---|---|---|---|
1 | 2 | 3 | |
Version 3 (revision) 08 Feb 17 |
read | read | |
Version 2 (revision) 15 Dec 16 |
read | ||
Version 1 25 Nov 16 |
read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (1)