LogisticRegression - binary classification, "custom threshold"

Question

I have a binary classification problem that I am trying to solve with sklearn's Logistic Regression. I am aware of the fact that the predict_proba() function is apparently only an approximation of the "real" probability and somewhat fuzzy. However, after reading some threads, e.g. here, I was wondering whether I would violate some assumptions about the LR classification by customizing the threshold for the decision.

In the end, my classification problem allows me to make mistakes in one class but preferably not in the other, i.e. maximize the recall for the "important" class. It appears to be a very intuitive solution to just shift the decision boundary in favor of one class and even have asymmetric decision boundaries. Or to put it differently, only assume the prediction is "correct" if probability > 0.75. Otherwise, don't make a prediction. Also, is this something with a known keyword in the ML world?

Edit:

I should add that the naive solution to classify everything in one class is not applicable (:

Frank Harrell · Accepted Answer · 2014-12-10 21:21:32Z

4

As written in numerous places on this site, logistic regression is a probability estimator. Any decision you want to make takes the predicted probability from the fitted logistic regression model, applies a utility function you specify, and chooses the decision with the highest expected utility. The utility should not be incorporated by playing with thresholds for the model (at least not usually). Having probabilities also gives you the luxury of making no decision at all for class calls.

answered Dec 10, 2014 at 21:21

Frank Harrell

105k6 gold badges207 silver badges487 bronze badges

$\begingroup$ Okay, so what you are saying is that the utility function which decides which class is the most probable should not be changed. Rather, a cut-off can be used where we decide that no prediction is reliable and thus we ignore the predicted label and don't assign a class. Is that correct? To elaborate a little bit, in my case not making a prediction is equal to assigning a sample to the important class. So the idea behind the LR is to reduce the amount of data. $\endgroup$

Kam Sen
– Kam Sen

2014-12-10 21:59:55 +00:00
Commented Dec 10, 2014 at 21:59
2

$\begingroup$ Not exactly. There is value if having a gray zone with 'no decision'. Utility functions have no notion of 'most probable'. The utility function can be changed as often as someone changes their utilities, even though you don't change your interpretation of the predicted probability. $\endgroup$

Frank Harrell
– Frank Harrell

2014-12-10 22:25:48 +00:00
Commented Dec 10, 2014 at 22:25
$\begingroup$ Fine, I simply misunderstood what a utility function is. Thank you! $\endgroup$

Kam Sen
– Kam Sen

2014-12-11 11:23:59 +00:00
Commented Dec 11, 2014 at 11:23

Add a comment |

Stack Exchange Network

LogisticRegression - binary classification, "custom threshold"

1 Answer 1

Your Answer

Linked

Hot Network Questions

LogisticRegression - binary classification, "custom threshold"

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Hot Network Questions