2
$\begingroup$

I am currently creating two logistic regression models (one with forward selection and one with LASSO) using R to predict whether a patient has a malignant or benign breast cancer from this dataset: https://www.kaggle.com/datasets/uciml/breast-cancer-wisconsin-data . I am using a nested crossed validation with stratification since my dataset is imbalanced, and a little bit of Platt calibration. When it's finally time to evaluate my models, i get very high results in terms of accuracy, precision, brier score,ecc. but i get very strange results on my calibration:

DEVELOPMENT SET RESULTS (Repeated Nested CV): ----------------------------------------------------

FORWARD SELECTION:
Performance Metrics:
AUC: 0.9792 ± 0.0209
Accuracy: 0.9509
Sensitivity: 0.937
Specificity: 0.9589
Brier Score: 0.0414
Calibration Metrics:
Mean Calibration Slope: 1.731
Mean Calibration Intercept: -0.4099
Proportion Well-Calibrated (HL p>0.05): 0.3696

LASSO SELECTION:
Performance Metrics:
AUC: 0.9885 ± 0.0133
Accuracy: 0.9254
Sensitivity: 0.9521
Specificity: 0.9077
Brier Score: 0.06
Calibration Metrics:
Mean Calibration Slope: 45.9989
Mean Calibration Intercept: 18.2002
Proportion Well-Calibrated (HL p>0.05): 0.64

2. HOLDOUT SET RESULTS (Unbiased Estimate):
----------------------------------------------------------------------

=== FORWARD ON HOLDOUT ===
Original Performance:
AUC: 0.997
Brier Score: 0.0217
Recalibrated Performance:
AUC: 0.9866
Brier Score: 0.0265
=== LASSO ON HOLDOUT ===
Original Performance:
AUC: 1
Brier Score: 0.0143
Recalibrated Performance:
AUC: 1
Brier Score: 0.0152

I really don't know what to do in order to fix my calibration and lower my accuracy, since it is really suspicious. Can anyone help me?

$\endgroup$
7
  • 1
    $\begingroup$ Accuracy and precision suffer from the same issues; I would trust the Brier score more if this is a pure prediction task. What exactly are you concerned about in terms of calibration? $\endgroup$ Commented Jul 16 at 14:52
  • $\begingroup$ When you select the stepwise model/lasso penalty parameter you can choose the model that gives you the best brier score if calibration is the most important to you. $\endgroup$ Commented Jul 16 at 14:54
  • $\begingroup$ @StephanKolassa The calibration slope and intercept values are way too high for my model to be considered well-calibrated, from what I understand the slope value should be around 1 and the intercept around 0, therefore not having under or over estimation in the model. $\endgroup$ Commented Jul 16 at 15:39
  • 1
    $\begingroup$ Note that the calibration curve is really hard to estimate correctly, especially for a very accurate/very confident model. Calibration is usually measured by taking the samples lying within some predefined intervals in terms of the predicted probability function (eg. between 0…0.1, 0.1…0.2, etc.), then comparing the center of that interval to the actual proportion of true positives in each group. Hence, for confident and good models that always predict very near 0 or 1, all intervals except the two extremes may be very sparsely represented or outright completely empty. $\endgroup$ Commented Jul 16 at 16:12
  • 1
    $\begingroup$ @Leo_Miche (cont.) And since you used cross-validation, you can be reasonably confident that the results reflect reality. If you want to, you can re-run cross-validation with a different random seed (assuming you randomly shuffle the data for CV) and see whether it substantially affects the results; if it does not, then you can trust the model more, or, at least, you excluded one possible failure mode regarding overfitting. $\endgroup$ Commented Jul 16 at 19:32

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.