Newest 'calibration' Questions

2 votes

0 answers

68 views

How can I best evaluate whether uncertainties (confidence/credible) are of appropriately width, in simulations?

I would like to evaluate how well two experimental designs perform with the goal of parameter estimation. I'm generating 1000 simulated datasets for each design and fitting the same model to all of ...

mkt

21.2k

asked Oct 22 at 12:26

2 votes

1 answer

34 views

Tune a Model on Calibrated or Uncalibrated Probabilties?

I report model performance using log loss on calibrated probabilities, where calibration is temperature scaling fitted on train-only out-of-fold (OOF) predictions. For hyperparameter tuning, should ...

randomstate42

23

asked Oct 21 at 12:23

5 votes

2 answers

95 views

Paradoxical effect of right censoring on D-calibration in survival analysis

I am examining the effect of (right-)censoring on the D-calibration of survival data sets. The data sets are completely synthetic, generated by R package coxed. I use my own code (not the package's ...

statistischegent

55

asked Sep 27 at 17:14

3 votes

0 answers

93 views

Running the Breusch-Pagan test manually in R assuming a weighted linear regression

I am trying to run the Breusch-Pagan test manually in RStudio from a weighted linear model (wi = 1/x^2). I need help verifying whether the following rationale is correct: What I did: WLS and residuals ...

finattisaka

31

asked Sep 26 at 1:30

11 votes

3 answers

731 views

How can calibration plots for my model's predictions look good while the standard metrics (ROC AUC, F-score, etc.) look poor?

Background I trained an XGBoost model to predict a dichotomous outcome, which has a base rate of about 55% in the overall sample. This model will not be used to classify, however: It will be used ...

Mark White

11.7k

asked Sep 10 at 14:10

0 votes

0 answers

27 views

Calibration in Cox regression

I am running a Cox regression model to examine the predictive performance of a model using one variable (a risk assessment score) to predict days to crime. As part of the modeling, we are examining ...

Will L Xu

1

asked Aug 27 at 4:13

1 vote

0 answers

94 views

Raking survey weights of specific subgroups

The Question: what are the best practices for raking the survey weights of a specific subsample while leaving the remaining sample (relatively) unchanged for later comparative analyses? The Situation: ...

Peter T

11

asked Jul 20 at 19:53

2 votes

0 answers

114 views

Logistic model giving mixed results

I am currently creating two logistic regression models (one with forward selection and one with LASSO) using R to predict whether a patient has a malignant or benign breast cancer from this dataset: ...

Leo_Miche

43

asked Jul 16 at 14:16

1 vote

1 answer

94 views

A correct approach to validate/correct readings from similar sensors?

I am looking to apply a calibration/correction approach on a set of sensors and I just wanted to know that the approach I am going to use is statistically correct and acceptable. I am using a set of ...

Milad

157

asked Jul 14 at 11:41

1 vote

0 answers

76 views

Calibration with all data: data-poor scenarios [closed]

I’m working on species distribution modeling with binary data (presence / absence, 1 / 0). My target species is extremely rare (prevalence ~0.014), so my dataset is almost all zeros and just a handful ...

LolaRT96

19

asked Jul 10 at 9:05

2 votes

1 answer

123 views

Calibration of score derived from model

I am looking to externally validate the calibration of a model someone else developed using a different dataset. The original model was developed by using linear regression, and then the weights were ...

Pink Flamingos

31

asked Apr 30 at 23:05

0 votes

0 answers

81 views

Interpreting Multicollinear Models with SHAP: Challenges with XGBoost and Isotonic Regression

I am familiar with SHAP and often use it when developing or assessing ML models. I want to use SHAP in a new context. I'm working on a project that relies on an XGBoost Classifier, which outputs ...

odd

1

asked Apr 8 at 13:27

1 vote

1 answer

59 views

How to obtain the same AUC using isotonic regression in R?

I am trying to calibrate the predicted probabilities using isotonic regression for binary outcome model in R. I know that calibrating probabilities should not change the AUC. But the following R ...

Phoebe

163

asked Mar 17 at 16:33

1 vote

0 answers

42 views

Calibrated Classifier on Training Data [closed]

If I am using a GridSearchCV to find hyper parameters on a training set; if I were to run a CalibriatedClassifierCV to tune my probabilities, would it suffice to fit the CalibraitedClassifierCV with ...

user54565

89

asked Mar 17 at 4:37

6 votes

1 answer

1k views

Why Isotonic Regression for Model Calibration?

It appears that isotonic regression is a popular method to calibrate models. I understand that isotonic guarantees a monotonically increasing or decreasing fit. However, if you can get a smoother fit, ...

SAS2Python

178

asked Jan 27 at 16:16

4 votes

3 answers

517 views

How to train a model with a small ECE (expected calibration error)?

If we train a deep learning model with cross entropy loss, we expect the model has a low cross entropy loss. Is there any way to train model to make the model get a small expected calibration error,...

Bayesian Hat

171

asked Jan 20 at 19:19

0 votes

0 answers

90 views

How to tune hyperparameters for low calibration error under small dataset

I'm studying which variant of variational autoencoders (VAE) gives better expected calibration error (ECE) (see also this doc) under small dataset. According to google's tuning playbook, to compare ...

Kaiwen

307

asked Dec 10, 2024 at 10:36

2 votes

1 answer

505 views

Calibration plot in survival analysis

I'm trying to evaluate my parametric proportional hazards and accelerated failure models in terms of calibration. There seem to be many ways to summarise calibration, but they give a single summary ...

Wojty

165

asked Aug 30, 2024 at 15:23

7 votes

2 answers

174 views

What does calibration mean when the outcome is not categorical?

In a situation where a binary variable is of interest and we want to predict the probability of either event (dog vs cat, say), it is common to talk about the calibration of the predictions, if the ...

Dave

72.9k

asked Aug 27, 2024 at 20:54

5 votes

3 answers

617 views

Welch t-test p-values are poorly calibrated for $N=2$ samples

I am performing a large number of Welch's t-tests (t-test with unequal variance) on very small sample sizes, often with only two samples per condition. I am finding the p-values are poorly calibrated: ...

emarti

151

asked Jul 17, 2024 at 18:56

0 votes

0 answers

118 views

rms::val.surv function estimated the same survival probability for all cases

I have fit a cox regression model, and used val.surv function to plot calibration plot to compare observed survival probability with predicted survival probability. ...

Xixuan Zhu

1

asked Jul 16, 2024 at 10:21

6 votes

1 answer

251 views

Why would `pROC::roc` calculate $\max\{AUC, 1 - AUC\}$ by default?

There is some interesting behavior in the pROC::roc function in R. ...

Dave

72.9k

asked Jul 1, 2024 at 17:47

1 vote

1 answer

91 views

Rank the sensitivity among multiple calibration curves

Let's say that, with a measurement device, we have a linear relationship between an output measurement I (in mV) and the concentration ...

Basj

632

asked May 27, 2024 at 12:44

7 votes

1 answer

630 views

What is happening behind the scenes when we use CalibratedClassifierCV without prefit?

From what I understood by reading sklearn Probability Calibration, when we run CalibratedClassifierCV we will fit "a regressor (called a calibrator) that maps the output of the classifier (as ...

andy mot

73

asked May 6, 2024 at 12:37

1 vote

0 answers

446 views

Calibrating CatBoostClassifier produces worse results

I'm performing multiclass probability prediction using CatBoostClassifier on a dataset with ~4000 rows, 13 features, 4 target classes. Dataset has outliers, but it is balanced. For this task I'm using ...

primadonna

43

asked Mar 29, 2024 at 16:14

7 votes

1 answer

170 views

What distribution assumptions do Gupta, Podkopaev & Ramdas (NeuroIPS 2020) think could be made?

A 2020 NeuroIPS paper by Gupta, Podkopaev & Ramdas addresses the calibration of outputs to binary “classification” models, admitting that the raw scores, despite perhaps being on $\left[0, 1\right]...

Dave

72.9k

asked Mar 12, 2024 at 23:08

1 vote

1 answer

81 views

Y axis of calibration plot: incidence per X vs percentage at risk

I am considering showing how mis-calibrated a cox proportional hazard model is by plotting the 10th percentiles of risk on the x axis vs the incidence per 100,000. For each bin in x I could plot data ...

brucezepplin

219

asked Mar 8, 2024 at 1:18

2 votes

1 answer

445 views

How to evaluate multi-class classifier on probability prediction task?

I have a balanced dataset where each object (song) has one of the four target class labels (mood of a song). Example: ID feature1 feture2 feature3 target_class 0 0.5 0.11 125 upbeat 1 0.23 0.75 136 ...

primadonna

43

asked Mar 7, 2024 at 19:57

3 votes

2 answers

258 views

What is a scoring rule for binary classification that is not dependent on the "difficulty" of classification?

Consider a model that predicts the probability of some binary event $Y$ (potentially given some features $X$). Denote the estimated probability of $Y$ occurring as $\hat{p}$. One possible choice for a ...

ischmidt20

560

asked Feb 6, 2024 at 21:51

0 votes

1 answer

207 views

Calibration in the large for continuous outcomes

I'm a bit confused around calibration in the large. I usually see it discussed in the context of binary outcomes, but am I correct in thinking it can also be valuable as a part of external validation ...

JeffR

1

asked Feb 3, 2024 at 22:01

3 votes

1 answer

268 views

How can model overconfidence coincide with accurate classifications?

Guo et al (ICML 2017) state the following. During training, after the model is able to correctly classify (almost) all training samples, NLL can be further minimized by increasing the confidence of ...

Dave

72.9k

asked Jan 27, 2024 at 1:44

2 votes

0 answers

85 views

How should I combine “typical” and “yesterday” self-reports?

I have inherited a long-running survey with two measures of individual behavior. Edits: clarifying that this is not about drinking behavior; it’s not, and I only used that to try and illustrate. It ...

dholstius

101

asked Jan 25, 2024 at 2:34

1 vote

1 answer

101 views

Hosmer-Lemeshow Calibration error when predicted probabilities are clustered

I have extracted predicted probabilities (logistic model) from a graph according to the nine classes of a certain variable (I don't own the model). I need to compare the predicted probabilities, that ...

vixxovs

45

asked Jan 21, 2024 at 18:21

1 vote

0 answers

332 views

LightGBM Regressor miscalibratred/underestimating on high fitted values and overestimating on low fitted values

I'm training a pretty standard LightGBM regressor and noticing a strange pattern with the residuals (see images below--I'm bunching the predicted values and taking the observed average for the group). ...

dfried

201

asked Dec 13, 2023 at 22:59

4 votes

0 answers

169 views

Can the calibration-discrimination decomposition of Brier score be viewed as the bias-variance decomposition of mean squared error?

The mean squared error has a famous decomposition into bias and variance. $$ \text{MSE} = \text{bias}^2 + \text{var} $$ Brier score is also a mean squared error calculation, and Brier score has a ...

Dave

72.9k

asked Nov 14, 2023 at 18:35

1 vote

0 answers

83 views

How to quantify the quality of a graphed calibration curve?

In his Is Medicine Mesmerized by Machine Learning? blog article, Frank Harrell shows a calibration curve (below) and states that it is quite poor. I follow the logic: the claimed probability of $0.20$...

Dave

72.9k

asked Nov 10, 2023 at 3:50

1 vote

0 answers

104 views

Temperature scaling a bayesian neural network?

I am trying to calibrate a Bayesian neural network. I have already approximated the posterior density for its weights. In order to make predictions the Bayesian way, I am taking samples from the ...

Randomdude

11

asked Nov 7, 2023 at 18:23

1 vote

1 answer

56 views

How to diagonalise when there is less parameters to estimate than data in the Levenberg-Marquardt algorithm

I am trying to calibrate a Heston Model with 100 call options using this paper https://arxiv.org/pdf/1511.08718.pdf. In algorithm 4.1 on page 18, they define the dampening factor as: $$\mu_0 = \omega \...

THATS MY QUANT MY QUANTITATIVE

131

asked Nov 1, 2023 at 8:03

0 votes

0 answers

61 views

Comparing proxy metrics to human evaluations

I have two proxy metrics, and I'd like to see which of them correlates more strongly with human ratings. I have ~30 questions, and for each question 3 humans independently give a score on a 1-10 scale....

augray

101

asked Oct 24, 2023 at 1:01

2 votes

1 answer

317 views

XGBoost Calibration for weighted loss function

I am currently using XGBoost (in R) to perform multiclass classification. I am using merror=eval_metric and my objective is <...

HeyCool08

125

asked Oct 16, 2023 at 15:56

2 votes

1 answer

203 views

Do uncalibrated "probability" predictions satisfy Kolmogorov's axioms?

Let's say we have some binary variable of interest and fit a model to predict the probability of the two classes, say a logistic regression or a "classification" neural network. This model ...

Dave

72.9k

asked Sep 12, 2023 at 20:22

1 vote

0 answers

240 views

Is perfect isotonic probability calibration realistic?

I work with a labelled tabular dataset of about 1 million observations, with the target being binary. The dataset is heavily imbalanced - about 0.5% positive class. I have trained a gradient boosting ...

StrLdn

11

asked Sep 6, 2023 at 20:28

5 votes

0 answers

265 views

Understanding a calibration plot for lightGBM binary classifier

I wanted to assess the performance of my lightGBM classifier using a calibration plot. If I understood correctly, a calibration plot visualizes the alignment between the predicted probabilities by the ...

Programming Noob

763

asked Aug 27, 2023 at 19:00

1 vote

0 answers

111 views

Assessing uncertainty calibration in regression using the CDF

I have a labelled data set with $n$ data points $(x_i, y_i)$ with $x_i \in \mathbb{R}^k$ and $y_i \in \mathbb{R}$ and I trained a model $f: \mathbb{R}^k \to \mathbb{R} \times \mathbb{R}^+$ on some of ...

PascalIv

921

asked Aug 11, 2023 at 10:23

1 vote

0 answers

62 views

How to get standard error from constrained optimization problem in R?

Can I get standard error from a constrained optimization problem in R? I have calculated transition probabilities. Now I am trying to calibrate it. Using these transition probabilities I have ...

Md. Zubab Ibne Moid

11

asked Jul 31, 2023 at 19:29

0 votes

1 answer

262 views

Model performance with multiply imputed data

I would like to know how to do calibration plot with Hosmer-lemeshow test and AUC for ROC curve after multiple imputation in R. I build one prediction model and tried to do model performance but ...

Haruka Hayashi

3

asked Jun 28, 2023 at 13:27

1 vote

0 answers

204 views

Model calibration in overfitted models

Why in Shrinkage, due to an overfitted prediction model, do we tend to overestimate risk for "high risk" subjects and to underestimate risk for "low risk" subjects ? Intuitively I ...

vixxovs

45

asked Jun 18, 2023 at 13:40

1 vote

1 answer

305 views

Optimizing a threshold value on a dependent metric using a classifier trained to optimize a threshold-independent metric

Is it a reasonable approach to train a probabilities classifier by optimizing a threshold-independent metric such as AUC, and then using the trained classifier to calibrate the decision threshold ...

Amit S

77

asked Jun 16, 2023 at 0:16

0 votes

0 answers

115 views

Calibration plot without binning predictions

Similar to ths question I would like to know how to create a calibration curve without binning my predictions. What makes my situation different, is that I'm using icenReg for my interval-censored ...

Wojty

165

asked Jun 8, 2023 at 15:03

1 vote

1 answer

148 views

How does someone achieve a desired confidence / accuracy when measuring using uncalibrated instrument?

I have an instrument that measures a value. It is only possible to measure the value once i.e. the experiment can't be repeated (think recording a car's speed as it drives past). The instrument is not ...

Chuck

91

asked Jun 7, 2023 at 19:30

Questions tagged [calibration]