Questions tagged [calibration]
Calibration can refer to adjustment of measurements to agree with value of some standard; to transform classifier scores into class membership probabilities; etc. Do not use for predicting an explanatory variable from an observation of the dependent variable, for that use the tag inverse-prediction.
341 questions
2
votes
0
answers
68
views
How can I best evaluate whether uncertainties (confidence/credible) are of appropriately width, in simulations?
I would like to evaluate how well two experimental designs perform with the goal of parameter estimation. I'm generating 1000 simulated datasets for each design and fitting the same model to all of ...
2
votes
1
answer
34
views
Tune a Model on Calibrated or Uncalibrated Probabilties?
I report model performance using log loss on calibrated probabilities, where calibration is temperature scaling fitted on train-only out-of-fold (OOF) predictions.
For hyperparameter tuning, should ...
5
votes
2
answers
95
views
Paradoxical effect of right censoring on D-calibration in survival analysis
I am examining the effect of (right-)censoring on the D-calibration of survival data sets. The data sets are completely synthetic, generated by R package coxed. I use my own code (not the package's ...
3
votes
0
answers
93
views
Running the Breusch-Pagan test manually in R assuming a weighted linear regression
I am trying to run the Breusch-Pagan test manually in RStudio from a weighted linear model (wi = 1/x^2). I need help verifying whether the following rationale is correct:
What I did:
WLS and residuals
...
11
votes
3
answers
731
views
How can calibration plots for my model's predictions look good while the standard metrics (ROC AUC, F-score, etc.) look poor?
Background
I trained an XGBoost model to predict a dichotomous outcome, which has a base rate of about 55% in the overall sample. This model will not be used to classify, however: It will be used ...
0
votes
0
answers
27
views
Calibration in Cox regression
I am running a Cox regression model to examine the predictive performance of a model using one variable (a risk assessment score) to predict days to crime. As part of the modeling, we are examining ...
1
vote
0
answers
94
views
Raking survey weights of specific subgroups
The Question: what are the best practices for raking the survey weights of a specific subsample while leaving the remaining sample (relatively) unchanged for later comparative analyses?
The Situation: ...
2
votes
0
answers
114
views
Logistic model giving mixed results
I am currently creating two logistic regression models (one with forward selection and one with LASSO) using R to predict whether a patient has a malignant or benign breast cancer from this dataset: ...
1
vote
1
answer
94
views
A correct approach to validate/correct readings from similar sensors?
I am looking to apply a calibration/correction approach on a set of sensors and I just wanted to know that the approach I am going to use is statistically correct and acceptable.
I am using a set of ...
1
vote
0
answers
76
views
Calibration with all data: data-poor scenarios [closed]
I’m working on species distribution modeling with binary data (presence / absence, 1 / 0). My target species is extremely rare (prevalence ~0.014), so my dataset is almost all zeros and just a handful ...
2
votes
1
answer
123
views
Calibration of score derived from model
I am looking to externally validate the calibration of a model someone else developed using a different dataset.
The original model was developed by using linear regression, and then the weights were ...
0
votes
0
answers
81
views
Interpreting Multicollinear Models with SHAP: Challenges with XGBoost and Isotonic Regression
I am familiar with SHAP and often use it when developing or assessing ML models. I want to use SHAP in a new context. I'm working on a project that relies on an XGBoost Classifier, which outputs ...
1
vote
1
answer
59
views
How to obtain the same AUC using isotonic regression in R?
I am trying to calibrate the predicted probabilities using isotonic regression for binary outcome model in R. I know that calibrating probabilities should not change the AUC. But the following R ...
1
vote
0
answers
42
views
Calibrated Classifier on Training Data [closed]
If I am using a GridSearchCV to find hyper parameters on a training set; if I were to run a CalibriatedClassifierCV to tune my probabilities, would it suffice to fit the CalibraitedClassifierCV with ...
6
votes
1
answer
1k
views
Why Isotonic Regression for Model Calibration?
It appears that isotonic regression is a popular method to calibrate models. I understand that isotonic guarantees a monotonically increasing or decreasing fit.
However, if you can get a smoother fit, ...
4
votes
3
answers
517
views
How to train a model with a small ECE (expected calibration error)?
If we train a deep learning model with cross entropy loss, we expect the model has a low cross entropy loss. Is there any way to train model to make the model get a small expected calibration error,...
0
votes
0
answers
90
views
How to tune hyperparameters for low calibration error under small dataset
I'm studying which variant of variational autoencoders (VAE) gives better expected calibration error (ECE) (see also this doc) under small dataset. According to google's tuning playbook, to compare ...
2
votes
1
answer
505
views
Calibration plot in survival analysis
I'm trying to evaluate my parametric proportional hazards and accelerated failure models in terms of calibration. There seem to be many ways to summarise calibration, but they give a single summary ...
7
votes
2
answers
174
views
What does calibration mean when the outcome is not categorical?
In a situation where a binary variable is of interest and we want to predict the probability of either event (dog vs cat, say), it is common to talk about the calibration of the predictions, if the ...
5
votes
3
answers
617
views
Welch t-test p-values are poorly calibrated for $N=2$ samples
I am performing a large number of Welch's t-tests (t-test with unequal variance) on very small sample sizes, often with only two samples per condition. I am finding the p-values are poorly calibrated: ...
0
votes
0
answers
118
views
rms::val.surv function estimated the same survival probability for all cases
I have fit a cox regression model, and used val.surv function to plot calibration plot to compare observed survival probability with predicted survival probability.
...
6
votes
1
answer
251
views
Why would `pROC::roc` calculate $\max\{AUC, 1 - AUC\}$ by default?
There is some interesting behavior in the pROC::roc function in R.
...
1
vote
1
answer
91
views
Rank the sensitivity among multiple calibration curves
Let's say that, with a measurement device, we have a linear relationship between an output measurement I (in mV) and the concentration ...
7
votes
1
answer
630
views
What is happening behind the scenes when we use CalibratedClassifierCV without prefit?
From what I understood by reading sklearn Probability Calibration, when we run CalibratedClassifierCV we will fit "a regressor (called a calibrator) that maps the output of the classifier (as ...
1
vote
0
answers
446
views
Calibrating CatBoostClassifier produces worse results
I'm performing multiclass probability prediction using CatBoostClassifier on a dataset with ~4000 rows, 13 features, 4 target classes. Dataset has outliers, but it is balanced.
For this task I'm using ...
7
votes
1
answer
170
views
What distribution assumptions do Gupta, Podkopaev & Ramdas (NeuroIPS 2020) think could be made?
A 2020 NeuroIPS paper by Gupta, Podkopaev & Ramdas addresses the calibration of outputs to binary “classification” models, admitting that the raw scores, despite perhaps being on $\left[0, 1\right]...
1
vote
1
answer
81
views
Y axis of calibration plot: incidence per X vs percentage at risk
I am considering showing how mis-calibrated a cox proportional hazard model is by plotting the 10th percentiles of risk on the x axis vs the incidence per 100,000. For each bin in x I could plot data ...
2
votes
1
answer
445
views
How to evaluate multi-class classifier on probability prediction task?
I have a balanced dataset where each object (song) has one of the four target class labels (mood of a song). Example:
ID
feature1
feture2
feature3
target_class
0
0.5
0.11
125
upbeat
1
0.23
0.75
136
...
3
votes
2
answers
258
views
What is a scoring rule for binary classification that is not dependent on the "difficulty" of classification?
Consider a model that predicts the probability of some binary event $Y$ (potentially given some features $X$). Denote the estimated probability of $Y$ occurring as $\hat{p}$. One possible choice for a ...
0
votes
1
answer
207
views
Calibration in the large for continuous outcomes
I'm a bit confused around calibration in the large. I usually see it discussed in the context of binary outcomes, but am I correct in thinking it can also be valuable as a part of external validation ...
3
votes
1
answer
268
views
How can model overconfidence coincide with accurate classifications?
Guo et al (ICML 2017) state the following.
During training, after the model is able to correctly classify (almost) all training samples, NLL can be further minimized by increasing the confidence of ...
2
votes
0
answers
85
views
How should I combine “typical” and “yesterday” self-reports?
I have inherited a long-running survey with two measures of individual behavior.
Edits: clarifying that this is not about drinking behavior; it’s not, and I only used that to try and illustrate. It ...
1
vote
1
answer
101
views
Hosmer-Lemeshow Calibration error when predicted probabilities are clustered
I have extracted predicted probabilities (logistic model) from a graph according to the nine classes of a certain variable (I don't own the model).
I need to compare the predicted probabilities, that ...
1
vote
0
answers
332
views
LightGBM Regressor miscalibratred/underestimating on high fitted values and overestimating on low fitted values
I'm training a pretty standard LightGBM regressor and noticing a strange pattern with the residuals (see images below--I'm bunching the predicted values and taking the observed average for the group). ...
4
votes
0
answers
169
views
Can the calibration-discrimination decomposition of Brier score be viewed as the bias-variance decomposition of mean squared error?
The mean squared error has a famous decomposition into bias and variance.
$$
\text{MSE} = \text{bias}^2 + \text{var}
$$
Brier score is also a mean squared error calculation, and Brier score has a ...
1
vote
0
answers
83
views
How to quantify the quality of a graphed calibration curve?
In his Is Medicine Mesmerized by Machine Learning? blog article, Frank Harrell shows a calibration curve (below) and states that it is quite poor.
I follow the logic: the claimed probability of $0.20$...
1
vote
0
answers
104
views
Temperature scaling a bayesian neural network?
I am trying to calibrate a Bayesian neural network. I have already approximated the posterior density for its weights. In order to make predictions the Bayesian way, I am taking samples from the ...
1
vote
1
answer
56
views
How to diagonalise when there is less parameters to estimate than data in the Levenberg-Marquardt algorithm
I am trying to calibrate a Heston Model with 100 call options using this paper https://arxiv.org/pdf/1511.08718.pdf.
In algorithm 4.1 on page 18, they define the dampening factor as: $$\mu_0 = \omega \...
0
votes
0
answers
61
views
Comparing proxy metrics to human evaluations
I have two proxy metrics, and I'd like to see which of them correlates more strongly with human ratings. I have ~30 questions, and for each question 3 humans independently give a score on a 1-10 scale....
2
votes
1
answer
317
views
XGBoost Calibration for weighted loss function
I am currently using XGBoost (in R) to perform multiclass classification. I am using merror=eval_metric and my objective is <...
2
votes
1
answer
203
views
Do uncalibrated "probability" predictions satisfy Kolmogorov's axioms?
Let's say we have some binary variable of interest and fit a model to predict the probability of the two classes, say a logistic regression or a "classification" neural network. This model ...
1
vote
0
answers
240
views
Is perfect isotonic probability calibration realistic?
I work with a labelled tabular dataset of about 1 million observations, with the target being binary. The dataset is heavily imbalanced - about 0.5% positive class.
I have trained a gradient boosting ...
5
votes
0
answers
265
views
Understanding a calibration plot for lightGBM binary classifier
I wanted to assess the performance of my lightGBM classifier using a calibration plot. If I understood correctly, a calibration plot visualizes the alignment between the predicted probabilities by the ...
1
vote
0
answers
111
views
Assessing uncertainty calibration in regression using the CDF
I have a labelled data set with $n$ data points $(x_i, y_i)$ with $x_i \in \mathbb{R}^k$ and $y_i \in \mathbb{R}$ and I trained a model $f: \mathbb{R}^k \to \mathbb{R} \times \mathbb{R}^+$ on some of ...
1
vote
0
answers
62
views
How to get standard error from constrained optimization problem in R?
Can I get standard error from a constrained optimization problem in R? I have calculated transition probabilities. Now I am trying to calibrate it. Using these transition probabilities I have ...
0
votes
1
answer
262
views
Model performance with multiply imputed data
I would like to know how to do calibration plot with Hosmer-lemeshow test and AUC for ROC curve after multiple imputation in R. I build one prediction model and tried to do model performance but ...
1
vote
0
answers
204
views
Model calibration in overfitted models
Why in Shrinkage, due to an overfitted prediction model, do we tend to overestimate risk for "high risk" subjects and to underestimate risk for "low risk" subjects ?
Intuitively I ...
1
vote
1
answer
305
views
Optimizing a threshold value on a dependent metric using a classifier trained to optimize a threshold-independent metric
Is it a reasonable approach to train a probabilities classifier by optimizing a threshold-independent metric such as AUC, and then using the trained classifier to calibrate the decision threshold ...
0
votes
0
answers
115
views
Calibration plot without binning predictions
Similar to ths question I would like to know how to create a calibration curve without binning my predictions.
What makes my situation different, is that I'm using icenReg for my interval-censored ...
1
vote
1
answer
148
views
How does someone achieve a desired confidence / accuracy when measuring using uncalibrated instrument?
I have an instrument that measures a value.
It is only possible to measure the value once i.e. the experiment can't be repeated (think recording a car's speed as it drives past).
The instrument is not ...