Questions tagged [predictive-models]
Predictive models are statistical models whose primary purpose is to predict other observations of a system optimally, as opposed to models whose purpose is to test a particular hypothesis or explain a phenomenon mechanistically. As such, predictive models place less emphasis on interpretability and more emphasis on performance.
67 questions from the last 365 days
0
votes
0
answers
26
views
What data mining freeware is available that replicates SAS EMiner's interactive Decision Tree node?
Its 2025, and yes I'm still using SAS EMiner's Decision Tree..... If anyone knows a modern freeware version that replicates the Interactive mode effectively (with controlling split cutoff values, a ...
0
votes
0
answers
47
views
Interpreting the predicted values from family = poisson(link="log") , binary outcome
I am fitting a simple model for dataset where the outcome is binary (1 or 0).
...
1
vote
0
answers
34
views
Confidence threshold for random forest type = "prob" new data
I have a nice multiclass random forest model in R (using the packages ranger and caret) but I think this question applies to any random forest logic.
When I use my RF to label unknown data I want to ...
4
votes
2
answers
374
views
Is it okay in prediction problems to put post-outcome features in the model?
I am relatively new to machine learning. I see many examples of practices where people include variables that are only available after the outcome variable (Y) to make predictions.
An example of this ...
1
vote
0
answers
18
views
How to use a hierarchical Bayesian model to combine regional and country-level data for TPES projections?
I’m trying to project TPES (Total Primary Energy Supply) by country in Africa up to the year 2100 under different SSP (Shared Socioeconomic Pathways) scenarios, the same framework used in the latest ...
2
votes
1
answer
101
views
Validating a new metric using two-period panel data
Suppose I have two metrics, x and y. I have measures for a few dozen units on both metrics, at time 1 and at time 2.
I want to validate metric y, so that future users can use it as a substitute for ...
0
votes
1
answer
118
views
How does the math of predict work for `lm` with `poly`?
I understand orthogonal polynomials (perhaps not the discrete ones?) but I don't understand how predict exactly handles polynomials with different number of data points i.e. different x-values and ...
7
votes
1
answer
167
views
Statistical modeling with only a single data point
The vast majority of statistical literature involves having a dataset which can be partitioned into $n$ data points, $\mathbf{x} = \{x_1,...,x_n\}$ constructing a model for the individual data ...
11
votes
3
answers
731
views
How can calibration plots for my model's predictions look good while the standard metrics (ROC AUC, F-score, etc.) look poor?
Background
I trained an XGBoost model to predict a dichotomous outcome, which has a base rate of about 55% in the overall sample. This model will not be used to classify, however: It will be used ...
0
votes
0
answers
42
views
Predicting occurrence of event following observation of string
I want to model the probability of an event occurring, given that a string has occurred. Or, in other words, predict which event is more likely to happen, given that the string was observed.
These are ...
1
vote
1
answer
68
views
Predicting global outcomes with logistic model
I have a database of many employees, and i want to estimate how many are going to retire next year, based on many retired last year. So i thought about a logistic model like
glm(retire ~ age2025 + ...
1
vote
1
answer
82
views
Recall in AncestryDNA white paper
I was reading through the company white paper for AncestryDNA, which gives DNA ancestry estimates to individuals who are willing to send them a saliva sample.
In their 2024 white paper they list the ...
7
votes
2
answers
456
views
Confidence intervals for predictions in ggeffects are outside the possible range of probabilities
I ran this lognormal hurdle GLMM using the R package glmmTMB:
...
0
votes
0
answers
89
views
Can I use confusion matrix for prediction?
TLDR : confusion matrix is used to validate a model. But I also want to make predictions using my models. Can I use the confusion matrix to make predictions? I don't see any other way to do it, but I ...
2
votes
0
answers
137
views
Prediction for glmm (correcting for bias due to jensens inequality?)
I am trying to decide on the best method for producing model predictions (for graphing) from my generalized linear mixed effects model. I am interested in getting marginal predictions (i.e., what the ...
2
votes
1
answer
83
views
Quantile-Based Analysis for Predictive Power Study
I’m currently conducting a statistical study to evaluate whether a given factor has predictive power over another variable—such as future returns. As part of this, I’ve been analyzing the mean and ...
1
vote
1
answer
73
views
Suitable metric to compare between two counts for H3 data
I have a set of H3 hexagons (spatial clustering of data) with counts for each hex over 2024 and 2025. I want to plot the relative change in counts for each hex, but my current method is unacceptable: ...
1
vote
1
answer
102
views
Predicting cyclical time series with non uniform sampled data
I want to predict with R the next month consumption (methane gas) with fair confidence (lets say 80%), based on:
the historical data
on the last month consumption
...
2
votes
1
answer
139
views
Forecasting supermarket prices using survival analysis
I need some help/feedback on an approach for my bachelor’s thesis.
I'm pretty new to this field, so I'm keen to learn!
The general topic is that I want to forecast discounts in the supermarket to help ...
0
votes
0
answers
45
views
Prediction of optimum variables through XGboost
I have a large dataset of soil moisture data (satellite) and water table depths (measurements).
I would like to derive the optimum soil moisture levels to predict the water table depths most ...
0
votes
0
answers
65
views
A simple-ish way of estimating the number of modes, and the 'pronounced'-ness of said modes of a discrete, finite distribution
Intuitively, let's say we're given a price $p$ for some product, and we want to compare the prices with what's available on the market (ex: to determine if we're being ripped off or not).
We come back ...
0
votes
0
answers
73
views
Multivariable linear regression model with continuous predictors with a spike at 0
I want to build a prediction model of a continuous outcome Y. I have ~50 predictors that are count variables (number of hospitalizations by cause, number of drugs dispensed by type of drug). I was ...
3
votes
1
answer
119
views
How can I estimate individual-level linear model predictions a latent class mixed model using lcmm package in R?
I am trying to complete the following statistical analyses using lcmm package in R, using longitudinal data with repeated survey question responses from the same people over time:
Model the repeated ...
0
votes
0
answers
46
views
Reclassifying transport mode choices to binary in random forest
I am building a model to predict mode choice, with a primary focus on cycling. Multinomial logistic regression fails to predict cycling well, so I choose to use random forest instead, with promising ...
0
votes
0
answers
79
views
Variable selection: Explanatory model with very low sample size
I am currently doing a research where I am finding the relationship between the quality of wastewater (e.g. biochemical oxygen demand, amount of nitrogen...) and regional characteristics of that ...
0
votes
0
answers
49
views
One-Step Ahead Forecasting with TensorFlow Structural Time Series
I have the following situation: I’m given a univariate time-series dataset $y$ that I wish to model using feature variables $X$, which are provided alongside $y$. Naturally, I split the data into a ...
5
votes
1
answer
191
views
Combining vs. Separating Predictors: What’s Better for Prediction
I'm using two independent predictors, A and B (Pearson correlation = 0), both standardized to the same scale, to predict a binary disease outcome using logistic regression.
I'm comparing two modeling ...
2
votes
2
answers
275
views
Standard error of the root mean squared predition error (RMSE) and its use in simulation studies of prediction models
In the context of statistical prediction models, one is often interested in the predictive accuracy of the model. A common model choice is the root mean squared error (RMSE), which is also also called ...
0
votes
0
answers
59
views
Standard Error of fitted value at breakpoint (segmented regression)
I am currently using the "segmented.lm" function to detect a change point in my data. At this stage I am trying to figure out how to derive the SE of the y value of the corresponding change ...
1
vote
1
answer
105
views
mgcv gam prediction model for deployment - what to do with terms shrunk out of the model?
I am developing a gam prediction model in the mgcv R package and turned on extra shrinkage using the select = TRUE argument. As I understand it, smooths that shrink "very small" are ...
4
votes
1
answer
199
views
Shrinkage in logistic regression prediction model: can we "remove" a predictor whose coefficient has shrunk to almost zero?
Suppose one is fitting a logistic regression to develop a clinical prediction model. In an effort to avoid overfitting, regularization is used (e.g. ridge, penalized maximum likelihood) where ...
1
vote
1
answer
98
views
Interpreting odds ratios greater than 1 , predicted odds less than 1
I am fitting an interrupted time series model to analyze a binary outcome: whether a woman reported feeding the child solid food within the first six months of birth (Yes/No).
The main exposure is ...
1
vote
0
answers
53
views
How do I go about refining my ARX model in R
I face a few issues where im trying to predict my dependent variable Y. I have 6 different independent external variables with one of them being lag(1) of the dependent variable Y. I differenced all ...
1
vote
1
answer
292
views
SHAP values across different groups
I developed and compared four ML models via Random Forest, Support Vector Machine, Logistic Regression, and Xgboost (tidymodels R package) algorithms using data without stratification by age groups. ...
1
vote
0
answers
76
views
Why minimise Calibration Error rather than MSE? Context: LLM Hallucination [closed]
In the discussion of Large Language Model hallucination phenomenon, people are interested in measuring and reducing the calibration error of the model predictions. However, what makes this situation ...
1
vote
0
answers
97
views
How would you show that $\text{cov}(ε, \hat{y})=0$? [closed]
I’m working on proving the distribution of the prediction error in the OLS model, but I get stuck when trying to compute the variance because after having calculated the variance of $\hat{y}$, I get ...
10
votes
2
answers
380
views
Misgivings about the notion that AUC is an incoherent model comparison method
An influential 2009 paper, Measuring classifier performance: A coherent alternative to the area under the ROC curve, argues that the Area Under the Curve (AUC) "is fundamentally incoherent in ...
0
votes
0
answers
29
views
smoothing parameters
I am doing a cases study and I need to forecast the sales. I am using multi linear regression and also winter's method and the decomposition approach with Holt's method. I am using these methods as ...
0
votes
0
answers
61
views
Model prediction is more accurate with substitued left-censored data than with imputed
I have a set of environmental variables that are left-censored (measurements of elements in my samples). I have two datasets, one dataset with samples with known origins and one dataset with samples ...
1
vote
0
answers
45
views
Distribution of response in simple linear regression with normal errors
Suppose I estimate
$$ Y_t = \alpha + \beta \times X_t + \varepsilon_t$$
via OLS, where $\varepsilon_t \sim \mathcal{N}(0, \sigma^2)$ is independent across observations. It is a standard result that, ...
0
votes
0
answers
64
views
Low fitted values in BAM models causing peculiar predict plots
I have run three models, one per season, on a dataset of animal points, using soap smoothers on a lake.
...
0
votes
0
answers
47
views
Pipeline for a bayesian network algorithm
I'd like to build a bayesian network that allows me to predict the most effective treatment sequence for a given treatment.
In the most simple case scenario I would have 2 treatments across 2 ...
2
votes
0
answers
83
views
Calibration of prediction model
I would be grateful for advice, hints, or reference on a question about predicting population level rates of in-hospital complications in older people.
I have routine hospital data and a high-quality ...
3
votes
2
answers
171
views
Purely theoretical measure of predictability
(Let's set aside how we might estimate this.)
I envision a setup where we have some space $\mathcal X$ of features and $\mathcal Y$ of outcomes, with each random variable $X_i\in\mathcal X$ ...
0
votes
0
answers
74
views
Prediction from GAMM-ZINB model does not match the original dependent variable
I have a dataset that includes one dependent variable, of which 47.2% of the values are zero, and 14 independent variables (1 numeric and 13 categorical). After testing for zero inflation using ...
5
votes
1
answer
125
views
Error of product of predictions
Suppose that we have two regression models $A$ and $B$ that predict values $\hat{a}$ and $\hat{b}$ but that we ultimately are interested in their product $\hat{a}\hat{b}$ (for instance, we may be ...
0
votes
1
answer
98
views
Does it make sense that a predictive model shows better discrimination ability and worse calibration than a less flexible one?
I have a data set with several thousands observations for both training set and test set, and I have defined two models (with the same covariates):
A Cox model
A Cox model with natural splines
which ...
1
vote
0
answers
78
views
Survival model predictions with counting process data
I am trying to get my head the utility of prediction - either with a Cox or a parametric survival model - when your dataset contains more than one row/person (i.e. when a time-varying covariate is ...
0
votes
0
answers
95
views
How to understand unexpected predictions from random forest in caret?
Using random forest in R to classify a small data set, 152 rows with 17 predictors. I can get through most of the steps I've seen in different tutorials without much trouble, but when I use ...
0
votes
0
answers
51
views
Predictive modeling on biased features
Some features I want to use for modeling have distributions like below:
There are high values of the features occurring frequently in my data. I can identify a subset of my data points that cause ...