Skip to main content

Questions tagged [predictive-models]

Predictive models are statistical models whose primary purpose is to predict other observations of a system optimally, as opposed to models whose purpose is to test a particular hypothesis or explain a phenomenon mechanistically. As such, predictive models place less emphasis on interpretability and more emphasis on performance.

Filter by
Sorted by
Tagged with
0 votes
0 answers
26 views

Its 2025, and yes I'm still using SAS EMiner's Decision Tree..... If anyone knows a modern freeware version that replicates the Interactive mode effectively (with controlling split cutoff values, a ...
Anthony Galka's user avatar
0 votes
0 answers
47 views

I am fitting a simple model for dataset where the outcome is binary (1 or 0). ...
Eagle Hawk's user avatar
1 vote
0 answers
34 views

I have a nice multiclass random forest model in R (using the packages ranger and caret) but I think this question applies to any random forest logic. When I use my RF to label unknown data I want to ...
Dr Egg's user avatar
  • 11
4 votes
2 answers
374 views

I am relatively new to machine learning. I see many examples of practices where people include variables that are only available after the outcome variable (Y) to make predictions. An example of this ...
Abdullah Abdelaziz's user avatar
1 vote
0 answers
18 views

I’m trying to project TPES (Total Primary Energy Supply) by country in Africa up to the year 2100 under different SSP (Shared Socioeconomic Pathways) scenarios, the same framework used in the latest ...
grégoire david's user avatar
2 votes
1 answer
101 views

Suppose I have two metrics, x and y. I have measures for a few dozen units on both metrics, at time 1 and at time 2. I want to validate metric y, so that future users can use it as a substitute for ...
Clara's user avatar
  • 123
0 votes
1 answer
118 views

I understand orthogonal polynomials (perhaps not the discrete ones?) but I don't understand how predict exactly handles polynomials with different number of data points i.e. different x-values and ...
Christoph's user avatar
  • 435
7 votes
1 answer
167 views

The vast majority of statistical literature involves having a dataset which can be partitioned into $n$ data points, $\mathbf{x} = \{x_1,...,x_n\}$ constructing a model for the individual data ...
jms's user avatar
  • 121
11 votes
3 answers
731 views

Background I trained an XGBoost model to predict a dichotomous outcome, which has a base rate of about 55% in the overall sample. This model will not be used to classify, however: It will be used ...
Mark White's user avatar
  • 11.7k
0 votes
0 answers
42 views

I want to model the probability of an event occurring, given that a string has occurred. Or, in other words, predict which event is more likely to happen, given that the string was observed. These are ...
Ricardo Antunes's user avatar
1 vote
1 answer
68 views

I have a database of many employees, and i want to estimate how many are going to retire next year, based on many retired last year. So i thought about a logistic model like glm(retire ~ age2025 + ...
FloLe's user avatar
  • 33
1 vote
1 answer
82 views

I was reading through the company white paper for AncestryDNA, which gives DNA ancestry estimates to individuals who are willing to send them a saliva sample. In their 2024 white paper they list the ...
H_1317's user avatar
  • 141
7 votes
2 answers
456 views

I ran this lognormal hurdle GLMM using the R package glmmTMB: ...
Michaela's user avatar
  • 229
0 votes
0 answers
89 views

TLDR : confusion matrix is used to validate a model. But I also want to make predictions using my models. Can I use the confusion matrix to make predictions? I don't see any other way to do it, but I ...
Siva Kg's user avatar
  • 23
2 votes
0 answers
137 views

I am trying to decide on the best method for producing model predictions (for graphing) from my generalized linear mixed effects model. I am interested in getting marginal predictions (i.e., what the ...
Stephanie Rivest's user avatar
2 votes
1 answer
83 views

I’m currently conducting a statistical study to evaluate whether a given factor has predictive power over another variable—such as future returns. As part of this, I’ve been analyzing the mean and ...
user73016's user avatar
1 vote
1 answer
73 views

I have a set of H3 hexagons (spatial clustering of data) with counts for each hex over 2024 and 2025. I want to plot the relative change in counts for each hex, but my current method is unacceptable: ...
tariqalr's user avatar
1 vote
1 answer
102 views

I want to predict with R the next month consumption (methane gas) with fair confidence (lets say 80%), based on: the historical data on the last month consumption ...
alex's user avatar
  • 163
2 votes
1 answer
139 views

I need some help/feedback on an approach for my bachelor’s thesis. I'm pretty new to this field, so I'm keen to learn! The general topic is that I want to forecast discounts in the supermarket to help ...
Pascal's user avatar
  • 21
0 votes
0 answers
45 views

I have a large dataset of soil moisture data (satellite) and water table depths (measurements). I would like to derive the optimum soil moisture levels to predict the water table depths most ...
Thomas's user avatar
  • 538
0 votes
0 answers
65 views

Intuitively, let's say we're given a price $p$ for some product, and we want to compare the prices with what's available on the market (ex: to determine if we're being ripped off or not). We come back ...
MergeMonster's user avatar
0 votes
0 answers
73 views

I want to build a prediction model of a continuous outcome Y. I have ~50 predictors that are count variables (number of hospitalizations by cause, number of drugs dispensed by type of drug). I was ...
Alex's user avatar
  • 301
3 votes
1 answer
119 views

I am trying to complete the following statistical analyses using lcmm package in R, using longitudinal data with repeated survey question responses from the same people over time: Model the repeated ...
Carly's user avatar
  • 33
0 votes
0 answers
46 views

I am building a model to predict mode choice, with a primary focus on cycling. Multinomial logistic regression fails to predict cycling well, so I choose to use random forest instead, with promising ...
SPet's user avatar
  • 33
0 votes
0 answers
79 views

I am currently doing a research where I am finding the relationship between the quality of wastewater (e.g. biochemical oxygen demand, amount of nitrogen...) and regional characteristics of that ...
Osuke Miyamaru's user avatar
0 votes
0 answers
49 views

I have the following situation: I’m given a univariate time-series dataset $y$ that I wish to model using feature variables $X$, which are provided alongside $y$. Naturally, I split the data into a ...
testing_dummy's user avatar
5 votes
1 answer
191 views

I'm using two independent predictors, A and B (Pearson correlation = 0), both standardized to the same scale, to predict a binary disease outcome using logistic regression. I'm comparing two modeling ...
zjppdozen's user avatar
  • 543
2 votes
2 answers
275 views

In the context of statistical prediction models, one is often interested in the predictive accuracy of the model. A common model choice is the root mean squared error (RMSE), which is also also called ...
Lukas D. Sauer's user avatar
0 votes
0 answers
59 views

I am currently using the "segmented.lm" function to detect a change point in my data. At this stage I am trying to figure out how to derive the SE of the y value of the corresponding change ...
a.henrietty's user avatar
1 vote
1 answer
105 views

I am developing a gam prediction model in the mgcv R package and turned on extra shrinkage using the select = TRUE argument. As I understand it, smooths that shrink "very small" are ...
user167591's user avatar
  • 1,173
4 votes
1 answer
199 views

Suppose one is fitting a logistic regression to develop a clinical prediction model. In an effort to avoid overfitting, regularization is used (e.g. ridge, penalized maximum likelihood) where ...
user167591's user avatar
  • 1,173
1 vote
1 answer
98 views

I am fitting an interrupted time series model to analyze a binary outcome: whether a woman reported feeding the child solid food within the first six months of birth (Yes/No). The main exposure is ...
Eagle Hawk's user avatar
1 vote
0 answers
53 views

I face a few issues where im trying to predict my dependent variable Y. I have 6 different independent external variables with one of them being lag(1) of the dependent variable Y. I differenced all ...
Hornet's user avatar
  • 11
1 vote
1 answer
292 views

I developed and compared four ML models via Random Forest, Support Vector Machine, Logistic Regression, and Xgboost (tidymodels R package) algorithms using data without stratification by age groups. ...
Data and data's user avatar
1 vote
0 answers
76 views

In the discussion of Large Language Model hallucination phenomenon, people are interested in measuring and reducing the calibration error of the model predictions. However, what makes this situation ...
Sasha Queequeg's user avatar
1 vote
0 answers
97 views

I’m working on proving the distribution of the prediction error in the OLS model, but I get stuck when trying to compute the variance because after having calculated the variance of $\hat{y}$, I get ...
wtr8m12's user avatar
  • 11
10 votes
2 answers
380 views

An influential 2009 paper, Measuring classifier performance: A coherent alternative to the area under the ROC curve, argues that the Area Under the Curve (AUC) "is fundamentally incoherent in ...
demim00nde's user avatar
0 votes
0 answers
29 views

I am doing a cases study and I need to forecast the sales. I am using multi linear regression and also winter's method and the decomposition approach with Holt's method. I am using these methods as ...
Forecast's user avatar
0 votes
0 answers
61 views

I have a set of environmental variables that are left-censored (measurements of elements in my samples). I have two datasets, one dataset with samples with known origins and one dataset with samples ...
AnneA's user avatar
  • 11
1 vote
0 answers
45 views

Suppose I estimate $$ Y_t = \alpha + \beta \times X_t + \varepsilon_t$$ via OLS, where $\varepsilon_t \sim \mathcal{N}(0, \sigma^2)$ is independent across observations. It is a standard result that, ...
bodhi's user avatar
  • 31
0 votes
0 answers
64 views

I have run three models, one per season, on a dataset of animal points, using soap smoothers on a lake. ...
mikejwilliamson's user avatar
0 votes
0 answers
47 views

I'd like to build a bayesian network that allows me to predict the most effective treatment sequence for a given treatment. In the most simple case scenario I would have 2 treatments across 2 ...
roybatty's user avatar
2 votes
0 answers
83 views

I would be grateful for advice, hints, or reference on a question about predicting population level rates of in-hospital complications in older people. I have routine hospital data and a high-quality ...
astaines's user avatar
  • 411
3 votes
2 answers
171 views

(Let's set aside how we might estimate this.) I envision a setup where we have some space $\mathcal X$ of features and $\mathcal Y$ of outcomes, with each random variable $X_i\in\mathcal X$ ...
Dave's user avatar
  • 72.9k
0 votes
0 answers
74 views

I have a dataset that includes one dependent variable, of which 47.2% of the values are zero, and 14 independent variables (1 numeric and 13 categorical). After testing for zero inflation using ...
Chao's user avatar
  • 333
5 votes
1 answer
125 views

Suppose that we have two regression models $A$ and $B$ that predict values $\hat{a}$ and $\hat{b}$ but that we ultimately are interested in their product $\hat{a}\hat{b}$ (for instance, we may be ...
49isprime's user avatar
0 votes
1 answer
98 views

I have a data set with several thousands observations for both training set and test set, and I have defined two models (with the same covariates): A Cox model A Cox model with natural splines which ...
Luigi's user avatar
  • 103
1 vote
0 answers
78 views

I am trying to get my head the utility of prediction - either with a Cox or a parametric survival model - when your dataset contains more than one row/person (i.e. when a time-varying covariate is ...
LucaS's user avatar
  • 1,099
0 votes
0 answers
95 views

Using random forest in R to classify a small data set, 152 rows with 17 predictors. I can get through most of the steps I've seen in different tutorials without much trouble, but when I use ...
John Polo's user avatar
  • 101
0 votes
0 answers
51 views

Some features I want to use for modeling have distributions like below: There are high values of the features occurring frequently in my data. I can identify a subset of my data points that cause ...
Jakub Małecki's user avatar