Newest 'predictive-models' Questions

0 votes

0 answers

26 views

What data mining freeware is available that replicates SAS EMiner's interactive Decision Tree node?

Its 2025, and yes I'm still using SAS EMiner's Decision Tree..... If anyone knows a modern freeware version that replicates the Interactive mode effectively (with controlling split cutoff values, a ...

Anthony Galka

1

asked Nov 26 at 0:07

0 votes

0 answers

47 views

Interpreting the predicted values from family = poisson(link="log") , binary outcome

I am fitting a simple model for dataset where the outcome is binary (1 or 0). ...

Eagle Hawk

37

asked Nov 13 at 2:42

1 vote

0 answers

34 views

Confidence threshold for random forest type = "prob" new data

I have a nice multiclass random forest model in R (using the packages ranger and caret) but I think this question applies to any random forest logic. When I use my RF to label unknown data I want to ...

Dr Egg

11

asked Nov 12 at 16:32

4 votes

2 answers

374 views

Is it okay in prediction problems to put post-outcome features in the model?

I am relatively new to machine learning. I see many examples of practices where people include variables that are only available after the outcome variable (Y) to make predictions. An example of this ...

Abdullah Abdelaziz

41

asked Oct 31 at 18:22

1 vote

0 answers

18 views

How to use a hierarchical Bayesian model to combine regional and country-level data for TPES projections?

I’m trying to project TPES (Total Primary Energy Supply) by country in Africa up to the year 2100 under different SSP (Shared Socioeconomic Pathways) scenarios, the same framework used in the latest ...

grégoire david

11

asked Oct 22 at 14:46

2 votes

1 answer

101 views

Validating a new metric using two-period panel data

Suppose I have two metrics, x and y. I have measures for a few dozen units on both metrics, at time 1 and at time 2. I want to validate metric y, so that future users can use it as a substitute for ...

Clara

123

asked Oct 13 at 15:36

0 votes

1 answer

118 views

How does the math of predict work for `lm` with `poly`?

I understand orthogonal polynomials (perhaps not the discrete ones?) but I don't understand how predict exactly handles polynomials with different number of data points i.e. different x-values and ...

Christoph

435

asked Oct 2 at 7:26

7 votes

1 answer

167 views

Statistical modeling with only a single data point

The vast majority of statistical literature involves having a dataset which can be partitioned into $n$ data points, $\mathbf{x} = \{x_1,...,x_n\}$ constructing a model for the individual data ...

jms

121

asked Sep 16 at 17:55

11 votes

3 answers

731 views

How can calibration plots for my model's predictions look good while the standard metrics (ROC AUC, F-score, etc.) look poor?

Background I trained an XGBoost model to predict a dichotomous outcome, which has a base rate of about 55% in the overall sample. This model will not be used to classify, however: It will be used ...

Mark White

11.7k

asked Sep 10 at 14:10

0 votes

0 answers

42 views

Predicting occurrence of event following observation of string

I want to model the probability of an event occurring, given that a string has occurred. Or, in other words, predict which event is more likely to happen, given that the string was observed. These are ...

Ricardo Antunes

1

asked Sep 5 at 14:39

1 vote

1 answer

68 views

Predicting global outcomes with logistic model

I have a database of many employees, and i want to estimate how many are going to retire next year, based on many retired last year. So i thought about a logistic model like glm(retire ~ age2025 + ...

FloLe

33

asked Aug 27 at 15:38

1 vote

1 answer

82 views

Recall in AncestryDNA white paper

I was reading through the company white paper for AncestryDNA, which gives DNA ancestry estimates to individuals who are willing to send them a saliva sample. In their 2024 white paper they list the ...

H_1317

141

asked Aug 25 at 1:33

7 votes

2 answers

456 views

Confidence intervals for predictions in ggeffects are outside the possible range of probabilities

I ran this lognormal hurdle GLMM using the R package glmmTMB: ...

Michaela

229

asked Aug 14 at 14:50

0 votes

0 answers

89 views

Can I use confusion matrix for prediction?

TLDR : confusion matrix is used to validate a model. But I also want to make predictions using my models. Can I use the confusion matrix to make predictions? I don't see any other way to do it, but I ...

Siva Kg

23

asked Aug 13 at 9:36

2 votes

0 answers

137 views

Prediction for glmm (correcting for bias due to jensens inequality?)

I am trying to decide on the best method for producing model predictions (for graphing) from my generalized linear mixed effects model. I am interested in getting marginal predictions (i.e., what the ...

Stephanie Rivest

137

asked Aug 6 at 15:49

2 votes

1 answer

83 views

Quantile-Based Analysis for Predictive Power Study

I’m currently conducting a statistical study to evaluate whether a given factor has predictive power over another variable—such as future returns. As part of this, I’ve been analyzing the mean and ...

user73016

21

asked Aug 6 at 3:11

1 vote

1 answer

73 views

Suitable metric to compare between two counts for H3 data

I have a set of H3 hexagons (spatial clustering of data) with counts for each hex over 2024 and 2025. I want to plot the relative change in counts for each hex, but my current method is unacceptable: ...

tariqalr

11

asked Jul 17 at 11:15

1 vote

1 answer

102 views

Predicting cyclical time series with non uniform sampled data

I want to predict with R the next month consumption (methane gas) with fair confidence (lets say 80%), based on: the historical data on the last month consumption ...

alex

163

asked Jul 6 at 21:00

2 votes

1 answer

139 views

Forecasting supermarket prices using survival analysis

I need some help/feedback on an approach for my bachelor’s thesis. I'm pretty new to this field, so I'm keen to learn! The general topic is that I want to forecast discounts in the supermarket to help ...

Pascal

21

asked Jun 29 at 9:52

0 votes

0 answers

45 views

Prediction of optimum variables through XGboost

I have a large dataset of soil moisture data (satellite) and water table depths (measurements). I would like to derive the optimum soil moisture levels to predict the water table depths most ...

Thomas

538

asked Jun 26 at 9:59

0 votes

0 answers

65 views

A simple-ish way of estimating the number of modes, and the 'pronounced'-ness of said modes of a discrete, finite distribution

Intuitively, let's say we're given a price $p$ for some product, and we want to compare the prices with what's available on the market (ex: to determine if we're being ripped off or not). We come back ...

MergeMonster

21

asked Jun 19 at 15:25

0 votes

0 answers

73 views

Multivariable linear regression model with continuous predictors with a spike at 0

I want to build a prediction model of a continuous outcome Y. I have ~50 predictors that are count variables (number of hospitalizations by cause, number of drugs dispensed by type of drug). I was ...

Alex

301

asked Jun 18 at 9:15

3 votes

1 answer

119 views

How can I estimate individual-level linear model predictions a latent class mixed model using lcmm package in R?

I am trying to complete the following statistical analyses using lcmm package in R, using longitudinal data with repeated survey question responses from the same people over time: Model the repeated ...

Carly

33

asked Jun 16 at 8:40

0 votes

0 answers

46 views

Reclassifying transport mode choices to binary in random forest

I am building a model to predict mode choice, with a primary focus on cycling. Multinomial logistic regression fails to predict cycling well, so I choose to use random forest instead, with promising ...

SPet

33

asked May 25 at 1:03

0 votes

0 answers

79 views

Variable selection: Explanatory model with very low sample size

I am currently doing a research where I am finding the relationship between the quality of wastewater (e.g. biochemical oxygen demand, amount of nitrogen...) and regional characteristics of that ...

Osuke Miyamaru

35

asked May 19 at 6:53

0 votes

0 answers

49 views

One-Step Ahead Forecasting with TensorFlow Structural Time Series

I have the following situation: I’m given a univariate time-series dataset $y$ that I wish to model using feature variables $X$, which are provided alongside $y$. Naturally, I split the data into a ...

testing_dummy

1

asked May 16 at 12:13

5 votes

1 answer

191 views

Combining vs. Separating Predictors: What’s Better for Prediction

I'm using two independent predictors, A and B (Pearson correlation = 0), both standardized to the same scale, to predict a binary disease outcome using logistic regression. I'm comparing two modeling ...

zjppdozen

543

asked Apr 30 at 20:20

2 votes

2 answers

275 views

Standard error of the root mean squared predition error (RMSE) and its use in simulation studies of prediction models

In the context of statistical prediction models, one is often interested in the predictive accuracy of the model. A common model choice is the root mean squared error (RMSE), which is also also called ...

Lukas D. Sauer

255

asked Apr 18 at 12:33

0 votes

0 answers

59 views

Standard Error of fitted value at breakpoint (segmented regression)

I am currently using the "segmented.lm" function to detect a change point in my data. At this stage I am trying to figure out how to derive the SE of the y value of the corresponding change ...

a.henrietty

433

asked Apr 14 at 9:08

1 vote

1 answer

105 views

mgcv gam prediction model for deployment - what to do with terms shrunk out of the model?

I am developing a gam prediction model in the mgcv R package and turned on extra shrinkage using the select = TRUE argument. As I understand it, smooths that shrink "very small" are ...

user167591

1,173

asked Apr 10 at 10:22

4 votes

1 answer

199 views

Shrinkage in logistic regression prediction model: can we "remove" a predictor whose coefficient has shrunk to almost zero?

Suppose one is fitting a logistic regression to develop a clinical prediction model. In an effort to avoid overfitting, regularization is used (e.g. ridge, penalized maximum likelihood) where ...

user167591

1,173

asked Apr 9 at 9:29

1 vote

1 answer

98 views

Interpreting odds ratios greater than 1 , predicted odds less than 1

I am fitting an interrupted time series model to analyze a binary outcome: whether a woman reported feeding the child solid food within the first six months of birth (Yes/No). The main exposure is ...

Eagle Hawk

37

asked Apr 2 at 7:00

1 vote

0 answers

53 views

How do I go about refining my ARX model in R

I face a few issues where im trying to predict my dependent variable Y. I have 6 different independent external variables with one of them being lag(1) of the dependent variable Y. I differenced all ...

Hornet

11

asked Mar 29 at 14:02

1 vote

1 answer

292 views

SHAP values across different groups

I developed and compared four ML models via Random Forest, Support Vector Machine, Logistic Regression, and Xgboost (tidymodels R package) algorithms using data without stratification by age groups. ...

Data and data

33

asked Mar 29 at 1:30

1 vote

0 answers

76 views

Why minimise Calibration Error rather than MSE? Context: LLM Hallucination [closed]

In the discussion of Large Language Model hallucination phenomenon, people are interested in measuring and reducing the calibration error of the model predictions. However, what makes this situation ...

Sasha Queequeg

111

asked Mar 20 at 2:41

1 vote

0 answers

97 views

How would you show that $\text{cov}(ε, \hat{y})=0$? [closed]

I’m working on proving the distribution of the prediction error in the OLS model, but I get stuck when trying to compute the variance because after having calculated the variance of $\hat{y}$, I get ...

wtr8m12

11

asked Mar 16 at 12:47

10 votes

2 answers

380 views

Misgivings about the notion that AUC is an incoherent model comparison method

An influential 2009 paper, Measuring classifier performance: A coherent alternative to the area under the ROC curve, argues that the Area Under the Curve (AUC) "is fundamentally incoherent in ...

demim00nde

476

asked Mar 10 at 20:51

0 votes

0 answers

29 views

smoothing parameters

I am doing a cases study and I need to forecast the sales. I am using multi linear regression and also winter's method and the decomposition approach with Holt's method. I am using these methods as ...

Forecast

1

asked Mar 6 at 13:41

0 votes

0 answers

61 views

Model prediction is more accurate with substitued left-censored data than with imputed

I have a set of environmental variables that are left-censored (measurements of elements in my samples). I have two datasets, one dataset with samples with known origins and one dataset with samples ...

AnneA

11

asked Mar 6 at 13:33

1 vote

0 answers

45 views

Distribution of response in simple linear regression with normal errors

Suppose I estimate $$ Y_t = \alpha + \beta \times X_t + \varepsilon_t$$ via OLS, where $\varepsilon_t \sim \mathcal{N}(0, \sigma^2)$ is independent across observations. It is a standard result that, ...

bodhi

31

asked Feb 27 at 12:11

0 votes

0 answers

64 views

Low fitted values in BAM models causing peculiar predict plots

I have run three models, one per season, on a dataset of animal points, using soap smoothers on a lake. ...

mikejwilliamson

141

asked Feb 25 at 14:51

0 votes

0 answers

47 views

Pipeline for a bayesian network algorithm

I'd like to build a bayesian network that allows me to predict the most effective treatment sequence for a given treatment. In the most simple case scenario I would have 2 treatments across 2 ...

roybatty

11

asked Feb 24 at 15:15

2 votes

0 answers

83 views

Calibration of prediction model

I would be grateful for advice, hints, or reference on a question about predicting population level rates of in-hospital complications in older people. I have routine hospital data and a high-quality ...

astaines

411

asked Feb 24 at 12:16

3 votes

2 answers

171 views

Purely theoretical measure of predictability

(Let's set aside how we might estimate this.) I envision a setup where we have some space $\mathcal X$ of features and $\mathcal Y$ of outcomes, with each random variable $X_i\in\mathcal X$ ...

Dave

72.9k

asked Feb 21 at 11:57

0 votes

0 answers

74 views

Prediction from GAMM-ZINB model does not match the original dependent variable

I have a dataset that includes one dependent variable, of which 47.2% of the values are zero, and 14 independent variables (1 numeric and 13 categorical). After testing for zero inflation using ...

Chao

333

asked Feb 19 at 15:46

5 votes

1 answer

125 views

Error of product of predictions

Suppose that we have two regression models $A$ and $B$ that predict values $\hat{a}$ and $\hat{b}$ but that we ultimately are interested in their product $\hat{a}\hat{b}$ (for instance, we may be ...

49isprime

51

asked Feb 18 at 10:26

0 votes

1 answer

98 views

Does it make sense that a predictive model shows better discrimination ability and worse calibration than a less flexible one?

I have a data set with several thousands observations for both training set and test set, and I have defined two models (with the same covariates): A Cox model A Cox model with natural splines which ...

Luigi

103

asked Feb 14 at 15:46

1 vote

0 answers

78 views

Survival model predictions with counting process data

I am trying to get my head the utility of prediction - either with a Cox or a parametric survival model - when your dataset contains more than one row/person (i.e. when a time-varying covariate is ...

LucaS

1,099

asked Feb 12 at 1:22

0 votes

0 answers

95 views

How to understand unexpected predictions from random forest in caret?

Using random forest in R to classify a small data set, 152 rows with 17 predictors. I can get through most of the steps I've seen in different tutorials without much trouble, but when I use ...

John Polo

101

asked Feb 11 at 21:31

0 votes

0 answers

51 views

Predictive modeling on biased features

Some features I want to use for modeling have distributions like below: There are high values of the features occurring frequently in my data. I can identify a subset of my data points that cause ...

Jakub Małecki

378

asked Feb 11 at 15:19

Questions tagged [predictive-models]