Questions tagged [model-selection]
Model selection is a problem of judging which model from some set performs best. Popular methods include $R^2$, AIC and BIC criteria, test sets, and cross-validation. To some extent, feature selection is a subproblem of model selection.
2,037 questions
0
votes
0
answers
44
views
How to plot AIC, BIC of all possible models?
Suppose I was given a data set, say, golf, in the form of an MLR model. Given that best subset selection is choosing the top 5 best models of each size, how would ...
1
vote
0
answers
22
views
3-way holdout for performance evaluation but 2-way for model selection
The paper https://arxiv.org/pdf/1811.12808 by Sebastian Raschka explains how to perform 3-way holdout method, and also how to compute the final model (used in production).
During computation of the ...
2
votes
0
answers
61
views
Do k-folds risk sampling bias and, if so, how do we avoid it?
In cross-validation, $k$-folds are a common way to train, compare and validate models. Often we want to find an optimal set of hyperparameters for our models. There are many ways to probe the ...
0
votes
0
answers
52
views
Dealing with high concurvity and variable selection in GAMMs with imbalanced data (mgcv::bam)
I am using GAMMs to model the probability of occurrence of a species, applying logistic regressions with mgcv::bam() to presence-pseudoabsence data. The dataset ...
0
votes
0
answers
41
views
How do I conduct backward selection on my OLS regression with Newey-West standard errors?
I have run an OLS regression and detected that it contains autocorrelation and heteroskedasticity. To deal with this I intend to use Newey-West standard errors.
But I am not sure what is the proper ...
0
votes
0
answers
55
views
LASSO and cross validation when dealing with missing data
I want to simulate data with missing values and use them to compare the predictive performance of several machine learning algorithms, including LASSO. All analyses will be performed in R, using the ...
0
votes
1
answer
76
views
How to model feeder choice in bees while ignoring unbalanced feeding events per bout?
I'm analyzing an experiment I ran with bumblebees, and really struggling with choosing the appropriate model.
In the experiment, each bee made feeder choices across two temperature conditions:
...
1
vote
0
answers
64
views
How to justify the number of background points in MaxEnt species distribution modeling?
I'm building a species distribution model using MaxEnt with 260 presence points, collected opportunistically within a relatively small study area (a single administrative department in France).
I'm ...
0
votes
0
answers
41
views
How to interpret AIC model selection and uninformative parameters
I have a model set with 36 candidate models and 4 models with an AIC less than or equal to 2.0. I do not want to model average because I don't think my candidate set really fits in with the caveats ...
1
vote
1
answer
43
views
DCC-GARCH: Valid to have different GARCH models for each series?
Most DCC-GARCH tutorials and guides I found online often use "replicate" in creating their DCC specification, i.e. ...
0
votes
1
answer
93
views
DCC-GARCH: Correct way of choosing between the normal distribution and t-distribution
DCC-GARCH is comprised of two stages: (1) estimating the univariate GARCH and (2) estimating the correlations through DCC.
My time series (bond yields) is not normally distributed, as they rejected ...
1
vote
1
answer
65
views
DCC GARCH - Is there any merit in setting omega to zero?
I estimated the univariate GARCH models for each series, and all coefficients are statistically significant. However, upon putting them into one DCC-GARCH model with a DCC(1,1) spec, the individual ...
1
vote
1
answer
79
views
Can Goodness-of-Fit Test be used for Model Selection?
I would like to know whether Goodness of Fit Tests (like Pearson's Chi-squared test or Kolmogorov-Smirnov Test) be used to select which probabilistic distribution model certain empirical observation ...
0
votes
1
answer
52
views
Why do overfitted models in finite mixture regression sometimes have the smallest BIC despite the true number of components being selected frequently?
Learning about EM algorithms and finite mixture models and I've run into a particularly unintuitive problem. I'm trying to fit a finite mixture regression model on simulated data, where the true ...
0
votes
0
answers
76
views
Linear regression after multiple imputation: Should assumptions be checked before or after AIC-based model selection?
I’m currently working on multiple regression analyses with a small sample (n = 36), using multiple imputation via the mice package in R (5 imputed datasets). The ...
1
vote
0
answers
42
views
Parsing maritime location ranges
I'm attempting to train a model to parse maritime location ranges. These are strings that can be resolved into a geographical area or a list of shipping ports.
An example could be ...
6
votes
1
answer
280
views
Automatic ARIMA model selection
There are many resources explaining why automatic variable selection is bad (e.g. here).
Regarding the selection of $p$, $d$, $q$ parameters in ARIMA models, the Hyndman-Khandakar algorithm combines ...
0
votes
0
answers
46
views
Beta-binomial mixed model: spline of time as fixed effect, keep random slope for time if variance is very small but LRT significant?
I’m modeling longitudinal substance use (number of days consumed over 30 days) for ~930 patients with repeated measures. The outcome is modeled with a beta-binomial distribution (logit link, glmmTMB ...
0
votes
0
answers
51
views
Variable selection methods
I am currently trying to build a model to link water quality metrics (e.g. biochemical oxygen demand, chemical oxygen demand) with regional characteristics data (e.g. population, GDP) through multiple ...
6
votes
2
answers
521
views
Variable selection strategy for *descriptive* modeling
From Shmueli's paper "To Explain or to Predict?", which also has a section about descriptive modeling (section 1.3): (see also this page)
[Descriptive modeling] is aimed at summarizing or ...
1
vote
1
answer
133
views
Why is the step size $\hat \gamma$ in Least Angle Regression (LARS) smaller than $\bar \gamma=\frac{\hat C}{A}$?
I'm currently studying the Least Angle Regression algorithm by Efron et al. (https://arxiv.org/abs/math/0406456).
After equation (2.22) in Efron et al., the authors claim the following:
It is easy to ...
2
votes
0
answers
80
views
Number of features selection using AUC
Can AUC be used for model selection, and how can the excessive number of features/parameters be penalized in this case?
In frequentist framework we have various model selection criteria, like AIC, BIC,...
5
votes
0
answers
144
views
How do I handle this very non-normal response variable?
In R, I want to use a repeated measures analysis with a mixed regression model to analyze how the mean of my response variable (mean bee pollination score) varies based on 1) week, 2) number of bee ...
0
votes
0
answers
48
views
Model selection for fixed effect and crossed random effect structure in glmer
I'm new to (generalized) linear mixed effects models. Any help would be appreciated!
Below is my study design with dummy data. I'm exploring the effects of the parameters I manipulated in game 1 on ...
2
votes
1
answer
117
views
Interaction Effect on the dependent variable?
I would like to run a model in R with two binary dependent variables. I know how to model an interaction on the independent variable, but is it possible to do this on the dependent variable too?
If my ...
0
votes
0
answers
31
views
Linear regression [duplicate]
If I have a single model say y = ax^2 + bx + c, can I use 3 linear regression algorithms y=ax^2, y=ax and y=a to learn the original function if use the same data set. Please help me out here.
0
votes
0
answers
53
views
Choosing ARIMA order from ACF PACF plot
I'm doing project using ARIMA and i face a problem where I cannot choose the order for ARIMA model. I know that i had to choose the order by identifying the significant lag, but the PACF plot showing ...
4
votes
1
answer
114
views
Smooth AIC selection
Suppose I have a family of $N$ models for the same data, indexed by $n\in\{1,\dots,N\}$.
And suppose that model $n\in\{1,\dots,N\}$ has log-likelihood given by:
$$L(X_n \theta_n),$$
where $L:\mathbb{R}...
1
vote
1
answer
71
views
Is it okay to select any of the surrogate models in nested cv?
Let's say I pick any of the winning surrogate models in my nested cv (in theory if you do k outer folds you could have k surrogate models) to simplify things, lets say I pick the first model and just ...
2
votes
0
answers
87
views
Why is a holdout test set an unbiased estimator of the selected model’s generalization error?
Let $\mathcal{D}_{\text{train}}$ be a training dataset, and let $D_{\text{test}} = \{(x_{\text{test}}, y_{\text{test}})\}$ be a single holdout test point drawn independently from the same distribution ...
0
votes
0
answers
78
views
Interpreting Nested CV Results When Selected Model Didn't Win All Outer Folds
In nested cross validation, I'm seeing an interesting scenario that I'd like to understand better:
Using 4-fold outer CV, my model selection process chose Model A overall (it performed best on average ...
1
vote
1
answer
141
views
Feature selection and outlier detection in panel regression with fixed effects
I am trying to fit the following panel regression with fixed entity effects
$$Y_{it} = \alpha_i + \sum_j \beta_jX^{(j)}_{it} + \epsilon_{it},$$
where the index $j$ labels the different features. Some ...
0
votes
0
answers
45
views
ARMA estimation before GJR-GARCH. How to proceed with multiple time-series?
I want to study the Conditional Variance of various crypto-currencies returns series (13, of which 5 meme, 8 "serious"). Since my main focus is the asymmetric response of the variance ...
1
vote
1
answer
107
views
How to calculate the BIC for each mixture component
I want to fit a mixture of Gaussian to simulated data. Then, I need to calculate the Bayesian information criteria for each mixture component. My point is that, after the model convergence, I ...
6
votes
1
answer
476
views
Preventing data leakage in time-series data splitting
I am working on a fault detection problem for a mechanical system where the goal is to determine the fault type. I use a dataset that for each type of fault (target label) has three sizes and each ...
1
vote
1
answer
104
views
Lasso and cross validation: model selection
Apologies for cross-posting
I am starting to use Lasso and cross validation for model selection to explain a dependent variable using linear models, but I can not understand why all p-values ...
1
vote
1
answer
93
views
Two questions about the VC theory (on the generalization error bound)
In Andrews Ng's machine learning notes (https://cs229.stanford.edu/main_notes.pdf), he introduced the following bound for the difference between generalization error and training error (see the ...
0
votes
0
answers
90
views
How to tune hyperparameters for low calibration error under small dataset
I'm studying which variant of variational autoencoders (VAE) gives better expected calibration error (ECE) (see also this doc) under small dataset. According to google's tuning playbook, to compare ...
5
votes
1
answer
320
views
Why is AIC useful for comparing GAMs? Only for prediction?
I have a follow-up question to this OP. I hope to understand the difference between comparing 2 models with AIC, and interpreting the summary output of the full model - specifically for GAMs. Gavin ...
1
vote
1
answer
92
views
Interaction terms in logistic regression model of patient mortality
Admittedly, I am a bit inexperienced in the world of statistics and data modeling but am trying my best to learn on the job. As a first time user, I apologize if there are any formatting errors here!
...
23
votes
4
answers
3k
views
Is it (always) better to build a model prior to viewing the data?
When it comes to data exploration, aside from checking for outliers (human error), correlated covariates, and missing values, is there a downside to viewing relationships between a response variable ...
16
votes
2
answers
841
views
Advantages of information criteria over cross-validation
I understand AIC is asymptotically equivalent to leave-one-out cross-validation and that BIC has a similar asymptotic equivalence to leave-k-out cross-validation. My question is, other than ...
0
votes
1
answer
111
views
Problems with using ACF and PACF for ARMA modelling
This is the ACF and PACF for my the first difference of my variable $\Delta y_t,$ I used the ADF test, the PP test, the Schmidt Phillips test and the DFGLS test, and got the same result that my ...
3
votes
2
answers
309
views
Why use nested validation when doing both hyper-parameter tuning and model selection?
The monograph Cross Validation contains a section on nested cross-validation for hyper-parameter optimisation (page 6). The author refers to this paper for a reason why it is better to decouple hp-...
2
votes
0
answers
59
views
weird results from Bai-Ng PCs selection criteria implementation of "dfms" on R
I am trying to select the number of Principal Components of this data following the optimality criteria of Bai and Ng (2002), on R.
The function ICr from the ...
1
vote
0
answers
56
views
Statistical Tests for Model Selection in Nested Cross Validation?
I’m using nested cross-validation to evaluate multiple models and hyperparameter configurations. After running trials with different random seeds (outer: 3-fold with 10 seeds, inner: 5-fold with 50 ...
0
votes
0
answers
32
views
Variable selection for checking casual relationship of regression model: should or should not? [duplicate]
I am looking for documents and online sources to understand whether or not I should exclude variables from my model through model selection (variable selection).
I also tried to use methods of Least ...
6
votes
2
answers
330
views
Is hierachical regression with aggressive p-deletion really much 'better' than stepwise?
In many medical science fields "hierarchical regression" is a popular method.
The approach is to break variables into categories, add one category of variables at a time and then remove ...
2
votes
1
answer
153
views
linear mixed effects models (lme): model comparison via AIC() or anova() function
I have a quick question concerning model selection for linear mixed effects models: When directly comparing AICs of two models (either including or excluding an additional fixed effect) versus ...
3
votes
1
answer
135
views
Why is stepwise selection of variable still taught in university statistics classes? [closed]
I have on more than one occasion come across both recently-published textbooks and classes that teach the use of stepwise methods for model construction. Why is this still done, given the problems ...