Questions tagged [overfitting]
Modeling error (especially sampling error) instead of replicable and informative relationships among variables improves model fit statistics, but reduces parsimony, and worsens explanatory and predictive validity.
1,002 questions
1
vote
0
answers
47
views
Potential CNN Overfitting Due to Limited Training Data
Neural Network Beginner here. I am currently implementing a CNN on PyTorch for recognizing Japanese handwritten letters, which has 46 classes of outputs.
I found a dataset on Kaggle https://www.kaggle....
0
votes
0
answers
51
views
Generalization Error PCA (with closed formula) versus Ridge
There is something I have an intuition on but my numerical toy examples do not confirm, and I really want to understand where is my mistake.
I suppose that I have a random vector $X = (X_1, \cdots, ...
3
votes
3
answers
294
views
How might softmax cause overfit in a neural model, even treated from a Bayesian perspective?
The title is perhaps purposely provocative, but still reflects my ignorance. I am trying to understand carefully why, despite a very nice Bayesian interpretation, softmax might overfit, since I've ...
1
vote
0
answers
28
views
Inference validity of an ordered logit model with only 50 observations
How accurate are the estatimates of an order logit model with only 51 observations? Here is my stata output from the model:
0
votes
1
answer
52
views
Why do overfitted models in finite mixture regression sometimes have the smallest BIC despite the true number of components being selected frequently?
Learning about EM algorithms and finite mixture models and I've run into a particularly unintuitive problem. I'm trying to fit a finite mixture regression model on simulated data, where the true ...
1
vote
0
answers
60
views
Overfitting problem in classification CNN
So I have a school project which is to train a CNN with our own architecture to be able to classify marine mammals with a minimum accuracy of 0.82
I have been trying a lot of things and different way ...
2
votes
0
answers
80
views
Number of features selection using AUC
Can AUC be used for model selection, and how can the excessive number of features/parameters be penalized in this case?
In frequentist framework we have various model selection criteria, like AIC, BIC,...
1
vote
1
answer
78
views
Gridsearch results vs learning curve
I am using a GridSearchCV to optimize some hyper parameters on a xgboost model. However, although the logloss (metric I am optimizing for) seems alright according to domain knowledge, the learning ...
1
vote
1
answer
118
views
How to reduce overfitting for a randomforest model even when cross validation is implemented?
I'm working on fitting a random forest model using the caret library in R with a repeated cross-validation design to select hyperparameters. I've also experimented with adjusting the number of trees (...
1
vote
0
answers
54
views
Is there an one to one relationship between high bias and underfitting, and with high variance and overfitting?
Assume you have training data $(x_1,y_1), \ldots, (x_n,y_n)$ and a relationship $y_i=f(x_i)+\epsilon_i$, where $\epsilon$ is a random variable. Assume you approximate $f$ with $\hat{f}$ using the ...
2
votes
1
answer
203
views
How to identify problems with mgcv:gam(y ~ s(x) + s(x, fac, bs="sz"))? [closed]
This is sort-of a follow-up from my last question, except purely based on curiosity. I found different versions of similar bs="sz" models in ...
1
vote
0
answers
57
views
The use of cross-validation and a hold-out set
I've been thinking about the use of cross-validation and hold-out sets and I don't really see the use of a randomly selected hold-out test set. I have to say, though, that when the hold-out is not ...
4
votes
1
answer
114
views
Smooth AIC selection
Suppose I have a family of $N$ models for the same data, indexed by $n\in\{1,\dots,N\}$.
And suppose that model $n\in\{1,\dots,N\}$ has log-likelihood given by:
$$L(X_n \theta_n),$$
where $L:\mathbb{R}...
0
votes
0
answers
90
views
Reducing MLP overfitting for feature importance
I am training an MLP on a dataset with the number of features >> number of samples. For certain reasons, MLPs with at least one hidden layer is the only architecture I am considering. ...
1
vote
0
answers
50
views
Model Performance Varying Greatly
I have built an XGBoost model that performs rather weirdly across months...
I trained the model on a heavily imbalanced dataset (1:40 000), which I undersampled to (1:500).
The model performance (...
3
votes
1
answer
372
views
What should the objective be when tuning hyperparameters to minimize overfitting?
I'm working on a classification problem with ~90k data rows and 12 features. I'm trying to tune the hyperparamters of an XGBoost model to minimize the overfitting. I use ROC_AUC as the metric to ...
26
votes
2
answers
4k
views
Why doesn't ML suffer from curse of dimensionality?
Disclaimer: I asked this question on Data Science Stack Exchange 3 days ago, and got no response so far. Maybe it is not the right site. I am hoping for more positive engagement here.
This is a ...
6
votes
1
answer
643
views
Model performs well on train and cross-validation sets but inaccurate in the test set. How to solve? [duplicate]
I've been working on a CNN binary classification model, and the model performs pretty good in both the training set, and the cross-validation set as well (both practically 1.0 acc). However, I also ...
1
vote
0
answers
70
views
Is my XGBoost Model Still Overfitting (Binary Classifcation)?
I am trying to build a binary classification model with XGBoost. I made sure to split my data into the training, validation and test sets. I performed feature selection, early stoppage and ...
0
votes
0
answers
60
views
Advice on fine-tuning an email classifier for a Pharma company
I'm an intern working on implementing a binary email classifier for a client (Pharmaceutical company) and I need some advice on fine-tuning the model.
The model I'm using is Longformer (because it has ...
1
vote
0
answers
57
views
Overfitting Time Series
I have only one time series $(y_0, t_0), (y_1,t_1), \ldots, (y_n, t_n)$, with $y_i \in \mathbb{R}$ and $t_0 < \cdots < t_n$. The believe is that these are points on a function $f(t; \mu)$ with $\...
5
votes
1
answer
176
views
What's the statistical historical precedence for generalisation beyond overfitting?
A recent work shows generalisation beyond overfitting for overparametrized systems [*]. Is there any precedence from statistics literature or is this a new phenomenon for deep learning?
[*] Grokking: ...
0
votes
0
answers
86
views
Training accuracy increases up to 99% but validation accuracy stops much earlier
I am attempting to perform classification against CIFAR-100 dataset using a Resnet model that I implemented.
I have been trying multiple different hyperparameter configurations, changing learning ...
1
vote
0
answers
76
views
What are the appropriate data splitting techniques for time-dependent sequential datasets, such as breakdown records over time?
I am working with a time-dependent sequential dataset, specifically a record of machine breakdowns over a period of time. My dataset includes data from the sensors of several machines until they fail ...
5
votes
1
answer
687
views
Are epochs the same as data duplication?
Epochs, the number of times training is repeated on the original data, are absolutely necessary for neural networks where there are often many more parameters than original instances.
What is the ...
1
vote
0
answers
49
views
Train model with labels generated by similar model: overfit?
I train models to predict some linear features from aerial imagery. Because the reference data are just lines, I made a simple buffer so that labels resemble very approximately the width of the target ...
0
votes
0
answers
86
views
Augmenting data for LSTM
The problem:
I have a datset with monthly economic indicators alongside monthly stock price, containing 434 total observations.
I have attempted to fit an LSTM onto the data, but it seems to ...
1
vote
0
answers
136
views
AUC > 0.5 under null model following feature selection
I've been going over the output of a Monte Carlo model that simulates disease risk as a function of genotype. Under a null model of no disease risk, we have 1000 case and 1000 control individuals. ...
0
votes
1
answer
119
views
Manual selection of parameters and features and bad results by gridsearch
For a very small dataset that I have, when I set the parameters with the help of gridsearch, the test and training results are not acceptable at all and have a huge difference. I have to manually ...
1
vote
0
answers
69
views
Significant performance drop between train and validation set
I am trying both Lgbm and RandomForest for a classification, and I observe the same problem. I am using various metaparams to prevent overfitting, such as max_depth, num_trees (keeping it small for ...
0
votes
0
answers
116
views
Path analysis with perfect fit
I'm trying to determine if I can display two regression models and the covariance between the dependent variables in one unified model using path analysis with lavaan in R. In the following (scaled) ...
2
votes
0
answers
151
views
Regression with small sample size - LASSO or remove variables?
I'm trying to run a regression, but I only have 14 observations, each being a different city in the US. My dependent variable is the total number of trips per capita, and my explanatory variables are ...
11
votes
1
answer
3k
views
Getting 99-100% accuracy on my training/validation data but performs bad on completely new data
I have a large dataset of the ASL (American Sign Language). I split this data into 70:15:15 for train, validation, test.
I then trained a CNN model on it, where I trained using the 70%, and evaluated ...
2
votes
1
answer
111
views
Estimate number of covariates in Cox regression model
My doubt about overfitting is almost general, but in this particular case is all about survival models. I am working in a case-cohort study, estimating the HR in a cohort where heart attack correspond ...
1
vote
0
answers
33
views
Image classification metrics
I have been working on an image classification task using CNNs and getting some puzzling results.
My training, validation and test loss keep going down with epochs and are comparable. So this might ...
0
votes
1
answer
70
views
Does the intuitive sense of overfitting in this mechanism design context exemplify bias-variance tradeoff?
Suppose the (we can say unanimous) preference of each individual in a society is to select roads for travel by placing 95% weight on the objective of minimizing travel time, and the remaining 5% ...
1
vote
1
answer
84
views
Accuracy "overfits" but loss doesn't?
I'm perplexed as to why my loss doesn't go up when the accuracy goes down (after about 40 epochs). Isn't it possible to tell overfitting from the loss curve alone? (I'm of course referring the ...
1
vote
1
answer
242
views
Is my model overfitting or is my training process wrong?
I'm predicting multiclass probabilities using CatBoost Classifier.
I have a balanced dataset with roughly 4000 rows, 13 features, 4 target class labels. Dataset has some outliers which I decided not ...
0
votes
1
answer
246
views
Learning Curve to Know Underfitting or Overfitting
I want to know if the model I am using tends to be overfitting or underfitting. I am using SVM and Random Forest algorithms. How to figure it out?
4
votes
2
answers
185
views
Scaling laws for neural network memorization
I would like to ask a generalization of this question: How to perfectly overfit neural network to memorize input?
Are there any scaling laws for neural network memorization? In other words, if I have ...
0
votes
1
answer
216
views
Random Forest Regressor gives negative test score in GridSearchCV
I built a random forest regressor and used gridserachCV to tune hyperparameters.
...
2
votes
2
answers
591
views
Can I skip test set and train on 100% of data?
Is it a viable solution to train on the whole dataset without splitting the data into 'train' and 'test' sets? In other words, is it okay to skip offline evaluation and only perform online evaluation (...
1
vote
0
answers
153
views
Predicted R squared - when is it good enough?
In order to access whether I am overfitting a multilinear model, I have calculated the predicted $R^2$, based on the info found here.
My question is, when is a predicted $R^2$ "good enough", ...
0
votes
0
answers
68
views
Ensemble Random Forest Overfitting
I am running an ensemble random forest model (a newer method published in 2020). The model works by using a double bootstrapping step to balance imbalanced training data. Then you grow multiple ...
1
vote
0
answers
89
views
BERT eval loss increase while performance metrics also increase
I want to fine-tune BERT for Named Entity Recognition (NER). However, when fine-tuning over several epochs on different datasets I get a weird behaviour where the training loss decreases, eval loss ...
4
votes
1
answer
323
views
Implications of keeping a "low" basis dimension in GAMM
Some of the smooths in my generalized additive mixed model (GAMM) indicate in mgcv::k.check(m) they want to be more wiggly, but I don't think I have enough data to ...
4
votes
1
answer
237
views
Overfitting GBM by simultaneously adding trees and lowering learning rate?
I understand that you can overfit a Gradient Boosting Machine (GBM) by using too many trees (unlike random forest), and also that you can overfit a GBM by using too high of a learning rate. My ...
1
vote
0
answers
105
views
Fitting a Gaussian function to Poisson noisy data
Let $A$, $\mu$, $\sigma$ be some positive, a priori unknown parameters. Define a Gaussian function $f$ as
$$f(x) = A \mathrm{exp}\left(-\frac{1}{2} \left( \frac{x-\mu}{\sigma}\right)^2\right).$$
One ...
3
votes
1
answer
165
views
Is my regularized logistic regression model overfit?
I have a dataset with the following characteristics:
moderate sample size (~300 samples)
moderate class imbalance (~20% positives)
high-dimensional (the number of independent variables, again ~300, ...
0
votes
1
answer
466
views
CFA: chi-square value is 0 but with degrees of freedom [closed]
I want to do a SEM analysis with an actor-partner interdependence model in Mplus. I managed to calculate it and everything seems right if I look at the means, SD's, ...