Skip to main content

Questions tagged [overfitting]

Modeling error (especially sampling error) instead of replicable and informative relationships among variables improves model fit statistics, but reduces parsimony, and worsens explanatory and predictive validity.

Filter by
Sorted by
Tagged with
1 vote
0 answers
47 views

Neural Network Beginner here. I am currently implementing a CNN on PyTorch for recognizing Japanese handwritten letters, which has 46 classes of outputs. I found a dataset on Kaggle https://www.kaggle....
Krish Thyagarajan's user avatar
0 votes
0 answers
51 views

There is something I have an intuition on but my numerical toy examples do not confirm, and I really want to understand where is my mistake. I suppose that I have a random vector $X = (X_1, \cdots, ...
arthur_elbrdn's user avatar
3 votes
3 answers
294 views

The title is perhaps purposely provocative, but still reflects my ignorance. I am trying to understand carefully why, despite a very nice Bayesian interpretation, softmax might overfit, since I've ...
Chris's user avatar
  • 322
1 vote
0 answers
28 views

How accurate are the estatimates of an order logit model with only 51 observations? Here is my stata output from the model:
Oindrila Roy's user avatar
0 votes
1 answer
52 views

Learning about EM algorithms and finite mixture models and I've run into a particularly unintuitive problem. I'm trying to fit a finite mixture regression model on simulated data, where the true ...
dancing_monkeys's user avatar
1 vote
0 answers
60 views

So I have a school project which is to train a CNN with our own architecture to be able to classify marine mammals with a minimum accuracy of 0.82 I have been trying a lot of things and different way ...
erodrigu's user avatar
2 votes
0 answers
80 views

Can AUC be used for model selection, and how can the excessive number of features/parameters be penalized in this case? In frequentist framework we have various model selection criteria, like AIC, BIC,...
Roger V.'s user avatar
  • 5,091
1 vote
1 answer
78 views

I am using a GridSearchCV to optimize some hyper parameters on a xgboost model. However, although the logloss (metric I am optimizing for) seems alright according to domain knowledge, the learning ...
user54565's user avatar
1 vote
1 answer
118 views

I'm working on fitting a random forest model using the caret library in R with a repeated cross-validation design to select hyperparameters. I've also experimented with adjusting the number of trees (...
Mdhale's user avatar
  • 133
1 vote
0 answers
54 views

Assume you have training data $(x_1,y_1), \ldots, (x_n,y_n)$ and a relationship $y_i=f(x_i)+\epsilon_i$, where $\epsilon$ is a random variable. Assume you approximate $f$ with $\hat{f}$ using the ...
user394334's user avatar
2 votes
1 answer
203 views

This is sort-of a follow-up from my last question, except purely based on curiosity. I found different versions of similar bs="sz" models in ...
Nate's user avatar
  • 2,537
1 vote
0 answers
57 views

I've been thinking about the use of cross-validation and hold-out sets and I don't really see the use of a randomly selected hold-out test set. I have to say, though, that when the hold-out is not ...
adriavc00's user avatar
4 votes
1 answer
114 views

Suppose I have a family of $N$ models for the same data, indexed by $n\in\{1,\dots,N\}$. And suppose that model $n\in\{1,\dots,N\}$ has log-likelihood given by: $$L(X_n \theta_n),$$ where $L:\mathbb{R}...
cfp's user avatar
  • 565
0 votes
0 answers
90 views

I am training an MLP on a dataset with the number of features >> number of samples. For certain reasons, MLPs with at least one hidden layer is the only architecture I am considering. ...
dkolobok's user avatar
1 vote
0 answers
50 views

I have built an XGBoost model that performs rather weirdly across months... I trained the model on a heavily imbalanced dataset (1:40 000), which I undersampled to (1:500). The model performance (...
user24758287's user avatar
3 votes
1 answer
372 views

I'm working on a classification problem with ~90k data rows and 12 features. I'm trying to tune the hyperparamters of an XGBoost model to minimize the overfitting. I use ROC_AUC as the metric to ...
WatermelonBunny's user avatar
26 votes
2 answers
4k views

Disclaimer: I asked this question on Data Science Stack Exchange 3 days ago, and got no response so far. Maybe it is not the right site. I am hoping for more positive engagement here. This is a ...
Landon Carter's user avatar
6 votes
1 answer
643 views

I've been working on a CNN binary classification model, and the model performs pretty good in both the training set, and the cross-validation set as well (both practically 1.0 acc). However, I also ...
Efe FRK's user avatar
  • 71
1 vote
0 answers
70 views

I am trying to build a binary classification model with XGBoost. I made sure to split my data into the training, validation and test sets. I performed feature selection, early stoppage and ...
Shak Jivraj's user avatar
0 votes
0 answers
60 views

I'm an intern working on implementing a binary email classifier for a client (Pharmaceutical company) and I need some advice on fine-tuning the model. The model I'm using is Longformer (because it has ...
Bhashwar Sengupta's user avatar
1 vote
0 answers
57 views

I have only one time series $(y_0, t_0), (y_1,t_1), \ldots, (y_n, t_n)$, with $y_i \in \mathbb{R}$ and $t_0 < \cdots < t_n$. The believe is that these are points on a function $f(t; \mu)$ with $\...
温泽海's user avatar
  • 808
5 votes
1 answer
176 views

A recent work shows generalisation beyond overfitting for overparametrized systems [*]. Is there any precedence from statistics literature or is this a new phenomenon for deep learning? [*] Grokking: ...
patagonicus's user avatar
  • 2,789
0 votes
0 answers
86 views

I am attempting to perform classification against CIFAR-100 dataset using a Resnet model that I implemented. I have been trying multiple different hyperparameter configurations, changing learning ...
codinator's user avatar
  • 123
1 vote
0 answers
76 views

I am working with a time-dependent sequential dataset, specifically a record of machine breakdowns over a period of time. My dataset includes data from the sensors of several machines until they fail ...
user386164's user avatar
5 votes
1 answer
687 views

Epochs, the number of times training is repeated on the original data, are absolutely necessary for neural networks where there are often many more parameters than original instances. What is the ...
Mitch's user avatar
  • 2,099
1 vote
0 answers
49 views

I train models to predict some linear features from aerial imagery. Because the reference data are just lines, I made a simple buffer so that labels resemble very approximately the width of the target ...
Pythonisa's user avatar
0 votes
0 answers
86 views

The problem: I have a datset with monthly economic indicators alongside monthly stock price, containing 434 total observations. I have attempted to fit an LSTM onto the data, but it seems to ...
altayir1's user avatar
1 vote
0 answers
136 views

I've been going over the output of a Monte Carlo model that simulates disease risk as a function of genotype. Under a null model of no disease risk, we have 1000 case and 1000 control individuals. ...
Max's user avatar
  • 145
0 votes
1 answer
119 views

For a very small dataset that I have, when I set the parameters with the help of gridsearch, the test and training results are not acceptable at all and have a huge difference. I have to manually ...
Erfan Mollai's user avatar
1 vote
0 answers
69 views

I am trying both Lgbm and RandomForest for a classification, and I observe the same problem. I am using various metaparams to prevent overfitting, such as max_depth, num_trees (keeping it small for ...
Baron Yugovich's user avatar
0 votes
0 answers
116 views

I'm trying to determine if I can display two regression models and the covariance between the dependent variables in one unified model using path analysis with lavaan in R. In the following (scaled) ...
BlueMarlin's user avatar
2 votes
0 answers
151 views

I'm trying to run a regression, but I only have 14 observations, each being a different city in the US. My dependent variable is the total number of trips per capita, and my explanatory variables are ...
BeyondConfused's user avatar
11 votes
1 answer
3k views

I have a large dataset of the ASL (American Sign Language). I split this data into 70:15:15 for train, validation, test. I then trained a CNN model on it, where I trained using the 70%, and evaluated ...
codinator's user avatar
  • 123
2 votes
1 answer
111 views

My doubt about overfitting is almost general, but in this particular case is all about survival models. I am working in a case-cohort study, estimating the HR in a cohort where heart attack correspond ...
Javier Hernando's user avatar
1 vote
0 answers
33 views

I have been working on an image classification task using CNNs and getting some puzzling results. My training, validation and test loss keep going down with epochs and are comparable. So this might ...
Nithin's user avatar
  • 11
0 votes
1 answer
70 views

Suppose the (we can say unanimous) preference of each individual in a society is to select roads for travel by placing 95% weight on the objective of minimizing travel time, and the remaining 5% ...
user10478's user avatar
  • 133
1 vote
1 answer
84 views

I'm perplexed as to why my loss doesn't go up when the accuracy goes down (after about 40 epochs). Isn't it possible to tell overfitting from the loss curve alone? (I'm of course referring the ...
Tfovid's user avatar
  • 815
1 vote
1 answer
242 views

I'm predicting multiclass probabilities using CatBoost Classifier. I have a balanced dataset with roughly 4000 rows, 13 features, 4 target class labels. Dataset has some outliers which I decided not ...
primadonna's user avatar
0 votes
1 answer
246 views

I want to know if the model I am using tends to be overfitting or underfitting. I am using SVM and Random Forest algorithms. How to figure it out?
Anna's user avatar
  • 3
4 votes
2 answers
185 views

I would like to ask a generalization of this question: How to perfectly overfit neural network to memorize input? Are there any scaling laws for neural network memorization? In other words, if I have ...
zfj3ub94rf576hc4eegm's user avatar
0 votes
1 answer
216 views

I built a random forest regressor and used gridserachCV to tune hyperparameters. ...
Nino640's user avatar
  • 11
2 votes
2 answers
591 views

Is it a viable solution to train on the whole dataset without splitting the data into 'train' and 'test' sets? In other words, is it okay to skip offline evaluation and only perform online evaluation (...
asparagus's user avatar
1 vote
0 answers
153 views

In order to access whether I am overfitting a multilinear model, I have calculated the predicted $R^2$, based on the info found here. My question is, when is a predicted $R^2$ "good enough", ...
Bettina's user avatar
  • 11
0 votes
0 answers
68 views

I am running an ensemble random forest model (a newer method published in 2020). The model works by using a double bootstrapping step to balance imbalanced training data. Then you grow multiple ...
Greatwhite4's user avatar
1 vote
0 answers
89 views

I want to fine-tune BERT for Named Entity Recognition (NER). However, when fine-tuning over several epochs on different datasets I get a weird behaviour where the training loss decreases, eval loss ...
CodingSquirrel's user avatar
4 votes
1 answer
323 views

Some of the smooths in my generalized additive mixed model (GAMM) indicate in mgcv::k.check(m) they want to be more wiggly, but I don't think I have enough data to ...
Nate's user avatar
  • 2,537
4 votes
1 answer
237 views

I understand that you can overfit a Gradient Boosting Machine (GBM) by using too many trees (unlike random forest), and also that you can overfit a GBM by using too high of a learning rate. My ...
David's user avatar
  • 1,276
1 vote
0 answers
105 views

Let $A$, $\mu$, $\sigma$ be some positive, a priori unknown parameters. Define a Gaussian function $f$ as $$f(x) = A \mathrm{exp}\left(-\frac{1}{2} \left( \frac{x-\mu}{\sigma}\right)^2\right).$$ One ...
mathslover's user avatar
3 votes
1 answer
165 views

I have a dataset with the following characteristics: moderate sample size (~300 samples) moderate class imbalance (~20% positives) high-dimensional (the number of independent variables, again ~300, ...
ladislaw94's user avatar
0 votes
1 answer
466 views

I want to do a SEM analysis with an actor-partner interdependence model in Mplus. I managed to calculate it and everything seems right if I look at the means, SD's, ...
Axenox's user avatar
  • 1

1
2 3 4 5
21