Newest 'overfitting' Questions - Page 3

1 vote

1 answer

312 views

Is XGBoost too much to apply on my data?

My data has 1530 samples and about 50 features. Not all features are used, some are removed after a feature selection process. Now I'm facing overfitting, and one solution to overfitting is ...

CORy

563

asked Dec 8, 2022 at 10:20

3 votes

2 answers

569 views

Do I want to overfit, when doing outlier detection based on regression?

Imagine, we have speed data of car and we would like to detect, if car speeds up or down more than it should. Do I want to just overfit my model, so the outlier (higher or lower speed) would lead me ...

Mr. Panda

325

asked Dec 6, 2022 at 9:42

1 vote

0 answers

93 views

Transfer Learning "from scratch"

I've recently started to work in machine learning and this is my first post here. Excuse me in advance for duplicates and/or slang mistakes. My question is about transfer learning (although in this ...

leapofFaith

13

asked Nov 25, 2022 at 12:58

1 vote

0 answers

80 views

What are the benefits of using gradient boosting machines in terms of variance and bias?

In two datasets that are composed of the same sample number, about 1500, but with different features. The first dataset has 15 predictive features and the second has 40. Now for someone who is ...

Programming Noob

763

asked Nov 23, 2022 at 10:01

2 votes

1 answer

1k views

How to fix overfitting in xgboost?

I am trying to build a classification xgboost model at work, and I'm facing overfitting issue that I have never seen before. My training sample size is 320,000 X 718 and testing sample is 80,000 X 78 ...

Piyush

215

asked Nov 17, 2022 at 8:28

2 votes

1 answer

334 views

Random forest with small number of samples (10)

I have a computer science background but I am trying to learn how to apply ML by solving small problems. I have been working on this problem for the last couple of days and I cannot find a solution. I ...

pingu87

21

asked Oct 28, 2022 at 16:16

0 votes

0 answers

159 views

Cross validation methods in scikit-learn using an SVC classifier

The dataset we are using consists of ~3000 images split at 60/40 partition for training/testing. We have used sklearn's GridSearchCV and ...

Colton Seegmiller

1

asked Oct 25, 2022 at 18:44

0 votes

0 answers

50 views

Does this look like overfitting or something else?

The input are a timeseries of 1x41x41 geospatial images (so, 5x1x41x41 for example). I managed to achieve an MSE of 0.53 with PCA and Random Forest. But I thought to use ConvLSTM since my input size ...

Doomski

1

asked Oct 13, 2022 at 21:05

0 votes

0 answers

57 views

Why does my network not learn a single image perfectly?

I have a convolutional neural network that uses Resnet(18,34 or 50 doesn't matter) as the backbone and pretrained weights from ImageNet.When I try training it with a single image for 50 or so epochs, ...

K dai

1

asked Oct 8, 2022 at 16:02

0 votes

1 answer

235 views

rep k fold cross validation, train test split and overfitting

I've recently gotten into ML and I'm a bit confused about rep k fold cross validation, train, test split and overfitting. I have already read some of the posts in this forum, but none of them could ...

Domi

1

asked Oct 8, 2022 at 9:52

2 votes

1 answer

1k views

Curse of dimensionality using trees

The curse of dimensionality refers to the fact when a model tries to fit the data in a very high dimensional space (and there is not enough training data). In my mind, I believe that this curse ...

lalaland

247

asked Sep 15, 2022 at 0:55

3 votes

0 answers

89 views

Does selecting confounder variables for a model with multiple correlation tests risk biasing results (similar to forward selection)?

My team is conducting a counterfactual difference-in-differences (DiD) healthcare analysis to estimate the benefits of home nursing visits compared with a control group. We've "pre-selected" ...

RobertF

6,644

asked Sep 14, 2022 at 20:08

2 votes

1 answer

255 views

Why validation accuracy starts to increase after overfitting?

I'm training a model on a small dataset of images. following are the curves of accuracy, f1 score and auc score. it's clear that the model is overfitting, however I don't understand why after sometime ...

Ines

31

asked Sep 4, 2022 at 14:53

2 votes

0 answers

211 views

Why does the test loss decrease even when the training loss and the validation loss increase

I was trying out different regression models to fit a time series. Models include a multiple linear regression model, ReLU regression models (with varying numbers of ReLU functions) and sigmoid ...

Jack

71

asked Aug 30, 2022 at 16:51

2 votes

1 answer

275 views

Example of KNN overfitting with k=1

I know that with k=1 a KNN lead to overfitting, this is because it follows the noisy data of the training sample and not generalize well on new input sample. But I am confused on how this happens, I ...

DYLAN NICO AMBROSI

21

asked Aug 28, 2022 at 14:14

1 vote

2 answers

336 views

Can SVM overfit even with cross-validation?

I am using SVM regressor models to fit some chemical data related to spectroscopy (I cannot say exactly what data because it is an ongoing research in my group). To combat overfitting, I have used 5-...

S R Maiti

163

asked Aug 25, 2022 at 21:16

1 vote

0 answers

48 views

Feature selection based on production data

I have a classifer (one/zero labels) that was trained and hypertuned by the book. When the model was ready, I applied it to the production data: real-time and unlabeled. After a short period (a few ...

Amit S

77

asked Aug 25, 2022 at 11:12

1 vote

0 answers

79 views

Overfitting is reduced but loss is worsened?

Consider the two pairs of learning curves below. The red and green lines are the training and validation curves of some model 1, and the gray and orange lines are the training and validation curves of ...

Tfovid

815

asked Aug 23, 2022 at 13:25

1 vote

1 answer

289 views

Fluctuations in both the accuracies and losses in training and validation of Deep learning MLP

I have a binary classification problem with Dataset N430 and predictors=146. Both Validation and training accuracies along with losses fluctuates. What would be the reason and suggest solution please?

Asif Munir

11

asked Aug 19, 2022 at 18:07

2 votes

1 answer

187 views

OVerfitting using Random Forest - classification [duplicate]

I have a dataframe which is a made of many datasets combined together (many datasets with the same predictive features but with different samples combined together). This dataframe, called ...

Programming Noob

763

asked Aug 14, 2022 at 13:59

3 votes

2 answers

3k views

Can a regularization harm more than help in the situation of a huge over-fit?

I fit a regression model on a data set and get some in-sample RMSE. I wanted to know, how likely is that I get this good RMSE (or even better) under assumptions that there are no patterns in the data. ...

Roman

774

asked Aug 8, 2022 at 10:32

6 votes

1 answer

786 views

XGBoost when P>>N

Someone built an XGBoost classification model using each pixel in an image (256*256) as a separate feature, plus a few other features. However they only have 500 data points. The target classes were ...

Alex

185

asked Aug 7, 2022 at 17:51

0 votes

2 answers

285 views

overfitting of random forest in r

I am running a random forest classifier in R and during 10-fold cross-validation, I discovered that the model is overfitting. I am using a grid search to find the best hyperparameters and used the ...

cassandra star

1

asked Jul 28, 2022 at 21:41

1 vote

3 answers

166 views

What is overfitting while building model?

What exactly is overfitting while building models ?

SR1

31

asked Jul 26, 2022 at 23:54

1 vote

0 answers

93 views

Overfitting with Non Negative Least Squares

I'm trying to reconstruct a function, $A(x)$ from the results of some detectors. Essentially, I have a set of $n$ points which are $ V_{i} = \int_{-\infty}^{\infty} A(x) e^{-(x - v_{i})^{2}} dx $ ...

user1150512

111

asked Jul 22, 2022 at 14:53

4 votes

1 answer

3k views

Is it possible to have a higher train error than a test error in machine learning?

Usually it is called over-fitting when the test error is higher than the training error. Does that imply that it is called under-fitting when the training error is higher than the test error? Also ...

Just a stat student

73

asked Jul 20, 2022 at 18:12

0 votes

0 answers

18 views

Emotion classifier: overfitting the training dataset [duplicate]

I'm working on a binary classification model over the RAVDESS dataset with a CNN model. These are the performances on the train and validation set and these are the performance on the test set for ...

Damiano Imola

66

asked Jul 20, 2022 at 13:45

2 votes

0 answers

96 views

An aggressive overfitting situation

I gather RNA-seq transcriptomic data from multiple cancer datasets. The datasets are about a treatment of cancer, we check Response vs NoResponse samples. The RNA-seq data I gather is before the ...

Programming Noob

763

asked Jul 20, 2022 at 9:09

1 vote

1 answer

358 views

How to judge the neural network training stage with double descent?

In https://arxiv.org/pdf/1908.05355.pdf, it mentioned double descent that training loss is decreasing, increasing and then decreasing again. And the important point ...

Mark

171

asked Jul 17, 2022 at 2:57

3 votes

2 answers

658 views

Statistical approaches to detect overfitting in simple models

I read here that there are statistical approaches to assess whether a tractable machine learning model (e.g., a linear regression model) overfits a dataset: Simpler models that have originated in ...

Rafs

453

asked Jul 8, 2022 at 15:45

0 votes

0 answers

292 views

What do flattening learning curves indicate and when to stop training of a ML model in that case?

I am training CNNs for image segmentation on a limited dataset and apply some on-the-fly data augmentation. I measure mean intersection over union (mean IoU) to evaluate the training and select models....

Manuel Popp

183

asked Jul 5, 2022 at 17:57

1 vote

1 answer

148 views

How to calculate the total number of inputs in CNN?

I search this kind of question for a while and I find many discussions involve on counting the number of parameters of a Convolutional Neural Network, but not on the inputs. Using the Fashion MNIST ...

rodericktung

51

asked Jun 30, 2022 at 23:18

2 votes

1 answer

180 views

Is there a relationship between the number of the mixture components and the overfiting of the model?

I read the following: To prevent overfitting we would like to work with as few components as possible". How does the number of the mixture component affect the fit of the model? Is that because ...

Maryam

1,720

asked Jun 30, 2022 at 10:33

1 vote

0 answers

91 views

Why isn't RandomSearchCV returning the optimum parameters for the XGBoost Model, and how can I avoid Overfitting?

I have a dataset for energy consumer customers and binary target variables with which I want to predict the churn for the customers. Counts of target values Not Churn 0: 14153 Churn 1: 1520 I have ...

Paul

31

asked Jun 19, 2022 at 3:07

1 vote

0 answers

49 views

State-of-the-art techniques for regularizing Neural Networks?

For regularizing neural networks, I'm familiar with drop-out and l2/l1 regularization, which were the biggest players in the late 2010's. Have any significant/strong competitors risen up since then?

chausies

561

asked Jun 18, 2022 at 20:38

1 vote

0 answers

129 views

Underfitting and Overfitting at the same time?

I am using a Logistic Regression Classifier on the Airline Cancellation dataset. Please note that the training set was undersampled (in order to balance classes) while the test set was left as it was. ...

vincenzoconv99

11

asked Jun 17, 2022 at 17:16

2 votes

1 answer

177 views

Can a slightly overfitted model be useful for exploratory (i.e. hypotheses generating) modelling?

Let's say you have a set of potential explanatory variables (e.g. p = 8) that you think are important to explain your response variable ($Y$) but your sample is too small to include them all in the ...

Fanfoué

661

asked Jun 16, 2022 at 12:55

2 votes

0 answers

355 views

Is deep double descent important in practical contemporary CNNs?

Deep double descent is an empirically observed phenomenon that happens with contemporary neural networks. Its essence is that often, increasing the model complexity first leads to the test loss ...

CrabMan

172

asked Jun 7, 2022 at 16:57

2 votes

0 answers

86 views

How to tell model (Multiclass Classification using Logistic Regression) is overfitting?

I'm training a logistic classifier to classify 5 classes using scikit-learn. The data isn't extremely imbalanced (class 1: 27.7%, class 2: 19.4%, class 3: 17%, class 4: 19.6%, class 5: 16.2%). I'm ...

Zoe

21

asked Jun 2, 2022 at 0:11

1 vote

1 answer

120 views

Do I need to normalize data before applying L1, L2 norm in ANN

I wish to train the ANN and use regularizers to avoid overfitting. I need some suggestions, is it mandatory to normalize the data before using L1, L2 regularizers. I would highly appreciate if you can ...

SiH

141

asked May 18, 2022 at 15:35

3 votes

1 answer

399 views

Matrix Factorization and Overfitting

I recently came accross the algorithm of Matrix Factorization for a recommendations system. One of the tutorials I followed can be found here. According to it given the initial matrix $R$ and the ...

RookieCookie

131

asked May 14, 2022 at 1:44

0 votes

1 answer

561 views

Low classification accuracy

I want to do a multi class classification with 6 classes. Whole dataset has 12750 and 56 features samples, so every class has 2125 samples. Before prediction I reduces amount of outliers by ...

jared

31

asked Apr 25, 2022 at 20:38

0 votes

0 answers

135 views

Effect of duplicate/redundant labels on performance of model

I am training a CNN to predict age,mass and tone from images. The structure of my dateset is as follows ...

Sparsh Garg

1

asked Apr 14, 2022 at 19:11

13 votes

3 answers

3k views

If I use a regularization (e.g. L2) can I not apply early stopping?

I've seen that early stopping is a form of regularization that limits the movement of the parameters $\theta$ in a similar way that L2 Regularization penalizes the movement of $\theta$ to be closer to ...

wd violet

787

asked Apr 5, 2022 at 5:05

1 vote

2 answers

194 views

Is it possible to evaluate a given model without having access to its fit method?

I have a data set with one real-valued feature and a real-valued target. Someone has used this data set to fit a model (a regression). I get a results of this fit, which is a single function mapping ...

Roman

774

asked Mar 30, 2022 at 9:55

0 votes

0 answers

18 views

Regressor-based L2 penalty [duplicate]

I'm working on a multiple regression problem where I have reasons to believe some (if not all) of the regressors have been cherry picked/data mined to a varying degree. My hypotheses are that there's ...

stevew

841

asked Mar 29, 2022 at 4:58

8 votes

2 answers

792 views

PCA as a Cure for the Curse of Dimensionality

I would like some clarification as to how principal component analysis mitigates the Curse of Dimensionality problem. My particular interest is in curbing overfitting in my modelling, or more ...

Andrew Beaven

81

asked Mar 28, 2022 at 13:25

2 votes

2 answers

1k views

Why use regularization instead of feature selection for logistic regression? [duplicate]

For a non-linearly separable problem, when there are enough features, we can make the data linearly separable. It seems to me that for logistic regression, the reason of overfitting is always ...

Santi Du

23

asked Mar 28, 2022 at 0:32

4 votes

1 answer

505 views

Does higher variance in predictions result in higher variance error estimation?

Motivation Everyone knows that fitting high variance models requires more data. A "yes" answer to the question would suggest that more data is also needed to evaluate these models. ...

chicxulub

1,645

asked Mar 21, 2022 at 1:46

1 vote

1 answer

674 views

Almost duplicate samples between train/test: overfitting?

I have been thinking about this for a few so I would like to hear some opinions. It could be complicated to explain so I will update the question if there is something that its not clear. Imagine I ...

Sergiodiaz53

153

asked Mar 17, 2022 at 11:31

Questions tagged [overfitting]