Manual selection of parameters and features and bad results by gridsearch

Question

For a very small dataset that I have, when I set the parameters with the help of gridsearch, the test and training results are not acceptable at all and have a huge difference. I have to manually select the parameters! In addition, I do feature selection for each model separately with the help of rfe and I also select this item manually. Is there a way to find both the desired parameters and the number of features more quickly?

The dataset has 65 samples and 20 features, which I have selected with the help of RFE. The problem is also regression.

Please edit the question to say more about your data: the type of outcome, the number of observations, and the number of parameters. There are limits on how far you can push a small data set. — EdM
– EdM, Commented Jun 11, 2024 at 11:33
I would look at Boruta for feature importance. You could recursively eliminate one, and look at how variable importance evolves as reduction progresses. github.com/scikit-learn-contrib/boruta_py — EngrStudent
– EngrStudent, Commented Jun 11, 2024 at 15:28

EdM · Accepted Answer · 2024-06-11 15:01:55Z

Intelligent application of your understanding of the subject matter is usually superior to hoping that some automated system will give you the best model. With only 65 samples, you probably can only fit 4 or 5 unpenalized features in a linear regression model without overfitting. Based on your understanding of the subject matter, you might select those features individually, or you might find a way to combine related features into a smaller number. Or you might use a method like principal component regression to turn your 20 individual features into a smaller number of linear combinations. Or you might use a penalized method like ridge regression or lasso, to avoid the overfitting that comes from having too few observations for the number of features that you want to use.

For a guide to the issues involved in building regression models, consult a reference like Frank Harrell's Regression Modeling Strategies. Chapters 4 and 5 are particularly relevant to your question. Also, note that splitting a small data set into separate training and test sets isn't wise.

Stack Exchange Network

Manual selection of parameters and features and bad results by gridsearch

1 Answer 1

Your Answer

Linked

Hot Network Questions

Manual selection of parameters and features and bad results by gridsearch

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Hot Network Questions