0
$\begingroup$

For a very small dataset that I have, when I set the parameters with the help of gridsearch, the test and training results are not acceptable at all and have a huge difference. I have to manually select the parameters! In addition, I do feature selection for each model separately with the help of rfe and I also select this item manually. Is there a way to find both the desired parameters and the number of features more quickly?

The dataset has 65 samples and 20 features, which I have selected with the help of RFE. The problem is also regression.

$\endgroup$
3
  • 1
    $\begingroup$ Please edit the question to say more about your data: the type of outcome, the number of observations, and the number of parameters. There are limits on how far you can push a small data set. $\endgroup$ Commented Jun 11, 2024 at 11:33
  • 1
    $\begingroup$ I just did that @EDM $\endgroup$ Commented Jun 11, 2024 at 13:04
  • 1
    $\begingroup$ I would look at Boruta for feature importance. You could recursively eliminate one, and look at how variable importance evolves as reduction progresses. github.com/scikit-learn-contrib/boruta_py $\endgroup$ Commented Jun 11, 2024 at 15:28

1 Answer 1

2
$\begingroup$

Intelligent application of your understanding of the subject matter is usually superior to hoping that some automated system will give you the best model. With only 65 samples, you probably can only fit 4 or 5 unpenalized features in a linear regression model without overfitting. Based on your understanding of the subject matter, you might select those features individually, or you might find a way to combine related features into a smaller number. Or you might use a method like principal component regression to turn your 20 individual features into a smaller number of linear combinations. Or you might use a penalized method like ridge regression or lasso, to avoid the overfitting that comes from having too few observations for the number of features that you want to use.

For a guide to the issues involved in building regression models, consult a reference like Frank Harrell's Regression Modeling Strategies. Chapters 4 and 5 are particularly relevant to your question. Also, note that splitting a small data set into separate training and test sets isn't wise.

$\endgroup$

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.