Cross validation methods in scikit-learn using an SVC classifier

Ask Question

Asked 3 years, 1 month ago

Modified 3 years, 1 month ago

Viewed 159 times

The dataset we are using consists of ~3000 images split at 60/40 partition for training/testing. We have used sklearn's GridSearchCV and RandomSearchCV, Bayesian Optimization, and a Hyperband implementation for hyperparameter tuning. After all these methods, we have been getting around 96% accuracy on training and around 78% accuracy for testing. Without changing the dataset size, the partition split, or augmenting the data in any way we want to increase accuracy as much as possible. Overfitting is most likely occurring and we are using sklearn's StratifiedKFold for cross validation with n_splits=10. We are using a SVC for classification and there are two classes we are dealing with (pictures of wind turbines and pictures of no wind turbine).

Would there be a better cross validation method to use, all while hopefully conserving the class ratios for each fold? Or any other suggestions for preventing overfitting?

edited Oct 25, 2022 at 18:48

asked Oct 25, 2022 at 18:44

Colton Seegmiller

$\begingroup$ Approximately how many of the images are of wind turbines vs. not? $\endgroup$

Sterling
– Sterling

2022-10-28 21:41:01 +00:00
Commented Oct 28, 2022 at 21:41
$\begingroup$ You may consider using nested CV with stratified CV. scikit-learn.org/stable/auto_examples/model_selection/…, stats.stackexchange.com/questions/357926/… $\endgroup$

Sterling
– Sterling

2022-10-28 21:44:56 +00:00
Commented Oct 28, 2022 at 21:44
1

$\begingroup$ Training set consists of 1820 WT/ 572 NWT. Testing set is split in half, 780 WT/780 NWT $\endgroup$

Colton Seegmiller
– Colton Seegmiller

2022-10-28 21:57:45 +00:00
Commented Oct 28, 2022 at 21:57
$\begingroup$ Ok, thanks. Are you attached to the idea of using SVC? Not related to the stratification, but you might consider using a FastAI classification model (see for example, docs.fast.ai/tutorial.medical_imaging.html). Or you could use skorch medium.datadriveninvestor.com/…. My guess is that you'll only get so much mileage with SVC, even with a ton of hyperparameter tuning. $\endgroup$

Sterling
– Sterling

2022-10-28 22:23:56 +00:00
Commented Oct 28, 2022 at 22:23
1

$\begingroup$ Ya I'm required to stick with an SVC. I think I'm getting as good as I can get as well, especially since the images I'm given are at 5% resolution. Thank you! $\endgroup$

Colton Seegmiller
– Colton Seegmiller

2022-10-30 00:39:18 +00:00
Commented Oct 30, 2022 at 0:39

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Stack Exchange Network

Cross validation methods in scikit-learn using an SVC classifier

0

Your Answer

Linked

Hot Network Questions

Cross validation methods in scikit-learn using an SVC classifier

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Hot Network Questions