0
$\begingroup$

The dataset we are using consists of ~3000 images split at 60/40 partition for training/testing. We have used sklearn's GridSearchCV and RandomSearchCV, Bayesian Optimization, and a Hyperband implementation for hyperparameter tuning. After all these methods, we have been getting around 96% accuracy on training and around 78% accuracy for testing. Without changing the dataset size, the partition split, or augmenting the data in any way we want to increase accuracy as much as possible. Overfitting is most likely occurring and we are using sklearn's StratifiedKFold for cross validation with n_splits=10. We are using a SVC for classification and there are two classes we are dealing with (pictures of wind turbines and pictures of no wind turbine).

Would there be a better cross validation method to use, all while hopefully conserving the class ratios for each fold? Or any other suggestions for preventing overfitting?

$\endgroup$
5
  • $\begingroup$ Approximately how many of the images are of wind turbines vs. not? $\endgroup$ Commented Oct 28, 2022 at 21:41
  • $\begingroup$ You may consider using nested CV with stratified CV. scikit-learn.org/stable/auto_examples/model_selection/…, stats.stackexchange.com/questions/357926/… $\endgroup$ Commented Oct 28, 2022 at 21:44
  • 1
    $\begingroup$ Training set consists of 1820 WT/ 572 NWT. Testing set is split in half, 780 WT/780 NWT $\endgroup$ Commented Oct 28, 2022 at 21:57
  • $\begingroup$ Ok, thanks. Are you attached to the idea of using SVC? Not related to the stratification, but you might consider using a FastAI classification model (see for example, docs.fast.ai/tutorial.medical_imaging.html). Or you could use skorch medium.datadriveninvestor.com/…. My guess is that you'll only get so much mileage with SVC, even with a ton of hyperparameter tuning. $\endgroup$ Commented Oct 28, 2022 at 22:23
  • 1
    $\begingroup$ Ya I'm required to stick with an SVC. I think I'm getting as good as I can get as well, especially since the images I'm given are at 5% resolution. Thank you! $\endgroup$ Commented Oct 30, 2022 at 0:39

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.