The dataset we are using consists of ~3000 images split at 60/40 partition for training/testing. We have used sklearn's GridSearchCV and RandomSearchCV, Bayesian Optimization, and a Hyperband implementation for hyperparameter tuning. After all these methods, we have been getting around 96% accuracy on training and around 78% accuracy for testing. Without changing the dataset size, the partition split, or augmenting the data in any way we want to increase accuracy as much as possible. Overfitting is most likely occurring and we are using sklearn's StratifiedKFold for cross validation with n_splits=10. We are using a SVC for classification and there are two classes we are dealing with (pictures of wind turbines and pictures of no wind turbine).
Would there be a better cross validation method to use, all while hopefully conserving the class ratios for each fold? Or any other suggestions for preventing overfitting?