So I have a school project which is to train a CNN with our own architecture to be able to classify marine mammals with a minimum accuracy of 0.82
I have been trying a lot of things and different way to optimize it.
To train the CNN I am following this "recipe" to know what to do and go step by step. https://karpathy.github.io/2019/04/25/recipe/
As I said it's a school work so the dataset is given and cannot be tweaked we just need to choose a repartition for the training/validation. I started with 80/20 and then changed to 75/25 but both seems to give the same results overall.
Here is the best model I could train in few days :
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type) ┃ Output Shape ┃ Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ input_layer_9 (InputLayer) │ (None, 128, 128, 3) │ 0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_64 (Conv2D) │ (None, 126, 126, 32) │ 896 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_65 (Conv2D) │ (None, 126, 126, 32) │ 9,248 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ batch_normalization_64 │ (None, 126, 126, 32) │ 128 │
│ (BatchNormalization) │ │ │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling2d_36 (MaxPooling2D) │ (None, 63, 63, 32) │ 0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout_41 (Dropout) │ (None, 63, 63, 32) │ 0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_66 (Conv2D) │ (None, 63, 63, 64) │ 18,496 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_67 (Conv2D) │ (None, 63, 63, 64) │ 36,928 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ batch_normalization_65 │ (None, 63, 63, 64) │ 256 │
│ (BatchNormalization) │ │ │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling2d_37 (MaxPooling2D) │ (None, 31, 31, 64) │ 0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout_42 (Dropout) │ (None, 31, 31, 64) │ 0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_68 (Conv2D) │ (None, 31, 31, 128) │ 73,856 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ batch_normalization_66 │ (None, 31, 31, 128) │ 512 │
│ (BatchNormalization) │ │ │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_69 (Conv2D) │ (None, 31, 31, 128) │ 147,584 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ batch_normalization_67 │ (None, 31, 31, 128) │ 512 │
│ (BatchNormalization) │ │ │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ activation_16 (Activation) │ (None, 31, 31, 128) │ 0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling2d_38 (MaxPooling2D) │ (None, 15, 15, 128) │ 0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout_43 (Dropout) │ (None, 15, 15, 128) │ 0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_70 (Conv2D) │ (None, 15, 15, 256) │ 295,168 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ batch_normalization_68 │ (None, 15, 15, 256) │ 1,024 │
│ (BatchNormalization) │ │ │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling2d_39 (MaxPooling2D) │ (None, 7, 7, 256) │ 0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout_44 (Dropout) │ (None, 7, 7, 256) │ 0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ flatten_9 (Flatten) │ (None, 12544) │ 0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_27 (Dense) │ (None, 256) │ 3,211,520 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout_45 (Dropout) │ (None, 256) │ 0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_28 (Dense) │ (None, 256) │ 65,792 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout_46 (Dropout) │ (None, 256) │ 0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_29 (Dense) │ (None, 6) │ 1,542 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ activation_17 (Activation) │ (None, 6) │ 0 │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
Total params: 11,587,956 (44.20 MB)
Trainable params: 3,862,246 (14.73 MB)
Non-trainable params: 1,216 (4.75 KB)
Optimizer params: 7,724,494 (29.47 MB)
I use data-augmentation, batch normalization, drop out and l2 regulization but I still have a overfitting problem. When I remove the last block of my CNN then it can't learn enough (I use early-stopping on accuracy with patience=5)
My train accuracy goes up to .88 but my val accuracy stays around .60 with a lot of noise and variation about +-10%
My train loss goes steadily down to .5 while my val loss goes a bit up from 20 epochs with the same variation as my accuracy.
My question is what hyperparameters should I tweak in order to correct this ? Should I start a new cnn architecture with a maximum of 3 block and small layers? Should I change my image dimension ? Should I give up ?
PS : I'm working with fit_batch of 32 and I use half of my dataset to tweak before using it fully whe I encounter a good enough model.
(6 class of 3200 images for training and 800 images for validation)