1
$\begingroup$

I search this kind of question for a while and I find many discussions involve on counting the number of parameters of a Convolutional Neural Network, but not on the inputs. Using the Fashion MNIST dataset as an example, each black and white image has $28 \times 28 \times 1$ pixels and there are 60,000 images in the training dataset. Does that mean we have total number of $28 \times 28 \times 60,000 = 47,040,000$ inputs for the input layer of CNN?

My partner critics my baseline/simplest CNN model (for demo purpose) with just one convolutional layer with 10 filters/kernels (the kernel size is $3 \times 3$), paddings have been used and strides = 1. The Keras model information is listed below. He says the training set only have a sample size of 60,000, but you have 78,510 parameters. He concerns over-fitting issues because I have more parameters than the inputs.

enter image description here

I really don't know how to explain the concepts to him clearly that the inputs of CNN are pixels. Could anyone help? A more detailed explanation will be very helpful and I am also happy to learn!

A similar question can be found here How many parameters can your model possibly have?.

$\endgroup$

1 Answer 1

1
$\begingroup$

Assuming you are classifying images (and not pixels or image segments), your samples are the images and your features are the individual pixels. So the Fashion MNIST dataset has $60,000$ samples and each sample has $28 \times 28 \times 1 = 784$ features. It's the number of training samples compared to the number of parameters (which is partly determined by the number of features) that is relevant when considering the potential of a model to overfit.

It is quite common for DL models to have more parameters than training samples and DL models do often overfit. However, there are at least a couple of reasons why this is not necessarily a problem.

  1. There are several ways of controlling overfitting in DL models, such as L1/L2 regularisation of the weights and dropout. Using these mean you can have more parameters than training samples without overfitting.

  2. There's a phenomenon that has been observed in DL models called "deep double descent" (see Nakkiran et al.'s Deep Double Descent: Where Bigger Models and More Data Hurt, also their Open AI blog post). When plotting test loss against the model size, we see at first the usual descent in the loss curve due to reducing bias as the number of parameters increases, followed by an increase in test loss as the models start to overfit. So far this is as expected. But as the model size is increased even further, the test loss starts decreasing again.

It's probably also worth noting that overfitting in itself is not a problem. It's common for a model to perform slightly better on the training data than the test data and we want the model with the lowest test loss, not the model with the least difference between the training and test loss. Overfitting is only a problem if we haven't allowed for it (e.g. if we make decisions based on the training data results or the model overfits the test data) and so select a sub-optimal model or incorrectly assess the model's ability to generalise.

$\endgroup$

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.