This is with reference to the CamVid dataset and one of its tutorial here: http://mi.eng.cam.ac.uk/projects/segnet/tutorial.html
I'm quite confused by how the model is supposed to be trained on 11 classes, when there are 12 classes in the ground truth segmentation, which includes the void class of 0. How is the network able to predict the 11 classes correctly if it is trained on 12 classes?
Also, how is the network able to know it shouldn't predict a class 0 void and how is it ensured that such prediction would result in no increase/decrease of the loss function?
I am guessing there could be a weighting function involved in the loss function calculation, i.e. void classes have weights of 0 and the rest is something like 1.0. Could anyone confirm this?