1
$\begingroup$

This is with reference to the CamVid dataset and one of its tutorial here: http://mi.eng.cam.ac.uk/projects/segnet/tutorial.html

I'm quite confused by how the model is supposed to be trained on 11 classes, when there are 12 classes in the ground truth segmentation, which includes the void class of 0. How is the network able to predict the 11 classes correctly if it is trained on 12 classes?

Also, how is the network able to know it shouldn't predict a class 0 void and how is it ensured that such prediction would result in no increase/decrease of the loss function?

I am guessing there could be a weighting function involved in the loss function calculation, i.e. void classes have weights of 0 and the rest is something like 1.0. Could anyone confirm this?

$\endgroup$

1 Answer 1

0
$\begingroup$

You should predict 11 channels, and compute the loss based on the 11 valid classes. The void class (technically $1-void\_class$) is used to mask the loss. I.e. you should split the 12 channel label you have to a 11 channel target and a 1 channel mask.

$\endgroup$

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.