Suppose I have a dataset comprised of garbages. Will a model perform better if I only label the dataset with biodegradable or non-biodegradable?
Or will it be better if I label them with plastics, paper, cardboard, glass, and organic?
In fact, I also plan to further increase the number of labels for example, the plastic label with be comprised of a large number of brands of plastic wrappers, etc.
I think that having a large number of labels is detrimental to computational performance in both training and evaluation since for example, Linear Discriminant Analysis will not lead to much reduced dimensionality due to the nature of the subspace spanned by the centroids on each label.
Neural networks would have a very wide top softmax layer and I am pretty sure that would require a wider architecture.
Aside from model issues, I could easily suffer from class imbalances. Is there any merit to having multiple labels ?