12
$\begingroup$

What is the best cost function to train a neural network to perform ordinal regression, i.e. to predict a result whose value exists on an arbitrary scale where only the relative ordering between different values is significant (e.g: to predict which product size a customer will order: 'small' (coded as 0), 'medium'(coded as 1), 'large' (coded as 2) or 'extra-large'(coded as 3))? I'm trying to figure out if there are better alternatives than quadratic loss (modeling the problem as an 'vanilla' regression) or cross-entropy loss (modeling the problem as classification).

$\endgroup$

1 Answer 1

10
$\begingroup$

Another approach was suggested in this paper for face age estimation: Ordinal Regression with Multiple Output CNN for Age Estimation.

The authors use a number of binary classifiers predicting whether a data point is larger than a threshold, and do this for multiple thresholds. I.e. in your case the network would have three binary outputs corresponding to

  • larger than 0
  • larger than 1
  • larger than 2.

For example, for 'large (2)' the ground-truth would be [1 1 0]. The final cost function is a weighted sum of the individual cross-entropy cost functions for each binary classifier.

This has the advantage of inherently weighting larger errors more because more of the individual cost-entropy terms will be violated. Simply doing categorical classification of the ordered outcomes doesn't inherently have this feature.

$\endgroup$
1
  • $\begingroup$ As an opportunity for further improvement, note that an output of [1,0,1] is possible but undesirable. See e.g. stats.stackexchange.com/a/494965/232706 for a proposed solution. $\endgroup$ Commented Sep 18, 2024 at 12:48

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.