Let's say we have a regression problem in which we would like to predict some score $y_i$ for person $i$.
As predictor data we have two variables: country $c_i$ and gender $g_i$. If we now assume that gender has three different classes (male, female, miscellaneous) but we only have historical score data for males and females and also some of the existing countries, how do we make predictions for the genders with miscellaneous value?
I maybe have some (quite philosophical) idea (that I've personally never have seen applied before):
What if we replace the values male/female/miscellaneous with the average historical score values corresponding to the male/female/miscellaneous and in case we don't have any values for miscellaneous we replace it to the average score over all instance (we can also do the same for the variable countries)?
E.g. what is the effect if we replace the categorical variable in itself to a numerical variable which has as values the historical average scores per value in the 'thrown away categorical variable'? It seems to me that this is a efficient trick to convert the categorical variable to some numerical variable and making better future predictions (since, intuitively speaking, future score predictions will depend more on the historical scores for that type of person/country than the type of gender and country in itself).