Basically, the question above: in RL, people typically encode the state as a tensor consisting of a plane with "channels", i.e. original Alpha Zero paper. These channels are typically one-hot encoded instead of binary encoded, with this I mean that if there are say 6 pieces in chess, instead of writing it as a 3-bit vector (since 2^3=8 > 6) people would write it as a 6-bit vector with only one bit on at the same time. This seems wasteful, so there has to be a deeper reason for why this is done.
$\begingroup$
$\endgroup$
1
-
$\begingroup$ Mechanical sympathy. Using one-hot encoding, you can optimize the model to predict direct probabilities (log-odds, etc.), and you can take the argmax for a discrete prediction, etc. It's much more meaningful and easier to train a model for. Consider what you'd have to do if you encoded pieces as a discrete integer. If there's no inherent ordering (ie., index 2 vs. index 3 is not a smaller error than index 1 vs. index 5), then it's wrong to try and learn integers as if their magnitude had some inherent meaning. $\endgroup$arpad– arpad2025-09-04 09:16:59 +00:00Commented Sep 4 at 9:16
Add a comment
|