0
$\begingroup$

Basically, the question above: in RL, people typically encode the state as a tensor consisting of a plane with "channels", i.e. original Alpha Zero paper. These channels are typically one-hot encoded instead of binary encoded, with this I mean that if there are say 6 pieces in chess, instead of writing it as a 3-bit vector (since 2^3=8 > 6) people would write it as a 6-bit vector with only one bit on at the same time. This seems wasteful, so there has to be a deeper reason for why this is done.

$\endgroup$
1
  • $\begingroup$ Mechanical sympathy. Using one-hot encoding, you can optimize the model to predict direct probabilities (log-odds, etc.), and you can take the argmax for a discrete prediction, etc. It's much more meaningful and easier to train a model for. Consider what you'd have to do if you encoded pieces as a discrete integer. If there's no inherent ordering (ie., index 2 vs. index 3 is not a smaller error than index 1 vs. index 5), then it's wrong to try and learn integers as if their magnitude had some inherent meaning. $\endgroup$ Commented Sep 4 at 9:16

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question