2
$\begingroup$

I am trying to understand this paper by Chapelle and Li "An Empirical Evaluation of Thompson Sampling" (2011). In particular, I am failing to derive the equations in algorithm 3 (page 6). The first equation looks like an NLL of $p(x|w) \, p(w)$ where the latter is the prior shown in the second line of the algorithm; modulo the constant term that does not depend on $w$. The question is: what is the likelihood? Obviously, it is not the canonical cross-entropy with $y_j \in \{0, 1\}$ but almost the cross-entropy with $y_j \in \{-1, +1\}$?

Furthermore, I don't understand the update step for $q_j$ in the last line: I can derive something that comes close using the Laplace approximation, $$q_j \leftarrow -\frac{\partial^2}{\partial w_j^2} \ln p(x, w),$$ and discarding correlations... but it is not the same and there are still some $y_j$ and other terms floating around.

Can someone tell me, how to derive these equations?

Thanks a lot!

$\endgroup$

1 Answer 1

3
$\begingroup$

I found the solution (with the help of a friend: cudos!). The posterior is $$\begin{align*} -\log p(\boldsymbol{w}|\boldsymbol{x}) &= -\log p(\boldsymbol{x}|\boldsymbol{w}) - \log p(\boldsymbol{w}) + \text{const.} \\ &= \sum\limits_j \log \left( 1 + \exp(-y_j \boldsymbol{w}^\top \boldsymbol{x}_j) \right) + \sum\limits_i \frac{q_i (w_i - m_i)^2}{2} + \text{const.}' \end{align*}$$ where the constant terms do not depend on $\boldsymbol{w}$, and with the NLL $$\begin{align*} -\log p(\boldsymbol{x}|\boldsymbol{w}) &= \sum_j \log \left[ \frac{1 + y_j}{2} \left( 1 + \mathrm{e}^{-\boldsymbol{w}^\top \boldsymbol{x}_j} \right) + \frac{1 - y_j}{2} \left( 1 + \mathrm{e}^{+\boldsymbol{w}^\top \boldsymbol{x}_j} \right) \right] \\ &= \sum\limits_j \log \left( 1 + \mathrm{e}^{-y_j\boldsymbol{w}^\top \boldsymbol{x}_j} \right) \end{align*}$$ for $y_j = \{-1, +1\}$ (!). Ignoring correlations, the Laplace approximation then yields $$\begin{align*} q_i &\leftarrow - \left. \frac{1}{p(\boldsymbol{\hat w} | \boldsymbol{x})} \frac{\partial^2 p(\boldsymbol{w} | \boldsymbol{x})}{\partial w_i^2} \right|_{\boldsymbol{w} = \boldsymbol{\hat w}} \\ &= \left. -\frac{\partial^2 \log p(\boldsymbol{w} | \boldsymbol{x})}{\partial w_i^2} \right|_{\boldsymbol{w} = \boldsymbol{\hat w}} \\ &= q_i + \sum\limits_j x_{ij}^2 \frac{1}{1 + \mathrm{e}^{-y_j \boldsymbol{\hat w}^\top \boldsymbol{x}_j}} \frac{1}{1 + \mathrm{e}^{+y_j \boldsymbol{\hat w}^\top \boldsymbol{x}_j}} \\ &= q_i + \sum\limits_j x_{ij}^2 \frac{1}{1 + \mathrm{e}^{-\boldsymbol{\hat w}^\top \boldsymbol{x}_j}} \frac{1}{1 + \mathrm{e}^{+\boldsymbol{\hat w}^\top \boldsymbol{x}_j}} \\ &= q_i + \sum\limits_j x_{ij}^2 \, p_j (1-p_j) \end{align*}$$ with $x_{ij} = (\boldsymbol{x}_j)_i$ and $$\boldsymbol{\hat w} = \underset{\boldsymbol{w}}{\operatorname{argmax}} p(\boldsymbol{x} | \boldsymbol{w}) \,.$$

$\endgroup$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.