Suppose user $i$ has purchase history of $J$ products $q_i^1,\ldots,q_i^J$. Also, values of $K$ user characteristics (gender, for example) are known $x_i^1,\ldots,x_i^K$. I want to build Bayesian regression model. I would use it to infer probability distribution across products for any user.
I would like to use it in the following manner: if no purchase history is known (only user characteristics) there would be some average distribution. With more and more purchase history, the distribution would be more and more user specific.
I have started with Multinomial-Dirichlet framework:
$$ \text{concentration}\sim\Gamma(2, 0.5)\\ \beta_j^k\sim N(0, 1)\\ \text{purchase}_i\sim\text{DirichletMultinomial}(q_i, p_i*\text{concentration}) $$
where the first two lines define prior distributions, the last defines likelihood with $q_i$ total purchase quantity and $p_i=(p_i^1,\ldots,p_i^J)$ denotes sampling probability vector. The user specific characteristics control these probabilities through $p_i^j=\text{softmax}(\sum_kx_i^k\beta_j^k)$.
After estimating model coefficients, I could do posterior sampling to get predicted purchase distribution, given some user information $x_i$. But that would give some average purchase distribution for specific set of $x_i$. What I don't understand is how to use it to get different distribution with regards user-specific purchase history. I would expect different distribution for customer with one purchase and one hundred purchases, although they might have the same values of $x_i$.
Is it some conceptual misunderstanding on my side, or do I need different model setting?