2
$\begingroup$

I am confused between the two at a very fundamental level. Following is the problem:

I take observations $\vec{x}$ and create a histogram $\mathbf{n} = (n_1,\ldots,n_N)$ out of it with $N$ bins. Number of elements in bin $i$ can be modelled as a Poisson random variable i.e. $n_i \sim Pois(\lambda_i)$ with parameter $\lambda_i = \mathbb{E}[n_i] = \mu s_i +b_i$ where $s_i$ and $b_i$ are the number of entries coming from distributions $f_s(x;\mathbf{\theta}_s)$ and $f_b(x;\mathbf{\theta}_b)$ respectively.

Now, when we write the Likelihood function $L(\mu,\theta_s,\theta_b|n_1,\ldots,n_N)$ of this histogram as: $$ \begin{align*} L(\mu,\theta_s,\theta_b;\vec{n}) &= f(n_1|\lambda_1) \cdots f(n_N|\lambda_N) \\ &= \prod_{i=1}^{N} f(n_i|\lambda_i)\\ &= \prod_{i=1}^{N} \frac{\lambda_i^{n_i}}{n_i!}e^{-\lambda_i} \\ &= \prod_{i=1}^{N} \frac{(\mu s_i + b_i)^{n_i}}{n_i!}e^{-(\mu s_i + b_i)} \end{align*} $$

Does this mean that my data is $N$-dimensional (which most probably is not but I still want to ask) or is it the case that I am calculating likelihood of $N$ i.i.d samples?

$\endgroup$
5
  • $\begingroup$ Because $N$ is fixed, no bin can contain more than $N$ values, whence a Poisson distribution is incorrect. It could approximately work, but why not use the correct multinomial probability? $\endgroup$ Commented May 24, 2024 at 13:49
  • $\begingroup$ @whuber Because $p_i$ (probability of an entry falling into bin $i$) is small whereas $M$ is large. So, can we not somehow apply Poisson Limit Theorem here? The entries to each bin $i$ can be only of two types - coming from $f_s$ or $f_b$ with probabilities adding to 1. So, applying Poisson Limit Theorem to each bin, do we not get the same likelihood as above? $\endgroup$ Commented May 24, 2024 at 14:04
  • $\begingroup$ As I wrote, you can sometimes use this approximation. But why? It's more complicated to analyze in many respects than the fully correct multinomial likelihood. $\endgroup$ Commented May 24, 2024 at 14:20
  • $\begingroup$ Because M is large and it would help in computation. The data comes in two flavours - s and b. We know that number of b >> number of s. So, can we also somehow use the combination of both? Multinomial for s (which is low) and Poisson for b (which is high)? $\endgroup$ Commented May 24, 2024 at 14:37
  • $\begingroup$ It's not at all apparent how the computation would be assisted by using a Poisson likelihood. The multinomial is no problem at all (one uses the log Gamma function, of course, to compute a log likelihood). $\endgroup$ Commented May 24, 2024 at 15:17

1 Answer 1

1
$\begingroup$

I do not believe you intend to create a histogram out of an observation $x$, but rather you require a sample, $X_1, X_2, \ldots, X_M$ (reserve $N$ for your histogram treatment). But for the sake of discussion you can set $M=1$.

Your sample space indeed is $M$ dimensional, such as $\mathbb{R}^M$ or some subset. You can define the histogram as a statistic defined on $\mathbb{N}^N$, so it is N dimensional. Even with $M=1$, the permutation of possible values for $\vec{n}$ are $[1,0,\ldots, 0], [0,1,\ldots,0], \ldots, [0,0,\ldots, 1]$. forming a basis for $\mathbb{N}^N$.

That said, I do not believe your likelihood is well defined! The distribution of $n_i$ is clearly not Poisson! Since the histogram is a statistic, you need to feed the initial sample conditions into specifying the model. So first any $n_i$ is bounded by $M$, and second the $n_i$ are not mutually independent. I believe the correct probability model for the histogram is hypergeometric with probabilities defined by the DF for the $x_i$ integrated over the binning interval.

$\endgroup$
8
  • $\begingroup$ I intended to write $\vec{x}$ instead of $x$. You're right, number of observations can of course not be 1. $\endgroup$ Commented May 24, 2024 at 13:31
  • $\begingroup$ About the distribution of $n_i$, I am following this paper (Page 4). Why would $n_i$ be not Poisson distributed? The bins are being filled independently of each other and at a constant rate. $\endgroup$ Commented May 24, 2024 at 13:36
  • $\begingroup$ @Siddhartha do you not agree that a bin is bounded by the total sample size as I stated? $\endgroup$ Commented May 24, 2024 at 13:38
  • $\begingroup$ So you want to say a multinomial distribution should be used because we have a finite number of sample which affects the 'independence' condition that we have assumed for $n_i$? What if M is large? And how large it should be before we can model $n_i$ as Poisson? $\endgroup$ Commented May 24, 2024 at 13:49
  • $\begingroup$ @Sid There are approximations, but one would expect an accurate expression of the likelihood before defaulting to them. $\endgroup$ Commented May 31, 2024 at 13:23

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.