Confused between Multiple Random Variables and Likelihood Function [closed]

Question

Closed. This question needs details or clarity. It is not currently accepting answers.

Want to improve this question? As written, this question is lacking some of the information it needs to be answered. If the author adds details in comments, consider editing them into the question. Once there's sufficient detail to answer, vote to reopen the question.

Closed last year.

Improve this question

I am confused between the two at a very fundamental level. Following is the problem:

I take observations $\vec{x}$ and create a histogram $\mathbf{n} = (n_1,\ldots,n_N)$ out of it with $N$ bins. Number of elements in bin $i$ can be modelled as a Poisson random variable i.e. $n_i \sim Pois(\lambda_i)$ with parameter $\lambda_i = \mathbb{E}[n_i] = \mu s_i +b_i$ where $s_i$ and $b_i$ are the number of entries coming from distributions $f_s(x;\mathbf{\theta}_s)$ and $f_b(x;\mathbf{\theta}_b)$ respectively.

Now, when we write the Likelihood function $L(\mu,\theta_s,\theta_b|n_1,\ldots,n_N)$ of this histogram as: $$ \begin{align*} L(\mu,\theta_s,\theta_b;\vec{n}) &= f(n_1|\lambda_1) \cdots f(n_N|\lambda_N) \\ &= \prod_{i=1}^{N} f(n_i|\lambda_i)\\ &= \prod_{i=1}^{N} \frac{\lambda_i^{n_i}}{n_i!}e^{-\lambda_i} \\ &= \prod_{i=1}^{N} \frac{(\mu s_i + b_i)^{n_i}}{n_i!}e^{-(\mu s_i + b_i)} \end{align*} $$

Does this mean that my data is $N$-dimensional (which most probably is not but I still want to ask) or is it the case that I am calculating likelihood of $N$ i.i.d samples?

Because $N$ is fixed, no bin can contain more than $N$ values, whence a Poisson distribution is incorrect. It could approximately work, but why not use the correct multinomial probability? — whuber
– whuber ♦, Commented May 24, 2024 at 13:49
@whuber Because $p_i$ (probability of an entry falling into bin $i$) is small whereas $M$ is large. So, can we not somehow apply Poisson Limit Theorem here? The entries to each bin $i$ can be only of two types - coming from $f_s$ or $f_b$ with probabilities adding to 1. So, applying Poisson Limit Theorem to each bin, do we not get the same likelihood as above? — Sid
– Sid, Commented May 24, 2024 at 14:04
As I wrote, you can sometimes use this approximation. But why? It's more complicated to analyze in many respects than the fully correct multinomial likelihood. — whuber
– whuber ♦, Commented May 24, 2024 at 14:20
Because M is large and it would help in computation. The data comes in two flavours - s and b. We know that number of b >> number of s. So, can we also somehow use the combination of both? Multinomial for s (which is low) and Poisson for b (which is high)? — Sid
– Sid, Commented May 24, 2024 at 14:37
It's not at all apparent how the computation would be assisted by using a Poisson likelihood. The multinomial is no problem at all (one uses the log Gamma function, of course, to compute a log likelihood). — whuber
– whuber ♦, Commented May 24, 2024 at 15:17

AdamO · Accepted Answer · 2024-05-24 12:10:56Z

1

I do not believe you intend to create a histogram out of an observation $x$, but rather you require a sample, $X_1, X_2, \ldots, X_M$ (reserve $N$ for your histogram treatment). But for the sake of discussion you can set $M=1$.

Your sample space indeed is $M$ dimensional, such as $\mathbb{R}^M$ or some subset. You can define the histogram as a statistic defined on $\mathbb{N}^N$, so it is N dimensional. Even with $M=1$, the permutation of possible values for $\vec{n}$ are $[1,0,\ldots, 0], [0,1,\ldots,0], \ldots, [0,0,\ldots, 1]$. forming a basis for $\mathbb{N}^N$.

That said, I do not believe your likelihood is well defined! The distribution of $n_i$ is clearly not Poisson! Since the histogram is a statistic, you need to feed the initial sample conditions into specifying the model. So first any $n_i$ is bounded by $M$, and second the $n_i$ are not mutually independent. I believe the correct probability model for the histogram is hypergeometric with probabilities defined by the DF for the $x_i$ integrated over the binning interval.

answered May 24, 2024 at 12:10

AdamO

67.5k6 gold badges143 silver badges293 bronze badges

$\begingroup$ I intended to write $\vec{x}$ instead of $x$. You're right, number of observations can of course not be 1. $\endgroup$

Sid
– Sid

2024-05-24 13:31:11 +00:00
Commented May 24, 2024 at 13:31
$\begingroup$ About the distribution of $n_i$, I am following this paper (Page 4). Why would $n_i$ be not Poisson distributed? The bins are being filled independently of each other and at a constant rate. $\endgroup$

Sid
– Sid

2024-05-24 13:36:22 +00:00
Commented May 24, 2024 at 13:36
$\begingroup$ @Siddhartha do you not agree that a bin is bounded by the total sample size as I stated? $\endgroup$

AdamO
– AdamO

2024-05-24 13:38:40 +00:00
Commented May 24, 2024 at 13:38
$\begingroup$ So you want to say a multinomial distribution should be used because we have a finite number of sample which affects the 'independence' condition that we have assumed for $n_i$? What if M is large? And how large it should be before we can model $n_i$ as Poisson? $\endgroup$

Sid
– Sid

2024-05-24 13:49:13 +00:00
Commented May 24, 2024 at 13:49
$\begingroup$ @Sid There are approximations, but one would expect an accurate expression of the likelihood before defaulting to them. $\endgroup$

AdamO
– AdamO

2024-05-31 13:23:11 +00:00
Commented May 31, 2024 at 13:23

| Show 3 more comments

Stack Exchange Network

Confused between Multiple Random Variables and Likelihood Function [closed]

1 Answer 1

Hot Network Questions

Confused between Multiple Random Variables and Likelihood Function [closed]

1 Answer 1

Related

Hot Network Questions