Questions tagged [likelihood]
Given a random variable $X$ which arise from a parameterized distribution $F(X; \theta)$, the likelihood is defined as proportional to the probability of observed data as a function of $\theta$: $\operatorname{L}\left(\theta \mid x \right)=\operatorname{P} \left(X=x \mid \theta \right)$.
1,587 questions
2
votes
0
answers
49
views
How can negative log likelihood be properly compared between two sets with different sample sizes?
I have a dataset that I have divided into training and testing data, with approximately 160 samples in the training set and 40 in the testing set. I fitted a probability distribution to each dataset ...
1
vote
0
answers
42
views
How to create a likelihood when one model depends on the other model [closed]
There are two models (one model uses the predicted response from the other model as a predictor), each is a linear mixed effect model and together they have MVN correlations through their random ...
3
votes
1
answer
304
views
Trouble understanding notation in Fisher Information
Consider a random variable $X$ with probability density function $f(x,\theta)$, where $\theta$ is the true parameter. Then, the Fisher information is defined as $\mathbb{E}\big[\big(\frac{\partial}{\...
0
votes
0
answers
60
views
Dealing with an overconfident likelihood
Background
Consider the following recursive Bayesian classifier
\begin{equation}
p_{t}(c)=\frac{\ell(y_t\mid c)p_{t-1}(c)}{\sum_{\nu=1}^C \ell(y_t\mid \nu)p_{t-1}(\nu)},
\qquad c=1,\dots,C
\tag{1}
\...
3
votes
1
answer
195
views
Interpretation of the likelihood ratio from a refitted Cox model
In a Cox proportional hazards model, one can compute a likelihood ratio from a refitted model as a measure of global discrimination.
Consider the initial Cox model of interest:
$$h\left(t \vert X\...
5
votes
3
answers
193
views
Comparison of candidate models obtained with possibly "non-comparable AIC": use RSS, MSE, or adjusted MSE as an alternative instead... or?
My question relates to the comparison of candidate models for which their parameter estimates have been produced with different methods and R packages.
As a fictive example, the MASS (continuous ...
2
votes
1
answer
72
views
Substituting per-trial missing data for all missing data in the two-coin expectation-maximation example
A common example that I have found for explaining expectation-maximization is the example of two biased coins. The problem statement is:
You have two biased coins, which you select with equal ...
0
votes
0
answers
41
views
Likelihood of the data given drift and random noise terms
I am interested in finding the likelihood for the location of a given number of particles at time = 1 in a process that resemble (or is) a Ornstein-Uhlenbeck (OU) process.
In particular, I am ...
0
votes
1
answer
166
views
Why likelihood have such poor precision?
Same data matched against two different normal distributions and the likelihood numbers are very similar.
The data - annual returns for AMD stock, with historical and recent volatility and risk free ...
7
votes
0
answers
93
views
name for -2 log L?
While differences between models in deviance ($-2 (\log L_s - \log L)$, where $L$ is the likelihood of a model/set of parameters and $L_s$ is the likelihood of the saturated model) are the same as ...
2
votes
1
answer
220
views
Maximum likelihood for binomial variables
Let's say we have two binomial variables $Y_1$, $Y_2$ such that for $Y_1$ number of trials $n = 103$ and number of successes $k = 51$ and for $Y_2$ number of trials $n = 53$ and number of successes $k ...
0
votes
0
answers
70
views
Comparing the second moements of different priors with a normal likelihood
Suppose you have two prior densities $f_1(x)$ and $f_2(x)$ in $\mathbb{R}$, and normally distributed gaussian likelihood $l(x)$. Let $\pi_1(x)$ and $\pi_2(x)$ be the associated posterior distributions,...
1
vote
1
answer
99
views
Reparameterization of the Fisher Information [duplicate]
Hey guys,
I don't understand why we have that $l_{\phi}(\phi) = l_{\theta}(h^{-1}(\phi))$ shouldn't it be that $l_{\phi}(\phi) = l_{\phi}(h(\theta))$ or how is it that $l_{\theta}(h^{-1}(\phi)) = l_{\...
4
votes
1
answer
233
views
Expected Fisher Information equal to expected score function squared?
I understand the proof showed in the picture, but the last step is unclear to me.
Shouldn't the last step simply be: $=S(\theta)^2$. Or is that because $J(\theta) = E[- \frac{d s(\theta)}{d\theta}]$ ...
2
votes
0
answers
87
views
Why not just use the likelihood of a sample as a test statistic?
This is probably a stupid question, but I've gotten myself a bit confused.
Suppose we have an iid sample of a random variable $x_n = {x_1, x_2, ..., x_n}$, and this variable is discrete so the ...
0
votes
0
answers
88
views
Use cross validation to determine number of factors in factor analysis: why the case is not simply that more factors get larger likelihood?
Consider a factor analysis model
\begin{equation*}
\begin{array}{cccccccccc}
X &=& \mu&+& L&\cdot& f & + &u \\
p\times 1 & & p\times 1 &&p\times k& ...
1
vote
0
answers
73
views
Why is the log additive model considered more accurate than other models in parameter estimation using the maxium likelihood
I am trying to get a better grasp of the theory of parameter estimation using different error models used in Phoenix NLME. The log additive model seems to perform better for my use case and I am ...
6
votes
3
answers
453
views
Does the following maximum likelihood mean and variance result hold for all distributions?
Given a probability distribution $p(x \,|\, \mu, \sigma^{2})$, and $n$ independent and identically distributed draws $x_{1}, \ldots, x_{n}$ from the distribution $p(x \,|\, \mu, \sigma^{2})$, we may ...
1
vote
0
answers
50
views
Spatial Lag Model Log-Likelihood Calculation
I am working with a Spatial Lag Model, which can be expressed as:
$$
y = \rho W y + X \beta + \varepsilon, \quad \varepsilon \sim N(0, \sigma^2 I),
$$
where:
$y$ is the $n \times 1$ vector of ...
2
votes
0
answers
102
views
What do Efron & Tibshirani mean by the 'empirical exponential family' in the textbook 'Introduction to the Bootstrap'?
I would like to understand what Efron & Tibshirani mean by 'empirical exponential family' in the textbook 'Introduction to the Bootstrap' (formula 21.84 for instance)?
The explanation there is too ...
0
votes
0
answers
47
views
Title: Handling Skewed Importance Sampling Weights for High-Dimensional Log-Likelihoods
Question:
I am performing importance sampling (IS) for a Bayesian inference problem with the following setup:
1. Data and Model
My data has ( D = 1300 ) dimensions.
The log-likelihood, $ \log p(x \...
0
votes
0
answers
55
views
Convolution with a pathological distribution part 2
This post is a follow-up to this previous one, based on what I learned from this second one.
Problem Definition
Consider a polygon with vertices $V_1,\dots,V_n \in \mathbb{R}^2$ and let
\begin{aligned}...
1
vote
0
answers
64
views
What makes a curve a good fit in the context of logistic regression
As I wanted to gain a better intuition between why separation is a problem in the context of logistic regression, I did create in R two models, one where y is perfectly separated at $x=5$, and one ...
2
votes
1
answer
144
views
Is there any question as to what the likelihood function for a geometric distribution is? [closed]
Because I've read it is either
$g(x) =
\prod_{i=1}^∞\ p(1-p)^{x_i}$
or
$g(x) =
\prod_{i=1}^∞\ p(1-p)^{x_i-1}$
So I'm really confused.
Reference: https://math.stackexchange.com/questions/4429910/...
0
votes
0
answers
86
views
Outer product approximation of derivatives of likelihood
Lately, I have been reading Muthén's paper, "Muthén, B. (1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika,...
0
votes
0
answers
19
views
Proof of the simplification of the likelihood function [duplicate]
There are many references that quote that under the assumption that $x_1,x_2,\ldots x_n$ are i.i.d., the likelihood function can be simplified as follows:
$$P(x_1,x_2,\ldots ,x_n|\theta)=P(x_1|\theta)...
1
vote
0
answers
70
views
Can we use fisher scoring on a restricted likelihood function?
I have a question on how to optimize the RMLE in mixed effects regression models.
Starting with a mixed effects model:
$$y = X\beta + Zu + e$$
$$u \sim N(0, G), \quad e \sim N(0, R)$$
Where:
$y$ is ...
5
votes
1
answer
409
views
How to prevent negative variance estimates in likelihood optimization?
I have this regression model in which 100 people have 20 measurements taken:
$$ y_{ij} = \beta_0 + \beta_1 x_{ij} + \beta_2 t_{ij} + u_i + \epsilon_{ij} $$
Where:
$ i = 1, 2, ..., 100 \text{ (patient ...
0
votes
1
answer
127
views
Bayes Factor for two exact hypotheses on normally distributed data with unknown variance
Let's assume that the observations $x_i$ are normally distributed
$$
x_i \sim N(\mu, \sigma^2)
$$
and that the variance $\sigma^2$ is unknown. The Bayes Factor to compare two point hypotheses on the ...
1
vote
0
answers
151
views
Hessian for log likelihood of regression with respect to covariance matrix
I am interested in the residual covariance of a multivariate regression model. The regression is
$$
Y_t = X_t \beta + \varepsilon_t
$$
and I have a log likelihood as follows
$$
\mathcal{L}(\Sigma) = \...
1
vote
0
answers
34
views
Analysis of the relative likelihood with parametric bootstrapping
Using parametric bootstrapping, I find that the relative likelihood of model A is $l_A$ and of model B is $l_B$. Repeating the analysis several times yields a distribution of likelihood values for ...
3
votes
1
answer
77
views
Maximum Likelihood Estimation for Pairs of Observations
I have $n$ pairs of observations $(x_i,y_i)$, where each $y_i$ is distributed according to $\text{Pois}(\theta x_i)$, and I wish to do a maximum likelihood estimation for $\theta$ only based on this ...
4
votes
2
answers
207
views
Confidence regions of optimized parameters in maximum log likelihood fits
I am using a numerical optimization algorithm to maximize a log-likelihood function, $\mathcal{L}$. The log-likelihood function has a fixed number of parameters, $\{\theta_i\}$. These parameters are ...
0
votes
0
answers
66
views
Flattening a likelihood
Background
Let $y_1,y_2,\dots,y_K$ be a sequence of measurements.
I've derived a likelihood $\mathcal{L}(y|i)$ to solve a classification problem via the Bayesian classifier
\begin{equation}
p_k(i)=\...
1
vote
0
answers
61
views
Comparing GLMM with LMM with -2*log-likelihood
Is it possible/recommended to compare the -2*Log-Likelihood (-2LL) value of a Generalized Linear Mixed Model (GLMM) against the -2LL value (and/or AIC/AICC/BIC) of a Linear Mixed Model (LMM) with the ...
0
votes
0
answers
181
views
Manually calculate loglikelihood of fitted model
My goal is to calculate the loglikelihood of a fitted model on some unseen data.
To this end I defined a function that calculates the loglikelihood by hand on some new data. However, as a sanity check ...
3
votes
2
answers
364
views
Regression and independent random vectors
Lets consider that data samples are generated from random vectors $(X_1, Y_1)...(X_N, Y_N)$ of cross-sectional data. For regression one usually assumes that the error distribution is I.I.D. normally ...
1
vote
0
answers
115
views
Gaussian linear model marginal likelihood under g-prior
Consider a Gaussian linear model with an $ n \times 1 $ outcome vector $ y $ and an $ n \times p $ matrix of centered predictors $ X $:
$ y = \iota\alpha + X\beta + \varepsilon \quad \quad \varepsilon ...
0
votes
1
answer
100
views
Monte Carlo method for likelihoods ratio density estimation
I recently started reading Stephen Kay's Fundamentals of Statistical Signal Processing - Detection Theory (Volume II) and there is something I do not fully understand about likelihoods and hypothesis ...
2
votes
1
answer
192
views
Birnbaum's Theorem: Strong belief in a model $\implies$ the likelihood function must be used as a data reduction device?
Working through understanding section 6.3.2 (pg. 292-294) in Casella and Berger's Statistical Inference (2nd-ed).
The following definitions and principles are given:
Definition (Experiment): An ...
0
votes
1
answer
83
views
Why my Rho-square on Multinomial Logit Model (McFadden) so small?
When I'm using MNL, and try to find my rho square, it's found out to be so small. It is $0.0139$. For a good fit model, the rho square has to be between $0.2$-$0.4$. Is there any reason why it's so ...
3
votes
1
answer
293
views
Basic question about deriving MAP estimator
Say we have a random process $X(t, u)$ parametrized by $t$ and $u$ that generates data $x$. We also have a prior on $u$, $p(u)$.
Am I correct in stating that the expression to find the maximum a ...
4
votes
2
answers
267
views
Confusion over Fisher-scoring algorithm
Given a probability model $f(X;\theta)$ and a set of i.i.d. observations $x_1,\ldots,x_n$ which we assume to be drawn from some true parameter $f(X; \theta_0)$, we can perform maximum-likelihood ...
0
votes
0
answers
51
views
Correlation Coefficent is higher when likelihood of an event is lower, how does this occur?
I have different variables that I am interested in if they influence pass/fail rates.
To see what variables I might use as a leading indicator, I've pulled different variables such as "tutoring&...
0
votes
2
answers
134
views
Negative log-likelihood, high BIC, high R-squared, low error, using a difference-in-differences (DiD) methodology [closed]
I am trying to see the impact of Brexit on UK imports. My dependent variable are EU exports to the rest of world. I have monthly data from 2013 to 2023, also data is in billions of GBP.
When I do ...
0
votes
0
answers
70
views
How to obtain likelihood ($P(B/R)$ given the prior $P(R)$ and the posterior $P(R/B)$
I am working on a topic related to multiple-choice response. I would like to measure the efficiency of the information source (or a student’s information search) and I believe Bayesian statistics is ...
1
vote
0
answers
61
views
Closed-Form Lambda for Yeo-Johnson-Transformed Normal-Inverse-Gaussian-Distributed Random Variables
I would like to know whether there exists a closed-form solution for the $\lambda$-parameter that maximizes the log-likelihood function of Yeo-Johnson transformed random variables that (before the ...
2
votes
0
answers
72
views
Likelihood from posterior [closed]
This question is strange and perhaps silly but it would be very useful for my research. Is there any method to find the likelihood given a prior distribution and its corresponding posterior ...
2
votes
0
answers
138
views
Can an outcome variable be used twice in the same model?
When is it appropriate to use the same outcome variable in two likelihoods in the same model framework?
Here is a specific example:
...
2
votes
1
answer
110
views
Confused between Multiple Random Variables and Likelihood Function [closed]
I am confused between the two at a very fundamental level. Following is the problem:
I take observations $\vec{x}$ and create a histogram $\mathbf{n} = (n_1,\ldots,n_N)$ out of it with $N$ bins. ...