Parameter $\lambda$ of Box-Cox transformation and likelihood

Question

In the Box-Cox transformation parameter $\lambda$ is defined by likelihood function. But I cannot understand what exactly is maximized in this case? What is the purpose of maximum-likelihood in this case?

Btzzzz · Accepted Answer · 2018-03-30 00:34:23Z

This family of transformations combines power and log transformations, and is parametrised by $\lambda$. Note that this is continuous in $\lambda$. The aim is to use likelihood methods to find the “best” $\lambda$.

Maybe it is best to provide an example, so let's assume that, for some $\lambda$ we have $E(Y ^{(λ)} ) = X\beta$ together with the normality assumption. Then, given data $Y_1, . . . , Y_n$ (ie the untransformed data), the likelihood is

$$ (2\pi \sigma^2)^{-n/2}\exp\left(-\frac1{2\sigma^2}(Y^{(\lambda)}-X\beta)^T(Y^{(\lambda)}-X\beta)\right)\prod_{i=1}^nY_i^{\lambda -1}$$

where the product at the end is the relevant Jacobian which will clearly differ in size for different values of $\lambda$, and so we want the optimal one for it to be consistent with our data. For each $\lambda$, fitting the linear model gives $\hat{\beta}{(\lambda)} = (X^TX)^{-1}X^TY^{(\lambda)} , RSS(λ) = (Y^{(\lambda)})^T(I_X)Y^{(λ)}$ , and $\hat{\sigma}^2 (λ) = RSS(\lambda)/n$ (the maximum likelihood estimate).

The profile log-likelihood for $\lambda$, obtained by maximising the loglikelihood over $\beta$ and $\sigma^2$, is therefore

$$ L_{max}(\lambda)= c - \frac{n}{2}\log(RSS(\lambda)/n)+ (\lambda-1)\sum_{i=1}^n \log(Y_i)$$

And so... we treat this as we usually treat log-likelihood functions: values of $\lambda$ close to the maximising value $\hat{\lambda}$ of $\lambda$ are consistent with the data.

@railgun X is the input/predictor variables, and beta is the regression coefficients. — Closed Limelike Curves
– Closed Limelike Curves, Commented Oct 30, 2023 at 2:46

kjetil b halvorsen · Accepted Answer · 2018-03-29 21:07:07Z

This is a good question. One can argue that the model used to estimate the box-cox transformation, something like $$ y_i^{(\lambda)} = \beta_0 + x_i^T \beta +\epsilon_i, \quad 1=1,\dotsc,n $$ with the error term $\epsilon_i$ independent and identically distributed with a normal distribution , zero mean and some variance. This is problematic as a statistical model Peter McCullagh wrote a paper about that https://projecteuclid.org/euclid.aos/1035844977) and I will come back and try to write about that, but no time now.

For one thing, the $\beta$ parameters and the variance will depend on the transformation parameter $\lambda$, but more important, the meaning of the model will change with changing $\lambda$. But still "estimating" $\lambda$ could be a meaningful thing to do, as help in modeling. It could still be that it is really not estimation in a scientific sense (since the $\lambda$ parameter do not reflect or represent anything in the reality we are modeling, it just indexes a family of models).

But the most obvious thing that happens when varying $\lambda$, is that the size of the $y^{(\lambda)}$ will change. That must be accounted for, and the jacobian is introduced for that reason. A post with details is How do I get the Box-Cox log likelihood using the Jacobian?

(When time (after easter or later) I will come back and (try to) explain my maybe somewhat cryptic comments above)

Stack Exchange Network

Parameter $\lambda$ of Box-Cox transformation and likelihood

2 Answers 2

Your Answer

Linked

Hot Network Questions

Parameter $\lambda$ of Box-Cox transformation and likelihood

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Hot Network Questions