Questions tagged [linear-model]
Refers to any model where a random variable is related to one or more random variables by a function that is linear in a finite number of parameters.
2,628 questions
0
votes
0
answers
40
views
Maximum likelihood estimation for linear regression [duplicate]
When conducting maximum likelihood estimation for simple linear regression whilst considering the regressors as random, the joint distribution of $f_{X,Y}(x,y;\theta) = f_{Y|X}(y|x;\theta) * f_{X}(x;\...
4
votes
3
answers
145
views
homoscedasticity for a linear model
I have a linear model with two continuous variables and three categorical variables. Do I need to check homoscedasticity within each level of my categorical variables, or is it sufficient to check ...
2
votes
1
answer
242
views
Calculating standard errors in least squares and the normality assumption
The question titled “How are the standard errors of coefficients calculated in a regression?” is asking how the standard errors of regression coefficient estimates are computed (for example, the ...
1
vote
0
answers
22
views
What to do if your residuals over time are not independent?
Design: 2 groups (treat vs control), 4 time points (baseline/time 0, time 1, time 2, and follow-up/time 3), time 0 to 2 are equally spaced in time (2 weeks apart) while follow up occurs 4 weeks after ...
0
votes
1
answer
85
views
Does endogeneity in a linear model imply a non-linear conditional mean?
Given the model:
$y = a + bx + u$,
and that $x$ is endogenous,
This implies that $E(u|x)\neq 0$.
I believe this implies that there are no values for $a$ and $b$ that exist that can make $E(u|x)=0$?
So ...
3
votes
1
answer
68
views
Efficient minimization of minimax objective function involving piecewise linear functions
Given an empirical cdf $\hat{F}$ with support on $[0,1]$, I am interested in finding the histogram with $B$ (unequal) bins with cdf $F_B$ that minimizes the maximum absolute deviation between the cdfs....
1
vote
0
answers
38
views
Choosing a Reference Value when Releveling Factors to Calculate Change Over Time
Context: I have a data set based around 16 different locations. Each location has a contaminant value measured once per year, from 2012 to 2023. The data looks something like this:
Location
Type
Year
...
6
votes
2
answers
262
views
Expectation and Kronecker product
Let $\mathbf{u} \sim \mathcal{U}(S_{\mathbb{R}^m})$ be a uniformly distributed random vector on the unit sphere $S_{\mathbb{R}^m} \triangleq \{\mathbf{u}\in \mathbb{R}^m\mid\|\mathbf{u}\|=1\}$ and let ...
4
votes
1
answer
77
views
Standardized coefficients vs Permutation-based variable importance
I recently read a post detailing the issues with using standardized coefficients as a measure of variable importance, and while looking for alternatives, I found several posts here discussing the use ...
0
votes
0
answers
77
views
Durbin-Watson test for weighted linear regression
My question concerns the use of the Durbin-Watson test for a weighted linear model in the context of calibration curves (a simple model y = ax + b in my case). I saw that there is a similar question ...
3
votes
0
answers
93
views
Running the Breusch-Pagan test manually in R assuming a weighted linear regression
I am trying to run the Breusch-Pagan test manually in RStudio from a weighted linear model (wi = 1/x^2). I need help verifying whether the following rationale is correct:
What I did:
WLS and residuals
...
6
votes
2
answers
301
views
Question on simple causal modeling
My causal graph looks like this: $A\to B$, $B \to C$ and $A \to C$. I want to model the direct influence of $B$ on $C$, i.e. changing $B$ by one unit, how much does $C$ change?
I think the correct ...
0
votes
0
answers
27
views
Partial Least Square Regression with Oracle on Variance matrix
I consider a centered random vector $(X_1,\cdots,X_d)$ and a real-valued random variable $Y$ such that the following model holds :
\begin{align*}
Y = \beta^{*}X^{\top} + \varepsilon
\end{align*}
with $...
0
votes
0
answers
61
views
Research method selection
I used a robust linear regression to evaluate the impact of some variables on a dependent variable, their linear correlation being tested and proven. Now, I want to compute an importance score of ...
3
votes
0
answers
120
views
Poles of rational basis functions as nonlinear features
Suppose I want to fit a linear model to non-linear rational features. Something like RationalTransformer instead of ...
0
votes
0
answers
93
views
Meaning of zero autocorrelation when performing linear regression on unstructured data
I have a seemingly very simple question that I cannot find the answer to.
When performing linear regression, we are assuming that the correlations between residuals is zero. This makes sense to me ...
2
votes
1
answer
75
views
Test for Pleiotropy vs close Linkage
Especially experts in fitting linear models.
I'm currently investigating pleiotropic associations in oats, and I found a paper by Schulthess et. al., 2017 that proposes a method to distinguish ...
6
votes
1
answer
168
views
Should I use lme4::lmer or nlme::lme for a repeated measures frog phonotaxis experiment with low between-subject variance?
I'm analyzing data from a frog phonotaxis experiment where I tested 17 females, each undergoing two trials. In each trial, a female was placed in a choice arena and exposed to two different acoustic ...
3
votes
2
answers
136
views
Impact of selection of features before Ridge regression : adaptation of regularization
I consider $X=(X_1,\cdots,X_d)$ a centered random vector such that its covariance matrix $\Sigma \in \mathbb{R}^{d \times d}$ is well defined. I suppose that for all $i= 1,\cdots,d$ we have $\text{Var}...
3
votes
1
answer
123
views
How to deal with unbalanced data in a within-subjects design using linear mixed effects model?
I conducted an experiment in which n=29 subjects participated. Each subject was measured under 5 different conditions, with 3-5 measurements per subject in conditions 1-4 and a maximum of 2 ...
8
votes
2
answers
308
views
Testing Hypotheses with Limited Data in an Ecological Experiment. How do I approach my data?
For my bachelor's thesis, I’m investigating the effect of voles and mulch on soil infiltration and saturated hydraulic conductivity (Ksat). I want to test the following three hypotheses:
Vole ...
4
votes
1
answer
300
views
Intercept in design matrix
Consider the design matrix:
1 0
1 0
1 0
1 0
1 1
1 1
1 1
1 1
when fitted to a linear model as y ~ design,...
2
votes
0
answers
52
views
Is it problematic to use a covariate derived from the dependent variable in linear regression?
I'm performing a simple linear regression with one dependent and one independent variable: dependent variable (y): Nighttime lights raster, Independent variable (x): Population raster
The issue is ...
0
votes
0
answers
68
views
Selecting number of PCs (principal components) to include in PCR (principal component regression)
How do you decide the number of principal components (PC) to include in principal component regression (PCR)?
I have seen these methods:
choosing the lowest RMSEP with the pls() package
Choosing PC's ...
0
votes
0
answers
60
views
Singular fit warning for LMM: Removing randoms effects problematic for model comparisons?
I have 33 plots measured in 2020 and remeasured in 2025, with three response variables. I'm using linear mixed models with "stand" and "age" as random effects. However, for some ...
0
votes
0
answers
52
views
Using random effects in a Linear Mixed Model and I think I am doing something wrong
I am performing an analysis on the correlation between the density of predators and the density of prey on plants, with exposure as a additional environmental/ explanatory variable. Sampled five ...
1
vote
0
answers
32
views
When dealing with correlated slopes and intercept, does it make sense to include only certain levels of the random slope variable (by subject)?
I am fitting a mixed effect model where some levels of the categorical variable are correlated with the intercept for the following formula, resulting in a singular fit:
...
0
votes
0
answers
50
views
Linear Mixed Model: Dealing with Predictors Collected Only During the Intervention (once)
We have conducted a study and are currently uncertain about the appropriate statistical analysis. We believe that a linear mixed model with random effects is required.
In the pre-test (time = 0), we ...
1
vote
0
answers
71
views
Linear classifier confusing two classes in both directions
I'm training a linear classier (converging fine), i.e. multi-class logistic regression, on 169 data points using 13 features. It's doing only slightly above chance, which is expected, it's a hard ...
5
votes
2
answers
277
views
Closed form for two-way ANOVA
Consider $Y_{ij} = \alpha_i + \beta_j + \varepsilon_{ij}$, where $\sum_i \alpha_i = 0$ for identifiability and $\varepsilon_{ij}$ is noise. The data is not balanced. What is the closed form for the ...
0
votes
0
answers
230
views
What does the Grenander condition imply about the data-generating process of $(y_i, x_i)$?
Consider a correctly specified linear model
$$
y_i = x_i^\top \beta + \varepsilon_i,\quad i=1,\dots,n,
$$
where the errors $\varepsilon_i$ are independent with zero mean and finite variance. ...
9
votes
2
answers
609
views
Good texts on Bayesian approach to ANOVA and beyond, specifically with replacement/comparison with frequentist methods in mind
I am pretty much at wit's end following a year of frequentist instruction on linear methods and models.
I tend to "think Bayesian" and find, for whatever reason, that Bayesian methods feel ...
1
vote
2
answers
145
views
Linear Mixed Model on correlated deltas for repeated measurements
There are already numerous threads related to Linear Mixed models, but they always deal with the raw dataset. However, I would like to use LMM on the deltas between the raw measurements, as using ...
6
votes
1
answer
326
views
Why isn't Frisch–Waugh–Lovell theorem (FWL) equivalent to fitting to residuals without orthogonalization?
In the context of regression by iteratively fitting each predictor, why isn't FWL equivalent to fitting each predictor to the residuals of the previous predictor without orthogonalizing the predictors ...
0
votes
0
answers
35
views
Obtain centered Regressors from `lm` object in R or via transformation
This is to some degree a software and to some degree a purely stats question. I have a design matrix $X$ with categorial and continuous variables. The first column contains only ones. For a given ...
0
votes
0
answers
48
views
Reporting unequal variance among groups in linear model
I have a linear model that predicts root mass as a function of root volume in 2 plant species. Code in R:
...
4
votes
1
answer
122
views
Consequence of useless regressor: Proving $\operatorname{cov}(\hat{\beta}) \succeq \operatorname{cov}(\tilde{\beta_1})$
$\newcommand{\cov}{\operatorname{cov}}$I am reading this note Linear Model and Extensions by Peng Ding and came across the following problem in Page 27 (Problem 4.4). Can someone help me figuring out ...
1
vote
0
answers
69
views
Computation of R squared in weighted linear regression [duplicate]
This question is based on the formulas as presented by the documentation of the fitting software Origin. Particularly this page. I'm working from the conceptualisation of the weights as inverse ...
4
votes
2
answers
232
views
Scale at which circular data approach linearity
We have a data set for hue, which is a circular variable. However, the data range only over 10 degrees of the possible 360. Can we use a linear mixed model to analyze the data, or do we have to use ...
10
votes
3
answers
392
views
Covariance between $\hat{\beta}_1$ and $\hat{\beta}_0$ for a simple linear model with correlated errors
$\newcommand{\Var}{\operatorname{Var}} \newcommand{\Cov}{\operatorname{Cov}}$I've found this assignment, given to undergrad students in a university in Cyprus, in 2022, where a simple linear model is ...
1
vote
0
answers
107
views
Sampling events to predict low-frequency events in linear regression
I am working on a project in which I am using two different datasources to predict a country's change in population as a percentage. The frequency at which that I receive data from these different ...
0
votes
0
answers
81
views
Covariance of observed and fitted values
I am confused about several computations I've seen for the covariance between the response and fitted values in linear regression.
For instance, it is a standard step to derive the bias-variance trade-...
0
votes
1
answer
57
views
Equivalence of two ways to obtain indirect effect in mediation analysis
In simple mediation analyisis related to usual linear regression we have 3 fitted regression models:
Y = aX
Y = bX + cZ
Z = dX
Here Y is outcome, X is explanatory variable and Z is a mediator.
...
0
votes
1
answer
117
views
Finding Weights for WLS Regression Using OLS
I am using statsmodels to run linear regressions on heteroscedastic data stored in DataFrame df_temp.
Currently, I am trying to find the variance of the model by ...
1
vote
0
answers
57
views
In time-series data with autocorrelation, how should I filter observations?
I'm looking at the simulation accuracy of a model that predicts forest carbon. I'm comparing these simulated values against measurements of forest carbon at specific sites. Each site has had forest ...
0
votes
1
answer
89
views
Contextualizing Mediation Analysis Results
Overview
I have no experience with mediation analysis, but I've run into a situation where it may be relevant. Since I lack experience, I'm not sure how much weight I should put into a significant ...
0
votes
0
answers
59
views
Can I use Standard error of prediction to execute a t-test for a new observation?
I have a linear model fitted to literature data, that correlates beek size to beek length.
I have a new observation and would like to test if, given its beek length, the beek size is inside the ...
8
votes
1
answer
299
views
Regression when $y$ has been calculated from $x$
I'm reading this paper:
Catalán, N., Marcé, R., Kothawala, D. N., & Tranvik, L. J. (2016). Organic carbon decomposition rates controlled by water retention time across inland waters. Nature ...
1
vote
0
answers
82
views
Problem with Adjusted $R^2$ as Criterion for Variable Selection
I have came across a problem when I am studying linear regression. From the book Plane Answers to Complex Questions (Christensen, 2020), he mentioned that:
If $F$ statistic is greater than 1, then ...
3
votes
3
answers
196
views
Why do highly correlated features cause the corresponding coefficients to be large and with opposite signs?
I've conducted the following experiment. Suppose we want to build a linear regression with 3 features: $$y = w_1 * x_1 + w_2 * x_2 + w_3 * x_3$$ and we have a dataset with certain number of samples. ...