Questions tagged [variance]
The expected squared deviation of a random variable from its mean; or, the average squared deviation of data about their mean.
4,351 questions
3
votes
0
answers
39
views
Is the variance the only well behaved functional of its kind?
I seems to be known to everybody except students who take introductory statistics courses that the reason why standard deviation rather than mean absolute deviation is used is that the variance of the ...
2
votes
2
answers
87
views
Interpreting Shapley values for variance decomposition?
I trained a SVM multiple regression model and want to know how much each feature contributes to the prediction variance (quantified by the RMSE). I got the Shapley values for each feature on data from ...
1
vote
1
answer
50
views
Vector direction of individual clusters after PCA
Suppose I have two multi-dimensional population samples - $A$ and $B$.
I hypothesise that $\mathbb{E}[A]$ and $\mathbb{E}[B]$ are orthogonal in this high-dimensional space.
To test this hypothesis, I ...
0
votes
1
answer
39
views
In linear regression, what changes when you use robust standard errors to overcome non-constant variance?
In my first course on linear regression, I learned the 4 basic assumptions that every textbook teaches: linearity, independence, homoscedasticity, and normality. However, I recently learned about ...
1
vote
0
answers
25
views
Spatial and temporal variance partitioning with missing values
I have a gridded dataset indexed by time and space, represented as a $m \times n$ array. I'm following along with Eq. 10 in this paper to partition the variance in this data over space and time. ...
0
votes
0
answers
20
views
Does a fully deterministic Pocock & Simon minimization affect variance estimation and inference validity (ignoring selection bias)?
I have a question about covariate-adaptive allocation in clinical trials.
Suppose we use a Pocock and Simon minimization procedure without any random component: that is, a fully deterministic ...
1
vote
0
answers
46
views
Deriving effective sample size and effective residual error from random-effects model in meta-analysis
I am performing a meta-analysis between two cohorts. I want to aggregate the estimates I obtained across a series of variables for each cohort.
I know that two main models are used in meta-analysis: ...
1
vote
0
answers
64
views
Is there a way to simplify this online calculation of the adjusted Fisher-Pearson standardized moment coefficient
I am working with an accelerometer on a project where I am calculating the angles between the vertical line and the accelerometer and the horizontal line and the accelerometer (something similar as ...
0
votes
0
answers
77
views
Sum of Squares and partial $R^2$ in robust multiple regression
I would like to obtain estimates of the variance explained by each predictor in multiple regression using robust linear regression (for instance with the R function ...
2
votes
0
answers
90
views
Variance of the average treatment effect using the unit-level variances of the potential outcomes
I'm reading Chapter 19 of Imbens and Rubin (2015), which is on the estimation of variance for estimators of treatment effects. They discuss using the variance of each sample/unit's potential outcomes ...
3
votes
1
answer
241
views
Standard deviation - which formula?
I teach maths and statistics at a secondary school in Glasgow, and am wondering what variance formula users think applies best to Q4 of this years National 5 Maths exam (see https://www.sqa.org.uk/...
3
votes
1
answer
84
views
Lower bound for MSE, based on sample mean and variance
Short question: For two unknown samples $A$ and $B$ of size $n$, if only their sample mean and sample variances are known, what can be said about $MSE(A,B)$ ?
Long version: To be more precise, I ...
3
votes
3
answers
487
views
Is it possible to identify this residual pattern as heteroscedastic or homoscedastic?
Plotting data onto a scatterplot from the U.S. Department of Transportation shows that there is a clear positive linear relationship between % of drivers under age 21 and fatal incidents per 1000 ...
8
votes
1
answer
1k
views
If I remove the point in a dataset which is furthest from the mean, does the sample variance automatically decrease, or at least not increase?
I guess the question fit in the title. It seems to me that it should be the case, but I don't see the proof.
It also seems to me to possibly depend on which definition of variance we use, in other ...
1
vote
0
answers
40
views
How to calculate the variance proportion explained by fixed and random variables in model?
I aim to determine the relative percentage of variance explained by each fixed and random variable in a linear mixed-effects model, such as: lmer(Y ~ A + B + C + (1|D)) (R syntax).
I've reviewed ...
4
votes
0
answers
112
views
Unbiased estimator for 1/Var(X)
Suppose I have iid observations $(X_1, \dots, X_N)$. Is there an unbiased estimator for $1 / \text{Var}(X)$?
Clearly, we can't just take the reciprocal of an unbiased estimator for $\text{Var}(X)$; by ...
3
votes
1
answer
147
views
Showing that the OLS variance is the same as the variance of difference in means (average treatment effect)
Please bear with me as the preamble might be a bit long. I'm currently reading Imbens and Rubin's Causal Inference book, and unfortunately there's no freely avaiable online copy so below are some ...
1
vote
0
answers
55
views
Should one include prognostic variables in outcome regression estimator for ATE?
I am interested in knowing whether adding prognostic variables will improve the asymptotic variance of outcome regression estimator for ATE. I have long heard that I should include prognostic ...
0
votes
0
answers
35
views
Implications of Time Series Exhibiting Slowly-Converging/"Practically"(?)-Infinite Unconditional Variance on Usual Time Series Methods
Let's say that we have observations, $x_t$, of a stock price over some period of time ($t = 0, 1, 2,\dots$) and want to model future behavior of the stock price using stochastic processes/time series ...
1
vote
1
answer
63
views
For reducing aggregate standard error, better to sample n items measured one time or n/k items measured k times?
I have a measurement system where we need to measure an item using an instrument with unknown noise.
Our metric of interest is average quality per item.
We have a lot of items that have been produced. ...
0
votes
0
answers
129
views
DGP for X_t and Y_t with specified correlation over time?
Edited for clarity.
I am trying to set up a DGP for two vectors of normally distributed random variables $\mathbf{X}=(X_1, ..., X_{50})$ and $\mathbf{Y}=(Y_1, ..., Y_{50})$, with the following two ...
2
votes
1
answer
87
views
Does CUPED violates the SUTVA principle?
The CUPED is a variance reduction method, which is a regression adjustments:
𝑌̂𝑐𝑣=𝑌¯−𝜃(𝑋¯−𝐸(𝑋)).
My concern is that we compute 𝜃 from the pooled population ...
0
votes
0
answers
48
views
R-squared in a multiple regression with nested slopes
Consider a multi-linear regression:
\begin{equation} \tag{Eq. 1}
Y=(a + b)X + (a + 6b)Z + \epsilon
\end{equation}
you can see that the slopes of variables $X$ and $Z$ are related by the term $a$. I ...
0
votes
0
answers
41
views
Concentration of measure and the variance of sums
I'm very new to measure theory & I'm trying to understand concentration of measure better and it's implications. The internet tells me it applies to sums (correct me if I'm wrong). But this makes ...
0
votes
0
answers
68
views
Appropriate R Package for Mixed Models
I have designed a mixed model that has four basic components:
a simple linear fixed effects component
a random intercept component
a random effects variance component that is a nonlinear function of ...
0
votes
0
answers
83
views
Is there a correct method to understand probability distributions of subset while keeping the distribution similar to the whole set's distribution
I am analyzing a dataset showcasing a number of (mostly) independent events occurring at varying times
(it is a list of car crash events, their locations and their times)
My task is to identify the ...
1
vote
0
answers
85
views
How do variances combine in samples with different average? [duplicate]
If you have two samples each with known mean, variance and sample size, how do you work out the variance of the combined sample?
Here by variance I mean the square root of the average of the squares ...
8
votes
1
answer
547
views
Is the "Jackknife estimator of variance" the variance or the squared standard error?
The Jackknife estimate for the variance is
$$\text{var}_\text{jack}(\theta) = \frac{n-1}{n}\sum(S_{(i)} - S_{(\cdot)})^2$$
well known, e.g. from Efron & Stein, "The Jackknife estimate of ...
0
votes
0
answers
90
views
Error propagation when taking the mean of uncertain variables
I have a set of N observations X and their corresponding standard deviations S. I calculate the mean of these observations, but I need the associated error as well. My current approach, which I'm ...
4
votes
2
answers
197
views
Coefficient of determination ($R^2$) for complex-valued models
For a model $f:\mathbb{R}\rightarrow\mathbb{R}$, the coefficient of determination is unambiguously defined by:
$$
R^2=1-\frac{\text{Unexplained variance}}{\text{Total variance}}=1-\frac{\sum_{k=1}^n\...
0
votes
1
answer
98
views
Mean and variance of partitioned sample multiplied by a scalar
Suppose I have a sample ${x_1,x_2,...,x_n}$ with mean $\mu$ and SD $\sigma$. The sample is normally distributed. If I take a fraction of the sample (say, the first $m$ values) and multiply them by a ...
1
vote
0
answers
65
views
Why is the residual standard error a measure of the standard deviation of $\epsilon$? [duplicate]
I have been working through the book "Introduction to Statistical Learning". Here is how I have come to understand a regression problem is set up:
We choose to do a simple linear regression....
1
vote
0
answers
155
views
Why do I get difference between Calculated Variance of All Sample Means and Theoretical $\sigma^2/n$ Formula
This question came from the example:
The five associates and the number of cars sold last week are:
\begin{array}{|c|c|}
\hline
\text{Sales Associate} & \text{Cars Sold} \\
\hline
\{ \text{Peter ...
3
votes
1
answer
235
views
Asymmetric Bayes error $\mathcal{N}\left(0,\begin{bmatrix} \sigma_1^2 & 0 \\ 0 & \sigma_2^2 \end{bmatrix}\right)$ vs $\mathcal{N}(0,I)$ classification
Consider the problem of classifying $x \in \mathbb{R}^2$ into one of two classes, $c1$ and $c2$, with known distributions \begin{align} & p(x\mid c1) \sim \mathcal{N}\left(\begin{bmatrix}
0 \\
0
\...
1
vote
0
answers
84
views
Detecting Adversaries in Robust Mean Estimation Without Using Variance?
In robust mean estimation under strong contamination models (e.g., Huber's model or adversarial corruption), variance is often used to assign small weights to suspicious data sources Kane, D. M., ...
1
vote
0
answers
48
views
Portfolio optimisation for 2 shares - What are some recommended metrics to use?
I want to maximize the total number of shares of either A or B, by reallocating shares daily. For simplicity, the trades occur at each day’s closing prices. I'm basically determining the "optimal ...
4
votes
2
answers
192
views
How to calculate number of trials needed to observe an event in a Poisson distribution?
Suppose one has a Poisson distributed random variable $\lambda$ with mean
$$\mu(\lambda) = 7$$ and variance
$$\sigma^2(\lambda) = 7$$
Is there a direct formula to calculate the expected number of ...
3
votes
1
answer
143
views
Expected Value and Variance of Skewness/Kurtosis Estimator for the Difference of Normal Random Variables
As known, sample skewness ($g_1$) and kurtosis ($g_2$) can be calculated as follows:
$$
g_1 = \frac{m_3}{m_2^{3/2}} = \frac{\tfrac{1}{n} \sum_{i=1}^n (x_i-\bar{x})^3}{\left[\tfrac{1}{n} \sum_{i=1}^n \...
0
votes
0
answers
68
views
Should varIdent be used in a linear model with outliers in nlme in R
I am unsure whether/how to use varIdent from the nlme package to allow different variances across factor levels when analysing a dataset which has outliers.
I am specifically interested in mixed ...
0
votes
0
answers
60
views
Parameterized variance? (Of a single sample or distribution.)
Trying to get my head around how to calculate the Anderson-Darling test statistic. I came across this page: https://twosampletest.com/reference/ad_test.html
The AD test compares two ECDFs by looking ...
0
votes
0
answers
34
views
How do I calculate repeatability (or variance explained) for each random slope in a mixed-effects model with multiple random slopes?
I'm working with a generalized linear mixed model and want to calculate the repeatability (or variance explained) of individual (ID) responses to each of my environmental variables (sst_scaled, ...
0
votes
0
answers
36
views
How do I find the variance explained by a fixed effect in a MCMCglmm threshold model?
I have run a threshold model using MCMCglmm (binary response variable) and obtained the proportion of variance explained by the random effects, but how do I do this for my fixed effect?
0
votes
0
answers
48
views
Reporting unequal variance among groups in linear model
I have a linear model that predicts root mass as a function of root volume in 2 plant species. Code in R:
...
1
vote
0
answers
40
views
Estimating variances from observations of linear combinations
This might be a silly question, as my statistics knowledge is quite limited, so bear with me.
Suppose $X_1,\dotsc,X_n$ are independent normal random variables with known mean and unknown variances, $...
7
votes
3
answers
492
views
Proof that variance of a regression parameter is negatively related to the coefficient of determination
According to Gujarati (5th edition, p.328, equation 7.4.12), the variance of $\hat \beta_2$ in a Multiple Linear Regression Model with a constant and two regressors is
$$
\frac{\sigma^2}{\sum_ix_{2i}^...
1
vote
0
answers
57
views
Determine sample size based on a pilot to estimate the variability of ratios within a given level [closed]
After thinking back and forth for a long time, I just can't get any further with a problem. Basically, the question is how representative a number of samples is for a population. The word “...
2
votes
1
answer
90
views
Variance calculated from bootstrap vs variance calculated otherwise?
For a parameter of interest $\theta$ and its estimator $\hat{\theta}$ based on a sample $X = (X_1, X_2, ..., X_n)$ from a population with distribution $F$, the theoretical variance of $\hat{\theta}$ ...
0
votes
0
answers
60
views
Normalizing datasets to make their variance comparable?
I'm working on processing a large variety of different data series, each being a list of numbers. Some of my series range from -1 to 1, some from 0.0001 to 0.0002, and some from 2 million to 3 ...
6
votes
3
answers
647
views
Is it ever preferable to have an estimator with a larger variance?
This is a question that has stumped me for some time.
In statistics, a common way to judge the quality of an estimator is by
its variance - an estimator is said to be better if the variance of
the ...
1
vote
1
answer
105
views
Best way to compare variance between two treatments with repeated measures and uneven samples?
I am trying to determine whether the variation in a response differs between two treatment groups, I am curious what others think of my current strategy and I have some outstanding questions that I ...