Questions tagged [robust]
Robustness in general refers to a statistic's insensitivity to deviations from its underlying assumptions (Huber and Ronchetti, 2009).
593 questions
89
votes
14
answers
7k
views
Why haven't robust (and resistant) statistics replaced classical techniques?
When solving business problems using data, it's common that at least one key assumption that under-pins classical statistics is invalid. Most of the time, no one bothers to check those assumptions so ...
56
votes
4
answers
70k
views
Replicating Stata's "robust" option in R
I have been trying to replicate the results of the Stata option robust in R. I have used the rlm command form the MASS package ...
55
votes
4
answers
15k
views
Fast linear regression robust to outliers
I am dealing with linear data with outliers, some of which are at more the 5 standard deviations away from the estimated regression line. I'm looking for a linear regression technique that reduces the ...
55
votes
3
answers
16k
views
Why do we care so much about normally distributed error terms (and homoskedasticity) in linear regression when we don't have to?
I suppose I get frustrated every time I hear someone say that non-normality of residuals and /or heteroskedasticity violates OLS assumptions. To estimate parameters in an OLS model neither of these ...
47
votes
2
answers
7k
views
Why should we use t errors instead of normal errors?
In this blog post by Andrew Gelman, there is the following passage:
The Bayesian models of 50 years ago seem hopelessly simple (except, of
course, for simple problems), and I expect the Bayesian ...
43
votes
2
answers
204k
views
Error "system is computationally singular" when running a glm
I'm using the robustbase package to run a glm estimation. However when I do it, I get the following error:
...
36
votes
4
answers
12k
views
Why isn't RANSAC most widely used in statistics?
Coming from the field of computer vision, I've often used the RANSAC (Random Sample Consensus) method for fitting models to data with lots of outliers.
However, I've never seen it used by ...
34
votes
8
answers
43k
views
Replacing outliers with mean
This question was asked by my friend who is not internet savvy. I've no statistics background and I've been searching around internet for this question.
The question is : is it possible to replace ...
33
votes
2
answers
8k
views
Are 50% confidence intervals more robustly estimated than 95% confidence intervals?
My question flows out of this comment on an Andrew Gelman's blog post in which he advocates the use of 50% confidence intervals instead of 95% confidence intervals, although not on the grounds that ...
33
votes
6
answers
4k
views
What would a robust Bayesian model for estimating the scale of a roughly normal distribution be?
There exists a number of robust estimators of scale. A notable example is the median absolute deviation which relates to the standard deviation as $\sigma = \mathrm{MAD}\cdot1.4826$. In a Bayesian ...
30
votes
5
answers
23k
views
How robust is the independent samples t-test when the distributions of the samples are non-normal?
I've read that the t-test is "reasonably robust" when the distributions of the samples depart from normality. Of course, it's the sampling distribution of the differences that are important. I have ...
28
votes
1
answer
11k
views
What are the multidimensional versions of median [duplicate]
What are the multidimensional versions of the median and what are their pros and cons? I confess this doesn't have a single answer, but I think it is a useful question to ask and will be a benefit to ...
26
votes
4
answers
28k
views
Mean and Median properties
Can somebody explain me clear the mathematical logic that would link two statements (a) and (b) together? Let us have a set of values (some distribution). Now,
a) Median does not depend on every ...
26
votes
2
answers
20k
views
Is a weighted $R^2$ in robust linear model meaningful for goodness of fit analysis?
I estimated a robust linear model in R with MM weights using the rlm() in the MASS package. `R`` does not provide an $R^2$ value ...
22
votes
6
answers
35k
views
Fitting t-distribution in R: scaling parameter
How do I fit the parameters of a t-distribution, i.e. the parameters corresponding to the 'mean' and 'standard deviation' of a normal distribution. I assume they are called 'mean' and 'scaling/degrees ...
21
votes
3
answers
12k
views
Crash course in robust mean estimation
I have a bunch (around 1000) of estimates and they are all supposed to be estimates of long-run elasticity. A little more than half of these is estimated using method A and the rest using a method B. ...
21
votes
5
answers
12k
views
Robust t-test for mean
I am trying to test the null $E[X] = 0$, against the local alternative $E[X] > 0$, for a random variable $X$, subject to mild to medium skew and kurtosis of the random variable. Following ...
21
votes
2
answers
5k
views
Definition and Convergence of Iteratively Reweighted Least Squares
I've been using iteratively reweighted least squares (IRLS) to minimize functions of the following form,
$J(m) = \sum_{i=1}^{N} \rho \left(\left| x_i - m \right|\right)$
where $N$ is the number of ...
20
votes
5
answers
16k
views
Which robust correlation methods are actually used?
I plan to do a simulation study where I compare the performance of several robust correlation techniques with different distributions (skewed, with outliers, etc.). With robust, I mean the ideal case ...
19
votes
1
answer
3k
views
Robust PCA vs. robust Mahalanobis distance for outlier detection
Robust PCA (as developed by Candes et al 2009 or better yet Netrepalli et al 2014) is a popular method for multivariate outlier detection, but Mahalanobis distance can also be used for outlier ...
18
votes
3
answers
7k
views
Estimating parameters of a normal distribution: median instead of mean?
The common approach for estimating the parameters of a normal distribution is to use the mean and the sample standard deviation / variance.
However, if there are some outliers, the median and the ...
18
votes
1
answer
4k
views
Are robust methods really any better?
I have two groups of subjects, A, and B, each with a size of approximately 400, and about 300 predictors. My goal is to build a prediction model for a binary response variable. My customer wants to ...
18
votes
1
answer
20k
views
Why are rlm() regression coefficient estimates different than lm() in R?
I am using rlm in the R MASS package to regress a multivariate linear model. It works well for a number of samples but I am getting quasi-null coefficients for a particular model:
...
16
votes
2
answers
19k
views
What is a robust statistical test? What is a powerful statistical test?
Some statistical tests are robust and some are not. What exactly does robustness mean? Surprisingly, I couldn't find such a question on this site.
Moreover, sometimes, robustness and powerfulness of ...
15
votes
3
answers
3k
views
What does it mean for a statistical test to be "robust"?
Is there an intuitive way of understanding what these two sentences mean and why they're true?:
"ANOVA is 'robust' to deviations from normality with large samples", and...
"ANOVA is '...
14
votes
3
answers
2k
views
Can CART models be made robust?
A colleague in my office said to me today "Tree models aren't good because they get caught by extreme observations".
A search here resulted in this thread that basically supports the claim.
Which ...
14
votes
1
answer
7k
views
Investigating robustness of logistic regression against violation of linearity of logit
I am conducting a logistic regression with a binary outcome (start and not start). My mix of predictors are all either continuous or dichotomous variables.
Using the Box-Tidwell approach, one of my ...
14
votes
3
answers
6k
views
How to calculate Rousseeuw’s and Croux’ (1993) Qn scale estimator for large samples?
Let $Q_n = C_n.\{|X_i-X_j|;i < j\}_{(k)}$ so for a very short sample like $\{1,3,6,2,7,5\}$ it can be calculated from finding the $k$th order static of pairwise differences:
...
13
votes
3
answers
1k
views
Robust estimators for count data
I am looking for robust estimators for parameters of a processes producing count data:
$$
(n_1,...,n_K), n_i\in\mathbb{N}
$$
that is the underlying distribution is something like Poisson or Negative ...
13
votes
1
answer
2k
views
Robust estimation of kurtosis?
I am using the usual estimator for kurtosis, $$\hat{K}=\frac{\hat{\mu}_4}{\hat{\sigma}^4}$$, but I notice that even small 'outliers' in my empirical distribution, i.e. small peaks far from the center, ...
13
votes
1
answer
2k
views
Why not robust regression everytime?
Examples of this page show that simple regression is markedly affected by outliers and this can be overcome by techniques of robust regression: http://www.alastairsanderson.com/R/tutorials/robust-...
13
votes
1
answer
4k
views
Downweight outliers in mean
I have a bunch of points $x_i$ and would like to calculate a kind of weighted mean that deemphasizes outliers. My first idea was to weight each point by $1/ (x_i - \mu)^2$. However, the problem is ...
12
votes
4
answers
2k
views
Good form to remove outliers?
I'm working on statistics for software builds. I have data for each build on pass/fail and elapsed time and we generate ~200 of these/week.
The success rate is easy to aggregate, I can say that 45% ...
12
votes
3
answers
20k
views
When to use robust standard errors in Poisson regression?
I am using a Poisson regression model for count data and am wondering whether there are reasons not to use the robust standard error for the parameter estimates? I am particularly concerned as some ...
12
votes
4
answers
2k
views
Relationship between overfitting and robustness to outliers
What's the relationship between overfitting and sensitivity to outliers? For example:
Does robustness to outliers make necessarily models less prone to overfitting?
What about the other way around? ...
12
votes
2
answers
2k
views
What does Gaussian efficiency mean?
In case of robust estimators, What does Gaussian efficiency means? For example $Q_{_n}$ has 82% Gaussian efficiency and 50% breakdown point.
The reference is: Rousseeuw P.J., and Croux, C. (1993). “...
11
votes
2
answers
7k
views
Iglewicz and Hoaglin outlier test with modified z-scores - What should I do if the MAD becomes 0?
I'm a programmer with a small statistics background and I need to find outliers in a small list of integers and floats.
After some search on google I found the Iglewicz and Hoaglin outlier test which ...
11
votes
1
answer
3k
views
Are regressions with student-t errors useless?
Please see edit.
When you have data with heavy tails, doing a regression with student-t errors seems like an intuitive thing to do. While exploring this possibility, I ran into this paper:
Breusch, ...
11
votes
2
answers
3k
views
Robust multivariate Gaussian fit in R
I need to fit a generalized Gaussian distribution to a 7-dim cloud of points containing quite a significant number of outliers with high leverage. Do you know any good R package for this job?
10
votes
2
answers
2k
views
Why is maximum likelihood estimator suspectible to outliers?
I'm new to statistics and currently learning abot MLE.
Some of the papers I read: Robust Graph Embedding with Noisy Link Weights mentioned MLEs are suspectible to contamination in data, but didn't ...
10
votes
2
answers
91k
views
How to calculate the truncated or trimmed mean?
How can I calculate the truncated or trimmed mean? Let's say truncated by 10%?
I can imagine how to do it if you have 10 entries or so, but how can I do it for a lot of entries?
10
votes
1
answer
3k
views
Robust estimation of Poisson distribution
I have a set of numbers which are assumed to be coming from a Poisson distribution. The set has some outliers also and because of that, maximum likelihood estimates are badly affected. I heard that ...
10
votes
2
answers
1k
views
Robustness of correlation test to non-normality
I'm trying to reconcile two seemingly opposite statements about robustness to non-normality of the Pearson's correlation test statistic (where the null means "no correlation").
This CV answer says:
...
10
votes
1
answer
14k
views
How to calculate the standard error of the marginal effects in interactions (robust regression)?
what I am interested in learning is how to calculate the std error of the marginal effects of a X variable when it is part of an interaction, especially in robust regression.
There are tipically two ...
10
votes
2
answers
6k
views
What is the difference between the MCD and the MVE estimators?
As far as I understand,
the Minimum Covariance Determinant (MCD) estimator looks for the subset of h data points whose covariance matrix has the smallest determinant.
the Minimum Volume Ellipsoid (...
10
votes
2
answers
4k
views
Looking for a robust, distribution-free/nonparametric distance between multivariate samples
There are many distance functions for distributions out there, but I'm having a hard time wading through them all to find one that
is "distribution-free", or "nonparametric", by which I mean only that ...
10
votes
2
answers
2k
views
Robust mean estimation with O(1) update efficiency
I am looking for a robust estimation of the mean that has a specific property. I have a set of elements for which I want to calculate this statistic. Then, I add new elements one at a time, and for ...
10
votes
1
answer
6k
views
Why is lasso more robust to outliers compared to ridge?
In my attempt to reason about it intuitively I am concluding that ridge might be more robust to outliers.
Following is my intuitive/lose reasoning :
If there is an outlier then to match my ...
10
votes
1
answer
382
views
Solution to exercice 2.2a.16 of "Robust Statistics: The Approach Based on Influence Functions"
On page 180 of Robust Statistics: The Approach Based on Influence Functions one finds the following question:
16: Show that for location-invariant estimators always $\varepsilon^*\leq\frac{1}{...
10
votes
2
answers
705
views
Can we estimate the mean of an asymmetric distribution in an unbiased and robust manner?
Suppose I have i.i.d. samples $X_1, \cdots, X_n$ from some unknown distribution $F$ and I wish to estimate the mean $\mu=\mu(F)$ of that distribution and I insist that the estimator be unbiased - i.e.,...