Skip to main content

Questions tagged [robust]

Robustness in general refers to a statistic's insensitivity to deviations from its underlying assumptions (Huber and Ronchetti, 2009).

Filter by
Sorted by
Tagged with
89 votes
14 answers
7k views

When solving business problems using data, it's common that at least one key assumption that under-pins classical statistics is invalid. Most of the time, no one bothers to check those assumptions so ...
doug's user avatar
  • 10.7k
56 votes
4 answers
70k views

I have been trying to replicate the results of the Stata option robust in R. I have used the rlm command form the MASS package ...
user56579's user avatar
  • 561
55 votes
4 answers
15k views

I am dealing with linear data with outliers, some of which are at more the 5 standard deviations away from the estimated regression line. I'm looking for a linear regression technique that reduces the ...
Matteo Fasiolo's user avatar
55 votes
3 answers
16k views

I suppose I get frustrated every time I hear someone say that non-normality of residuals and /or heteroskedasticity violates OLS assumptions. To estimate parameters in an OLS model neither of these ...
Zachary Blumenfeld's user avatar
47 votes
2 answers
7k views

In this blog post by Andrew Gelman, there is the following passage: The Bayesian models of 50 years ago seem hopelessly simple (except, of course, for simple problems), and I expect the Bayesian ...
Potato's user avatar
  • 1,135
43 votes
2 answers
204k views

I'm using the robustbase package to run a glm estimation. However when I do it, I get the following error: ...
NK1's user avatar
  • 613
36 votes
4 answers
12k views

Coming from the field of computer vision, I've often used the RANSAC (Random Sample Consensus) method for fitting models to data with lots of outliers. However, I've never seen it used by ...
Bossykena's user avatar
  • 687
34 votes
8 answers
43k views

This question was asked by my friend who is not internet savvy. I've no statistics background and I've been searching around internet for this question. The question is : is it possible to replace ...
Alun's user avatar
  • 443
33 votes
2 answers
8k views

My question flows out of this comment on an Andrew Gelman's blog post in which he advocates the use of 50% confidence intervals instead of 95% confidence intervals, although not on the grounds that ...
user1205901 - Слава Україні's user avatar
33 votes
6 answers
4k views

There exists a number of robust estimators of scale. A notable example is the median absolute deviation which relates to the standard deviation as $\sigma = \mathrm{MAD}\cdot1.4826$. In a Bayesian ...
Rasmus Bååth's user avatar
30 votes
5 answers
23k views

I've read that the t-test is "reasonably robust" when the distributions of the samples depart from normality. Of course, it's the sampling distribution of the differences that are important. I have ...
Archaeopteryx's user avatar
28 votes
1 answer
11k views

What are the multidimensional versions of the median and what are their pros and cons? I confess this doesn't have a single answer, but I think it is a useful question to ask and will be a benefit to ...
John Robertson's user avatar
26 votes
4 answers
28k views

Can somebody explain me clear the mathematical logic that would link two statements (a) and (b) together? Let us have a set of values (some distribution). Now, a) Median does not depend on every ...
ttnphns's user avatar
  • 60.2k
26 votes
2 answers
20k views

I estimated a robust linear model in R with MM weights using the rlm() in the MASS package. `R`` does not provide an $R^2$ value ...
CraigMilligan's user avatar
22 votes
6 answers
35k views

How do I fit the parameters of a t-distribution, i.e. the parameters corresponding to the 'mean' and 'standard deviation' of a normal distribution. I assume they are called 'mean' and 'scaling/degrees ...
user12719's user avatar
  • 1,149
21 votes
3 answers
12k views

I have a bunch (around 1000) of estimates and they are all supposed to be estimates of long-run elasticity. A little more than half of these is estimated using method A and the rest using a method B. ...
Ondrej's user avatar
  • 567
21 votes
5 answers
12k views

I am trying to test the null $E[X] = 0$, against the local alternative $E[X] > 0$, for a random variable $X$, subject to mild to medium skew and kurtosis of the random variable. Following ...
shabbychef's user avatar
  • 15.2k
21 votes
2 answers
5k views

I've been using iteratively reweighted least squares (IRLS) to minimize functions of the following form, $J(m) = \sum_{i=1}^{N} \rho \left(\left| x_i - m \right|\right)$ where $N$ is the number of ...
Chris A.'s user avatar
  • 255
20 votes
5 answers
16k views

I plan to do a simulation study where I compare the performance of several robust correlation techniques with different distributions (skewed, with outliers, etc.). With robust, I mean the ideal case ...
19 votes
1 answer
3k views

Robust PCA (as developed by Candes et al 2009 or better yet Netrepalli et al 2014) is a popular method for multivariate outlier detection, but Mahalanobis distance can also be used for outlier ...
Mustafa Eisa's user avatar
  • 1,332
18 votes
3 answers
7k views

The common approach for estimating the parameters of a normal distribution is to use the mean and the sample standard deviation / variance. However, if there are some outliers, the median and the ...
SO is dead's user avatar
  • 3,448
18 votes
1 answer
4k views

I have two groups of subjects, A, and B, each with a size of approximately 400, and about 300 predictors. My goal is to build a prediction model for a binary response variable. My customer wants to ...
user765195's user avatar
  • 2,235
18 votes
1 answer
20k views

I am using rlm in the R MASS package to regress a multivariate linear model. It works well for a number of samples but I am getting quasi-null coefficients for a particular model: ...
Robert Kubrick's user avatar
16 votes
2 answers
19k views

Some statistical tests are robust and some are not. What exactly does robustness mean? Surprisingly, I couldn't find such a question on this site. Moreover, sometimes, robustness and powerfulness of ...
JetLag's user avatar
  • 1,165
15 votes
3 answers
3k views

Is there an intuitive way of understanding what these two sentences mean and why they're true?: "ANOVA is 'robust' to deviations from normality with large samples", and... "ANOVA is '...
Nate's user avatar
  • 2,537
14 votes
3 answers
2k views

A colleague in my office said to me today "Tree models aren't good because they get caught by extreme observations". A search here resulted in this thread that basically supports the claim. Which ...
Tal Galili's user avatar
  • 22.1k
14 votes
1 answer
7k views

I am conducting a logistic regression with a binary outcome (start and not start). My mix of predictors are all either continuous or dichotomous variables. Using the Box-Tidwell approach, one of my ...
Short Elizabeth's user avatar
14 votes
3 answers
6k views

Let $Q_n = C_n.\{|X_i-X_j|;i < j\}_{(k)}$ so for a very short sample like $\{1,3,6,2,7,5\}$ it can be calculated from finding the $k$th order static of pairwise differences: ...
K-1's user avatar
  • 257
13 votes
3 answers
1k views

I am looking for robust estimators for parameters of a processes producing count data: $$ (n_1,...,n_K), n_i\in\mathbb{N} $$ that is the underlying distribution is something like Poisson or Negative ...
Roger V.'s user avatar
  • 5,091
13 votes
1 answer
2k views

I am using the usual estimator for kurtosis, $$\hat{K}=\frac{\hat{\mu}_4}{\hat{\sigma}^4}$$, but I notice that even small 'outliers' in my empirical distribution, i.e. small peaks far from the center, ...
yoki's user avatar
  • 1,526
13 votes
1 answer
2k views

Examples of this page show that simple regression is markedly affected by outliers and this can be overcome by techniques of robust regression: http://www.alastairsanderson.com/R/tutorials/robust-...
rnso's user avatar
  • 10.4k
13 votes
1 answer
4k views

I have a bunch of points $x_i$ and would like to calculate a kind of weighted mean that deemphasizes outliers. My first idea was to weight each point by $1/ (x_i - \mu)^2$. However, the problem is ...
jdm's user avatar
  • 301
12 votes
4 answers
2k views

I'm working on statistics for software builds. I have data for each build on pass/fail and elapsed time and we generate ~200 of these/week. The success rate is easy to aggregate, I can say that 45% ...
Kim Gräsman's user avatar
12 votes
3 answers
20k views

I am using a Poisson regression model for count data and am wondering whether there are reasons not to use the robust standard error for the parameter estimates? I am particularly concerned as some ...
kara's user avatar
  • 121
12 votes
4 answers
2k views

What's the relationship between overfitting and sensitivity to outliers? For example: Does robustness to outliers make necessarily models less prone to overfitting? What about the other way around? ...
Josh's user avatar
  • 4,668
12 votes
2 answers
2k views

In case of robust estimators, What does Gaussian efficiency means? For example $Q_{_n}$ has 82% Gaussian efficiency and 50% breakdown point. The reference is: Rousseeuw P.J., and Croux, C. (1993). “...
K-1's user avatar
  • 505
11 votes
2 answers
7k views

I'm a programmer with a small statistics background and I need to find outliers in a small list of integers and floats. After some search on google I found the Iglewicz and Hoaglin outlier test which ...
szuuuken's user avatar
  • 213
11 votes
1 answer
3k views

Please see edit. When you have data with heavy tails, doing a regression with student-t errors seems like an intuitive thing to do. While exploring this possibility, I ran into this paper: Breusch, ...
John Salvatier's user avatar
11 votes
2 answers
3k views

I need to fit a generalized Gaussian distribution to a 7-dim cloud of points containing quite a significant number of outliers with high leverage. Do you know any good R package for this job?
user avatar
10 votes
2 answers
2k views

I'm new to statistics and currently learning abot MLE. Some of the papers I read: Robust Graph Embedding with Noisy Link Weights mentioned MLEs are suspectible to contamination in data, but didn't ...
port trum's user avatar
  • 103
10 votes
2 answers
91k views

How can I calculate the truncated or trimmed mean? Let's say truncated by 10%? I can imagine how to do it if you have 10 entries or so, but how can I do it for a lot of entries?
Queops's user avatar
  • 471
10 votes
1 answer
3k views

I have a set of numbers which are assumed to be coming from a Poisson distribution. The set has some outliers also and because of that, maximum likelihood estimates are badly affected. I heard that ...
suresh's user avatar
  • 255
10 votes
2 answers
1k views

I'm trying to reconcile two seemingly opposite statements about robustness to non-normality of the Pearson's correlation test statistic (where the null means "no correlation"). This CV answer says: ...
max's user avatar
  • 1,704
10 votes
1 answer
14k views

what I am interested in learning is how to calculate the std error of the marginal effects of a X variable when it is part of an interaction, especially in robust regression. There are tipically two ...
R.Astur's user avatar
  • 1,147
10 votes
2 answers
6k views

As far as I understand, the Minimum Covariance Determinant (MCD) estimator looks for the subset of h data points whose covariance matrix has the smallest determinant. the Minimum Volume Ellipsoid (...
user7064's user avatar
  • 2,277
10 votes
2 answers
4k views

There are many distance functions for distributions out there, but I'm having a hard time wading through them all to find one that is "distribution-free", or "nonparametric", by which I mean only that ...
kjo's user avatar
  • 1,997
10 votes
2 answers
2k views

I am looking for a robust estimation of the mean that has a specific property. I have a set of elements for which I want to calculate this statistic. Then, I add new elements one at a time, and for ...
Bitwise's user avatar
  • 6,684
10 votes
1 answer
6k views

In my attempt to reason about it intuitively I am concluding that ridge might be more robust to outliers. Following is my intuitive/lose reasoning : If there is an outlier then to match my ...
Siddharth Shakya's user avatar
10 votes
1 answer
382 views

On page 180 of Robust Statistics: The Approach Based on Influence Functions one finds the following question: 16: Show that for location-invariant estimators always $\varepsilon^*\leq\frac{1}{...
user603's user avatar
  • 24k
10 votes
2 answers
705 views

Suppose I have i.i.d. samples $X_1, \cdots, X_n$ from some unknown distribution $F$ and I wish to estimate the mean $\mu=\mu(F)$ of that distribution and I insist that the estimator be unbiased - i.e.,...
Thomas Steinke's user avatar

1
2 3 4 5
12