Questions tagged [winsorizing]
Winsorizing is a kind of data transformation used in robust/resistant statistics. Extreme values in the sample is replaced by some chosen data quantile(s). See https://en.wikipedia.org/wiki/Winsorizing
63 questions
0
votes
0
answers
42
views
Winsorizing outliers across multiple analyses: once or multiple times? (SPSS)
I have a 2×2 experimental design with four conditions and eight outcome variables. I’m supposed to winsorize outliers, but I’m confused about how many times this needs to be done because I’m ...
1
vote
0
answers
157
views
Valid approach: Winsorizing data for main analysis and then doing sensitivity analysis without winsorizing?
I've got a variable with psychological data (N=75) which is distributed pretty symmetrical, but has very few cases with very extreme values, more extreme to the left tail. But nevertheless this data ...
1
vote
1
answer
163
views
Removing outliers in several groups and for several features
I'm unsure on how to remove or winsorize outliers. Let's say I have 2 groups, treated and control. And I measure feature1 and feature2 for both.
How should I handle outliers? For each group and each ...
0
votes
0
answers
51
views
Identify ARMA model with no autocorrelation in residuals [duplicate]
I have a set of log-return data for a commodity and am unable to identify an appropriate ARMA model. I used auto.arima() function, and the optimized model is (4,0,4) with zero mean. However, when I ...
0
votes
1
answer
237
views
Can I apply both winsorization and CUPED to my experiment results?
Our current experimentation platform currently has winsorization implemented to reduce "whale effects" on metrics like revenue and volume. We are also interested in applying CUPED to further ...
2
votes
1
answer
624
views
Is winsorizing limited to the usage of a certain percentile cutoff?
The Context
My dataset consists of 68 groups, each with 4 data points inside it.
As means of a robustness test, I am looking to see how the type of average/mean I use impacts the analysis that I will ...
1
vote
1
answer
223
views
Should I trim/winsorize raw data or computed metric used in models?
Question: Should I rather winsorise (or trim, where relevant) my raw data, or the intermediary metric I use in my models?
Context: My analysis consists in 3 steps:
Collect raw data,
Compute ...
0
votes
0
answers
1k
views
Winsorizing or taking the logarithm first?
I testing if I can describe the StockPRice with EPS (=earnings per share), BookValuePS an ESGscore.
Before I start I winsorized all my variables. Now I want to take the loagrithm of e.g. BookValuePS ...
1
vote
0
answers
482
views
How to optimally choose winsorization thresholds for different metrics in large scale A/B testing platform
I work on our A/B testing platform where we have implemented one-sided winsorization broadly across all continuous variables (capped at 95th percentile). While that's a common cut-off, some of our ...
2
votes
0
answers
297
views
Winsorizing and ratios [closed]
Say I have a ratio c = a/b.
Should I winsorize both a and b and then ...
2
votes
0
answers
2k
views
Dealing with outliers: Interquartile range normalization vs. Winsorization
According to this page -- "When a data set has outliers, variability is often summarized by a statistic called the interquartile range, which is the difference between the first and third ...
4
votes
1
answer
2k
views
Removing outliers renders a new distribution that has its own outliers
I'm trying to remove all the outliers from a data set. However, after removing them, data points that weren't outliers before are now outliers due to the new distribution. What is the correct ...
2
votes
1
answer
1k
views
Winsorizing propensity scores
Is it kosher? Inverse propensity weights (IPW) has been shown to perform poorly when selection probabilities are small (Kang and Schafer, 2007).
Are there any standard solutions to this issue?
1
vote
0
answers
120
views
Name for the opposite of Winsorizing?
For some regressions we find it useful to focus on extreme values, and so we discard middling dependent values (which we might call "noise") from data in order to find relationships that hold at data ...
0
votes
0
answers
701
views
How to choose cut off for winsorization/ flooring- capping? What is the impact of variable distribution on the decision
To perform logistic regression I wish to winsorize outliers in independent/ explanatory variables by flooring and capping independent variables.
Can you suggest how I should choose cut-off for ...
0
votes
0
answers
901
views
Winsorizing data in small sample
I have a relatively small sample of panel data (quarterly data for 68 firms over 7 years). My dependent variable is positively skewed. In order to limit the influence of observations with large values,...
8
votes
1
answer
5k
views
Use and misuse of Winsorization
I am doing research on Winsorization (and trimming), which has been broadly applied in many fields, but I think many researchers didn't do it in a "rigorous" way. Or maybe even worse, they misuse it. ...
1
vote
1
answer
1k
views
Greater than 30% outliers in small dataset - what to do? Standard test? Test with outliers removed? Robust statistics?
I have a small-sample dataset representing observations from a longitudinal study. My principal interest is in 'change scores' across three parameters (A, B, C). This requires simple paired t-tests. ...
0
votes
1
answer
5k
views
Winsorizing data
I am currently working on my bachelor thesis in finance and I faced some problems regarding my dataset. I wanted to analyze the effect of leverage on the performance of companies and as many ...
1
vote
2
answers
2k
views
Is Winsorization performed on test data as well?
I know what is Winsorization and why is it applied. My understanding was that it is applied only on the train data to reduce the effect of outliers.
But! Recently I came across a kernel where Min, ...
4
votes
0
answers
2k
views
Treating outliers for time series forecasting in Python
What is the best way to treat outliers in a time series forecasting model? In particular, for product demand modeling?
Based on what I've read so far, the following methods can be applied:
...
0
votes
0
answers
722
views
Winsorization to remove spikes in time series
In product demand forecasting, is it valid to use winsorization to remove large outliers (spikes) in the data? I understand that the spikes may be due to holiday effects (e.g. people will buy more ...
3
votes
1
answer
2k
views
Treatment of outliers in financial data
I have a data set with financial panel data from 150 companies. I want to analyse the data using linear repeated measures ANOVA and OLS Regression (so far). For this, I want to use the absolute values ...
2
votes
0
answers
424
views
Does pre winsorising of a variable help for a logistic regression?
I am wondering if winsorising makes a difference in a logistic regression.
In a situation where I am looking at the individual contribution, looking at their individual discriminatory power (...
0
votes
1
answer
71
views
winsoring forecasting dataset
I have performed a logistic regression to estimate the default probability of a dataset of firms based on some basic balance-sheet ratios. I have winsorized all the ratios at the 1st and 99th ...
1
vote
1
answer
815
views
functional differences between using huber loss and winsorizing/trimming
Curious what the functional differences are between using a Huber loss function/ regression and Winsorizing data and then running a classic least squares regression.
Will the resulting outputs be ...
1
vote
0
answers
323
views
Winsorization when we run regressions by size group
I have a sample that consists of large, medium, and small firms and i want to run a separate regression for each size group. When I winsorize a variable should I do it for the whole sample (i.e. ...
13
votes
1
answer
4k
views
Downweight outliers in mean
I have a bunch of points $x_i$ and would like to calculate a kind of weighted mean that deemphasizes outliers. My first idea was to weight each point by $1/ (x_i - \mu)^2$. However, the problem is ...
3
votes
1
answer
996
views
Does the Hodges-Lehmann estimator perform better than trimmed/winsorized means?
I've been reading about the HL estimator, and a question came to mind. I could fairly easily create a mean-estimator where I trim or clip 29% of the data on either side and have a statistic with a ...
2
votes
0
answers
157
views
Optimizing Robust Statistics
A robust paired t-test is a better choice for skewed distributions than the conventional paired t-test (e.g Fradrette, Keselman, Lix, & Wilcox, 2003). One version of the robust test uses a trimmed ...
2
votes
2
answers
2k
views
What is the difference between GAS ( Generalized Autoregressive Score) model and a GARCH?
I am trying to analyze some data about Brent Oil volatility. So far I have managed to fit a GARCH(1,1) model and an EGARCH. However, someone has recommended to use a GAS model, Generalized ...
2
votes
0
answers
551
views
Transformation and/or Winsorizing?
I want to compare two group of 24 and 28 people with t-test on type of activity (5 different's types of activity and a total), later one the same value will be use in regression logistic.
If you ...
1
vote
0
answers
110
views
Winsorizing, just the outlier or all the value?
I have an outlier in my data set. I want to use the winsorizing quartile (to change the outlier to the 5th% and/or 95th%).
Looking at the quartiles, sometimes I have more values than just my ...
0
votes
0
answers
549
views
Using trimmed means and Winsorized variances to compute standardisation of data
I am looking into the pros and cons of each normalisation technique for work and it got me thinking. What if I used trimmed means and the sqrt of Winsorized variances to compute the standardised data? ...
6
votes
3
answers
8k
views
Extreme values in the data
I have a very general statistical question. If a variable has some extreme values, then for the purpose of statistical inferences for example OLS regression, is it better to detect these extreme ...
4
votes
3
answers
193
views
Limiting the range of numbers
Suppose that I have the following data set:
{0.1, 0.2, 0.5, -0.1, 0.5, 1.1, 0.8}
I would like to limit the range of these data to be within the range of [0,1].
...
2
votes
0
answers
211
views
Robust Estimators - Winsorized Variance degree of freedom (df)
this is my first question on this site.
So, I'm currently working on my final year thesis, and it was on Robust statistics. In my work, I will use Trimmed Mean, Winsorized Mean and Winsorized ...
2
votes
2
answers
349
views
Multilevel modeling for limited dependent variable
I am doing the research, using Multilevel modeling, with limited dependent variable number of days- it is limited downward (0) and upward (30). Is it necessary to use Multilevel logit model? Or is it ...
2
votes
1
answer
793
views
Ensemble time series prediction from two separate models
I have two different forecasts that are produced by ARMA models using two different data samples. The difference between the two data sets is their size: one used data from 2013-2014 and another used ...
4
votes
1
answer
2k
views
Linear regression with violated assumptions
I am trying to find out the determinants of cognitive function. The outcome variable is the mini–mental state examination which is a 30 point questionnaire response that has score values from 0 to 30(...
4
votes
3
answers
453
views
In a "bursty" dataset, how do you filter for the few important values that make up the bulk of the information?
Note sure if there is an existing stats concept for this but I have a dataset that consists of mostly small data points with a few large ones.
e.g. 1 2 1 3 1 2 87 3 2 1 1 1 1 3 1 2 1 1 1 99
How can ...
5
votes
1
answer
819
views
Why are Winsorized random variables independent?
While studying trimmed mean I understood that if I have some random variables $X_1, X_2, .., X_n$ by ordering them and trimming, the variables are no longer independent.
However it is said that "by ...
24
votes
4
answers
70k
views
Should the mean be used when data are skewed?
Often introductory applied statistics texts distinguish the mean from the median (often in the the context of descriptive statistics and motivating the summarization of central tendency using the mean,...
42
votes
5
answers
37k
views
What are the relative merits of Winsorizing vs. Trimming data?
Winsorizing data means to replace the extreme values of a data set with a certain percentile value from each end, while Trimming or Truncating involves removing those extreme values.
I always see ...
2
votes
1
answer
1k
views
Scale independent forecast error metric that works with changing signs
I am trying to analyze a quite large (~25,000 rows) dataset of cash flow forecasts. Receipts and expenses are aggregated, thus I may end up with the following data:
...
34
votes
8
answers
43k
views
Replacing outliers with mean
This question was asked by my friend who is not internet savvy. I've no statistics background and I've been searching around internet for this question.
The question is : is it possible to replace ...
3
votes
1
answer
724
views
Combining similarity scores
I have a list of m x n similarity score matrix, something like
...
15
votes
5
answers
45k
views
How to correct outliers once detected for time series data forecasting?
I'm trying to find a way of correcting outliers once I find/detect them in time series data. Some methods, like nnetar in R, give some errors for time series with big/large outliers. I already managed ...
3
votes
3
answers
3k
views
Robust standardization of data
I have some data where I want to determine whether the shape of the probability distribution has changed compared to 10 years ago.
One example is that I have for various automobiles multiple measures ...
2
votes
2
answers
713
views
Removing outliers and calculating a "lowest" attainable price from a pre-determined/fixed time series of prices
Just a foreword, I'm not a mathematician or otherwise statistically skilled. I know my way around calculating standard deviations, but it's all self taught. I'm a programmer with limited stats ...