Newest 'outliers' Questions

5 votes

2 answers

500 views

Extreme outlier in real data

I'm looking at the amount of carbon in seven forest pools. For dead trees left on the landscape across many locations and over several harvest retention (logging) treatments, there is an extreme value ...

Declan

51

asked Nov 27 at 23:33

0 votes

0 answers

42 views

Winsorizing outliers across multiple analyses: once or multiple times? (SPSS)

I have a 2×2 experimental design with four conditions and eight outcome variables. I’m supposed to winsorize outliers, but I’m confused about how many times this needs to be done because I’m ...

mk0

21

asked Nov 14 at 19:55

1 vote

2 answers

275 views

outlier detection in classification

I am curious if there are any methods of outlier detection [read: NOT high leverage point detection] that be used in classification problems without fitting a model. As I understand it, some commonly ...

plotmaster473

255

asked Oct 19 at 12:04

1 vote

0 answers

34 views

How to assign an observation to a group but include an out-group option?

I have collected data from a number of known groups, and from individuals that I would like to assign to a group but may be from an unknown group. For simplicity's sake, I have created an example with ...

AnneA

11

asked Oct 9 at 11:18

5 votes

3 answers

533 views

How to handle outliers when some predictors perform better with them and others without

I’m working on a project where I need to build a predictive model for wine quality based on its chemical properties. The goal is to find which features best explain or predict the quality score. I’ve ...

QualityX

51

asked Oct 8 at 19:23

8 votes

4 answers

1k views

Should I transform my data before or after removing outliers? (Highly skewed cortisol example)

I am analyzing cortisol data collected over multiple days, with three samples per day (Cortisol_1, Cortisol_2, Cortisol_3). My data are extremely skewed: Skewness of Cortisol_1: 26.3 Skewness of ...

Aaliya Ahamed

101

asked Jul 8 at 15:31

2 votes

0 answers

30 views

Hypothesis testing for a weekly seasonal effect in the presence of outliers

Suppose that I have a time series where the mean usually changes smoothly over time, and I want a hypothesis test for whether there is a weekly seasonal pattern to the data. The time series also ...

Alex

817

asked Jul 2 at 10:27

0 votes

0 answers

65 views

A simple-ish way of estimating the number of modes, and the 'pronounced'-ness of said modes of a discrete, finite distribution

Intuitively, let's say we're given a price $p$ for some product, and we want to compare the prices with what's available on the market (ex: to determine if we're being ripped off or not). We come back ...

MergeMonster

21

asked Jun 19 at 15:25

0 votes

0 answers

66 views

What does iteration in sigma clipping do

If I only want the high-SNR data, I do sigma-clipping to an array. As this link says Suppose you have a set of data. Compute its median m and its standard deviation ...

Firestar-Reimu

1

asked Jun 10 at 19:52

8 votes

1 answer

378 views

Does the presence of outliers always mean that robust regression analysis should be used?

I revised my question to be more specific, as suggested by the community. Since my knowledge of statistics is limited, I'm not entirely sure what it means to specialize in this subject—but I'll give ...

Ertan

141

asked May 30 at 20:23

3 votes

2 answers

124 views

How to test if a single value in a set of values is higher than the remaining values

I have a set of $8$ participants $P_1, \ldots P_8$. Each participant takes two tasks $A$ and $B$, and each task results in an ordered vector of $6$ positive values. I'll denote the vector recorded ...

chesslad

241

asked May 23 at 14:57

0 votes

0 answers

68 views

Should varIdent be used in a linear model with outliers in nlme in R

I am unsure whether/how to use varIdent from the nlme package to allow different variances across factor levels when analysing a dataset which has outliers. I am specifically interested in mixed ...

Pratorum

65

asked Apr 11 at 16:39

3 votes

1 answer

167 views

What is the difference between Theil-sen estimator and Repeated median regression?

I am currently learning about robust regression and came across two variants: the Theil–Sen estimator and Repeated Median Regression. However, I got confused when comparing these two algorithms. Both ...

Olivia

191

asked Apr 8 at 1:46

6 votes

1 answer

214 views

What regression method should I use for non-normal, outlier-heavy biomedical data with a continuous outcome?

I'm working with a large dataset of about 50,000 patients and trying to understand how protein expression levels influence erythrocyte (red blood cell) counts. The outcome variable — erythrocyte count ...

Nikimiskata

61

asked Mar 22 at 15:02

5 votes

1 answer

282 views

Moderation analysis assumption: univariate outliers after centering

I am conducting a moderation analysis for my thesis and am performing assumption testing. I found a few univariate outliers and transformed any scores that were z-score of > (-)3.29. I then ...

Emily

51

asked Feb 22 at 3:02

0 votes

1 answer

129 views

dataset with outliers: Kendall Tau or Spearman´s Rho?

I am analyzing some data and in particular I want to test for the presence of a monotonic relationship between two random variables whose values don´t appear normally distributed. I know about the ...

Jamilo

1

asked Feb 14 at 16:58

0 votes

1 answer

98 views

Outlier Removal from only One Class in a binary classification problem

Can outlier removal be done only on one class in a binary classification problem? when facing with class imbalance for example, can it be done only on majority class? if so, is there any paper on this ...

vhd

25

asked Feb 14 at 9:54

5 votes

2 answers

602 views

How can I use unsupervised methods to recommend an “ideal” number of managers for companies when no labels exist?

I have a dataset of around 100,000 companies. For each company, I have a bunch of features such as: Number of employees, Number of customers, Number of complaints, other additional company attributes ...

B_fig

63

asked Feb 13 at 12:57

3 votes

2 answers

299 views

DFBETA in regression model diagnostics of influential points

Belsley (1980) mentioned how DFBETA are calculated for linear regression models "DFBETA values are usually calculated via equations that relate the least-squares fit of a model calculated with $n$...

user27842288

101

asked Feb 4 at 0:27

6 votes

2 answers

652 views

Can you "dummy-out" an outlier on the independent variable?

I want to run a regression where one of the regressors has a single outlier. I wonder if I can include a dummy variable to rule out this outlier without loosing information from other regressors, as ...

Victor Hugo Schieck Terziani

63

asked Jan 28 at 14:10

0 votes

1 answer

116 views

Outlier detection, is it appropriate to take the mean of Z scores? [closed]

Simple backstory, I have few crypto tokens that I want to look at. I want to do some outlier detection and look for which token could be susceptible to a rugpull or scam. Lets say, we get 10 tokens. I ...

myts999

13

asked Jan 28 at 6:38

0 votes

0 answers

54 views

How to handle an extreme outlier (clinical setting)

I am currently analyzing data from cancer patients and plan on running cox regression and assessing survival times. I also want to correlate certain tumor-related data to different markers. One of ...

Maria Nieves Arredondo Lasso

1

asked Jan 22 at 11:16

1 vote

1 answer

141 views

Feature selection and outlier detection in panel regression with fixed effects

I am trying to fit the following panel regression with fixed entity effects $$Y_{it} = \alpha_i + \sum_j \beta_jX^{(j)}_{it} + \epsilon_{it},$$ where the index $j$ labels the different features. Some ...

Mark Dubin

11

asked Jan 11 at 17:54

0 votes

0 answers

37 views

How do you identify "important" changes between 2 or 3 time periods?

I am comparing sales by Customer for a company for 2 years in a row (sometimes for 3 years) and would like to highlight to my sales team the customers they should be looking into: customers who have ...

Adriana

1

asked Jan 9 at 14:54

2 votes

0 answers

36 views

What do I do with a time series that has a large, strong, trend-violating glitch? [duplicate]

I have data (a few hundred thousand points) from 1 January 2017 up to a few days ago. I can create a time series by day (or even by time to the minute) if I so wish. However, this data is of public ...

Bryan

1,541

asked Jan 2 at 14:09

2 votes

2 answers

316 views

Checking for an increase in outliers over time

I've been asked to test if there has been an increase in the number and size specifically of the high outliers over the years. The purpose is to show that there are more and higher extreme cases as ...

Woolynik

51

asked Dec 13, 2024 at 6:09

7 votes

4 answers

880 views

Can you remove outliers if they are less than 10% of the datapoints? [duplicate]

I am currently attending my first data analysis class and we do some simple hypothesis tests like t test etc. Our teacher told us that we can remove outliers, as long as they are not more that the 10% ...

Maria

71

asked Dec 7, 2024 at 10:22

2 votes

3 answers

153 views

Testing forecasting accuracy - outliers [ with example]

I have a simple model that produces forecast values. The model works on hourly data. Now, I am only interested in observations with flags. I would like to identify where the forecasts are ...

Lohengrin

79

asked Dec 5, 2024 at 23:32

0 votes

1 answer

59 views

How can I filter outliers in data that is manually recorded?

Different people have to write down values on a certain type of parameter in order to fill out a table, and people obviously tend to write wrong. Sometimes, by a factor of 1000. This creates a lot of ...

Huragok

1

asked Nov 29, 2024 at 16:40

5 votes

3 answers

401 views

Understanding heuristic-based outlier detection: concerns about scoring, weighting, and validity

I am trying to understand the mathematics and methodology behind a newly published outlier detection algorithm in the Computer & Security journal. This algorithm uses heuristic-based approaches, ...

Mario

579

asked Nov 27, 2024 at 3:48

2 votes

1 answer

236 views

Finding outliers in mostly zero data

Background I'm working on an algorithm to find a short pieces of DNA sequence in a long DNA sequence. I won't go in detail of how it actually works, but let me more formally state it to provide ...

CodeNoob

231

asked Nov 15, 2024 at 12:03

1 vote

0 answers

69 views

How can I identify the distribution of a series of Mahalanobis distances?

If my dataset follows a multivariate t-distribution, what is the cdf of the Mahalanobis distance of a datapoint outside the sample? In other words, if I want to calculate the probability that a ...

Andreas Ierodiaconou

11

asked Nov 8, 2024 at 20:53

1 vote

1 answer

272 views

Local Outlier Factor for time series

I hope this makes sense. I have discovered LOF and tried it in R. However, since I am dealing with time series, the neighbors cannot be "future" neighbors of the current observation(s). I am ...

umbe1987

307

asked Oct 23, 2024 at 14:47

0 votes

0 answers

47 views

Latent variable demonstration with only 3 variables

I collected data for anxiety (ANX), depression (DEP), and posttraumatic stress syndrome (PTSD) symptoms. Spearman's correlation results are the following (...

pdeli

161

asked Sep 26, 2024 at 18:51

1 vote

0 answers

41 views

How to know which features contribute the most to the outlier score after applying GMM detector?

I have a dataset with 100+ features, upon which I test GMM to detect anomalies. For example, I add some Gaussian noise to 5-6 features of 100 points. GMM detects the points easily, but the next ...

AlisherAliev

11

asked Sep 26, 2024 at 9:13

2 votes

1 answer

163 views

MSE gets better but $R^2$ gets worse

Consider the following small dataset (around 569 data points), where Uptake is the regression target: As you can see, most of the variables are skewed, with some of them having only 2 or 3 data ...

AnotherSherlock

23

asked Sep 4, 2024 at 6:27

1 vote

1 answer

70 views

Determining the multiplier in limits for spotting Outliers

I want to determine the chance of having above-the-expected sales orders for products, then i could use this (my gut feeling and other business analysis) to determine if i should (or not) keep safety ...

Simonates

11

asked Aug 29, 2024 at 19:54

0 votes

0 answers

49 views

Bayesian model missing outliers at cutoff in data

I am having trouble getting the model to fit. I have ED50 values of chlorophyll in corals during a heating experiment. I have 4 reef sites and 4 species of coral with ~14 corals per site-species group....

Michael

11

asked Aug 21, 2024 at 5:51

3 votes

1 answer

343 views

Utilising Paired T-test but data is not normally distributed and there are outliers

I have a data sample of 190 but I have a few outliers and my data is not normally distributed. I intend to use paired T-test to evaluate the pre-post treatment over time. What should I do? In addition,...

Aurelia

69

asked Aug 11, 2024 at 6:49

5 votes

1 answer

1k views

Why does modified z-score not pick up an obvious outlier?

looking to draw on some of your wisdom around modified z-scores as used for detecting outliers. As far as I can tell from my research, when a distribution might not be normal (e.g. skewed), a modified ...

gecko

53

asked Aug 2, 2024 at 17:17

0 votes

1 answer

64 views

Outlier detection for data set comparison

I have two data sets with similar columns, one numerical and the rest categorical. col_1= categorical: city_name, col_2= categorical: company_name, col_3 = categorical: product_name, col_4 = numerical ...

Jens123

1

asked Jul 29, 2024 at 10:39

-1 votes

1 answer

81 views

Usefulness of p-value to flag outliers in a data set [closed]

Suppose I have a set of data such that $$y= a\times x + b + \varepsilon $$ I am trying to find $a$ and $b$, but some $y$'s are outliers and up to 80% of the data is missing, so I don't have access to $...

Anatole

1

asked Jul 24, 2024 at 9:40

2 votes

1 answer

124 views

How can I show statistically that one of my replicates is likely contaminated?

I have a dataset that looks like the below: five replicate samples, each of which is composed of 4 different fractions that sum to 100%. The fifth sample clearly looks visually distinct from the other ...

Dubukay

298

asked Jul 5, 2024 at 18:07

0 votes

0 answers

68 views

Determining the p-value of a test statistic, which is not distributed according to a commonly known distribution under the null hypothesis

Currently I am working in R on a project that aims to identify Dragon King events (massive outliers) in large datasets. These outliers appear for example in the city sizes in England, where London is ...

user25936873

1

asked Jul 2, 2024 at 12:59

1 vote

1 answer

93 views

What is the interpretation of outlier-robust principal component analysis?

There's a set of methods called "robust" principal component analysis (here, "robust" means resistant to influence from outliers). One example is Hubert et al., "ROBPCA: A new ...

cgmil

1,633

asked Jun 17, 2024 at 20:17

1 vote

1 answer

882 views

to determine the appropriate threshold of the z-score for the non-normally distributed data

I am interested in CPI. And I need to identify outliers in the series. For that, my instructor mentioned about the number of standard deviations from the mean that a data point is. This is Z-score. I ...

1190

1,160

asked Jun 17, 2024 at 10:06

2 votes

2 answers

613 views

Methods for Detecting outliers in a time series

I have a question on detecting the outliers in a time series like PPI, CPI, inflation,...etc.) Which method should I use? How can I precisely detect these outliers in a test or a method? Please ...

Community wiki

1190

2 votes

1 answer

103 views

Calculate the confidence that the data point is NOT explained by the regression

I have $n$ independent variables $x_i$ and dependent variables $y_i$ with uncertainties for both $x$ and $y$. I did a linear regression to get a model $\hat y = \beta x$. Now I want to use this ...

Tibor

155

asked May 29, 2024 at 15:10

1 vote

0 answers

198 views

How to deal with outliers in panel data? [closed]

When we have cross-sectional data, we can easily detect and remove outliers. But how should one approach outliers when we are dealing with panel data? Since we have $i$ entities and $t$ times periods, ...

TFT

345

asked May 18, 2024 at 11:37

4 votes

1 answer

215 views

Interpreting Mass-Volume as an evaluation criterion for unsupervised anomaly detection

I have found this paper How to Evaluate the Quality of Unsupervised Anomaly Detection Algorithms? by Nicolas Goix that talks about evaluation of unsupervised anomaly scoring functions by the use of ...

deblue

399

asked May 16, 2024 at 9:45

Questions tagged [outliers]