Newest 'outliers' Questions - Page 5

1 vote

0 answers

149 views

Standardized Euclidean Distance over variables distributed as $\chi^2$

I sample $n$ dimension vectors (each sample is a vector). My objective is the detection of outliers. In case those elements would distribute normally, for outlier detection, I could use Standardized ...

Gideon Kogan

390

asked Feb 15, 2022 at 7:04

3 votes

0 answers

874 views

Measuring unusual death [closed]

Given the Prussian Horse Data here: https://www.randomservices.org/random/data/HorseKicks.html Is there a way to find out which corp has an unusually high number of deaths? (Note that Prussian horse ...

william007

1,097

asked Feb 14, 2022 at 13:36

3 votes

2 answers

555 views

Is that possible for a dataset to be 9% outliers?

I have a dataset about solar panels' output power. After visually inspecting the data distribution, I found it is not normal distribution and is a right-skewed distribution with many zeroes. I used ...

graphicart86

31

asked Jan 30, 2022 at 13:38

0 votes

0 answers

57 views

Is it appropriate to fit a linear model to my data?

I have a bunch of outcome/exposure relationships I am trying to fit models to: From these graphs, I am not sure if a simple lm is appropriate. Some of them look ...

Hank Lin

529

asked Jan 29, 2022 at 17:11

3 votes

4 answers

954 views

Why don't we automatically have outliers when mean and median differ strongly?

Assume you have a data set with information on income of all students in the lecture. The mean value is 1500\$. The median value is however only 800\$. Which of the following conclusions is wrong? The ...

StatisticsNoobie

111

asked Jan 28, 2022 at 21:39

1 vote

1 answer

152 views

Outlier management

I apologize in advance for my novice question. I am a part of an interview committee of eight people. We interview 70 applicants for just six positions. All of the applicants are very accomplished. We ...

Joe Davey

11

asked Jan 28, 2022 at 19:07

4 votes

3 answers

2k views

Which are outliers?

I am in the process of solving a Machine Learning challenge, and I want to do it the right way. I did some exploratory data analysisand I wanted to check the distribution of the data. As displayed in ...

Spicy strike

51

asked Jan 26, 2022 at 14:38

0 votes

0 answers

227 views

How to clean dataset in order to fit to a curve? [duplicate]

I'm trying to fit a dataset to a curve for while, but I'm not managing. The goal is to obtain a curve with equation that fits the data so I can get the parameter x to any value of y. The blue dataset ...

JCV

153

asked Jan 24, 2022 at 9:37

0 votes

0 answers

487 views

Detecting and dealing with outliers in a sales prediction dataset of "Rossmann"

I have been working on a dataset for which the task is to forecast the sales of the drug sold by 1115 drug stores of the Rossmann chain. The dataset is fairly large with over 1m records and as many as ...

Ritik P. Nayak

333

asked Jan 20, 2022 at 17:19

0 votes

1 answer

178 views

Standard deviation estimator without outliers

I have samples that are distributed like this: I want to calculate the standard deviation (or similar) of the main peak without the outliers. Of course I can do this just applying a cut at, say, -5µ. ...

user171780

229

asked Jan 19, 2022 at 9:13

0 votes

1 answer

339 views

Why is 50% the best breakdown point for an estimator?

As stated in Wikipedia: Intuitively, we can understand that a breakdown point cannot exceed 50% because if more than half of the observations are contaminated, it is not possible to distinguish ...

JustBlaze

57

asked Jan 14, 2022 at 4:49

0 votes

0 answers

1k views

Winsorizing or taking the logarithm first?

I testing if I can describe the StockPRice with EPS (=earnings per share), BookValuePS an ESGscore. Before I start I winsorized all my variables. Now I want to take the loagrithm of e.g. BookValuePS ...

wrangjangler

1

asked Jan 12, 2022 at 16:33

9 votes

10 answers

6k views

Why is the Median Less Sensitive to Extreme Values Compared to the Mean?

I am sure we have all heard the following argument stated in some way or the other: For a given set of measurements (e.g. heights of students), the mean of these measurements is more "prone"...

stats_noob

1

asked Jan 8, 2022 at 6:45

2 votes

1 answer

524 views

Comparing outliers in two distributions

I apologize in advance as I am not well-versed in statistics, but I hope that this question makes sense. I have 2 populations which are normally distributed and have a near-identical mean. I would ...

octopuslegs11

23

asked Jan 6, 2022 at 20:14

1 vote

1 answer

169 views

General Question: Should Legitimate Outliers in the Data be Included or Excluded from Statistical Models? [duplicate]

I have the following (general) question (I know there is no definite answer to this question and it largely depends on the specific data and choice of model): Should Legitimate Outliers in the Data be ...

stats_noob

1

asked Jan 2, 2022 at 3:49

0 votes

1 answer

650 views

How to detect outliers in skewed data?

I have a dataset I need to use to predict the probability of conversion based on the number of days an individual has spent using my app. I got a list of historical users and the number of session ...

Andrei Budaes

121

asked Dec 21, 2021 at 12:44

0 votes

1 answer

3k views

Outliers Logistic Regression

I want to know how to find and remove outliers from my Logistic Regression. I have tried using formula from Faraway, but I don't know is it applicable for logistic regression or not For example my ...

Jasmine Helen

33

asked Dec 20, 2021 at 6:10

0 votes

2 answers

4k views

Running ANOVA - must I remove outliers?

Some people seem to frown on removing outliers. But I've also read many times elsewhere that ANOVAs are sensitive to outliers and you must remove them. I'm running a 2 x 2 repeated measures within ...

Statsquestionboy

31

asked Dec 20, 2021 at 4:10

1 vote

2 answers

182 views

Should I remove this outlier?

I am running a multiple variable regression predicting GDP per capita for U.S. states with a bunch of independent variables. Currently I have included the District of Columbia in the data set which ...

Jeremy

13

asked Dec 10, 2021 at 19:06

-1 votes

1 answer

267 views

Independent Sample T-test or Mann-Whitney U test?

I am a very young stats learner, and I need help understanding the justification of a test choice. I have a sample of 39 participants (20 females and 19 males) been measured on task performance, and I ...

marth

1

asked Dec 7, 2021 at 14:48

2 votes

1 answer

3k views

Ridge regression for multicollinearity and outliers

I'm wondering about techniques like ridge regression with regard to both multicollinearity and outliers. My understanding is that ridge regression is primarily used for multicollinearity, but that ...

fmtcs

575

asked Dec 7, 2021 at 12:09

4 votes

2 answers

3k views

Does classic outlier detection assume normality?

My classmate told me he was showing his work in some stuff statistics-based and some time he was showing a boxplot and using it as outlier detection then his professor said 'it's not even correct, the ...

Davi Américo

1,270

asked Dec 3, 2021 at 4:35

0 votes

0 answers

223 views

Using the IQR method to filter outliers in experimental research, by group or as a whole?

In my current dataset (results from a factorial ANOVA), I know I have outliers (due to qualitative comments participants wrote during an online experiment), thus I'd like to do a filtering process to ...

JoeyyyFunk

88

asked Nov 30, 2021 at 20:27

1 vote

0 answers

191 views

Legitimacy of transforming data before statistical tests

I have two groups of samples (N=4 for each) and found that there is one outlier for each group (both are higher than the rest of the respected samples within the groups). I have no resources to repeat ...

William Wong

207

asked Nov 29, 2021 at 18:20

0 votes

1 answer

192 views

Setting the observation likelihood threshold for outlier detection if you know know the percentage of outliers

Let's assume I have a sensor that gives me measurements $z$ and I know that $50\%$ of the measurements I read are outliers (more than 3 standard deviations away from the real measurement distribution)....

MattSt

350

asked Nov 27, 2021 at 12:19

0 votes

0 answers

26 views

PCA: does outlier detection make sense with low linear correlation? [duplicate]

I am experimenting PCA to detect outliers based on the reconstruction error. What I do: I start with a 6 dimensions dataset and reduce it to 5 dimensions. Then, I reconstruct the initial dataset and ...

savoga

16

asked Nov 22, 2021 at 15:09

1 vote

0 answers

48 views

Outlier Detection in Meta-Analysis Models for Observational Studies of Adverse Drug Outcomes using Distributed Networks

I hope you are in good health. My thesis is on outlier detection in meta-analysis models. I will be using a case study from Canadian Network for Observational Drug Effects Studies (CNODES) to detect ...

HRH

11

asked Nov 21, 2021 at 23:21

0 votes

0 answers

281 views

How to optimize K-means to eliminate outliers and unrelated clusters?

I clustered document embeddings with K-Means. Embeddings have 2048 dimensions. Now, i am trying to optimize clustering. There are two problems. 1- Some clusters may have outlier samples. 2- Sometimes,...

Alper M.

1

asked Nov 15, 2021 at 7:12

1 vote

1 answer

318 views

Removing outliers at the start when there are multiple ANOVA and correlational analyses in a single results section [duplicate]

I would be grateful for opinion on which of the two options below (or an alternative) is best: Summary of study: In a single results section, different ANOVAs are run on the different metrics – raw ...

Pop

13

asked Nov 5, 2021 at 17:43

1 vote

0 answers

127 views

Outliers for N=4

If you have 4 observations, can you have an outlier? Consider any value outside (Lower fourth-1.5(Fourth Spread), upper fourth+1.5(fourth spread)) as an outlier.

Aleph

11

asked Oct 30, 2021 at 1:56

0 votes

1 answer

286 views

Heteroskedastic time series outlier analysis using machine learning

Is anyone aware of machine learning models that are able to deal with heteroskedasticity in time series, when trying to detect outliers? There are a lot of anomaly detection tools out there (like k-...

SimonDude

75

asked Oct 29, 2021 at 7:13

1 vote

1 answer

412 views

detecting outliers in weight measurement

I have weights data of users collected over a period of time. My goal is to find incorrect weight readings. The definition of incorrect readings is purely based on logical reasons (or in other words ...

monte

121

asked Oct 27, 2021 at 6:25

3 votes

1 answer

3k views

What to do with outliers? Should you use capping, remove outliers, or use non-parametric tests?

This will be my first question on Cross Validated, and besides, no one has ever taught me statistics. I am completely self-taught in this regard. So please forgive me if my question seems trivial. I ...

Marek Fiołka

160

asked Oct 22, 2021 at 19:24

1 vote

1 answer

335 views

Removing instances that decrease accuracy of Machine learning algorithm Methodology

Is it bad practice to run a Machine learning algorithm on an experimental dataset, check the MAE, and remove the instances that have a value of MAE above a certain limit? If we run the algorithm ...

RandML000

13

asked Oct 22, 2021 at 13:56

2 votes

2 answers

174 views

Does KNN fail if the test data have no epsilon close nearest neighbors to the training data?

If I have binary-classification data and a Euclidean metric, and I know the best number of nearest neighbors, then I draw circles on my training data based on my K-value which tell me which regions ...

user318514

asked Oct 20, 2021 at 21:08

2 votes

2 answers

133 views

How to test assumptions for a large number of statistical tests?

I am running a logistic regression. The outcome is a clinical variable, and there are two predictors: gene expression (continuous), hormone levels (continuous), and the interaction term between them. ...

Sam

679

asked Oct 19, 2021 at 7:36

5 votes

2 answers

342 views

Applications of "Regression Towards the Mean" in Real Life

I was reading about "regression towards the mean". Over here, an explanation of this concept is provided: "Consider a simple example: a class of students takes a 100-item true/false ...

Community wiki

4 revs, 4 users 65%
stats_noob

2 votes

0 answers

80 views

Define outliers in correlation with right-skewed data (log-log plot)

I have a dataset of counts of occurrences of variables in different classes. For each class, I have an equivalent control created by shuffling the dataset. For instance, this could be words from ...

mm523

85

asked Oct 12, 2021 at 16:27

7 votes

1 answer

7k views

Tukey's fences for outlier removal

I'm in a biomedical research field, and I see a lot of researchers conducting low N studies that use Tukey's fences for outlier removal. For anyone who doesn't know, Tukey's fences works as such: ...

torpedo_cantankerous_softener

93

asked Oct 11, 2021 at 3:20

0 votes

0 answers

981 views

Is it valid to remove trials as outliers using IQR?

I have a repeated measures experiment where all participants completed several trials for each condition. My dependent variables are response time and accuracy. I am using the Interquartile Range as ...

john connor

113

asked Oct 10, 2021 at 18:30

0 votes

0 answers

52 views

Remove outliers from mostly linear data

I have a cumulated sum of battery charges that is mostly extremely linear, apart from some faulty data in the beginning. See this image as an example: In order to get the most accurate linear ...

mneumann

101

asked Oct 8, 2021 at 21:12

10 votes

2 answers

2k views

Why is maximum likelihood estimator suspectible to outliers?

I'm new to statistics and currently learning abot MLE. Some of the papers I read: Robust Graph Embedding with Noisy Link Weights mentioned MLEs are suspectible to contamination in data, but didn't ...

port trum

103

asked Oct 5, 2021 at 6:10

2 votes

1 answer

336 views

Method for outlier detection in noisy seasonal time series data?

I have around 1000 times series of around 1000 samples, where each sample is 5 minutes a part. An example of a time series after performing seasonal decomposition is As we can see the data is very ...

kspr

231

asked Oct 4, 2021 at 19:54

0 votes

1 answer

176 views

How to measure if a point of data is a deviation from other data points?

I have a data set that consists of many single data points. They are the measurements of network traffic, so they include e.g. '1403021', '1402341, '1399312'... values that are labeled as 'label1' and ...

norivotset

1

asked Sep 29, 2021 at 8:00

1 vote

2 answers

693 views

AUC measure for Local outlier detection in python?

I'm using Local outlier factor algorithm provided by Scikit-learn for outlier detection. For the evaluation i want to use auc measure. ...

Imen F

11

asked Sep 27, 2021 at 11:10

4 votes

1 answer

425 views

In anomaly detection of time series, should global outliers and contextual outliers be separated?

I am trying to create a pipeline in Python which automatically identifies global and contextual anomalies of a time series. Which one of these approaches do you believe is more correct? Method 1) ...

kspr

231

asked Sep 24, 2021 at 9:47

0 votes

0 answers

503 views

Many Outlier Handling in Logistic Regression

I am working on Telcom data for Churn modelling. I have 18 categorical and 2 numeric variables (total charges and monthly charges) in my data set. After handling the missing values, I checked the ...

newbie-data-student

1

asked Sep 9, 2021 at 11:36

0 votes

0 answers

80 views

Relative anomalies in multiple multivariate times series with different lengths?

I have a set of time series, highly correlated (similar peaks and trend). I'm going to find relative anomalies, e.g. say there are 20 times series. At a snapshot date, all values increase, but one ...

Soom

11

asked Sep 9, 2021 at 3:44

1 vote

2 answers

81 views

When is a realization from a bivariate distribution surprising?

I really need a hint here. Suppose I want to be able to detect unusual events and express the likelihood of it occurring. Suppose that I know that two events usually move in a given association such ...

Eugene

111

asked Sep 7, 2021 at 22:50

0 votes

2 answers

2k views

Is it wrong to remove outliers from dependent variable when adjusting a model?

I'm beginning to study Generalized Linear Models and I was trying to adjust a model to the dataset NMES1988. More specifically, my goal is to adjust a Poisson Regression to this dataset considering ...

mathguy_666

3

asked Aug 31, 2021 at 20:33

Questions tagged [outliers]