Newest 'outliers' Questions - Page 3

4 votes

2 answers

2k views

Is calculating skewness necessary before using the z-score to find outliers?

For example, if I specify a z-value of 3, then I would look at both sides and know its position in the distribution (99.73%). Would this change if I have a left or right skewed distribution? Would I ...

JAdel

125

asked Jul 13, 2023 at 7:23

1 vote

1 answer

254 views

Can I trust the results of a t test on 4 point Likert scale data which hides outliers?

I want to use an unpaired two-sample t-test of random samples of $n=40$ each. The sample data is from 4-point Likert scale assessments. I understand the t-test is not very robust to outliers, which I ...

Ranfurley

11

asked Jun 29, 2023 at 3:11

3 votes

1 answer

453 views

Why should I split the data when searching for outliers? (pyod)

I am using pyod to detect outliers in data, and I came across this official example: https://github.com/yzhao062/pyod/blob/master/examples/comb_example.py I have a question regarding the need to split ...

JAdel

125

asked Jun 23, 2023 at 23:58

0 votes

0 answers

41 views

How can I detect univariate outliers multivariately?

Chose hopefully a catchy title :-) I am looking for a simple algorithm to detect outliers caused by measurement errors So assune I have given a multivariate sample (30 dimensions) and I want to detect ...

Johannes

1

asked Jun 16, 2023 at 20:44

0 votes

0 answers

171 views

Why do residuals cluster in two group?

I am running a logistic regression in a sample with ~150,000 observations. I am predicting three different outcomes, x, y, and z, that occur in ~10,000, ~4,000, and ~2,000 cases respectively (for each ...

JuM24

21

asked Jun 12, 2023 at 8:03

0 votes

1 answer

112 views

How to identify outliers and drop rows in train splits of all folds, when using StratifiedKFold in GridSearchCV?

For predicting whether a subject has liver disease or not, I'm using StratifiedKFold CV in GridSearch for AdaBoost and RandomForest Classsifiers. For Outlier anlaysis, I've identified all feature ...

hanpat99

1

asked Jun 10, 2023 at 17:01

0 votes

0 answers

125 views

What did Grubbs mean when he "cautioned against interpreting probabilities too literally when normality of the data is not assured"?

In his 1969 paper, Grubbs mentioned that "Until such time as criteria not sensitive to the normality assumption are developed, the experimenter is cautioned against interpreting probabilities too ...

WacKaDoodle

1

asked Jun 6, 2023 at 14:53

3 votes

1 answer

249 views

Modeling outliers in maximum likelihood estimation with gradient descent

Consider a set of 3D points $X = \{x_1, x_2, ...x_n\} $ with $ x_i\in\mathbb{R}^3$ on which we want to fit an arbitrary probability distribution. The distribution we want to fit models some ...

Daniel López

5,726

asked May 23, 2023 at 19:42

1 vote

0 answers

44 views

shuffling data change OPTICS outlier results

I am trying to use sklearn.cluster.OPTICS to identify outliers, but found an issue: I use 2 examples with exactly the same data but different orders. They give different results: 1st example /////////...

Ya Gao

11

asked Apr 30, 2023 at 1:33

0 votes

0 answers

62 views

What is it called when an outlier falls out of a rolling window statistical calculation?

I have a time series $X_t \sim N(0, 1)$. There is a single outlier at index 347, at 8.5 standard deviations from the mean. If I now compute a rolling window standard deviation of $X_t$ with window ...

PyRsquared

1,364

asked Apr 26, 2023 at 15:05

2 votes

1 answer

235 views

Adjust the "Threshold" in a robust regression

I am trying to perform a robust regressions using the lmrob function in R. I am getting this error Message: ...

induktivist

23

asked Apr 19, 2023 at 22:49

0 votes

1 answer

135 views

Do outliers begin from or above the whisker-limit? [duplicate]

Does outliers begin on the whisker limit or above it? In the (Python) example below the calculcated upper whisker limit is 64.8125. Is a value of ...

buhtz

282

asked Apr 18, 2023 at 8:45

4 votes

1 answer

837 views

Should I be concerned about outliers in NB GLMM with an offset term?

I'm working on a negative binomial model for count data. Unfortunately I can't provide a more detailed description because I wasn't explicitly allowed to. All I can say now is that the data is about ...

Eva Šragová

93

asked Apr 12, 2023 at 3:45

1 vote

1 answer

184 views

Suggestions on dealing with outliers when sample size is very small AND you must order the results

I run competitive events. In our normal event, we have 8 adjudicators split between to categories. Skill and Artistry. For each category we throw out the high and low scores and average the remaining ...

Omar Paloma

11

asked Apr 6, 2023 at 4:51

0 votes

1 answer

567 views

Can high standard deviations explain my non-significant & low effect size results? (please read description)

I'm trying to analyse bullying experiences across three age groups. The DV is scored on a 5-point Likert, and the IV is categorical (ages 11, 13, and 15). Initially I ran an ANOVA to see if there was ...

Hannah

1

asked Apr 4, 2023 at 9:14

1 vote

0 answers

129 views

Standardization of out-of-sample data

I have a panel (N firms across 10 years) dataset on which I want to estimate and test a prediction model $f$: \begin{equation} y = f(x). \end{equation} Following common practice, I split my data into ...

shenflow

1,149

asked Mar 30, 2023 at 12:40

0 votes

1 answer

276 views

Outlier Detection using OutlierTest

I found an outlier using the outlierTest function in the car package. However, I can see from the results that the Externally Studentized Residual and p-values. This is a result for the full model. <...

Dome

21

asked Mar 21, 2023 at 17:38

1 vote

0 answers

75 views

how to find anomalies for a non-normal distribution with seasonality?

I have a time series broken down by day, and there are gaps in it that I have marked in red: the distribution there is not normal How do we approach modeling a system that will look for anomalies ...

Roman Stasiuk

31

asked Mar 16, 2023 at 9:36

0 votes

1 answer

1k views

How to deal with Covid outlier in time series/machine learning forecasting?

Disclaimer: I checked some similar questions but I could not find anything in particular that would work for my case. I am dealing with a time series going from 2015 to 2023. The data points are the ...

duecci

11

asked Mar 15, 2023 at 11:15

1 vote

0 answers

41 views

How to deal with outliers after heterogeneity test in microarray expression datasets?

I have performed a meta-analysis using five micro-array datasets. After performing meta analysis I visualized the heterogeneity using funnel plot and forest plot (using two up-regulated and two down-...

Aditi Agnihotri

11

asked Mar 9, 2023 at 18:23

1 vote

1 answer

595 views

Robustification in lavaan: Difference between M, MV and MVS?

In lavaan, I am running a two-factor CFA on a questionnaire with 28 items, all of which are scored on a 6-point Likert scale. In total I have ~350 participants who completed the questionnaire. Because ...

LJFlameling

11

asked Mar 6, 2023 at 8:52

2 votes

1 answer

597 views

Replacing outliers with the median value of the preceding 5 observations

In the paper Implications of dynamic factor models for VAR analysis the authors propose a a technique for removing outliers in variables used for dyanamic factors analysis: "The outlier ...

Bertrand87

71

asked Feb 23, 2023 at 15:07

2 votes

1 answer

322 views

R Tukey Anova: Can non-overlapping boxplots share the same letter of significance in Anova / Tukey Test?

I conducted a one way anova followed by a tukey-test in Rstudio and used a compact letter display to add letters of significance to a ggplot. After a positive Grubbs-outlier-test I removed an outlier ...

runald

21

asked Feb 20, 2023 at 11:49

0 votes

0 answers

341 views

outliers for right heavy tails distribtuions

There is plenty of information on how to detect outliers in a sample when assuming that this sample was derived from a normal distribution. Sometimes it seems to me as if when we talk about outliers ...

Alex Il

45

asked Feb 12, 2023 at 12:38

0 votes

1 answer

71 views

Evaluate outliers of strictly non-decreasing sequences

Say I have the following sequence: Is there a way to get a probability for each point indicating whether it is an outlier or not of the underlining strictly non-decreasing sequence? I suppose the ...

Tom Huntington

141

asked Feb 9, 2023 at 6:49

1 vote

0 answers

53 views

Which metric for neural network should I try for time series data with sudden peaks?

I am doing time series forecasting with neural network (feedforward for now, but I will test also RNNs) and my problem is that, even though the network learned general patterns, it doesn't forecast ...

SlimakSlimak

115

asked Feb 6, 2023 at 20:11

0 votes

0 answers

43 views

How to impute additive outliers in time series data

I need to forecast daily electricity demand. It seems that the outliers in my dataset are additive as they are affected by an anomalous behavior and are not induced by a random process that also ...

ebrahimi

291

asked Jan 30, 2023 at 4:06

5 votes

1 answer

368 views

Boxplot | 5-Number-Summary

I have a question regarding the boxplot. On some web pages, the Minimum and the Maximum of the 5-Number-Summary correspond to the whiskers. However, regarding this definition, my question is: how is ...

Made

121

asked Jan 28, 2023 at 13:00

3 votes

0 answers

43 views

How to identify individuals that don't belong to a training class?

The frequency of 8 cell types is measured in 100 patients (the frequencies do not sum up to 1). The patients form 4 pathologies established by the physicians. As there might be better markers (cell ...

SamGG

51

asked Jan 23, 2023 at 12:08

2 votes

2 answers

2k views

Should I treat these data points as outliers?

Currently, I am building my analytics portfolio as part of the Google Data Analytics course. I chose the option to analyze Divvy Bike Sharing data for the year 2021. But now I'm currently stuck in the ...

Atthoriq Pangestu

23

asked Jan 22, 2023 at 11:55

5 votes

1 answer

4k views

How to define the line to fit in Q-Q plot?

I'm trying to figure out if my data follows a normal distribution and if it contains outliers. I have plotted the histogram and now I would like to plot the quantile-quantile (Q-Q) plot. My point is, ...

JCV

153

asked Jan 12, 2023 at 11:17

0 votes

1 answer

196 views

What is a suitable technique for detecting anomalies in time series data?

I have a problem, where I try to identify if a machine performs an activity when it is not supposed to, or performs it an unusual number of times. I am attempting to this using an anomaly detection ...

Nht_e0

33

asked Dec 27, 2022 at 5:18

1 vote

0 answers

96 views

F1 Score vs PR Curve

If I understood correctly, PR Curve it's just the mean of F1 score computed multiple times with different thresholds. In the task of outlier detection those are two suggested metrics given the fact ...

Loris

23

asked Dec 24, 2022 at 16:38

1 vote

2 answers

988 views

Heavy vs light tail distributions when modelling with outliers

I am reading this lecture notes on using the MLEs from other distributions (as Laplace) rather than a Gaussian when dealing with outliers. The lecture notes came from Oxford University: https://www.cs....

cgo

9,507

asked Dec 23, 2022 at 6:13

1 vote

1 answer

314 views

Do I remove outliers within training set or duplicate of original?

I want to predict on a test set. I have created a binary logistic regression using my current training set and have predicted on the test set. The dataset I used to split has 299 observations. What if ...

Antonio

673

asked Dec 14, 2022 at 17:03

0 votes

0 answers

67 views

Cluster a set of files by the the number of points

I have a large set of aerial images with herds of elephants in it. The number of elephants in a single image can range from ~ 20 elephants to 1. I have created a dataset of ~ 2,000 png image files ...

user3200293

31

asked Dec 13, 2022 at 21:34

3 votes

2 answers

569 views

Do I want to overfit, when doing outlier detection based on regression?

Imagine, we have speed data of car and we would like to detect, if car speeds up or down more than it should. Do I want to just overfit my model, so the outlier (higher or lower speed) would lead me ...

Mr. Panda

325

asked Dec 6, 2022 at 9:42

0 votes

0 answers

766 views

Can I use normalization and standardization on the same dataset?

I'm working on an ML project to predict wine quality from a wine's physical characteristics. The features of my data are on vastly different scales so I've been experimenting with different ...

Sylith

101

asked Dec 1, 2022 at 13:04

1 vote

0 answers

125 views

Inverse-variance weighting non related to meta-analysis?

I've been reading about inverse-variance weighting and every reference I find to it is related to meta-analysis. However, I wonder if inverse-variance weighting can be used to reduce the bias produced ...

lafinur

235

asked Nov 24, 2022 at 15:54

2 votes

3 answers

359 views

Should variables be dropped according to their skewness?

I am creating a classification model to predict the credit score of a person based on lots of factors. I got the dataset from kaggle. When I started doing the EDA part, I noticed that the skewness ...

Sounak Sarkar

29

asked Nov 16, 2022 at 14:05

4 votes

1 answer

1k views

What's the best method to identify outliers and influential cases for linear mixed models?

I've seen many many many different questions on how to extract Leverage and Cook's distance for Lmers. I'm able to do that with different packages and functions by now, but how should I interpret them ...

Larissa Cury

853

asked Nov 9, 2022 at 17:01

1 vote

1 answer

2k views

DHARMa outlier test is significant, what are my next steps?

I'm looking for information and guidance to help me understand the outlier test in DHARMa for negative binomial regression in R. Here is the diagnostic plot from DHARMa using the function ...

Enialoj

13

asked Oct 31, 2022 at 13:04

1 vote

0 answers

178 views

Outliers in a PCA score plot [closed]

I have this dataset of 104 tissue samples from two different types of tumors (B and C) along and 182 observations (gene expression profile). I do not need to understand the underlying biological ...

wantingtoimprove

13

asked Oct 28, 2022 at 10:23

3 votes

1 answer

989 views

Outliers and possible dispersion in neg. binomial glmm residuals (DHARMa package)

I need help fixing the model I landed on through backwards step-wise elimination. I chose a negative binomial model because my variance seems much larger than the mean, with random intercepts from the ...

Nate

2,537

asked Oct 27, 2022 at 18:47

0 votes

0 answers

53 views

Scaling outliers in a dataset and reverse scaling

I have a data set with lots of small integer values and occasional large integers. For instance 1,1,1,3,2,1,320,2,3,4. I would like to scale my outlier values such that I can perform regression on my ...

murage kibicho

101

asked Oct 17, 2022 at 21:07

4 votes

2 answers

2k views

Should outliers be removed for goodness-of-fit tests?

If you allow a bit digression about the context: I am on a journey to better understand the power and usefulness of parametric distributions; I am a bit scared of them. Maybe due to the fact that I've ...

rusiano

606

asked Oct 16, 2022 at 9:18

0 votes

1 answer

3k views

Interquartile range finding more than 10 times outliers than zscore

I'm learning about outlier detection and I wrote these two methods to get the row indexes of the instances that have outliers so I can drop them later. The problem is I'm getting two numbers very far ...

Antonio Caipora

61

asked Oct 11, 2022 at 0:45

0 votes

1 answer

117 views

Detection outliers in financial time series taking into account related time series

I would like seek advice on how to build an efficient approach to identify outliers in a financial series taking into account also related series. For example, let's assume the there is a very ...

user3548751

1

asked Oct 7, 2022 at 18:09

3 votes

3 answers

254 views

Problem with a single outlier, non-normal data, and unequal sample distributions

I am wanting to compare two independent groups on a likert-like item. To explain, the dependent variable is structured so that a 1 = <1 units, 2 = 1-<2, 3 = 2-<3, all the way up to option 7 = ...

Amy

31

asked Oct 6, 2022 at 19:09

0 votes

0 answers

106 views

Help Needed for Outliers detection post paired T-test statistical test

I don't know if this is a standard way od doing things so open to any suggestions, basically I have done random sampling from my population to create 2 groups Treatment & Control. I also have few ...

av abhishiek

163

asked Oct 4, 2022 at 12:08

Questions tagged [outliers]