Newest 'outliers' Questions - Page 4

0 votes

0 answers

23 views

Detection of Multivariate Outliers (in a multiple linear regression problem) [duplicate]

In a multiple regression problem, suppose we have responses $Y_1, Y_2, \cdots , Y_n$ corresponding to data $\mathbf{X}_1, \mathbf{X}_2, \cdots, \mathbf{X}_n$ where each $\mathbf{X}_i$ is a $d$-...

JRC

619

asked Oct 2, 2022 at 8:14

0 votes

0 answers

70 views

Should you use mean difference between measurements or min-max difference to detect outliers?

I have a dataset which has temperature measurements for every minute in a certain time period. I want to focus on 10 minute intervals and determine whether two adjacent 10 minute intervals differ ...

Jamess11

87

asked Sep 30, 2022 at 6:59

1 vote

0 answers

45 views

A theorem or result relating the probability of occurence of outliers to population size

I was wondering if there is a theorem or a result that relates the size of the population to the probability of the occurrence of outliers of various degrees, relating the z-score to the size of the ...

Tekko

11

asked Sep 27, 2022 at 19:44

0 votes

0 answers

34 views

Identify outliers of river levels that change continuously over time

This is a time-dependent measure of the water level of a river measured by an instrument that measures the water level every five minutes. However, due to some interference and other factors, there ...

yongchuang

1

asked Sep 21, 2022 at 15:28

3 votes

1 answer

380 views

Detecting multivariate outliers with Minimum covariance discriminant and mahalanobis distance

I've read in some papers (such as this) and CrossValidated questions (such as this, that people are using mahalanobis distance based on robust estimations of location and scatter using minimum ...

ira

461

asked Sep 5, 2022 at 14:32

0 votes

0 answers

125 views

The Literature on the impact of outliers on ordinary least square (OLS) regression

I remembered I have encountered a paper in 1960s or 1970s that explore the impact of outliers on ordinary least square (OLS) regression. In the paper, it is shown that just adding one outlier will ...

Alex Cicco

1

asked Sep 4, 2022 at 17:49

2 votes

0 answers

407 views

How to detect low and high flow outliers with seasonal time series data in R?

I have a dataset recording daily river flow from 1976 to 2017. I want to find out unusually high (potential flood) or low (potential drought) flow values from that datatset. What's the best way to ...

CyG

181

asked Aug 31, 2022 at 21:57

0 votes

0 answers

241 views

What is the right order in dealing with outliers, missing values and log transformation?

I am currently working on a project involving banking stock price data. I have around 3000 observations, some columns have a lot of missing values (null value); they can account for 5 to 50% of the ...

MINH NHỰT NGUYỄN TRẦN

1

asked Aug 29, 2022 at 5:00

0 votes

0 answers

337 views

Dealing with outliers for Multimodal distribution

Say the distribution of underlying data points is multi-modal and we have an extremely large data point that has been confirmed to be an outlier. If it is not acceptable to simply remove the outlier ...

NMA

19

asked Aug 23, 2022 at 16:21

0 votes

0 answers

32 views

How do we account best for outliers with applied statistics? [duplicate]

If we have a set of data of how long one watches youtube, these data points only include the raw number of minutes watched. If it is known that some of those data points include situations where you ...

NMA

19

asked Aug 22, 2022 at 20:26

2 votes

2 answers

231 views

Algorithm for detecting collective outliers

What algorithm should I go for if I want to determine collective outliers within a dataset? By collective outliers, I mean a series of data points differ significantly from the trends in the rest of ...

Iamtrying

33

asked Aug 15, 2022 at 13:01

3 votes

2 answers

478 views

Fixing outliers and normalizing a vector using R

I am trying to do factor analysis on a few variables and one particular variable (given in the example below) is covering/ explaining all the variance due to some outliers. I am not sure what else I ...

Saurabh

175

asked Aug 3, 2022 at 20:08

3 votes

1 answer

634 views

Why am I getting strange upper & lower limits on a gamma distribution?

I am working on a time series dataset. I understand it has a gamma distribution. I want to use a 99% probability threshold to establish upper & lower limits/cut-offs and find anomalies. However, I ...

S2DEN8

31

asked Jul 31, 2022 at 20:03

3 votes

2 answers

213 views

Standardize dataset with high outliers

Is there a better way to standardize a dataset with outliers than to normalized value (z-score) based on the mean and standard deviation? I am using the Excel STANDARDIZE function. I have two datasets ...

Patricia Nunes

107

asked Jul 28, 2022 at 19:05

-1 votes

0 answers

15 views

Centroid and Outlier calculation [duplicate]

I have this question, but to be honest i am stuck 1.Considering a set of 60 users, an a maximum number of objects that a user can own equal to 4000, which approach would you choose to calculate the ...

De Une

1

asked Jul 27, 2022 at 7:53

7 votes

1 answer

1k views

Outlier/anomaly detection on histograms

So, the idea is that I have many histograms, each one representing results for something. So, I have histogram_1 for object_1, histogram_2 for object_2,...,histogram_20 for object_20. How can throw ...

nowhere

157

asked Jul 22, 2022 at 14:37

5 votes

2 answers

6k views

MAE vs MSE for Linear regression

Several articles says that MAE is robust to outliers but MSE is not and MSE can hamper the model if errors are too huge. My question is that MSE and MAE both are error matrices, our priority is to ...

Parth Sharma

51

asked Jul 17, 2022 at 6:00

2 votes

1 answer

1k views

How to use box plots to detect outliers?

Suppose for simplicity that we have Gaussian distributed data with some outliers, whose typical characteristic is getting values that are far from the mean. Suppose my sample size is ...

Thomas

1,137

asked Jul 13, 2022 at 14:27

1 vote

0 answers

97 views

Understanding an outlier detection technique for fraud detection

I came across this article: http://projetoaprendizagemgrupo4.pbworks.com/f/03.03%20-%20Unsupervised%20Profiling%20Methods%20Fraud%20Detection.pdf since I am interested in detecting abnormal behavior (...

Thomas

1,137

asked Jul 13, 2022 at 11:05

3 votes

1 answer

329 views

Is applying dimension reduction to mixed type data valid for outlier detection after that?

I'm facing with anomaly detection (outlier detection) task with mixed (numerical and categorical) multi-feature data set. I understand that many of the possible multivariate outlier detection methods ...

Hendrik

253

asked Jul 5, 2022 at 7:03

2 votes

2 answers

5k views

Does IQR method for outliers work for non-normal data?

Any observations that are more than 1.5 IQR below Q1 or more than 1.5 IQR above Q3 are considered outliers. However does this theory still hold when a data set is not normally distributed? Outlier ...

maximus

131

asked Jun 30, 2022 at 17:14

7 votes

1 answer

1k views

Why does univariate Mahalanobis distance not match z-score?

I am using Mahalanobis distance for outlier detection. Sometimes my dataset only has 1 feature, sometimes many more. I believe the univariate Mahalanobis distance should be equal to the z-score of the ...

kwinkunks

369

asked Jun 25, 2022 at 17:24

2 votes

1 answer

624 views

Is winsorizing limited to the usage of a certain percentile cutoff?

The Context My dataset consists of 68 groups, each with 4 data points inside it. As means of a robustness test, I am looking to see how the type of average/mean I use impacts the analysis that I will ...

Son18

21

asked Jun 21, 2022 at 13:22

2 votes

1 answer

79 views

Statistical method to detect possible electoral frauds

In Colombia there are 12.000 voting centers that consist of one or more electoral tables (the number of electoral tables depends on the number of registered voters in the voting center, and voting ...

user2246905

223

asked Jun 15, 2022 at 22:53

2 votes

0 answers

69 views

if my dataset is standardized but have outliers should I remove them and re-standardize? [closed]

I have a data set named Geographical Original of Music Data Set from the UCI repository. The data is given standardized but I think it has outliers and I do not know the best way to handle them. ...

Dazckel

81

asked Jun 10, 2022 at 18:48

0 votes

0 answers

39 views

outliers in regression, selection of a specific region of the samples

I have a set of points/samples like the ones in blue in the image below: there is a bunch of wiggly nonsense here and there, and somewhere in the middle the is a region of almost perfect linear fit (...

user1384636

217

asked Jun 10, 2022 at 15:47

1 vote

1 answer

813 views

How to detect outliers in linear regression

I am studying the relationship between the concentration of metals in organisms (Y axis in the image) and the environment (X axis). The regressions are not very good due to some outliers, and I want ...

Antón

43

asked Jun 1, 2022 at 12:12

0 votes

0 answers

48 views

Outlier Treatment and Forecasting

I have come across multiple methods regarding outlier treatment: (features = my input/regressor/... matrix) Treat outliers in the entire sample (both features and the variable to be forecasted). ...

shenflow

1,149

asked May 17, 2022 at 14:42

8 votes

2 answers

5k views

Can I remove sample outliers using standard deviation?

I am looking to find find clinical and other measurements to predict a blood metabolite with Elastic-Net Regression models. Can I remove samples with values greater than 1.96 SD from the mean as ...

Molly_K

203

asked May 16, 2022 at 13:28

1 vote

1 answer

223 views

Should I trim/winsorize raw data or computed metric used in models?

Question: Should I rather winsorise (or trim, where relevant) my raw data, or the intermediary metric I use in my models? Context: My analysis consists in 3 steps: Collect raw data, Compute ...

ebosi

138

asked May 16, 2022 at 9:00

1 vote

0 answers

220 views

Bayesian approach to removing outliers from a normal distribution

A lot of what I've seen for Bayesian approaches to removing outliers is for a linear model, not a normal distribution. Is there a way we can take a Bayesian approach to remove outliers from a normal ...

bme-programmer

11

asked May 13, 2022 at 20:59

0 votes

0 answers

56 views

Modification of Outliers

I have a practical / applied statistics question. I'm dealing with a specialized dataset with a very small sample (i.e. n < 10). In the sequence of observations, it is possible that a new ...

logisticregress

177

asked May 11, 2022 at 2:13

0 votes

0 answers

86 views

How to conduct EM algorithm when there are some outliers in GMM Models?

I'm just confused about the problem of adding an outlier component directly to the primary form of GMM models: Suppose that the observed data contains several outliers. The mixture model could be: $$ ...

Iris88

1

asked May 8, 2022 at 14:33

0 votes

0 answers

2k views

Log Transformation to treat outliers [duplicate]

I am trying to replicate a research paper as part of my Applied Econometrics course, and I came across a particularly vague statement in the reference paper. "Following Malmendier and Tate (2005),...

Madhav Bajaj

1

asked Apr 28, 2022 at 21:45

0 votes

0 answers

71 views

Smoothing time series with Adjusted R2-weighted averages

I have two parameters (a,b) resulting from an exponential estimation of a curve. I have estimated this curve every hour for one month. In other words, I have a total of 720 parameters a and b, and I ...

angelavtc

11

asked Apr 27, 2022 at 8:45

1 vote

1 answer

532 views

Detecting outliers in a multiple time-series

I have some broker prices incoming in real-time for several products. Sometimes a broker makes a typo and sends a wrong price, or my parsing engine assigns the price to the wrong product - these are ...

MilTom

369

asked Apr 25, 2022 at 9:34

0 votes

0 answers

222 views

Detect and remove outliers from unknown distribution

I have completed a range of steady-state CFD simulations on building roofs. A contour map of the resulting variable is displayed in the Figure below with the corresponding values on the left side. ...

JimiChango

1

asked Apr 22, 2022 at 19:36

2 votes

2 answers

2k views

Do we need to split the data for Unsupervised Anomaly Detection?

I'm struggling with understanding the concept of splitting data for unsupervised anomaly/outlier detection. You can find all approaches here. I found some papers and implementations that didn't split ...

Mario

579

asked Apr 20, 2022 at 19:28

1 vote

1 answer

73 views

Which raw data to include for heterogenous autoregressive (HAR) model

I constructed the realized variance of bitcoin returns per day from 8-10-2015 to today. The realized variance is calculated by taking the cumulative squared intra-day returns. 5-minute high frequency ...

Elise

51

asked Apr 19, 2022 at 10:53

0 votes

0 answers

61 views

Sample of Runners - Can the Group Run 2.5 Miles in 20 mins?

I have a dataset where there are 6 runners. Each runner runs as far as they can for 20 mins, and a watcher records their distance (to the nearest 0.1 miles) at certain times, precisely on the minute ...

user267587

1

asked Apr 14, 2022 at 13:34

1 vote

1 answer

768 views

Identify outliers in chi-squared goodness of fit test

I am performing a chi-square goodness of fit test to compare an observed value with an expected value. The expected value is calculated from theory. p-value suggests statistical significance. How do I ...

Jalan

11

asked Apr 14, 2022 at 11:44

1 vote

1 answer

196 views

Do I need to transform/standardise my dependent variable?

Attached are the results and the residual plot for my regression of control variables on CEO compensation (TDC1). When I look at the plot my main concerns are the outliers (which I checked to be ...

user3129800

13

asked Apr 14, 2022 at 9:15

1 vote

1 answer

267 views

Detecting Spikes in a 1-D discrete time series data with unknown underlying distribution

I have a discrete 1-D data set with a value range of 0-100. The underlying distribution is unknown --although we have enough data to fit a model-- to summarize it is a highly right-skewed data set, ...

Ninja Bug

111

asked Apr 3, 2022 at 1:36

0 votes

0 answers

114 views

How to decide which "outliers" to get rid of?

I have thinking about this problem for a while but couldn't quite formulate a proper solution myself. I am also not even sure if it is appropriate to speak of "outliers" or if the term "...

rememberhthename94

11

asked Mar 28, 2022 at 16:39

-1 votes

1 answer

148 views

Is 6% of your dataset are outliers normal?

My dataset has 80,886 obs and 16 variables. I am using Mahalanobis Distance to detect outliers. And use P-value less than 0.001 as the cut-off. I am getting 5,423 obs as outlier which is 6% of total ...

surfffffffff

11

asked Mar 21, 2022 at 5:01

2 votes

1 answer

472 views

Flagging bad time series behavior (Pattern Recognition and Outlier Detection)

I want to get some opinions on how to approach the following problem to do with detecting "unhealthy" behavior in time series data (either using a statistical/analytical model or ML/DL, I do ...

User_13

49

asked Mar 20, 2022 at 20:35

0 votes

0 answers

207 views

Outlier in grouped data

Existencial crysis here xD. When you want to determine outliers with IQR, and plotting a box-plot what do you plot if your data is structure in the following manner: n-dependent variables (n=6) (...

Leonardo Mendes-Silva

11

asked Mar 4, 2022 at 14:39

0 votes

0 answers

271 views

Non-parametric outlier estimation

Are there ways to automatically detect outliers ( we can fix uni-dimensional datasets ) when the underlying distribution is difficult to model ? Intuitively, resampling techniques could help. (1) You ...

Thomas

1,137

asked Mar 1, 2022 at 8:35

1 vote

0 answers

540 views

Should I have more trees than dimensions for the Isolation Forest?

I have a dataset which has 200 dimensions after pre-processing. I read multiple times that 100 is the recommended number of trees for the Isolation Forest. Since each tree chooses one feature randomly,...

2much2code

31

asked Feb 23, 2022 at 10:36

0 votes

1 answer

278 views

How to deal with a large number of outliers in biological data?

I´m working on a marine species dataset with R. I would like to compare the biomass and abundance between different sites but I´m not sure how to deal with the large number of outliers. I am aware ...

Florian B.

97

asked Feb 22, 2022 at 9:27

Questions tagged [outliers]