Skip to main content

Questions tagged [outliers]

An outlier is an observation that appears to be unusual or not well described relative to a simple characterization of a dataset. A discomfiting possibility is that these data come from a different population than the one intended to be studied.

Filter by
Sorted by
Tagged with
4 votes
2 answers
2k views

For example, if I specify a z-value of 3, then I would look at both sides and know its position in the distribution (99.73%). Would this change if I have a left or right skewed distribution? Would I ...
JAdel's user avatar
  • 125
1 vote
1 answer
254 views

I want to use an unpaired two-sample t-test of random samples of $n=40$ each. The sample data is from 4-point Likert scale assessments. I understand the t-test is not very robust to outliers, which I ...
Ranfurley's user avatar
3 votes
1 answer
453 views

I am using pyod to detect outliers in data, and I came across this official example: https://github.com/yzhao062/pyod/blob/master/examples/comb_example.py I have a question regarding the need to split ...
JAdel's user avatar
  • 125
0 votes
0 answers
41 views

Chose hopefully a catchy title :-) I am looking for a simple algorithm to detect outliers caused by measurement errors So assune I have given a multivariate sample (30 dimensions) and I want to detect ...
Johannes's user avatar
0 votes
0 answers
171 views

I am running a logistic regression in a sample with ~150,000 observations. I am predicting three different outcomes, x, y, and z, that occur in ~10,000, ~4,000, and ~2,000 cases respectively (for each ...
JuM24's user avatar
  • 21
0 votes
1 answer
112 views

For predicting whether a subject has liver disease or not, I'm using StratifiedKFold CV in GridSearch for AdaBoost and RandomForest Classsifiers. For Outlier anlaysis, I've identified all feature ...
hanpat99's user avatar
0 votes
0 answers
125 views

In his 1969 paper, Grubbs mentioned that "Until such time as criteria not sensitive to the normality assumption are developed, the experimenter is cautioned against interpreting probabilities too ...
WacKaDoodle's user avatar
3 votes
1 answer
249 views

Consider a set of 3D points $X = \{x_1, x_2, ...x_n\} $ with $ x_i\in\mathbb{R}^3$ on which we want to fit an arbitrary probability distribution. The distribution we want to fit models some ...
Daniel López's user avatar
1 vote
0 answers
44 views

I am trying to use sklearn.cluster.OPTICS to identify outliers, but found an issue: I use 2 examples with exactly the same data but different orders. They give different results: 1st example /////////...
Ya Gao's user avatar
  • 11
0 votes
0 answers
62 views

I have a time series $X_t \sim N(0, 1)$. There is a single outlier at index 347, at 8.5 standard deviations from the mean. If I now compute a rolling window standard deviation of $X_t$ with window ...
PyRsquared's user avatar
  • 1,364
2 votes
1 answer
235 views

I am trying to perform a robust regressions using the lmrob function in R. I am getting this error Message: ...
induktivist's user avatar
0 votes
1 answer
135 views

Does outliers begin on the whisker limit or above it? In the (Python) example below the calculcated upper whisker limit is 64.8125. Is a value of ...
buhtz's user avatar
  • 282
4 votes
1 answer
837 views

I'm working on a negative binomial model for count data. Unfortunately I can't provide a more detailed description because I wasn't explicitly allowed to. All I can say now is that the data is about ...
Eva Šragová's user avatar
1 vote
1 answer
184 views

I run competitive events. In our normal event, we have 8 adjudicators split between to categories. Skill and Artistry. For each category we throw out the high and low scores and average the remaining ...
Omar Paloma's user avatar
0 votes
1 answer
567 views

I'm trying to analyse bullying experiences across three age groups. The DV is scored on a 5-point Likert, and the IV is categorical (ages 11, 13, and 15). Initially I ran an ANOVA to see if there was ...
Hannah's user avatar
  • 1
1 vote
0 answers
129 views

I have a panel (N firms across 10 years) dataset on which I want to estimate and test a prediction model $f$: \begin{equation} y = f(x). \end{equation} Following common practice, I split my data into ...
shenflow's user avatar
  • 1,149
0 votes
1 answer
276 views

I found an outlier using the outlierTest function in the car package. However, I can see from the results that the Externally Studentized Residual and p-values. This is a result for the full model. <...
Dome's user avatar
  • 21
1 vote
0 answers
75 views

I have a time series broken down by day, and there are gaps in it that I have marked in red: the distribution there is not normal How do we approach modeling a system that will look for anomalies ...
Roman Stasiuk's user avatar
0 votes
1 answer
1k views

Disclaimer: I checked some similar questions but I could not find anything in particular that would work for my case. I am dealing with a time series going from 2015 to 2023. The data points are the ...
duecci's user avatar
  • 11
1 vote
0 answers
41 views

I have performed a meta-analysis using five micro-array datasets. After performing meta analysis I visualized the heterogeneity using funnel plot and forest plot (using two up-regulated and two down-...
Aditi Agnihotri's user avatar
1 vote
1 answer
595 views

In lavaan, I am running a two-factor CFA on a questionnaire with 28 items, all of which are scored on a 6-point Likert scale. In total I have ~350 participants who completed the questionnaire. Because ...
LJFlameling's user avatar
2 votes
1 answer
597 views

In the paper Implications of dynamic factor models for VAR analysis the authors propose a a technique for removing outliers in variables used for dyanamic factors analysis: "The outlier ...
Bertrand87's user avatar
2 votes
1 answer
322 views

I conducted a one way anova followed by a tukey-test in Rstudio and used a compact letter display to add letters of significance to a ggplot. After a positive Grubbs-outlier-test I removed an outlier ...
runald's user avatar
  • 21
0 votes
0 answers
341 views

There is plenty of information on how to detect outliers in a sample when assuming that this sample was derived from a normal distribution. Sometimes it seems to me as if when we talk about outliers ...
Alex Il's user avatar
  • 45
0 votes
1 answer
71 views

Say I have the following sequence: Is there a way to get a probability for each point indicating whether it is an outlier or not of the underlining strictly non-decreasing sequence? I suppose the ...
Tom Huntington's user avatar
1 vote
0 answers
53 views

I am doing time series forecasting with neural network (feedforward for now, but I will test also RNNs) and my problem is that, even though the network learned general patterns, it doesn't forecast ...
SlimakSlimak's user avatar
0 votes
0 answers
43 views

I need to forecast daily electricity demand. It seems that the outliers in my dataset are additive as they are affected by an anomalous behavior and are not induced by a random process that also ...
ebrahimi's user avatar
  • 291
5 votes
1 answer
368 views

I have a question regarding the boxplot. On some web pages, the Minimum and the Maximum of the 5-Number-Summary correspond to the whiskers. However, regarding this definition, my question is: how is ...
Made's user avatar
  • 121
3 votes
0 answers
43 views

The frequency of 8 cell types is measured in 100 patients (the frequencies do not sum up to 1). The patients form 4 pathologies established by the physicians. As there might be better markers (cell ...
SamGG's user avatar
  • 51
2 votes
2 answers
2k views

Currently, I am building my analytics portfolio as part of the Google Data Analytics course. I chose the option to analyze Divvy Bike Sharing data for the year 2021. But now I'm currently stuck in the ...
Atthoriq Pangestu's user avatar
5 votes
1 answer
4k views

I'm trying to figure out if my data follows a normal distribution and if it contains outliers. I have plotted the histogram and now I would like to plot the quantile-quantile (Q-Q) plot. My point is, ...
JCV's user avatar
  • 153
0 votes
1 answer
196 views

I have a problem, where I try to identify if a machine performs an activity when it is not supposed to, or performs it an unusual number of times. I am attempting to this using an anomaly detection ...
Nht_e0's user avatar
  • 33
1 vote
0 answers
96 views

If I understood correctly, PR Curve it's just the mean of F1 score computed multiple times with different thresholds. In the task of outlier detection those are two suggested metrics given the fact ...
Loris's user avatar
  • 23
1 vote
2 answers
988 views

I am reading this lecture notes on using the MLEs from other distributions (as Laplace) rather than a Gaussian when dealing with outliers. The lecture notes came from Oxford University: https://www.cs....
cgo's user avatar
  • 9,507
1 vote
1 answer
314 views

I want to predict on a test set. I have created a binary logistic regression using my current training set and have predicted on the test set. The dataset I used to split has 299 observations. What if ...
Antonio's user avatar
  • 673
0 votes
0 answers
67 views

I have a large set of aerial images with herds of elephants in it. The number of elephants in a single image can range from ~ 20 elephants to 1. I have created a dataset of ~ 2,000 png image files ...
user3200293's user avatar
3 votes
2 answers
569 views

Imagine, we have speed data of car and we would like to detect, if car speeds up or down more than it should. Do I want to just overfit my model, so the outlier (higher or lower speed) would lead me ...
Mr. Panda's user avatar
  • 325
0 votes
0 answers
766 views

I'm working on an ML project to predict wine quality from a wine's physical characteristics. The features of my data are on vastly different scales so I've been experimenting with different ...
Sylith's user avatar
  • 101
1 vote
0 answers
125 views

I've been reading about inverse-variance weighting and every reference I find to it is related to meta-analysis. However, I wonder if inverse-variance weighting can be used to reduce the bias produced ...
lafinur's user avatar
  • 235
2 votes
3 answers
359 views

I am creating a classification model to predict the credit score of a person based on lots of factors. I got the dataset from kaggle. When I started doing the EDA part, I noticed that the skewness ...
Sounak Sarkar's user avatar
4 votes
1 answer
1k views

I've seen many many many different questions on how to extract Leverage and Cook's distance for Lmers. I'm able to do that with different packages and functions by now, but how should I interpret them ...
Larissa Cury's user avatar
1 vote
1 answer
2k views

I'm looking for information and guidance to help me understand the outlier test in DHARMa for negative binomial regression in R. Here is the diagnostic plot from DHARMa using the function ...
Enialoj's user avatar
  • 13
1 vote
0 answers
178 views

I have this dataset of 104 tissue samples from two different types of tumors (B and C) along and 182 observations (gene expression profile). I do not need to understand the underlying biological ...
wantingtoimprove's user avatar
3 votes
1 answer
989 views

I need help fixing the model I landed on through backwards step-wise elimination. I chose a negative binomial model because my variance seems much larger than the mean, with random intercepts from the ...
Nate's user avatar
  • 2,537
0 votes
0 answers
53 views

I have a data set with lots of small integer values and occasional large integers. For instance 1,1,1,3,2,1,320,2,3,4. I would like to scale my outlier values such that I can perform regression on my ...
murage kibicho's user avatar
4 votes
2 answers
2k views

If you allow a bit digression about the context: I am on a journey to better understand the power and usefulness of parametric distributions; I am a bit scared of them. Maybe due to the fact that I've ...
rusiano's user avatar
  • 606
0 votes
1 answer
3k views

I'm learning about outlier detection and I wrote these two methods to get the row indexes of the instances that have outliers so I can drop them later. The problem is I'm getting two numbers very far ...
Antonio Caipora's user avatar
0 votes
1 answer
117 views

I would like seek advice on how to build an efficient approach to identify outliers in a financial series taking into account also related series. For example, let's assume the there is a very ...
user3548751's user avatar
3 votes
3 answers
254 views

I am wanting to compare two independent groups on a likert-like item. To explain, the dependent variable is structured so that a 1 = <1 units, 2 = 1-<2, 3 = 2-<3, all the way up to option 7 = ...
Amy's user avatar
  • 31
0 votes
0 answers
106 views

I don't know if this is a standard way od doing things so open to any suggestions, basically I have done random sampling from my population to create 2 groups Treatment & Control. I also have few ...
av abhishiek's user avatar

1 2
3
4 5
28