First, I should state that I have searched on this site for the answer. I either didn't find a question that answered my question or my knowledge level is so low I didn't realize I already read the answer.
I am studying for the AP Statistics Exam. I have to learn linear regression and one of the topics is residuals. I have a copy of Introduction to Statistics and Data Analysis on page 253 it states.
Unusual points in a bivariate data set are those that fall away from most of the other points in the scatterplot in either the $x$ direction or the $y$ direction
An observation is potentially an influential observation if it has an $x$ value that is far away from the rest of the data (separated from the rest of the data in the $x$ direction). To determine if the observation is in fact influential, we assess whether removal of this observation has a large impact on the value of the slope or intercept of the least-square line.
An observation is an outlier if it has a large residual. Outlier observation fall far away from the least-square line in the $y$ direction.
Stattreck.com states four methods to determine an outlier from residuals:
Data points that diverge in a big way from the overall pattern are called outliers. There are four ways that a data point might be considered an outlier.
- It could have an extreme X value compared to other data points.
- It could have an extreme Y value compared to other data points.
- It could have extreme X and Y values.
- It might be distant from the rest of the data, even without extreme X or Y values.
These two sources seem to conflict each other. Could anyone help clear up my confusion. Also, how does one define extreme. The AP Statistics uses the rule if the data point is outside of (Q1-1.5IQR,Q3+1.5IQR) the it is an outlier. I don't know how to apply that from just a graph off the residuals.