Note sure if there is an existing stats concept for this but I have a dataset that consists of mostly small data points with a few large ones.
e.g. 1 2 1 3 1 2 87 3 2 1 1 1 1 3 1 2 1 1 1 99
How can I filter this data set to only get the values that disproportionately make up the bulk of the information? I am currently filtering by data points that exist a few standard deviations out but this doesn't tell me what % of the total I am getting. (e.g. if I go 2 standard deviations out am I getting 70% of the info? if I go 5 is it 95%? I only know what % of the number of data points it represents not the percentage of the data)
EDIT: I want to remove as many data points as possible without removing the important data points. So if I have a mean of 5 and a std-deviation of 20, I filter out data points that are less than 45 (20 + 20 + 5). This removes say 95% of the data points but then the dataset can look like: 50 46 90 80 44 99999 57 87 88. The Pareto Principle here is applied recursively with this 99999. In this scenario I'd like to only keep the 99999 since it accounts for 99% of the data but I don't know this from just using a std-deviation rule of thumb.
For example, many people will agree that 1% of people can hold 99% of the wealth. If you slice into that data further you find that 1% of that 1% holds 99% of that wealth. Meaning that 0.01% of people hold 98% of the wealth. This second piece of information is surprising since it shows the "big guys" of the "big guys". This might even go further with the "big guys" of the "big guys" of the "big guys" (big-guys^3) Maybe 1 person holds 95% of all the wealth. How can I analyze my data for this? Given a pie or bar-chart it would be obviously at a glance.