0
$\begingroup$

Say the distribution of underlying data points is multi-modal and we have an extremely large data point that has been confirmed to be an outlier. If it is not acceptable to simply remove the outlier then what techniques are there to reduce the outlier to an acceptable value? Apart from Winsorising (unless this is the only adequate technique for the task)

Thank you.

$\endgroup$
6
  • $\begingroup$ Welcome to Cross Validated! How did you determine that point to be an outlier? $\endgroup$ Commented Aug 23, 2022 at 16:26
  • $\begingroup$ And can you help us understand what about the answer should be specific to multimodal distributions? $\endgroup$ Commented Aug 23, 2022 at 16:34
  • $\begingroup$ @JohnMadden The underlying distribution is unknown so I'm just trying to think about an all round case example. $\endgroup$ Commented Aug 23, 2022 at 19:33
  • $\begingroup$ @Dave Using hypothesis testing from the mean aswell as the IQR method $\endgroup$ Commented Aug 23, 2022 at 19:33
  • $\begingroup$ @NMA Outlier identification by hypothesis testing requires a model assumption on which we can't usually rely. Anyway, the only reliable way to identify that a point is an outlier that should be removed is information about the data collection from which you know that the observation is erroneous. This cannot be said based on the data alone. $\endgroup$ Commented Aug 23, 2022 at 21:26

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.