0
$\begingroup$

I am looking into the pros and cons of each normalisation technique for work and it got me thinking. What if I used trimmed means and the sqrt of Winsorized variances to compute the standardised data? Instead of

x<- rnorm(100, 1, 2)
y<- x-mean(x)/sd(x)

it becomes

y<- x- mean(x, trim= 0.2)/sqrt(winvar(x))

My thinking is that this won't make much difference to normally distributed data but in the case of non-normal data it might place the mean within a more accurate place in the distribution, such that the true number of points below or above the mean will be known.

This might all be rubbish but let me know what you think.

$\endgroup$
5
  • 1
    $\begingroup$ Using a different estimate of mean and/or a different estimate of SD will just produce a linear rescaling of what you would have got otherwise. And you lose something that might be important, namely that the normalised variables are guaranteed to have mean 0 and SD 1. The gain in what you propose is mostly that the zero on your new scale may seem better placed, but I would not use or recommend wording such as the "true number of points below or above the mean". I think that there can be some point in using trimmed means as summary measures, but less point in using Winsorized variances. $\endgroup$ Commented Apr 15, 2016 at 12:19
  • $\begingroup$ Thanks for your reply. Why is there less point in using winsorised variances? $\endgroup$ Commented Apr 15, 2016 at 12:24
  • 1
    $\begingroup$ Fair comment! What you would do with them? how do you think about them? how do you justify, explain, defend the degree of Winsorizing you did in a report? how do you compare them with results from other studies unless those other studies used the same method? You have some of the same problems with trimmed means, but trimmed means are fairly easy to compare with other trimmed means (including the mean itself), while Winsorized variances are not, so far as I can see, easy to compare with other Winsorized variances (including the variance). $\endgroup$ Commented Apr 15, 2016 at 12:29
  • $\begingroup$ Your R or R-like code is transparent, but in general it's not a good idea to assume that everyone uses the same software. $\endgroup$ Commented Apr 15, 2016 at 12:31
  • 1
    $\begingroup$ Much depends on the detail downstream of this. In some fields there is enormous emphasis on standardisation as a way of getting variables on similar scales and that can be important. But there is a fallacy abroad that standardization will somehow deliver robustness or resistance too: that can't be true of any linear scaling. I don't think you're saying that, but I underline the point. (I think that the different meanings of "normalise" don't help in this territory.) $\endgroup$ Commented Apr 15, 2016 at 12:36

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.