Can I use robust estimators (e.g., "MLM" and "MLR"estimator lavaan options) to overcome outliers within my sample, or should I remove outliers?
For context, I am modelling the trajectories of scores on four cognitive tests measured at four time points. To do this I am creating separate latent growth curve models for each cognitive test, using the growth function from lavaan. I plan to investigate whether a measured biomarker (n = ~850) and a polygenic score for the biomarker (n = ~7,000) can predict the trajectories.
There are outliers (extreme scores +/- 3.29 from the mean) on both the cognitive (outcome/indicator) variables and biomarker (predictor) variables.
I am unsure whether I should remove outliers from either/both the outcome or predictor variables or can I handle them by using robust estimators (which I already plan to use as my data are non-normal).
All scores on the cognitive tests fall within the range of possible scores (though most scores make up only a narrow range of possible scores). For the biomarkers, it is not possible to determine whether all scores are possible. It might be worth noting that the measured biomarker data is fairly "noisy" (i.e., more prone to measurement error) and shows some very extreme values (> 6SDs from the mean); see the plot below for the histogram of the standardised scores.
NOTE: "MLM": maximum likelihood estimation with robust standard errors and a Satorra-Bentler scaled test statistic. For complete data only. "MLR": maximum likelihood estimation with robust (Huber-White) standard errors and a scaled test statistic that is (asymptotically) equal to the Yuan-Bentler test statistic. For both complete and incomplete data.
Thank you in advance! :)
