1
$\begingroup$

I have a dataset with 100+ features, upon which I test GMM to detect anomalies. For example, I add some Gaussian noise to 5-6 features of 100 points. GMM detects the points easily, but the next suggested step is to develop an algorithm to locate the features with noise. This is where I got stuck.

Outlier score returned by the sklearn is calculated as a sum for all the dimensions of a datapoint. I tried to retrieve internal variables to understand the process of the Gaussian log-likelihood calculation, which underlies the outlier score and somehow segregate features which have outstanding values, but that was not successful. I suspect this has something to do with the way covariance matrices are calculated.

I would be happy to get some hints on where to look at either inside the GMM algorithm or suggestions on some post-detection analysis methods.

$\endgroup$

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.