1
$\begingroup$

I've been going over the output of a Monte Carlo model that simulates disease risk as a function of genotype. Under a null model of no disease risk, we have 1000 case and 1000 control individuals. Each individual has 500 loci of interest to which a genotype is randomly assigned based on the allele frequency at each locus. Under this scenario, the only possible association with genotype and disease will be due to random error.

Nevertheless, when running logistic regression of disease against genotype as well as several other machine learning classifiers (naive Bayes, neural networks, random forests), we consistently find AUC > 0.5 for classifiers under the test set under this null model. If we simulate a reduced data set with fewer (200) individual and 50 sites, AUC under the null model is even larger.

I can understand how overfitting may give AUC < 0.5, but I can' think of a plausible scenario that would generate AUC > 0.5 for this null model.

Another strange anomaly is that in the absence of feature selection (using LASSO), AUC is approximately 0.5 under the null model for all classifiers. However, once feature selection is introduced, I get AUC > 0.5 for all of them.

Is there some phenomenon that's a "mirror image" of overfitting that could generate these results, and if so, why am I only seeing it for classifiers following feature selection?

If it matters, the random assignment of genotypes and the machine learning classifiers were implemented in Mathematica (a colleague's old code) while the Lasso was performed in R (interfacing with Mathematica), but this shouldn't matter.

$\endgroup$
2
  • $\begingroup$ Comments have been moved to chat; please do not continue the discussion here. Before posting a comment below this one, please review the purposes of comments. Comments that do not request clarification or suggest improvements usually belong as an answer, on Cross Validated Meta, or in Cross Validated Chat. Comments continuing discussion may be removed. $\endgroup$ Commented Jun 29, 2024 at 17:25
  • $\begingroup$ Some of the comments should have been kept here. $\endgroup$ Commented Jun 29, 2024 at 19:47

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.