Can AUC be used for model selection, and how can the excessive number of features/parameters be penalized in this case?
In frequentist framework we have various model selection criteria, like AIC, BIC, etc., which penalize the excessive number of features - balancing it with likelihood increase (in this sense, BIC is a frequentist criterion.)
In Bayesian framework we select the model with the maximum evidence, where the excessive number of features is penalized via the prior.
In classification tasks, yet another approach is using n-fold cross-validation, where we can compare AUC (or some other score) for different models. This approach is appealing since AUC (or a relevant score) is easier to interpret than information criteria or Bayesian evidence, and it seems (superficially) independent on out-of-blue choices (like Bayesian prior or specific information criterion.)
However, it seems that adding extra features might increase the AUC and must be penalized (which would require introduction of of ad-hoc penalty.) The question is how such a penalty can be introduced.
sklearnfunctions.) $\endgroup$penalty = Nonedoes it). Earlier versions did not allow for the penalty to be ditched, however. You had to hack around. There should be old Stack Overflow posts about it. Anyway, the penalty probably influences the behavior you’re seeing. I think you should be able to drive the AUC pretty low if you ditch the penalty. $\endgroup$