0
$\begingroup$

Some features I want to use for modeling have distributions like below: enter image description here

There are high values of the features occurring frequently in my data. I can identify a subset of my data points that cause this polarization easily. There is no phenomena here, these are just samples associated with big cities. The question is how I should tackle the problem. Should I build a separate model for big cities? Or would you recommend a transformation minimizing the polarity? I know there is no general recipe in predictive modeling, but maybe do you have some experience and good practices with datasets like this. What would be your suggestion on how to incorporate these features in a model?

$\endgroup$
4
  • $\begingroup$ What is the "problem" you believe needs tackling? If big cities have high values of predictors, then that may simply mean that your target also has a different distribution in big cities, and that your predictor is useful in predicting for a specific city. $\endgroup$ Commented Feb 11 at 15:33
  • $\begingroup$ My question is how distributions like these should be handled in modeling process in general. I see there is a distinguishing subset of samples. In my case it's related to big cities. But in general, when you see the distribution like that, what would be your next step? Would you transform the features somehow to make the distribution not so polarized? Or would you extract the subset and create a separate model only for these samples? $\endgroup$ Commented Feb 11 at 16:13
  • $\begingroup$ I would first try to understand whether there is anything to be concerned about. Are the six features actually correlated, i.e., is it the same instances that score high on all six features? If so, separate your data into these instances and "the rest" and run separate diagnostics. If there is an issue, address the issue. It may be helpful to add a "city size" predictor or similar. Or not, because these features may already be carrying all the information. But I would certainly not start transforming anything just because of these histograms. $\endgroup$ Commented Feb 11 at 16:16
  • 1

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.