Some features I want to use for modeling have distributions like below:

There are high values of the features occurring frequently in my data. I can identify a subset of my data points that cause this polarization easily. There is no phenomena here, these are just samples associated with big cities. The question is how I should tackle the problem. Should I build a separate model for big cities? Or would you recommend a transformation minimizing the polarity? I know there is no general recipe in predictive modeling, but maybe do you have some experience and good practices with datasets like this. What would be your suggestion on how to incorporate these features in a model?