I have a dataset containing features and a target variable, all of which are numeric values. I wanted to see which variables influence the target variable in what way, if at all, and thought a multiple linear regression model would suit. However, I get very bad results. What model is better suited for this problem?
1 Answer
From what I know Lasso is commonly used for this purpose. It provides coefficients for features which you can compare using a bar chart.
This is a sample code from DataCamp (Supervised Learning with scikit-learn course) that gives an example to that:
from sklearn.linear_model import Lasso
X = diabetes_df.drop("glucose", axis=1).values
y = diabetes_df["glucose"].values
names = diabetes_df.drop("glucose", axis=1).columns
lasso = Lasso(alpha=0.1)
lasso_coef = lasso.fit(X, y).coef_
plt.bar(names, lasso_coef)
plt.xticks(rotation=45)
plt.show()
But there are of course advantages and disadvantages for Lasso. Whether they suit you depends on the dataset and the analysis you want to perform. Another commonly used method is apparently Stepwise Regression and apparently it is commonly discussed whether to use Lasso or Stepwise Regression. You can also check discussions of the advantages or disadvantages of both here and here.
Also here and here are cool papers where they compare different regression methods for feature selection
Hope this helps!

feature_importances_attribute. This gets you a pretty good idea of how relevant a feature might be. Linear regression won't account for non linear and interaction effects but RF will. $\endgroup$