2

I'm learning the very basics of data science and started with regression analysis. So I decided building a linear regression model to examine the linear relationship between two variables (chemical_1 and chemical_2) from this dataset.

I made chemical_1 the predictor (independent variable) and chemical_2 the target (dependent variable). Then used scipy.stats.linregress to calculate a regression line.

from scipy import stats

X = df['chemical_1']
Y = df['chemical_2']

slope, intercept, r_value, p_value, slope_std_error = stats.linregress(X,Y)
predict_y = slope * X + intercept

I figured out how to plot the regression line with matplotlib.

plt.plot(X, Y, 'o')
plt.plot(X, predict_y)
plt.show()

However I want to plot regression with Seaborn. The only option I have discovered for now is the following:

sns.set(color_codes=True)
sns.set(rc={'figure.figsize':(7, 7)})
sns.regplot(x=X, y=Y);

Is there a way to provide Seaborn with the regression line predict_y = slope * X + intercept in order to build a regression plot?

UPD: When using the following solution, proposed by RPyStats the Y-axis gets the chemical_1 name although it should be chemical_2.

fig, ax = plt.subplots()
sns.set(color_codes=True)
sns.set(rc={'figure.figsize':(8, 8)})
ax = sns.regplot(x=X, y=Y, line_kws={'label':'$y=%3.7s*x+%3.7s$'%(slope, intercept)});
ax.legend()
sns.regplot(x=X, y=Y, fit_reg=False, ax=ax);
sns.regplot(x=X, y=predict_y,scatter=False, ax=ax);

enter image description here

1 Answer 1

2

Using subplots and setting the axes will allow you to overlay your predicted Y values. Does this answer your question?

print(predict_y.name)
predict_y = predict_y.rename('chemical_2')
fig, ax = plt.subplots()
sns.set(color_codes=True)
sns.set(rc={'figure.figsize':(7, 7)})
sns.regplot(x=X, y=Y, fit_reg=False, ax=ax,scatter_kws={"color": "green"});
sns.regplot(x=X, y=predict_y,scatter=False, ax=ax, scatter_kws={"color": "green"});
Sign up to request clarification or add additional context in comments.

4 Comments

Updated my question, could you take a look at it? Also could you tell why the plot has different colors, not just blue as usual? Is it because of overlaying?
Your correct in that overlaying the plots will produce different colors, pass in additional arguments to sns.regplot can set them to be the same color. Y-axis name is getting overwritten with the second call to sns.regplot. It says 'chemical_1' as this is the name of the series predict_y. I can update my solution above.
Got it, thanks! One more question: do you think I've added the legend correctly? It looks as expected but I wonder - is it the most optimal way or could it be improved?
Your last question is where it gets subjective, if your happy with it and it is working as expected then it's probably safe to keep it as is. Also, I haven't done much research on adding legends to seaborn plots.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.