8

I have the following dataframe that I wish to perform some regression on. I am using Seaborn but can't quite seem to find a non-linear function that fits. Below is my code and it's output, and below that is the dataframe I am using, df. Note I have truncated the axis in this plot.

I would like to fit either a Poisson or Gaussian distribution style of function.

import pandas 
import seaborn


graph = seaborn.lmplot('$R$', 'Equilibrium Value', data = df, fit_reg=True, order=2, ci=None)
graph.set(xlim = (-0.25,10))

However this produces the following figure.

enter image description here

df

     R          Equilibrium Value
0   5.102041    7.849315e-03
1   4.081633    2.593005e-02
2   0.000000    9.990000e-01
3   30.612245   4.197446e-14
4   14.285714   6.730133e-07
5   12.244898   5.268202e-06
6   15.306122   2.403316e-07
7   39.795918   3.292955e-18
8   19.387755   3.875505e-09
9   45.918367   5.731842e-21
10  1.020408    9.936863e-01
11  50.000000   8.102142e-23
12  2.040816    7.647420e-01
13  48.979592   2.353931e-22
14  43.877551   4.787156e-20
15  34.693878   6.357120e-16
16  27.551020   9.610208e-13
17  29.591837   1.193193e-13
18  31.632653   1.474959e-14
19  3.061224    1.200807e-01
20  23.469388   6.153965e-11
21  33.673469   1.815181e-15
22  42.857143   1.381050e-19
23  25.510204   7.706746e-12
24  13.265306   1.883431e-06
25  9.183673    1.154141e-04
26  41.836735   3.979575e-19
27  36.734694   7.770915e-17
28  18.367347   1.089037e-08
29  44.897959   1.657448e-20
30  16.326531   8.575577e-08
31  28.571429   3.388120e-13
32  40.816327   1.145412e-18
33  11.224490   1.473268e-05
34  24.489796   2.178927e-11
35  21.428571   4.893541e-10
36  32.653061   5.177167e-15
37  8.163265    3.241799e-04
38  22.448980   1.736254e-10
39  46.938776   1.979881e-21
40  47.959184   6.830820e-22
41  26.530612   2.722925e-12
42  38.775510   9.456077e-18
43  6.122449    2.632851e-03
44  37.755102   2.712309e-17
45  10.204082   4.121137e-05
46  35.714286   2.223883e-16
47  20.408163   1.377819e-09
48  17.346939   3.057373e-08
49  7.142857    9.167507e-04

EDIT

Attached are two graphs produced from both this and another data set when increasing the order parameter beyond 20.

enter image description here

enter image description here

Order = 3 enter image description here

7
  • try changing your order to 3 Commented Sep 29, 2017 at 23:29
  • @BenT I have tried multiple values of the order parameter :/ Commented Sep 29, 2017 at 23:32
  • Does it not modify the shape of your curve? It uses np.polyfit which should reduce the cost function with the higher order functions. Can you post what that does to your graph? Commented Sep 29, 2017 at 23:50
  • @BenT Hmm actually you are right, it does look better now that I am pushing the value quite high (25 + ). It is a bit jumpy however, not a continuous curve. Commented Sep 30, 2017 at 0:01
  • Im surprised it doesn't look better with just a 3rd order function. It would be expected to have such a complicated curve with a 25th order polynomial function fitted to the line. If you want to compare the curve to gaussian you may just want to plot an actual gaussian curve instead of trying to fit one. You could also just make this a line plot which would clearly show that the data points look gaussian. Commented Sep 30, 2017 at 0:09

1 Answer 1

10

I have problems understanding why a lmplot is needed here. Usually you want to perform a fit by taking a model function and fit it to the data. Assume you want a gaussian function

model = lambda x, A, x0, sigma, offset:  offset+A*np.exp(-((x-x0)/sigma)**2)

you can fit it to your data with scipy.optimize.curve_fit:

popt, pcov = curve_fit(model, df["R"].values, 
                              df["EquilibriumValue"].values, p0=[1,0,2,0])

Complete code:

import pandas as pd
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt

df = ... # your dataframe

# plot data
plt.scatter(df["R"].values,df["EquilibriumValue"].values, label="data")

# Fitting
model = lambda x, A, x0, sigma, offset:  offset+A*np.exp(-((x-x0)/sigma)**2)
popt, pcov = curve_fit(model, df["R"].values, 
                              df["EquilibriumValue"].values, p0=[1,0,2,0])
#plot fit
x = np.linspace(df["R"].values.min(),df["R"].values.max(),250)
plt.plot(x,model(x,*popt), label="fit")

# Fitting
model2 = lambda x, sigma:  model(x,1,0,sigma,0)
popt2, pcov2 = curve_fit(model2, df["R"].values, 
                              df["EquilibriumValue"].values, p0=[2])
#plot fit2
x2 = np.linspace(df["R"].values.min(),df["R"].values.max(),250)
plt.plot(x2,model2(x2,*popt2), label="fit2")

plt.xlim(None,10)
plt.legend()
plt.show()

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

No way to do similar with Seaborn?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.