1
$\begingroup$

Corrected equation and one more update.

There was a major mistake in the equation. The corrected equation should be in the form: $$f(x) = 1 - c1*exp(-3*x/a1) - c2*exp(-3*x/a2).$$

I would like to find a least squares fit to a bunch of points; the parameters $c1$, $c2$, $a1$ ,and,$a2$ should be optimized. I am particularly interested in finding an analytical solution without iterations and initial guesses.

Now, $c1$, and $c2$ should not necessarily sum up to 1 (instead, the sum should be less than or equal to one and positive); $a1$ and $a2$ should also be positive.

I am not sure if this is realistic...but any help would be appreciated!

If anybody finds it worth his/her time, there are some points: $$xi=0, 12.08, 24.276, 36.368, 48.21, 59.998,$$ $$yi=0, 0.735, 0.894, 0.999, 1.074, 0.84.$$

I am trying to fit a variogram model. Therefore, I need to use one of the models that are positive-definite; the exponential model happens to be one of them. The variogram models are usually being fit either manually or through some iterative procedure.

Using a semi-automatic fitting algorithm, I obtained the following result. fitted model

The fitted parameters: $c1=0.975$; $c2=0.025$; $a1=25.741$; $a2=150.0$

I hope everything is correct now. Let me know if you see any inconsistency.

$\endgroup$
11
  • $\begingroup$ Sorry, the method of regression with integral equation cannot work in case of small number of points. This method is based on numerical integration which accuracy is too low if the number of points is not sufficient. $\endgroup$ Commented Apr 24, 2017 at 13:31
  • $\begingroup$ Thank you very much for your response! It actually worked with a sufficient precision for this particular set of points, but the coefficients are unconstrained. I thought that maybe there is a way to constraint them to sum up to one, since your method is so easy to implement. I don't have a necessary background in non-linear optimization and therefore am trying to get some help outside. I am able to get a satisfactory result using an iterative approach but an analytical solution would be much better as I need to implement this algorithm many times. $\endgroup$ Commented Apr 24, 2017 at 18:40
  • 1
    $\begingroup$ OK. But it is impossible with your numerical example because for $x=0$ the equation $y=c_0+c_1e^{-3x/a_1}+c_2e^{-3x/a_2}$ gives $y(0)=c_0+c_1+c_2=1$ which is in contradiction with the data $y(0)=0$ $\endgroup$ Commented Apr 25, 2017 at 9:20
  • $\begingroup$ I calculated c1 to be around 0, c2 = -1.0549, and c0=1.0645. So, for the (x=0,y=0) point it gives me around 0.0095. Other points are fitted as follows (y(x)): 2) 0.692, 3) 0.934, 4) 1.018, 5) 1.046, 6) 0.84. A graph shows a pretty close visual fit. Anyway, I am just blindly following one of your tutorials and am not sure 100 % that I used everything correct. $\endgroup$ Commented Apr 26, 2017 at 3:51
  • $\begingroup$ So, I don't understand the wording of your question. I agree that $c_0+c_1+c_2=1.0645-0.0095-1.0549=0.0001$ which is close to $0$. Also, you wrote : I need to have c0, c1, and c2 to sum up to 1. This seems contradictory because in the example it doesn't sum up to 1, but to 0. What I am missing ? $\endgroup$ Commented Apr 26, 2017 at 7:26

2 Answers 2

2
$\begingroup$

The case of the regression for four parameters $p,q,b,c$ of the function : $$y=be^{px}+ce^{qx}$$ was considered pp.71-74 in the paper https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales . It involves a 4x4 matrix.

If we add a parameter for the function : $$y=a+be^{px}+ce^{qx}$$ the five parameters $p,q,a,b,c$ regression involves a 5x5 matrix, as shown below.

The first part of calculus is common for various variants, depending if there is or not a relationship between $a,b,c$.

The second part of calculus below is valid if there is no additional condition. Do not apply it if an additional condition is requested on the form of linear relationship between $a,b,c$.

In the case of condition $c_0+c_1+c_2=1$ see the corresponding calculus method below.

enter image description here

enter image description here

UPDATED ANSWER AFTER THE CHANGE OF WORDING OF THE PROBLEM :

Now, the function considered is : (1-f(x)) = c1*exp(-3x/a1) + c2*exp(-3*x/a2) as specified in R.Chuck's comment. Then : $$f(x)=1-c_1e^{-3x/a_1}-c_2e^{-3x/a_2}$$ This corresponds to : $$y(x)=be^{px}+ce^{qx}\qquad \begin{cases} y(x)=f(x)-1\\ p=-3/a_1\\ q=-3/a_2\\ b=-c_1 \\ c=-c_2 \end{cases}$$ So, the method of regression with four parameters $(p,q,b,c)$ can directly be applied. There is no need for the above method with five parameters.

NUMERICAL EXAMPLE (From R.Chuck's original data)

As already pointed out the number of points is too small, which leads to deviation in the numerical integration (not accurate values of $S_k$ and $SS_k$).

As a consequence, the numerical results below are far to be accurate.

The fitting of the equation of the form (1) : $\quad f(x)=a+be^{px}+be^{qx}\quad$ leads to the black curve.

The fitting of the equation of the form (2) : $\quad f(x)=1-c_1e^{px}-c_2e^{qx}\quad$ leads to the blue curve.

Obviously, this example of data is not compatible with a good fit of the form (2) of equation.

enter image description here

enter image description here

COMMENTS ABOUT THE SECOND DATA SET :

Second data set (given by R. Chuck in the comments section) :

x: 0, 6.798406, 10.924855, 15.152776, 19.715873, 25.229183, 29.650875‌​, 34.891332, 40.29349, ‌​44.933608, 50.335293;

y: 0, 0.301604, 0.573718, 0.627697, 0.687598, 0.802262, 0.742347, 0.‌​857322, 0.947088, 0.96‌​6117, 1.093539

This is a very useful example to understand where the difficulty arrises.

In the next figure, the results of two regression calculus are represented :

Black curve : Five parameters regression. The fit is satisfactory.

Blue curve : Four parameters regression, with an imposed condition ($a=1$ instead of free $a$). The fit is very bad. So, what is the snag ?

enter image description here

The function to fit is : $\quad f(x)=1-c_1e^{-2x/a_1}-c_2e^{-2x/a_2}\quad$ with condition $a_1,a_2,c_1,c_2$ all positive.

This is the same as $\quad f(x)=a+be^{px}+ce^{qx}\quad$ with condition $p,q,b,c$ all negative and $a=1$.

$f'(x)=(pbe^{px}+qce^{qx})>0$ any $x$ , so $f(x)$ is an increasing function.

$f''(x)=(p^2be^{px}+q^2ce^{qx})<0$ any $x$ so $f'(x)$ is a decreasing function.

Thus, the function $f(x)$ is increasing more and more slowly.

If the overall shape of the "cloud" of points is of the same kind (increasing more and more slowly), all is for the best : The fitting will probably be good and the condition fulfilled.

If not, the computed coefficients will probably not fulfill the condition.

That is what arrises in the case of the given data : we observe that, for large $x$, the trend is to $y$ increasing more quickly than slowly. The shape of the cloud of points isn't compatible with the chosen function and/or associated condition.

Then, they are two possibilities :

  • Choosing an other kind of function and/or conditions. This is probably not what is wanted.

  • Considering the points which make not compatible the shape of the cloud are outliners and eliminate them. For example in eliminating the three last points, the result is shown below. The result of the four parameters fitting is close to the five parameters fitting. With the four parameters fitting, all conditions of signs of the parameters are fulfilled.

enter image description here

NOTE : It is possible that the points appearing as outliners be not really outliners but be due to a big scatter. This can be overcome with a larger number of points. In case of large scatter it is necessary to have a big number of points so that the overall shape of the cloud of points be representative.

Possibly, if there was much more points, the cloud of points might appear increasing more slowly on the right side. Then the condition would be satisfied. The present appearance can be an artefact due to a too large scatter and the difficulty encountered can be a consequence.

By a number of simulations, if it's confirmed that a large scatter combined with a low number of points is the true cause of the problem, the results cannot be reproducible from a data set to another. No miraculous solution, except reducing the scatter and/or increasing the number of points, of course if possible in practice.

$\endgroup$
10
  • $\begingroup$ Are you able to apply this to the user data set? $\endgroup$ Commented Apr 26, 2017 at 18:53
  • $\begingroup$ Definitively not. As pointed out in my comments, the method of regression with integral equation is based on numerical integrations which requires more points to be accurate enough. Moreover, the specified relationship $c_0+c_1+c_2=1$ is in full contradiction with the data set. In fact, it is not expected to solve the case of this particular data set. It is proposed to try the method with other data, made of more points and compatible to the particular relationship. $\endgroup$ Commented Apr 26, 2017 at 20:07
  • $\begingroup$ As a solution to the updated problem I used a variant of the aforementioned method from the page 72 of fr.scribd.com/doc/14674814/Regressions-et-equations-integrales. Before doing that, we can bring the equation to the form: (1-f(x)) = c1*exp(-3*x/a1) + c2*exp(-3*x/a2). This implicitly accounts for the constraint for c1,c2, and c0 (that is removed from the equation) to sum up to one. The only problem that is left is a1 and a2 as well as c1 and c2 are not necessarily positive. $\endgroup$ Commented Apr 27, 2017 at 1:43
  • $\begingroup$ See my updated answer, taking account of the updated equation (1-f(x)) = c1*exp(-3*x/a1) + c2*exp(-3*x/a2). Moreover, you wrote : "The only problem that is left is a1 and a2 as well as c1 and c2 are not necessarily positive". For me, this wording of an additional condition is still ambiguous. In order to make it clear, instead of a verbal explanation, better edit a numerical example of data set (with at least 10 points) for which the mentioned problem of signs appears. $\endgroup$ Commented Apr 27, 2017 at 4:50
  • $\begingroup$ In fact, the fit (2) is almost exactly what I need even though it is not an exact fit. The function is supposed to be constantly growing and the data number 6 should be mostly neglected. I would like to confirm that you used the equations from the page 72 of fr.scribd.com/doc/14674814/Regressions-et-equations-integral‌​es. When I am using these equations (p. 72) accounting for all the last corrections, I cannot get p = -0.016945 and q=-0.040794. I have figured out that the problem is in my first matrix. $\endgroup$ Commented Apr 27, 2017 at 9:15
0
$\begingroup$

To add to the comments of @JJacquelin, your data (below) is essentially increasing and your are trying to find a fit with exponentially decreasing functions. This will not end well.

data


As you had the foresight to post your data, we can bandy about some options.

Here is a quartic polynomial fit, and it describes the data quite well.

quartic fit


If you insist on an exponential because of your insight into the physics, consider a polynomial times an exponential. The simplest case, $$ f(x) = x e^{-x}, $$ follows.

poly exp

$\endgroup$
1
  • 1
    $\begingroup$ In fact, I do not contest the form of the chosen function. The sum of a constant and the two exponentials can be convenient for a good fitting insofar the parameters c1 and c2 can be negative and/or positive, leading to an increasing or decreasing function. My trouble comes from the additional request " c0, c1, and c2 to sum up to 1." Probably this will be clarified soon. By the way, your fit with a quartic polynomial appears rather good in the case of the given example. But I doubt that it will be convenient in other examples with more extended data on wider range. $\endgroup$ Commented Apr 26, 2017 at 7:51

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.