How to choose a point that has both optimal value and low variance

Question

I have a Gaussian Process Regression model that models the cost of a certain process. Once trained, I want to find the point $x$ corresponding to which the regression predicts the lowest cost.

Simply choosing the point with the lowest expected value $\mu(x)$ does not seem intuitive because if the lowest point has high uncertainty (i.e. high variance, $\sigma$) then we might be better off choosing a point which is slightly sub optimal but has lower uncertainty.

Edit: This is different from a Bayesian Optimization routine because I am not looking to acquire new points to update the model, I am looking to acquire new points to draw inference from the GP at its current state.

I want to improve my likelihood of choosing the most optimal point such that when I run my process with those parameters, my cost is the minimum.

A simple solution might be to consider a linear combination of both predicted mean ($\mu$) and variance ($\sigma$) instead of just mean

$\alpha(x) = \mu(x) + \lambda \sigma(x)$

So now I am finding an optimal point $x$ wrt $\alpha$ rather than $\mu$

I am trying to see if anybody else has been able to solve this problem in a different way. I was unable to find any similar work online.

What’s the “ problem” that you’re trying to solve? What information are you trying to learn from your GP? — Sycorax
– Sycorax ♦, Commented Jan 10, 2024 at 1:37
I am sorry, @Sycorax, for putting together such an un-informative question. I wrote it a time when my thinking was clouded by my own frustration. I have edited my question for it to be more insightful. To answer your question, the "problem" I am trying to solve and the information I am trying to learn from the GP is the same that is what point $x$ yields the smallest predicted value. — Namit Juneja
– Namit Juneja, Commented Jan 10, 2024 at 5:58
The terms of art you are looking for are “acquisition function” and “bayesian-optimization”. — Sycorax
– Sycorax ♦, Commented Jan 10, 2024 at 8:45
The acquisition function helps me acquire more points that I can run to make my surrogate model more accurate. That is not what I am trying to do. I am trying to draw an inference from whatever state the model is currently in. — Namit Juneja
– Namit Juneja, Commented Jan 10, 2024 at 18:08
I don't think you appreciate how an acquisition function relates to the problem you've described. If you select a new point (perhaps using $\alpha(x)$), it could be better or worse than the best point you've visited so far. Likewise, if you select the best point you've visited so far, it could be worse than some point you haven't visited. There's no escape from this dilemma. If we knew how to pick the $x$ that corresponded to the global minimum of our function, we'd do that instead of BO. If you write down a utility function for selecting a point to visit, an acquisition function falls out. — Sycorax
– Sycorax ♦, Commented Jan 10, 2024 at 18:45

Ggjj11 · Accepted Answer · 2024-01-10 19:38:04Z

4

So in general I agree with Sycorax, a different point of view to your problem could be that you try to optimize a 2-target objective (minimum mean, minimum variance) now this inherently calls for a multi-objective optimization to find the set of Pareto optimal (incomparable) solutions. While scalarization (alpha) is one route, this will only give you all points points in the pareto front (while varying lambda) if the front is convex. Otherwise you might need evolutionary algorithms for sampling from the pareto front (see Wikipedia in multi-objective optimization)

https://en.m.wikipedia.org/wiki/Multi-objective_optimization

edited Jan 10, 2024 at 19:38

answered Jan 10, 2024 at 19:32

Ggjj11

2,3451 gold badge8 silver badges22 bronze badges

1

$\begingroup$ And the purpose of an acquisition function is to create a principled way to govern the trade-off between $\mu(x)$ and $\sigma(x)$: a low $\mu(x)$ with a small $\sigma(x)$ could improve on our current lowest $x$, or large $\sigma(x)$ with a (somewhat) higher $\mu(x)$ could improve upon our lowest $x$. $\endgroup$

Sycorax
– Sycorax ♦

2024-01-11 16:55:33 +00:00
Commented Jan 11, 2024 at 16:55
$\begingroup$ Exactly. I just want to be explicit: there is also other ways than scalarization to find Pareto optimal points (especially when we know that the set of optimal solutions is non-convex), see wikipedia $\endgroup$

Ggjj11
– Ggjj11

2024-01-11 17:35:07 +00:00
Commented Jan 11, 2024 at 17:35
$\begingroup$ I see - yes, I agree. $\endgroup$

Sycorax
– Sycorax ♦

2024-01-11 17:41:28 +00:00
Commented Jan 11, 2024 at 17:41

Add a comment |

knrumsey · Accepted Answer · 2024-01-23 02:06:41Z

3

The "linear combination" approach that you mention is equivalent to choosing the point $x$ which minimizes a lower quantile of the response distribution. This is known as the upper confidence bound (UCB) acquisition function (@Sycorax). This is a sensible approach in many applications. In particular, the solution

$$x_\star = \arg\max_x \mu(x) + \left(-\Phi^{-1}(q)\right)\sigma(x)$$

minimizes the $q$-quantile of the response, where $\Phi$ is the standard normal CDF.

edited Jan 23, 2024 at 2:06

answered Jan 11, 2024 at 3:45

knrumsey

9,40229 silver badges55 bronze badges

1

$\begingroup$ Indeed, this is the UCB acquisition function. $\endgroup$

Sycorax
– Sycorax ♦

2024-01-11 05:44:44 +00:00
Commented Jan 11, 2024 at 5:44

Add a comment |

Dawei Zhan · Accepted Answer · 2024-10-17 03:35:55Z

1

You are still doing Bayesian optimization, but only using one additional sample. The acquisition functions such as expected improvement, probability of improvement, lower confidence bound functions are still the right way to go.

If you can verify the accuracy of your GP model using methods such as calculating the leave-one-out error, you can directly go for the $\mu(x)$ when your model are very accurate. Otherwise, go for the acquisition functions.

answered Oct 17, 2024 at 3:35

Dawei Zhan

111 bronze badge

Add a comment |

Ben · Accepted Answer · 2024-10-17 03:53:50Z

1

Consider estimating using a standard loss function

Rather than reinventing the wheel here, you might wish to consider undertaking estimation based on a standard loss function such as squared-error-loss. By minimising expected loss under a loos function of this kind, your method would implicitly take account of both the expected value and variance of the estimator.

answered Oct 17, 2024 at 3:53

Ben

142k7 gold badges286 silver badges649 bronze badges

Add a comment |

Stack Exchange Network

How to choose a point that has both optimal value and low variance

4 Answers 4

Consider estimating using a standard loss function

Your Answer

Hot Network Questions

How to choose a point that has both optimal value and low variance

4 Answers 4

Consider estimating using a standard loss function

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions