Suppose $m, n, p, n_c\in\mathbb{N}$, $Y\in\mathbb{R}^{m\times p}$. Let $\mathcal{P}_{m, n, p} := \mathbb{R}^{m\times n}\times\mathbb{R}^{n\times p}\times\mathbb{R}^{m\times p}$ and $g:\mathcal{P}_{m, n, p}\to\mathbb{R}^{n_c}$ be such that every component $g_j$ of $g$ is affine in its arguments. Suppose we want to solve the following optimization problem. $$ \min_{(U, V, Z)\in\mathcal{P}_{m, n, p}}\quad \frac12\Vert UV - Y\Vert_\mathrm{F}^2\\ \mathrm{s.t.}\quad Z = UV,\\ \quad\qquad\qquad g(U, V, Z) = 0. $$ It is a simple exercise to find the conditions needed for a constrained local minimum which is a system of polynomial equations. Are there any numerical methods to compute a constrained critical point with certainty? I have looked at Newton's method, but it is not guaranteed to always converge. Any analytical insight is also welcome.
1 Answer
Let's forget for a moment that this is an optimization problem at all and instead assume that you're solving a big nonlinear system of equations $F(x) = 0$. If you introduce Lagrange multipliers for the constraints and set $x = (U, V, Z, \lambda)$ and take $F$ to be the derivative of the Lagrangian then we get the desired form. As you said, the regular old Newton-Raphson algorithm $$x_{n + 1} = x_n - dF(x_n)^{-1}F(x_n)$$ converges quadratically to the true solution $F$ provided that we started from an initial guess that was sufficiently close, but can diverge if we start outside of the quadratic convergence basin.
There are several ways to fix up Newton's method to achieve global convergence. Arguably the simplest is the damped Newton method. First, compute the search direction $v_n$ by solving the linear system $$dF(x_n)v_n = -F(x_n).$$ Then compute the scalar $\alpha_n$ that minimizes the quantity $$f(\alpha) = \|F(x_n + \alpha\cdot v_n)\|^2.$$ There are other choices of norm or merit functions you can use but you get the idea. Provided that you can solve the 1D minimization problem accurately enough, the Newton line search method is guaranteed to converge from any initial guess. The Armijo-Wolfe conditions tell you how accurate is accurate enough.
I mention line search methods because they achieve global convergence and they're easy to explain. I don't think they're "the best" and there are loads of other approaches worth knowing about. I've had the best luck with trust region methods. Nocedal and Wright is worth a read (or two).
Specifically for polynomial systems, there is a beautiful theory due to Smale which gives sufficient conditions to know when you're in the convergence basin. The slides from this talk are a great intro.
You might also get more responses to optimization questions on the Computational Science stack exchange.