The nonlinear program we wish to solve is of the form:
Define the convex sets and as
These rectangular boxes can be decomposed as Cartesian products of 1-dimensional closed intervals:
Using these definitions, problem can equivalently be expressed as
After introduction of a slack variable , problem can be stated as
The Lagrangian function of problem is given by:
The vector is called the vector of Lagrange multipliers.
The augmented Lagrangian function with penalty factor of the problem is defined as the sum of the Lagrangian function and a quadratic term that penalizes the constraint violation:
where is a symmetric positive definite matrix that defines a norm on , .
The augmented Lagrangian method algorithm
The augmented Lagrangian method for solving Problem consists of the successive minimization of with respect to the decision variables and the slack variables (1), after which the Lagrange multipliers are updated (2), and the penalty factors corresponding to constraints with high violation are increased (3).
The augmented Lagrangian function is used as an exact penalty function for problem , it is equivalent to the shifted quadratic penalty method with shift .
1. Minimization of the augmented Lagrangian
Using some algebraic manipulations, the augmented Lagrangian defined in can be expressed as
At each iteration of the ALM algorithm, the following minimization problem is solved:
2. Update of the Lagrange multipliers
The update of the Lagrange multipliers corrects the shift in : if the constraint violation is positive, the shift is increased, in an attempt to drive the next iterate towards a smaller constraint violation . The following update rule formalizes that idea:
When the constraint violation becomes zero, the Lagrange multipliers are no longer updated.
As the penalty factors tend towards infinity, the shift has to vanish, because in that case, the quadratic penalty method without shifts solves the problem exactly. For to vanish, the Lagrange multipliers must be bounded, which is achieved by the following projection:
Let be some large but finite bound.
The result of is therefore clamped as follows:
3. Update of the penalty factors
When the penalty factor for the -th constraint, is increased, minimizing the violation of this constraint becomes more important in . Therefore, if the constraint violation cannot be reduced by updating the shifts alone, the penalty factors are increased.
Selecting when and by how much each penalty factor should be increased is more of a heuristic. The strategy used here is to compare the violation at the current iterate with the violation at the previous iterate, it is the same strategy as used in QPALM. Denote the vector of constraint violations as . Let .
If , meaning that the constraint violation has decreased by at least a factor compared to the previous iteration, then the penalty factor is not updated.
If the constraint violation did not decrease sufficiently, then the penalty factor is increased by a factor
where is a tuning parameter. The violation of each individual constraint is scaled by the maximum violation of all constraints, such that the penalty factors of constraints with a large violation are increased more aggressively. If the factor in is less than one, the penalty factor is not updated (otherwise it would result in a reduction of the penalty).
PANOC
PANOC is an algorithm that solves optimization problems of the form:
where has Lipschitz gradient, and allows efficient computation of the proximal operator.
Recall the inner minimization problem in the first step of the ALM algorithm. It can be simplified to:
Within the PANOC algorithm, the parameters and remain constant, and will be omitted from the function names to ease notation:
The inner problem in has the same minimizers as the following problem that will be solved using the PANOC algorithm:
This problem is an instance of problem where the nonsmooth term is the indicator of the set , .
Evaluation
The following is a list of symbols and formulas that are used in the implementation of the PANOC algorithm.
Note that many of the intermediate values depend on the value of , it is sometimes easiest to define them as functions:
The result of the PANOC algorithm is the triple .
The following graph visualizes the dependencies between the different values used in a PANOC iteration.