alpaqa 0.0.1
Nonconvex constrained optimization
All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Macros Modules Pages
Math

Augmented Lagrangian method

Definitions

The nonlinear program we wish to solve is of the form:

(P)minimizexf(x)f:IRnIRsubject toxxxzg(x)zg:IRnIRm

Define the convex sets C and D as

(1)C={xIRnxxx}D={zIRmzzz}.

These rectangular boxes can be decomposed as Cartesian products of 1-dimensional closed intervals:

(2)C=C1×C2××Cnwhere Ci=[xi,xi]D=D1×D2××Dmwhere Di=[zi,zi]

Using these definitions, problem (P) can equivalently be expressed as

(3)minimizexCf(x)subject tog(x)D.

After introduction of a slack variable z, problem (3) can be stated as

(P-ALM)minimizexC,zDf(x)subject tog(x)z=0.

The Lagrangian function of problem (P-ALM) is given by:

(4)L:IRn×IRm×IRmIR:(x,z,y)L(x,z,y)f(x)+g(x)z,y.

The vector yIRm is called the vector of Lagrange multipliers.

The augmented Lagrangian function with penalty factor Σ of the problem (P-ALM) is defined as the sum of the Lagrangian function and a quadratic term that penalizes the constraint violation:

(5)LΣ:IRn×IRm×IRmIR:(x,z,y)LΣ(x,z,y)L(x,z,y)+12g(x)zΣ2,

where Σ is a symmetric positive definite m×m matrix that defines a norm on IRm, zΣ2zΣz.

The augmented Lagrangian method algorithm

The augmented Lagrangian method for solving Problem (P-ALM) consists of the successive minimization of LΣ with respect to the decision variables x and the slack variables z (1), after which the Lagrange multipliers y are updated (2), and the penalty factors Σii corresponding to constraints with high violation are increased (3).

The augmented Lagrangian function is used as an exact penalty function for problem (P-ALM), it is equivalent to the shifted quadratic penalty method with shift Σ1y.

1. Minimization of the augmented Lagrangian

Using some algebraic manipulations, the augmented Lagrangian defined in (5) can be expressed as

(6)LΣ(x,z,y)=f(x)+12g(x)z+Σ1yΣ212yΣ1.2

At each iteration ν of the ALM algorithm, the following minimization problem is solved:

(7)(xν,zν)=argminxC,zDLΣν1(x,z;yν1)

2. Update of the Lagrange multipliers

The update of the Lagrange multipliers corrects the shift Σ1y in (6): if the constraint violation g(xν)zν is positive, the shift is increased, in an attempt to drive the next iterate towards a smaller constraint violation g(xν)zν. The following update rule formalizes that idea:

(8)yνyν1+Σν1(g(xν)zν)

When the constraint violation becomes zero, the Lagrange multipliers are no longer updated.

As the penalty factors Σ tend towards infinity, the shift Σ1y has to vanish, because in that case, the quadratic penalty method without shifts solves the problem exactly. For Σ1y to vanish, the Lagrange multipliers must be bounded, which is achieved by the following projection:

Let M>0 be some large but finite bound.

(9)yi{0zi=Motherwise,yi{0zi=++Motherwise(10)Y[y1,y1]××[ym,ym]

The result of (8) is therefore clamped as follows:

(11)yνΠY(yν1+Σν1(g(xν)zν))

3. Update of the penalty factors

When the penalty factor for the i-th constraint, Σii is increased, minimizing the violation of this constraint becomes more important in (7). Therefore, if the constraint violation cannot be reduced by updating the shifts alone, the penalty factors are increased.

Selecting when and by how much each penalty factor should be increased is more of a heuristic. The strategy used here is to compare the violation at the current iterate with the violation at the previous iterate, it is the same strategy as used in QPALM. Denote the vector of constraint violations as eνg(xν)zν. Let θ(0,1).
If |eiν|θ|eiν1|, meaning that the constraint violation has decreased by at least a factor θ compared to the previous iteration, then the penalty factor is not updated.
If the constraint violation did not decrease sufficiently, then the penalty factor Σii is increased by a factor

(12)Δ|eiν|eν,

where Δ>1 is a tuning parameter. The violation of each individual constraint is scaled by the maximum violation of all constraints, such that the penalty factors of constraints with a large violation are increased more aggressively. If the factor in (12) is less than one, the penalty factor is not updated (otherwise it would result in a reduction of the penalty).

PANOC

PANOC is an algorithm that solves optimization problems of the form:

(P-PANOC)minimizexψ(x)+h(x),

where ψ:IRnIR has Lipschitz gradient, and h:IRnIR allows efficient computation of the proximal operator.

Recall the inner minimization problem (7) in the first step of the ALM algorithm. It can be simplified to:

(13)minxC,zDLΣ(x,z,y)=12yΣ12+minxC{f(x)+minzD{12z(g(x)+Σ1y)Σ2}}=12yΣ12+minxC{f(x)+12distΣ2(g(x)+Σ1y, D)ψΣ(x;y)}

Within the PANOC algorithm, the parameters y and Σ remain constant, and will be omitted from the function names to ease notation:

(14)ψ(x)=f(x)+12distΣ2(g(x)+Σ1y, D)

The inner problem in (13) has the same minimizers as the following problem that will be solved using the PANOC algorithm:

(15)minimizexCψ(x),

This problem is an instance of problem (P-PANOC) where the nonsmooth term h is the indicator of the set C, h(x)=δC(x).

Evaluation

The following is a list of symbols and formulas that are used in the implementation of the PANOC algorithm.

yIRmCurrent Lagrange multipliersΣdiag(IR>0m)Current penalty factorxkIRnCurrent PANOC iterateγkIR>0Current proximal gradient step sizeζkg(xk)+Σ1yShifted constraint valuez^kΠD(g(xk)+Σ1y)Closest feasible value for slack variable z=ΠD(ζk)dkζkΠD(ζk)How far the shifted constraint value ζ=ζkz^kis from the feasible setekg(xk)z^kConstraint violationy^kΣdkCandidate Lagrange multipliers,=Σ(g(xk)+Σ1yΠD(g(xk)+Σ1y))see (8)=y+Σ(g(xk)z^k)=y+Σekψ(xk)=LΣ(xk,z^k,y)+12yΣ12PANOC objective function=f(xk)+12distΣ2(g(xk)+Σ1y,D)=f(xk)+12(g(xk)+Σ1y)ΠD(g(xk)+Σ1y)Σ2=f(xk)+12ζkz^kΣ2=f(xk)+12dk,y^kψ(xk)=f(xk)+g(xk)Σ(g(xk)+Σ1yΠD(g(xk)+Σ1y))Gradient of the objective=f(xk)+g(xk)Σ(ζkz^k)=f(xk)+g(xk)y^kx^kTγk(xk)Next proximal gradient iterate=ΠC(xkγkψ(xk))pkx^kxkProximal gradient steprk1γkpkFixed-point residual (FPR)φγk(xk)=ψ(xk)+h(x^k)+12γkx^kxk2+ψ(xk)(x^kxk)Forward-backward envelope (FBE)=ψ(xk)+12γkpk2+ψ(xk)pkqkHkrkQuasi-Newton stepxk+1=xk+(1τ)pk+τqkNext PANOC iterate

Note that many of the intermediate values depend on the value of xk, it is sometimes easiest to define them as functions:

ζ(x)g(x)+Σ1yz^(x)ΠD(g(x)+Σ1y)=ΠD(ζ(x))d(x)ζ(x)ΠD(ζ(x))=ζ(x)z^(x)y^(x)Σd(x)=Σ(g(x)+Σ1yΠD(g(x)+Σ1y))e(x)g(x)z^(x)

The result of the PANOC algorithm is the triple (x^k,y^(x^k),z^(x^k)).

The following graph visualizes the dependencies between the different values used in a PANOC iteration.