Compact quasi-Newton representation
The compact representation for quasi-Newton methods is a matrix decomposition, which is typically used in gradient based optimization algorithms or for solving nonlinear systems. The decomposition uses a low-rank representation for the direct and/or inverse Hessian or the Jacobian of a nonlinear system. Because of this, the compact representation is often used for large problems and constrained optimization.
Definition
The compact representation of a quasi-Newton matrix for the inverse Hessian ordirect Hessian of a nonlinear objective function
expresses a sequence of recursive rank-1
or rank-2 matrix updates as one rank- or rank- update of an initial matrix.
Because it is derived from quasi-Newton updates,
it uses differences of iterates and gradients in its definition
In particular, for or the rectangular matrices and the square symmetric systems depend on the 's and define the quasi-Newton representations
Applications
Because of the special matrix decomposition the compact representation is implemented instate-of-the-art optimization software.
When combined with limited-memory techniques it is a popular technique for constrained optimization with gradients.
Linear algebra operations can be done efficiently, like matrix-vector products, solves or eigendecompositions. It can be combined
with line-search and trust region techniques, and the representation has been developed for many quasi-Newton
updates. For instance, the matrix vector product with the direct quasi-Newton Hessian and an arbitrary
vector is:
Background
In the context of the GMRES method, Walkershowed that a product of Householder transformations can be expressed as
a compact matrix formula. This led to the derivation
of an explicit matrix expression for the product of identity plus rank-1 matrices.
Specifically, for
and
when
the product of rank-1 updates to the identity is
The BFGS update can be expressed in terms of products of the 's, which
have a compact matrix formula. Therefore, the BFGS recursion can exploit these block matrix
representations
Recursive quasi-Newton updates
A parametric family of quasi-Newton updates includes many of the most known formulas. Forarbitrary vectors and such that and
general recursive update formulas for the inverse and direct Hessian
estimates are
By making specific choices for the parameter vectors and well known
methods are recovered
| BFGS | PSB | ||
| DFP | |||
| SR1 | SR1 | ||
| MSS |
Compact Representations
Collecting the updating vectors of the recursive formulas into matrices, defineupper triangular
lower triangular
and diagonal
With these definitions the compact representations of general rank-2 updates in and
have been developed in Brust:
and the formula for the direct Hessian is
For instance, when the representation in is
the compact formula for the BFGS recursion in.
Specific Representations
Prior to the development of the compact representations of and,equivalent representations have been discovered for most known updates.
BFGS">Broyden–Fletcher–Goldfarb–Shanno algorithm">BFGS
Along with the SR1 representation, the BFGS compact representation was the first compact formula known. In particular, the inverse representation isgiven by
The direct Hessian approximation can be found by applying the Sherman-Morrison-Woodbury identity to the inverse Hessian:
SR1">Symmetric rank-one">SR1
The SR1 compact representation was first proposed in. Using the definitions ofand from above, the inverse Hessian formula is given by
The direct Hessian is obtained by the Sherman-Morrison-Woodbury identity and has the form
MSS
The multipoint symmetric secant method is a method that aims to satisfy multiple secant equations. The recursiveupdate formula was originally developed by Burdakov. The compact representation for the direct Hessian was derived in
Another equivalent compact representation for the MSS matrix is derived by rewriting
in terms of.
The inverse representation can be obtained by application for the Sherman-Morrison-Woodbury identity.
DFP">Davidon–Fletcher–Powell formula">DFP
Since the DFP update is the dual of the BFGS formula, the compact representation for DFP can be immediately obtained from the one for BFGS.PSB
The PSB compact representation was developed for the direct Hessian approximation. It is equivalent to substituting inStructured BFGS
For structured optimization problems in which the objective function can be decomposed into two parts, where the gradients and Hessian of
are known but only the gradient of is known, structured BFGS formulas
exist. The compact representation of these methods has the general form of,
with specific and.
Reduced BFGS
The reduced compact representation of BFGS is for linear equality constrained optimization, where is underdetermined. In addition to the matrices
the RCR also stores the projections of the 's onto the nullspace of
For the compact representation of the BFGS matrix the block of the inverse KKT matrix has the compact representation
Limited Memory
The most common use of the compact representations is for the limited-memory setting where denotes the memory parameter,with typical values around . Then, instead
of storing the history of all vectors one limits this to the most recent vectors and possibly or.
Further, typically the initialization is chosen as an adaptive multiple of the identity,
with and. Limited-memory methods are frequently used for
large-scale problems with many variables, in which the limited-memory matrices
and are tall and very skinny:
and.
Implementations
Open source implementations include:- ACM TOMS algorithm 1030 implements a L-SR1 solver
- R's
optimgeneral-purpose optimizer routine uses the L-BFGS-B method. - SciPy's optimization module's minimize method also includes an option to use L-BFGS-B.
- IPOPT with first order information
- Artelys Knitro nonlinear programming solvers use compact quasi-Newton matrices
- L-BFGS-B