CyqloneSolver#
-
template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
struct CyqloneSolver Linear solver for systems with optimal control structure.
- Template Parameters:
VL – Vector length.
T – Scalar type.
DefaultOrder – Storage order for the matrix workspaces (row/column major).
Ctx – Parallel execution context type, see parallel::Context.
Parallelization and vectorization
-
using tricyqle_t = TricyqleSolver<VL, T, DefaultOrder, Ctx>
Tricyqle solver type for solving block-tridiagonal systems in parallel.
-
using Context = tricyqle_t::Context
-
using SharedContext = tricyqle_t::SharedContext
-
using simd = tricyqle_t::simd
-
const index_t p
Number of processors/threads.
-
const index_t n = (N_horiz + p * v - 1) / (p * v)
Number of stages per thread per vector lane (rounded up).
-
static index_t v = VL
Vector length.
-
inline index_t lv() const
log₂(v), logarithm of the vector length.
-
inline index_t lp() const
log₂(p), logarithm of the number of processors/threads, rounded up.
-
inline index_t ceil_p() const
The number of processors p rounded up to the next power of two.
-
inline index_t ceil_P() const
The number of parallel execution units P rounded up to the next power of two.
-
inline std::unique_ptr<SharedContext> create_parallel_context() const
-
inline void foreach_stage(Context &ctx, auto &&func, auto&&... xs) const
Call a function for each stage in the horizon, passing the stage index, the data batch index, and optionally the corresponding batches of the given arrays.
Iterates backwards in time (decreasing stage index j).
-
inline void foreach_stage_fwd(Context &ctx, auto &&func, auto&&... xs) const
Call a function for each stage in the horizon, passing the stage index, the data batch index, and optionally the corresponding batches of the given arrays.
Iterates forward in time (increasing stage index j).
Matrix data structures
-
template<StorageOrder O = column_major>
using matrix = typename tricyqle_t::template matrix<O> Owning type for a batch of matrices (with batch size v).
-
template<StorageOrder O = column_major>
using view = typename tricyqle_t::template view<O> Non-owning immutable view type for matrix.
-
template<StorageOrder O = column_major>
using mut_view = typename tricyqle_t::template mut_view<O> Non-owning mutable view type for matrix.
-
using layer_stride = typename tricyqle_t::layer_stride
-
template<StorageOrder O = column_major>
using batch_view = typename tricyqle_t::template batch_view<O> Non-owning immutable view type for a single batch of v matrices.
-
template<StorageOrder O = column_major>
using mut_batch_view = typename tricyqle_t::template mut_batch_view<O> Non-owning mutable view type for a single batch of v matrices.
-
static auto default_order = tricyqle_t::default_order
Default storage order for most matrices.
-
static auto column_major = tricyqle_t::column_major
Column-major storage order for column vectors and update matrices.
Problem dimensions
-
const index_t N_horiz
Horizon length of the optimal control problem.
-
const index_t nx
Number of states of the OCP.
-
const index_t nu
Number of controls of the OCP.
-
const index_t ny
Number of general constraints of the OCP per stage.
-
const index_t ny_0
Number of general constraints at stage 0, D(0) u(0).
-
const index_t ny_N
Number of general constraints at the final stage, C(N) x(N).
-
inline index_t num_variables() const
Get the total number of primal variables in the OCP.
Note
The actual number of variable stored in Cyqlone’s internal data structures may be larger.
-
inline index_t num_dynamics_constraints() const
Get the total number of dynamics constraints in the OCP.
Note
The actual number of constraints stored in Cyqlone’s internal data structures may be larger.
-
inline index_t num_general_constraints() const
Get the total number of general constraints in the OCP.
Note
The actual number of constraints stored in Cyqlone’s internal data structures may be larger.
Solver parameters
-
CyqloneParams<value_type> params = {}
Solver parameters and settings.
-
inline CyqloneParams<value_type> get_params() const
Get the current Cyqlone solver parameters.
-
inline void update_params(const CyqloneParams<value_type> &new_params)
Update the Cyqlone solver parameters.
-
inline TricyqleParams<value_type> get_tricyqle_params() const
Get the current Tricyqle solver parameters.
-
inline void update_tricyqle_params(const TricyqleParams<value_type> &new_params)
Update the Tricyqle solver parameters.
-
inline std::string get_params_string() const
Get a string representation of the main solver parameters. Used mainly for file names.
Tricyqle solver for block-tridiagonal systems
- tricyqle_t tricyqle = {.block_size =nx,.max_rank = std::max(ny, ny_0 + ny_N) * N_horiz,.p = p,}
Block-tridiagonal solver (CR/PCR/PCG).
OCP data (reordered for use during the Cyqlone algorithm)
-
matrix<default_order> data_H =
[this] {returnmatrix<default_order>{{.depth = ceil_N(), .rows = nu + nx, .cols = nu + nx}};}() Stage-wise Hessian blocks H(j) = [ R(j) S(j); S(j)ᵀ Q(j) ] of the OCP cost function.
-
matrix<default_order> data_F =
[this] {returnmatrix<default_order>{{.depth = ceil_N(), .rows = nx, .cols = nu + nx}};}() Stage-wise dynamics matrices F(j) = [ B(j) A(j) ] of the OCP.
- matrix< default_order > data_Gᵀ = [this] {const auto nyM = std::max(ny, ny_0 + ny_N);returnmatrix<default_order>{{.depth = ceil_N(), .rows = nu + nx, .cols = nyM}};}()
Stage-wise constraint Jacobians G(j)ᵀ = [ D(j) C(j) ]ᵀ of the OCP.
Modified Riccati data structures
-
matrix<default_order> riccati_LH =
[this] {returnmatrix<default_order>{{.depth = p * v, .rows = nu + nx, .cols = n * (nu + nx)}};}() Cholesky factors of the Hessian blocks for the Riccati recursion.
LH(j) = [ LR(j) 0; LS(j) LQ(j) ]
-
matrix<default_order> riccati_LAB =
[this] {returnmatrix<default_order>{{.depth = p * v, .rows = nx, .cols = n * nx + n * nu}};}() Storage for the matrices LB(j), Acl(j) and LA(j₁) for the Riccati recursion.
Grouped per thread, with layout [ Acl(jₙ) … Acl(j₂) LA(j₁) | LB(jₙ) … LB(j₁) ], so that LA(j₁) and LB(j) are contiguous (useful when evaluating the Schur complement).
-
matrix<default_order> riccati_V =
[this] {const auto nyM = std::max(ny, ny_0 + ny_N);returnmatrix<default_order>{{.depth = p * v, .rows = nu + nx, .cols = nx+ nyM}};}() Temporary storage for the V(j) = [ B(j)ᵀ LQ(j); A(j)ᵀ LQ(j) ] matrices during the Riccati recursion.
The workspace is wider than just V to also accommodate the active constraint Jacobians, since both are used to update the Hessian blocks during the Riccati recursion.
-
matrix<column_major> riccati_work =
[this] {returnmatrix<column_major>{{.depth = p * v, .rows = nx, .cols = 1}};}() Temporary workspace for the Riccati solve phase.
Riccati factorization update data structures
- matrix< column_major > work_Σ = [this] {const auto nyM = std::max(ny, ny_0 + ny_N);returnmatrix<column_major>{{.depth = p * v, .rows = n* nyM, .cols = 1}};}()
Compressed representation of the nonzero diagonal elements of the matrix Σ, populated for each thread separately during the factorization update of the Riccati recursion.
- matrix< column_major > riccati_Υ1 = [this] {const auto nyM = std::max(ny, ny_0 + ny_N);returnmatrix<column_major>{{.depth = p * v, .rows = nu + nx + nx, .cols = n* nyM}};}()
Workspace to store the update matrices Υu, Υx, Υλ, Φu, Φx and Φλ during the factorization update of the Riccati recursion.
Both riccati_Υ1 and riccati_Υ2 are used alternately.
- matrix< column_major > riccati_Υ2 = [this] {const auto nyM = std::max(ny, ny_0 + ny_N);returnmatrix<column_major>{{.depth = p * v, .rows = nu + nx + nx, .cols = n* nyM}};}()
Alternate workspace to riccati_Υ1.
Indexing utilities
-
inline index_t ceil_N() const
Horizon length, rounded up to a multiple of the number of parallel execution units.
- index_t ν2 (index_t i) const
2-adic valuation ν₂.
- index_t ν2p (index_t i) const
2-adic valuation modulo p, i.e.
ν2p(0) = ν2p(p) = lp().
-
index_t add_wrap_ceil_N(index_t a, index_t b) const
Add
btoamodulo N_horiz.
-
index_t sub_wrap_ceil_N(index_t a, index_t b) const
Subtract
bfromamodulo N_horiz.
-
index_t add_wrap_p(index_t a, index_t b) const
Add
btoamodulo p.
-
index_t sub_wrap_p(index_t a, index_t b) const
Subtract
bfromamodulo p.
-
index_t add_wrap_ceil_p(index_t a, index_t b) const
Add
btoamodulo ceil_p().
-
index_t sub_wrap_ceil_p(index_t a, index_t b) const
Subtract
bfromamodulo ceil_p().
-
index_t sub_wrap_ceil_P(index_t a, index_t b) const
-
index_t add_wrap_ceil_P(index_t a, index_t b) const
-
index_t get_linear_batch_offset(index_t biA) const
Packing and unpacking of OCP data to Cyqlone storage format
-
void update_data(const CyqloneStorage<value_type> &ocp)
Update the internal data structures to reflect changes in the OCP data (without changing the problem size).
-
void initialize_rhs(const CyqloneStorage<value_type> &ocp, mut_view<> rhs) const
Initialize the right-hand side vector for the dynamics constraints of the OCP, using the custom Cyqlone storage format.
-
matrix initialize_rhs(const CyqloneStorage<value_type> &ocp) const
Initialize the right-hand side vector for the dynamics constraints of the OCP, using the custom Cyqlone storage format.
-
void initialize_gradient(const CyqloneStorage<value_type> &ocp, mut_view<> grad) const
Initialize the gradient vector for the OCP cost function, using the custom Cyqlone storage format.
-
matrix initialize_gradient(const CyqloneStorage<value_type> &ocp) const
Initialize the gradient vector for the OCP cost function, using the custom Cyqlone storage format.
-
void initialize_bounds(const CyqloneStorage<value_type> &ocp, mut_view<> b_min, mut_view<> b_max) const
Initialize the lower and upper bounds for the general constraints of the OCP, using the custom Cyqlone storage format.
-
std::pair<matrix<>, matrix<>> initialize_bounds(const CyqloneStorage<value_type> &ocp) const
Initialize the lower and upper bounds for the general constraints of the OCP, using the custom Cyqlone storage format.
-
void pack_variables(std::span<const value_type> ux_lin, mut_view<> ux) const
-
matrix pack_variables(std::span<const value_type> ux_lin) const
-
void unpack_variables(view<> ux, std::span<value_type> ux_lin) const
-
std::vector<value_type> unpack_variables(view<> ux) const
- void pack_dynamics (std::span< const value_type > λ_lin, mut_view<> λ) const
- matrix pack_dynamics (std::span< const value_type > λ_lin) const
- void unpack_dynamics (view<> λ, std::span< value_type > λ_lin) const
- std::vector< value_type > unpack_dynamics (view<> λ) const
-
void pack_constraints(std::span<const value_type> y_lin, mut_view<> y, value_type fill = 0) const
-
matrix pack_constraints(std::span<const value_type> y_lin, value_type fill = 0) const
-
void unpack_constraints(view<> y, std::span<value_type> y_lin) const
-
std::vector<value_type> unpack_constraints(view<> y) const
-
matrix initialize_variables() const
Get a zero-initialized matrix for the primal variables u and x.
-
matrix initialize_dynamics_constraints() const
Get a zero-initialized matrix for the dynamics constraints (or their multipliers).
-
matrix initialize_general_constraints() const
Get a zero-initialized matrix for the general constraints (or their multipliers).
-
static CyqloneSolver build(const CyqloneStorage<value_type> &ocp, index_t p)
Initialize a Cyqlone solver for the given OCP.
Note: constraints on u(0) and x(N) should be independent.
nx nu ocp.CD(0) = [ 0 | D ] ny₀ [ 0 | 0 ] ny - ny₀Since ocp.D(0) and ocp.C(N) will be merged, the top ny₀ rows of ocp.C(N) should be zero.
OCP cost gradient and constraints evaluation
-
void residual_dynamics_constr(Context &ctx, view<> x, view<> b, mut_view<> Mxb) const
Compute Mx + b, where M is the dynamics constraint Jacobian matrix of the OCP.
In other words, evaluate the residuals for all stages, i.e., \( A_j x^j + B_j u^j - x^{j+1} + b^j \).
- void transposed_dynamics_constr (Context &ctx, view<> λ, mut_view<> Mᵀλ, bool accum=false) const
Compute Mᵀλ, where M is the dynamics constraint Jacobian matrix of the OCP.
Optionally add the result to the existing contents of Mᵀλ by setting
accumto true.
-
void general_constr(Context &ctx, view<> ux, mut_view<> DCux) const
Compute the general constraints Gx, where G is the general constraint Jacobian matrix of the OCP.
In other words, evaluate the constraints for all stages, i.e., \( C_j x^j + D_j u^j \).
- void transposed_general_constr (Context &ctx, view<> y, mut_view<> DCᵀy) const
Compute Gᵀy, where G is the general constraint Jacobian matrix of the OCP.
- void transposed_general_constr (view<> y, mut_view<> DCᵀy) const
Compute Gᵀy, where G is the general constraint Jacobian matrix of the OCP.
- void cost_gradient (Context &ctx, view<> ux, value_type α, view<> q, value_type β, mut_view<> grad_f) const
Compute the cost gradient, with optional scaling factors.
grad_f ← Q ux + α q + β grad_f
- void cost_gradient_regularized (Context &ctx, value_type γ, view<> ux, view<> ux0, view<> q, mut_view<> grad_f) const
Compute the regularized cost gradient, with regularization parameter γ⁻¹, with respect to the point
ux0.
- void cost_gradient_remove_regularization (Context &ctx, value_type γ, view<> x, view<> x0, mut_view<> grad_f) const
Subtract the regularization term from the cost gradient.
Factorization and solve routines
- void factor_solve (Context &ctx, value_type γ, view<> Σ, mut_view<> ux, mut_view<> λ)
Compute the Cyqlone factorization of the KKT matrix of the OCP and perform a forward solve (fused for improved locality).
See also
- Parameters:
ctx – Parallel context.
γ – Reciprocal primal regularization.
Σ – ALM penalty factors.
ux – [inout] Negative augmented Lagrangian gradient on entry; solution of forward solve on exit.
λ – [inout] Constant term of the dynamics constraints on entry; solution of forward solve on exit. To obtain the solution of the KKT system, a reverse solve with the same factorization must be performed afterwards.
- void factor (Context &ctx, value_type γ, view<> Σ)
Compute the Cyqlone factorization of the KKT matrix of the OCP.
See also
- void solve_forward (Context &ctx, mut_view<> ux, mut_view<> λ)
Perform a forward solve with the Cyqlone factorization.
See also
- void solve_reverse (Context &ctx, mut_view<> ux, mut_view<> λ)
Perform a reverse solve with the Cyqlone factorization.
[Cyqlone solve CR serial]
- Parameters:
ctx – Parallel context.
ux – On entry, the result of the forward solve; on exit the solution of the primal variables of the KKT system.
λ – On entry, the result of the forward solve; on exit the solution of the dual variables corresponding to the dynamics constraints of the KKT system.
- void solve_reverse_mul (Context &ctx, mut_view<> ux, mut_view<> λ, mut_view<> Mᵀλ)
Fused variant of solve_reverse and transposed_dynamics_constr (for improved locality of the dynamics Jacobians).
- void update (Context &ctx, view<> ΔΣ)
Perform factorization updates of the Cyqlone factorization as described by Algorithm 4 in the paper.
[Cyqlone update]
- Parameters:
ctx – Parallel context.
ΔΣ – Changes to the ALM penalty factors Σ.
- void update_solve (Context &ctx, view<> ΔΣ, mut_view<> ux, mut_view<> λ)
Fused variant of update and solve_forward.
Low-level factorization and forward solve routines
-
inline index_t riccati_thread_assignment(Context &ctx) const
- template<bool Factor = true, bool Solve = true> void factor_riccati_solve (Context &ctx, value_type γ, view<> Σ, mut_view<> ux, mut_view<> λ)
[Modified Riccati factorization and fused forward solve]
- template<bool Factor = true, bool Solve = true> void compute_schur (Context &ctx, mut_view<> ux, mut_view<> λ)
[Cyqlone compute Schur]
Low-level factorization and forward solve routines for parallel cyclic reduction
- template<bool Factor = true, bool Solve = true> void factor_solve_impl (Context &ctx, value_type γ, view<> Σ, mut_view<> ux, mut_view<> λ)
[Cyqlone factorization and fused forward solve]
Low-level reverse solve routines
- void solve_riccati_reverse (Context &ctx, mut_view<> ux, mut_view<> λ, mut_view<> work, std::optional< mut_view<> > Mᵀλ) const
[Modified Riccati factorization and fused forward solve]
- void solve_reverse (Context &ctx, mut_view<> ux, mut_view<> λ, mut_view<> work, std::optional< mut_view<> > Mᵀλ=std::nullopt) const
Low-level factorization update routines
- template<bool Solve = true> void update_riccati_solve (Context &ctx, view<> ΔΣ, mut_view<> ux, mut_view<> λ)
Update the modified Riccati factorization of a single block column as described by Algorithm 3 in the paper.
[Cyqlone update CR]
[Cyqlone update Riccati]
- inline void update_riccati (Context &ctx, view<> ΔΣ)
- template<bool Solve = true> void update_solve_impl (Context &ctx, view<> ΔΣ, mut_view<> ux, mut_view<> λ)
[PCR update]
[Cyqlone update]
Build sparse representations for debugging and testing
- SparseMatrix build_sparse (const CyqloneStorage< value_type > &ocp, std::span< const value_type > Σ) const
-
std::vector<value_type> build_rhs(view<> rq, view<> b, value_type scale_rq = -1, value_type scale_b = -1) const
- inline std::vector< value_type > build_sol (view<> ux, view<> λ) const
-
SparseMatrix build_sparse_factor() const
-
SparseMatrix build_sparse_diag() const
Public Types
-
using value_type = T