CyqloneSolver

CyqloneSolver#

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
struct CyqloneSolver

Linear solver for systems with optimal control structure.

Template Parameters:
  • VL – Vector length.

  • T – Scalar type.

  • DefaultOrder – Storage order for the matrix workspaces (row/column major).

  • Ctx – Parallel execution context type, see parallel::Context.

Parallelization and vectorization

using tricyqle_t = TricyqleSolver<VL, T, DefaultOrder, Ctx>

Tricyqle solver type for solving block-tridiagonal systems in parallel.

using Context = tricyqle_t::Context
using SharedContext = tricyqle_t::SharedContext
using simd = tricyqle_t::simd
const index_t p

Number of processors/threads.

const index_t n = (N_horiz + p * v - 1) / (p * v)

Number of stages per thread per vector lane (rounded up).

static index_t v = VL

Vector length.

inline index_t lv() const

log₂(v), logarithm of the vector length.

inline index_t lp() const

log₂(p), logarithm of the number of processors/threads, rounded up.

inline index_t ceil_p() const

The number of processors p rounded up to the next power of two.

inline index_t ceil_P() const

The number of parallel execution units P rounded up to the next power of two.

inline std::unique_ptr<SharedContext> create_parallel_context() const
inline void foreach_stage(Context &ctx, auto &&func, auto&&... xs) const

Call a function for each stage in the horizon, passing the stage index, the data batch index, and optionally the corresponding batches of the given arrays.

Iterates backwards in time (decreasing stage index j).

inline void foreach_stage_fwd(Context &ctx, auto &&func, auto&&... xs) const

Call a function for each stage in the horizon, passing the stage index, the data batch index, and optionally the corresponding batches of the given arrays.

Iterates forward in time (increasing stage index j).

Matrix data structures

template<StorageOrder O = column_major>
using matrix = typename tricyqle_t::template matrix<O>

Owning type for a batch of matrices (with batch size v).

template<StorageOrder O = column_major>
using view = typename tricyqle_t::template view<O>

Non-owning immutable view type for matrix.

template<StorageOrder O = column_major>
using mut_view = typename tricyqle_t::template mut_view<O>

Non-owning mutable view type for matrix.

using layer_stride = typename tricyqle_t::layer_stride
template<StorageOrder O = column_major>
using batch_view = typename tricyqle_t::template batch_view<O>

Non-owning immutable view type for a single batch of v matrices.

template<StorageOrder O = column_major>
using mut_batch_view = typename tricyqle_t::template mut_batch_view<O>

Non-owning mutable view type for a single batch of v matrices.

static auto default_order = tricyqle_t::default_order

Default storage order for most matrices.

static auto column_major = tricyqle_t::column_major

Column-major storage order for column vectors and update matrices.

Problem dimensions

const index_t N_horiz

Horizon length of the optimal control problem.

const index_t nx

Number of states of the OCP.

const index_t nu

Number of controls of the OCP.

const index_t ny

Number of general constraints of the OCP per stage.

const index_t ny_0

Number of general constraints at stage 0, D(0) u(0).

const index_t ny_N

Number of general constraints at the final stage, C(N) x(N).

inline index_t num_variables() const

Get the total number of primal variables in the OCP.

Note

The actual number of variable stored in Cyqlone’s internal data structures may be larger.

inline index_t num_dynamics_constraints() const

Get the total number of dynamics constraints in the OCP.

Note

The actual number of constraints stored in Cyqlone’s internal data structures may be larger.

inline index_t num_general_constraints() const

Get the total number of general constraints in the OCP.

Note

The actual number of constraints stored in Cyqlone’s internal data structures may be larger.

Solver parameters

CyqloneParams<value_type> params = {}

Solver parameters and settings.

inline CyqloneParams<value_type> get_params() const

Get the current Cyqlone solver parameters.

inline void update_params(const CyqloneParams<value_type> &new_params)

Update the Cyqlone solver parameters.

inline TricyqleParams<value_type> get_tricyqle_params() const

Get the current Tricyqle solver parameters.

inline void update_tricyqle_params(const TricyqleParams<value_type> &new_params)

Update the Tricyqle solver parameters.

inline std::string get_params_string() const

Get a string representation of the main solver parameters. Used mainly for file names.

Tricyqle solver for block-tridiagonal systems

tricyqle_t tricyqle  = {.block_size =nx,.max_rank   = std::max(ny, ny_0 + ny_N) * N_horiz,.p          = p,}

Block-tridiagonal solver (CR/PCR/PCG).

OCP data (reordered for use during the Cyqlone algorithm)

matrix<default_order> data_H = [this] {returnmatrix<default_order>{{.depth = ceil_N(), .rows = nu + nx, .cols = nu + nx}};}()

Stage-wise Hessian blocks H(j) = [ R(j) S(j); S(j)ᵀ Q(j) ] of the OCP cost function.

matrix<default_order> data_F = [this] {returnmatrix<default_order>{{.depth = ceil_N(), .rows = nx, .cols = nu + nx}};}()

Stage-wise dynamics matrices F(j) = [ B(j) A(j) ] of the OCP.

matrix< default_order > data_Gᵀ  = [this] {const auto nyM = std::max(ny, ny_0 + ny_N);returnmatrix<default_order>{{.depth = ceil_N(), .rows = nu + nx, .cols = nyM}};}()

Stage-wise constraint Jacobians G(j)ᵀ = [ D(j) C(j) ]ᵀ of the OCP.

Modified Riccati data structures

matrix<default_order> riccati_LH = [this] {returnmatrix<default_order>{{.depth = p * v, .rows = nu + nx, .cols = n * (nu + nx)}};}()

Cholesky factors of the Hessian blocks for the Riccati recursion.

LH(j) = [ LR(j) 0; LS(j) LQ(j) ]

matrix<default_order> riccati_LAB = [this] {returnmatrix<default_order>{{.depth = p * v, .rows = nx, .cols = n * nx + n * nu}};}()

Storage for the matrices LB(j), Acl(j) and LA(j₁) for the Riccati recursion.

Grouped per thread, with layout [ Acl(jₙ) … Acl(j₂) LA(j₁) | LB(jₙ) … LB(j₁) ], so that LA(j₁) and LB(j) are contiguous (useful when evaluating the Schur complement).

matrix<default_order> riccati_V = [this] {const auto nyM = std::max(ny, ny_0 + ny_N);returnmatrix<default_order>{{.depth = p * v, .rows = nu + nx, .cols = nx+ nyM}};}()

Temporary storage for the V(j) = [ B(j)ᵀ LQ(j); A(j)ᵀ LQ(j) ] matrices during the Riccati recursion.

The workspace is wider than just V to also accommodate the active constraint Jacobians, since both are used to update the Hessian blocks during the Riccati recursion.

matrix<column_major> riccati_work = [this] {returnmatrix<column_major>{{.depth = p * v, .rows = nx, .cols = 1}};}()

Temporary workspace for the Riccati solve phase.

Riccati factorization update data structures

matrix< column_major > work_Σ  = [this] {const auto nyM = std::max(ny, ny_0 + ny_N);returnmatrix<column_major>{{.depth = p * v, .rows = n* nyM, .cols = 1}};}()

Compressed representation of the nonzero diagonal elements of the matrix Σ, populated for each thread separately during the factorization update of the Riccati recursion.

matrix< column_major > riccati_Υ1  = [this] {const auto nyM = std::max(ny, ny_0 + ny_N);returnmatrix<column_major>{{.depth = p * v, .rows = nu + nx + nx, .cols = n* nyM}};}()

Workspace to store the update matrices Υu, Υx, Υλ, Φu, Φx and Φλ during the factorization update of the Riccati recursion.

Both riccati_Υ1 and riccati_Υ2 are used alternately.

matrix< column_major > riccati_Υ2  = [this] {const auto nyM = std::max(ny, ny_0 + ny_N);returnmatrix<column_major>{{.depth = p * v, .rows = nu + nx + nx, .cols = n* nyM}};}()

Alternate workspace to riccati_Υ1.

Indexing utilities

inline index_t ceil_N() const

Horizon length, rounded up to a multiple of the number of parallel execution units.

index_t ν2 (index_t i) const

2-adic valuation ν₂.

index_t ν2p (index_t i) const

2-adic valuation modulo p, i.e. ν2p(0) = ν2p(p) = lp().

index_t add_wrap_ceil_N(index_t a, index_t b) const

Add b to a modulo N_horiz.

index_t sub_wrap_ceil_N(index_t a, index_t b) const

Subtract b from a modulo N_horiz.

index_t add_wrap_p(index_t a, index_t b) const

Add b to a modulo p.

index_t sub_wrap_p(index_t a, index_t b) const

Subtract b from a modulo p.

index_t add_wrap_ceil_p(index_t a, index_t b) const

Add b to a modulo ceil_p().

index_t sub_wrap_ceil_p(index_t a, index_t b) const

Subtract b from a modulo ceil_p().

index_t sub_wrap_ceil_P(index_t a, index_t b) const
index_t add_wrap_ceil_P(index_t a, index_t b) const
index_t get_linear_batch_offset(index_t biA) const

Packing and unpacking of OCP data to Cyqlone storage format

void update_data(const CyqloneStorage<value_type> &ocp)

Update the internal data structures to reflect changes in the OCP data (without changing the problem size).

void initialize_rhs(const CyqloneStorage<value_type> &ocp, mut_view<> rhs) const

Initialize the right-hand side vector for the dynamics constraints of the OCP, using the custom Cyqlone storage format.

matrix initialize_rhs(const CyqloneStorage<value_type> &ocp) const

Initialize the right-hand side vector for the dynamics constraints of the OCP, using the custom Cyqlone storage format.

void initialize_gradient(const CyqloneStorage<value_type> &ocp, mut_view<> grad) const

Initialize the gradient vector for the OCP cost function, using the custom Cyqlone storage format.

matrix initialize_gradient(const CyqloneStorage<value_type> &ocp) const

Initialize the gradient vector for the OCP cost function, using the custom Cyqlone storage format.

void initialize_bounds(const CyqloneStorage<value_type> &ocp, mut_view<> b_min, mut_view<> b_max) const

Initialize the lower and upper bounds for the general constraints of the OCP, using the custom Cyqlone storage format.

std::pair<matrix<>, matrix<>> initialize_bounds(const CyqloneStorage<value_type> &ocp) const

Initialize the lower and upper bounds for the general constraints of the OCP, using the custom Cyqlone storage format.

void pack_variables(std::span<const value_type> ux_lin, mut_view<> ux) const
matrix pack_variables(std::span<const value_type> ux_lin) const
void unpack_variables(view<> ux, std::span<value_type> ux_lin) const
std::vector<value_type> unpack_variables(view<> ux) const
void pack_dynamics (std::span< const value_type > λ_lin, mut_view<> λ) const
matrix pack_dynamics (std::span< const value_type > λ_lin) const
void unpack_dynamics (view<> λ, std::span< value_type > λ_lin) const
std::vector< value_type > unpack_dynamics (view<> λ) const
void pack_constraints(std::span<const value_type> y_lin, mut_view<> y, value_type fill = 0) const
matrix pack_constraints(std::span<const value_type> y_lin, value_type fill = 0) const
void unpack_constraints(view<> y, std::span<value_type> y_lin) const
std::vector<value_type> unpack_constraints(view<> y) const
matrix initialize_variables() const

Get a zero-initialized matrix for the primal variables u and x.

matrix initialize_dynamics_constraints() const

Get a zero-initialized matrix for the dynamics constraints (or their multipliers).

matrix initialize_general_constraints() const

Get a zero-initialized matrix for the general constraints (or their multipliers).

static CyqloneSolver build(const CyqloneStorage<value_type> &ocp, index_t p)

Initialize a Cyqlone solver for the given OCP.

Note: constraints on u(0) and x(N) should be independent.

              nx  nu
ocp.CD(0) = [ 0 | D ] ny₀
            [ 0 | 0 ] ny - ny₀

Since ocp.D(0) and ocp.C(N) will be merged, the top ny₀ rows of ocp.C(N) should be zero.

OCP cost gradient and constraints evaluation

void residual_dynamics_constr(Context &ctx, view<> x, view<> b, mut_view<> Mxb) const

Compute Mx + b, where M is the dynamics constraint Jacobian matrix of the OCP.

In other words, evaluate the residuals for all stages, i.e., \( A_j x^j + B_j u^j - x^{j+1} + b^j \).

void transposed_dynamics_constr (Context &ctx, view<> λ, mut_view<> Mᵀλ, bool accum=false) const

Compute Mᵀλ, where M is the dynamics constraint Jacobian matrix of the OCP.

Optionally add the result to the existing contents of Mᵀλ by setting accum to true.

void general_constr(Context &ctx, view<> ux, mut_view<> DCux) const

Compute the general constraints Gx, where G is the general constraint Jacobian matrix of the OCP.

In other words, evaluate the constraints for all stages, i.e., \( C_j x^j + D_j u^j \).

void transposed_general_constr (Context &ctx, view<> y, mut_view<> DCᵀy) const

Compute Gᵀy, where G is the general constraint Jacobian matrix of the OCP.

void transposed_general_constr (view<> y, mut_view<> DCᵀy) const

Compute Gᵀy, where G is the general constraint Jacobian matrix of the OCP.

void cost_gradient (Context &ctx, view<> ux, value_type α, view<> q, value_type β, mut_view<> grad_f) const

Compute the cost gradient, with optional scaling factors.

grad_f ← Q ux + α q + β grad_f

void cost_gradient_regularized (Context &ctx, value_type γ, view<> ux, view<> ux0, view<> q, mut_view<> grad_f) const

Compute the regularized cost gradient, with regularization parameter γ⁻¹, with respect to the point ux0.

void cost_gradient_remove_regularization (Context &ctx, value_type γ, view<> x, view<> x0, mut_view<> grad_f) const

Subtract the regularization term from the cost gradient.

Factorization and solve routines

void factor_solve (Context &ctx, value_type γ, view<> Σ, mut_view<> ux, mut_view<> λ)

Compute the Cyqlone factorization of the KKT matrix of the OCP and perform a forward solve (fused for improved locality).

See also

solve_reverse

Parameters:
  • ctx – Parallel context.

  • γ – Reciprocal primal regularization.

  • Σ – ALM penalty factors.

  • ux[inout] Negative augmented Lagrangian gradient on entry; solution of forward solve on exit.

  • λ[inout] Constant term of the dynamics constraints on entry; solution of forward solve on exit. To obtain the solution of the KKT system, a reverse solve with the same factorization must be performed afterwards.

void factor (Context &ctx, value_type γ, view<> Σ)

Compute the Cyqlone factorization of the KKT matrix of the OCP.

See also

factor_solve

void solve_forward (Context &ctx, mut_view<> ux, mut_view<> λ)

Perform a forward solve with the Cyqlone factorization.

See also

factor_solve

void solve_reverse (Context &ctx, mut_view<> ux, mut_view<> λ)

Perform a reverse solve with the Cyqlone factorization.

[Cyqlone solve CR serial]

Parameters:
  • ctx – Parallel context.

  • ux – On entry, the result of the forward solve; on exit the solution of the primal variables of the KKT system.

  • λ – On entry, the result of the forward solve; on exit the solution of the dual variables corresponding to the dynamics constraints of the KKT system.

void solve_reverse_mul (Context &ctx, mut_view<> ux, mut_view<> λ, mut_view<> Mᵀλ)

Fused variant of solve_reverse and transposed_dynamics_constr (for improved locality of the dynamics Jacobians).

void update (Context &ctx, view<> ΔΣ)

Perform factorization updates of the Cyqlone factorization as described by Algorithm 4 in the paper.

[Cyqlone update]

Parameters:
  • ctx – Parallel context.

  • ΔΣ – Changes to the ALM penalty factors Σ.

void update_solve (Context &ctx, view<> ΔΣ, mut_view<> ux, mut_view<> λ)

Fused variant of update and solve_forward.

Low-level factorization and forward solve routines

inline index_t riccati_thread_assignment(Context &ctx) const
template<bool Factor = true, bool Solve = true> void factor_riccati_solve (Context &ctx, value_type γ, view<> Σ, mut_view<> ux, mut_view<> λ)

[Modified Riccati factorization and fused forward solve]

template<bool Factor = true, bool Solve = true> void compute_schur (Context &ctx, mut_view<> ux, mut_view<> λ)

[Cyqlone compute Schur]

Low-level factorization and forward solve routines for parallel cyclic reduction

template<bool Factor = true, bool Solve = true> void factor_solve_impl (Context &ctx, value_type γ, view<> Σ, mut_view<> ux, mut_view<> λ)

[Cyqlone factorization and fused forward solve]

Low-level reverse solve routines

void solve_riccati_reverse (Context &ctx, mut_view<> ux, mut_view<> λ, mut_view<> work, std::optional< mut_view<> > Mᵀλ) const

[Modified Riccati factorization and fused forward solve]

void solve_reverse (Context &ctx, mut_view<> ux, mut_view<> λ, mut_view<> work, std::optional< mut_view<> > Mᵀλ=std::nullopt) const

Low-level factorization update routines

template<bool Solve = true> void update_riccati_solve (Context &ctx, view<> ΔΣ, mut_view<> ux, mut_view<> λ)

Update the modified Riccati factorization of a single block column as described by Algorithm 3 in the paper.

[Cyqlone update CR]

[Cyqlone update Riccati]

inline void update_riccati (Context &ctx, view<> ΔΣ)
template<bool Solve = true> void update_solve_impl (Context &ctx, view<> ΔΣ, mut_view<> ux, mut_view<> λ)

[PCR update]

[Cyqlone update]

Build sparse representations for debugging and testing

SparseMatrix build_sparse (const CyqloneStorage< value_type > &ocp, std::span< const value_type > Σ) const
std::vector<value_type> build_rhs(view<> rq, view<> b, value_type scale_rq = -1, value_type scale_b = -1) const
inline std::vector< value_type > build_sol (view<> ux, view<> λ) const
SparseMatrix build_sparse_factor() const
SparseMatrix build_sparse_diag() const

Public Types

using value_type = T