cyqlone develop
Fast, parallel and vectorized solver for linear systems with optimal control structure.
Loading...
Searching...
No Matches
cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx > Struct Template Reference

#include <cyqlone/cyqlone.hpp>

Detailed Description

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
struct cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >

Solver for block-tridiagonal systems using cyclic reduction (CR), parallel cyclic reduction (PCR), and preconditioned conjugate gradient (PCG) methods.

Template Parameters
VLVector length.
TScalar type.
DefaultOrderStorage order for the matrix workspaces (row/column major).
Examples
solve-block-tridiagonal.cpp.

Definition at line 66 of file cyqlone.hpp.

Parallelization and vectorization

using simd = batmat::datapar::deduced_simd<value_type, v>
 Represents a SIMD vector of width v storing values of type value_type.
using vl_t = std::integral_constant<index_t, v>
 Integral constant type for the vector length.
using align_t = std::integral_constant<index_t, v * alignof(value_type)>
 Integral constant type for the alignment of the batched matrix data structures.
const index_t p = 8
 Number of processors/threads.
static constexpr index_t v = VL
 Vector length.
constexpr index_t lp () const
 log₂(p), logarithm of the number of processors/threads p, rounded up.
constexpr index_t ceil_p () const
 The number of processors p rounded up to the next power of two.
constexpr index_t ceil_P () const
 The number of parallel execution units P = p * v rounded up to the next power of two.
std::unique_ptr< SharedContextcreate_parallel_context () const
 Create a new parallel execution context, storing synchronization primitives and shared data for the parallel algorithms.
static constexpr index_t lv ()
 log₂(v), logarithm of the vector length v.

Matrix data structures

template<StorageOrder O = column_major>
using matrix = batmat::matrix::Matrix<value_type, index_t, vl_t, index_t, O, align_t>
 Owning type for a batch of matrices (with batch size v).
template<StorageOrder O = column_major>
using view = batmat::matrix::View<const value_type, index_t, vl_t, index_t, index_t, O>
 Non-owning immutable view type for matrix.
template<StorageOrder O = column_major>
using mut_view = batmat::matrix::View<value_type, index_t, vl_t, index_t, index_t, O>
 Non-owning mutable view type for matrix.
using layer_stride = batmat::matrix::DefaultStride
template<StorageOrder O = column_major>
using batch_view = batmat::matrix::View<const value_type, index_t, vl_t, vl_t, layer_stride, O>
 Non-owning immutable view type for a single batch of v matrices.
template<StorageOrder O = column_major>
using mut_batch_view = batmat::matrix::View<value_type, index_t, vl_t, vl_t, layer_stride, O>
 Non-owning mutable view type for a single batch of v matrices.
static constexpr auto default_order = DefaultOrder
 Default storage order for most matrices.
static constexpr auto column_major = StorageOrder::ColMajor
 Column-major storage order for column vectors and update matrices.

Problem dimensions

const index_t block_size
 Block size of the block-tridiagonal system.
const index_t max_rank = 0
 Maximum update rank.
bool circular = false
 Whether the block-tridiagonal system is circular (nonzero top-right & bottom-left corners).

Solver parameters

Params params {}
 Solver parameters for Tricyqle-specific settings.
Params get_params () const
 Get the current solver parameters.
void update_params (const Params &new_params)
 Update the solver parameters.

Cyclic reduction data structures

matrix< default_ordercr_L
 Diagonal blocks of the Cholesky factor of the Schur complement (used during CR).
matrix< default_ordercr_U
 Subdiagonal blocks U of the Cholesky factor of the Schur complement (used during CR).
matrix< default_ordercr_Y
 Subdiagonal blocks Y of the Cholesky factor of the Schur complement (used during CR).
matrix< column_majorwork_cr
 Temporary workspace for the CR solve phase.

Parallel cyclic reduction data structures

matrix< default_orderpcr_L
 Diagonal blocks of the PCR Cholesky factorizations.
matrix< default_orderpcr_Y
 Subdiagonal blocks Y of the PCR Cholesky factorizations.
matrix< default_orderpcr_U
 Subdiagonal blocks U of the PCR Cholesky factorizations.
matrix< default_orderpcr_M
 Workspace to store the diagonal blocks during the PCR factorization.
matrix< column_majorwork_pcg
 Temporary workspace for CG vectors.

Factorization update data structures

std::vector< index_t > m_update = std::vector<index_t>(p)
 Update rank (number of changing constraints) per thread.
index_t m_update_u0 = -1
 Update rank from D(0). Negative if D(0) is not handled separately.
matrix< column_majorwork_update_Σ
 Compressed reprentation of the nonzero diagonal elements of the matrix Σ.
matrix< column_majorwork_update
 Workspace to store the update matrices Ξ(Υ) for the factorization update.
matrix< column_majorwork_hyh
 Storage for the hyperbolic Householder transformations.
matrix< column_majorwork_update_pcr_Σ
 Two copies of work_update_Σ for PCR updates.
matrix< column_majorwork_update_pcr_L
 Update matrices to apply to the diagonal blocks L during PCR updates.
matrix< column_majorwork_update_pcr_UY
 Update matrices to apply to the subdiagonal blocks U and Y during PCR updates.

Low-level PCR factorization and solve routines

static constexpr bool merge_last_level_pcr = true
void factor_pcr ()
 Compute the parallel cyclic reduction factorization of the final block tridiagonal system of size v.
template<index_t Level>
void factor_pcr_level ()
 Perform a single level of the PCR factorization.
void factor_pcr_parallel (Context &ctx)
 Compute the parallel cyclic reduction factorization of the final block tridiagonal system of size v.
template<index_t Level>
void factor_pcr_level_parallel (Context &ctx)
 Perform a single level of the PCR factorization.
void solve_pcr (mut_batch_view<> λ, mut_batch_view<> work_pcr) const
 Solve a linear system with the final block tridiagonal system of size v using the PCR factorization.
void solve_pcr (mut_batch_view<> λ)
 Solve a linear system with the final block tridiagonal system of size v using the PCR factorization.
template<index_t Level>
void solve_pcr_level (mut_batch_view<> λ, mut_batch_view<> work_pcr) const
 Perform a single level of the PCR solve.

Indexing utilities

index_t ν2 (index_t i) const
 2-adic valuation ν₂.
index_t ν2p (index_t i) const
 2-adic valuation modulo p, i.e. ν2p(0) = ν2p(p) = lp().
index_t add_wrap_ceil_p (index_t a, index_t b) const
 Add b to a modulo ceil_p().
index_t sub_wrap_ceil_p (index_t a, index_t b) const
 Subtract b from a modulo ceil_p().

Factorization and solve routines

decltype(auto) init_diag (Context &ctx, auto &&func)
 Initialize the diagonal blocks M of the block tridiagonal system using a user-provided function.
decltype(auto) init_subdiag (Context &ctx, auto &&func)
 Initialize the subdiagonal blocks K of the block tridiagonal system using a user-provided function.
decltype(auto) init_rhs (Context &ctx, mut_view<> b, auto &&func) const
 Initialize the right-hand side of the linear system using a user-provided function.
decltype(auto) get_solution (Context &ctx, view<> λ, auto &&func) const
 Get access to the solution computed by this thread using a user-provided function.
decltype(auto) get_solution (Context &ctx, mut_view<> λ, auto &&func) const
 Get access to the solution computed by this thread using a user-provided function.
template<class Λ, class F>
decltype(auto) get_solution (Context &ctx, Λ &&λ, F &&func) const
 Get access to the solution computed by this thread using a user-provided function.
void factor_solve (Context &ctx, mut_view<> λ, index_t stride=1)
 Fused factorization and forward solve.
void factor (Context &ctx)
 Perform only the factorization as described by factor_solve.
void solve_forward (Context &ctx, mut_view<> λ, index_t stride=1)
 Perform only the forward solve as described by factor_solve.
void solve_reverse (Context &ctx, mut_view<> λ, mut_view<> work, index_t stride=1) const
 Perform the backward solve phase, after the forward solve phase has been performed by factor_solve.
void solve_reverse (Context &ctx, mut_view<> λ, index_t stride=1)

Low-level factorization and solve routines

template<bool Factor = true, bool Solve = true>
void factor_solve_skip_first (Context &ctx, mut_view<> λ, index_t stride=1)
 Fused factorization and forward solve.
void factor_skip_first (Context &ctx)
 Factorization-only variant of factor_solve_skip_first.
void solve_forward_skip_first (Context &ctx, mut_view<> λ, index_t stride=1)
 Solution-only variant of factor_solve_skip_first.
template<bool Factor = true, bool Solve = true>
void factor_solve_impl (Context &ctx, mut_view<> λ, index_t stride=1)
 Implementation of factor_solve.

Low-level CR factorization and solve routines

index_t cr_thread_assignment (index_t l, index_t c) const
 Adjust thread assignment for non-power-of-two p: The diagonal blocks M(⌊p/2⌋2) are usually mapped to increasing thread indices c as the CR level l increases, as can be seen in the functions above, where iY = c + 1 - 2^l, and from the way the path of M nodes curves to the right in the thread assignment diagram in the paper.
void factor_U (index_t l, index_t iU)
 Compute a block U in the Cholesky factor for the given CR level l and column index iU.
void factor_Y (index_t l, index_t iY)
 Compute a block Y in the Cholesky factor for the given CR level l and column index iY.
void factor_L (index_t l, index_t i)
 Update and factorize a block L in the Cholesky factor for CR level l+1 and column index i, using the previously computed blocks U and Y in the same row at level l.
void update_K (index_t l, index_t i)
 Compute a subdiagonal block K of the Schur complement for CR level l+1 and column index i, using the previously computed blocks U and Y in the same row at level l.
void solve_u_forward (index_t l, index_t iU, mut_view<> λ, index_t stride) const
 Update the right-hand side λ during the forward solve phase of CR after computing block iU of λ at level l, subtracting the product U(iU) λ(iU) from the block of λ in the same row as U(iU).
void solve_y_forward (index_t l, index_t iY, mut_view<> λ, mut_view<> w, index_t stride) const
 Update the right-hand side λ during the forward solve phase of CR after computing block iY of λ at level l, subtracting the product Y(iY) λ(iY) from the block of λ in the same row as Y(iY).
void solve_λ_forward (index_t l, index_t iL, mut_view<> λ, view<> w, index_t stride) const
 Apply the updates to block iL of the right-hand side from solve_u_forward and solve_y_forward, and then solve with the diagonal block L for the next level l+1.

Low-level PCG routines

value_type mul_Mv (batch_view<> p, mut_batch_view<> Mp, batch_view< default_order > L, batch_view< default_order > K) const
 Multiply a vector by the final block tridiagonal matrix of size v.
value_type mul_precond (batch_view<> r, mut_batch_view<> z, mut_batch_view<> w, batch_view< default_order > L, batch_view< default_order > K) const
 Multiply a vector by the preconditioner for the final block tridiagonal system of size v.
void solve_pcg (mut_batch_view<> λ, mut_batch_view<> work_pcg) const
 Solve a linear system with the final block tridiagonal system of size v using the preconditioned conjugate gradient method.
void solve_pcg (mut_batch_view<> λ)
 Solve a linear system with the final block tridiagonal system of size v using the preconditioned conjugate gradient method.

Low-level reverse solve routines

void solve_reverse_parallel (Context &ctx, mut_view<> λ, mut_view<> work, index_t stride) const
 [Cyqlone solve CR]
void solve_reverse_serial (mut_view<> λ, mut_view<> work, index_t stride) const
 [Cyqlone solve CR]
void solve_u_backward (index_t l, index_t iU, mut_view<> λ, mut_view<> w, index_t stride) const
void solve_y_backward (index_t l, index_t iY, mut_view<> λ, index_t stride) const
void solve_λ_backward (index_t biL, mut_view<> λ, view<> w, index_t stride) const

Low-level factorization update routines

void set_thread_update_rank (Context &ctx, index_t c, index_t m)
 [Cyqlone update Riccati]
void set_update_rank_extra (index_t m)
void clear_update_rank_extra ()
std::pair< index_t, index_t > cols_Ups_fwd (index_t l, index_t i) const
std::pair< index_t, index_t > cols_Ups_bwd (index_t l, index_t i) const
std::pair< index_t, index_t > cols_Q_cr (index_t l, index_t i) const
index_t work_Ups_fwd_w (index_t l, index_t i) const
index_t work_Ups_bwd_w (index_t l, index_t i) const
mut_batch_view< column_majorwork_Ups_fwd (index_t l, index_t i)
mut_batch_view< column_majorwork_Ups_bwd (index_t l, index_t i)
mut_batch_view< column_majorwork_Q_cr (index_t l, index_t i)
mut_batch_view< column_majorwork_Σ_fwd (index_t l, index_t i)
mut_batch_view< column_majorwork_Σ_bwd (index_t l, index_t i)
mut_batch_view< column_majorwork_Σ_Q (index_t l, index_t i)
mut_batch_view< column_majorwork_Ups_fwd_last ()
mut_batch_view< column_majorwork_Ups_bwd_last ()
mut_batch_view< column_majorwork_Σ_fwd_last ()
mut_batch_view< column_majorwork_Σ_bwd_last ()
mut_batch_view< column_majorwork_Ups_extra ()
mut_batch_view< column_majorwork_Σ_extra ()
template<bool Solve = true>
void update_solve_cr (Context &ctx, mut_view<> λ, index_t stride)
 [Cyqlone update CR]
void update_L (index_t l, index_t i)
 [Cyqlone update CR helper]
void update_U (index_t l, index_t i)
void update_Y (index_t l, index_t i)
template<index_t Level>
void update_pcr_level (index_t m, mut_batch_view<> WYU, mut_batch_view<> WΣ)
void update_pcr (batch_view<> fwd, batch_view<> bwd, batch_view<> Σ)
 [Cyqlone update CR helper]

Low-level prefetching

template<StorageOrder O>
void prefetch (batch_view< O > X) const
 [Cyqlone solve CR helper]
template<StorageOrder O>
void prefetch_L (batch_view< O > X) const
void prefetch_L (index_t bi) const
void prefetch_U (index_t l, index_t iU) const
void prefetch_Y (index_t l, index_t iY) const

Public Types

using value_type = T
using Params = TricyqleParams<value_type>
using Context = Ctx
using SharedContext = typename Context::shared_context_type

Member Typedef Documentation

◆ value_type

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
using cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::value_type = T

Definition at line 67 of file cyqlone.hpp.

◆ Params

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
using cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::Params = TricyqleParams<value_type>

Definition at line 68 of file cyqlone.hpp.

◆ Context

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
using cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::Context = Ctx

Definition at line 69 of file cyqlone.hpp.

◆ SharedContext

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
using cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::SharedContext = typename Context::shared_context_type

Definition at line 70 of file cyqlone.hpp.

◆ simd

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
using cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::simd = batmat::datapar::deduced_simd<value_type, v>

Represents a SIMD vector of width v storing values of type value_type.

Definition at line 114 of file cyqlone.hpp.

◆ vl_t

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
using cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::vl_t = std::integral_constant<index_t, v>

Integral constant type for the vector length.

Definition at line 116 of file cyqlone.hpp.

◆ align_t

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
using cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::align_t = std::integral_constant<index_t, v * alignof(value_type)>

Integral constant type for the alignment of the batched matrix data structures.

Definition at line 118 of file cyqlone.hpp.

◆ matrix

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
template<StorageOrder O = column_major>
using cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::matrix = batmat::matrix::Matrix<value_type, index_t, vl_t, index_t, O, align_t>

Owning type for a batch of matrices (with batch size v).

Definition at line 152 of file cyqlone.hpp.

◆ view

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
template<StorageOrder O = column_major>
using cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::view = batmat::matrix::View<const value_type, index_t, vl_t, index_t, index_t, O>

Non-owning immutable view type for matrix.

Definition at line 155 of file cyqlone.hpp.

◆ mut_view

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
template<StorageOrder O = column_major>
using cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::mut_view = batmat::matrix::View<value_type, index_t, vl_t, index_t, index_t, O>

Non-owning mutable view type for matrix.

Definition at line 158 of file cyqlone.hpp.

◆ layer_stride

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
using cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::layer_stride = batmat::matrix::DefaultStride

Definition at line 159 of file cyqlone.hpp.

◆ batch_view

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
template<StorageOrder O = column_major>
using cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::batch_view = batmat::matrix::View<const value_type, index_t, vl_t, vl_t, layer_stride, O>

Non-owning immutable view type for a single batch of v matrices.

Definition at line 162 of file cyqlone.hpp.

◆ mut_batch_view

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
template<StorageOrder O = column_major>
using cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::mut_batch_view = batmat::matrix::View<value_type, index_t, vl_t, vl_t, layer_stride, O>

Non-owning mutable view type for a single batch of v matrices.

Definition at line 165 of file cyqlone.hpp.

Member Function Documentation

◆ get_params()

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
Params cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::get_params ( ) const
inlinenodiscard

Get the current solver parameters.

Definition at line 90 of file cyqlone.hpp.

◆ update_params()

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::update_params ( const Params & new_params)
inline

Update the solver parameters.

Definition at line 93 of file cyqlone.hpp.

◆ lp()

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
index_t cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::lp ( ) const
inlinenodiscardconstexpr

log₂(p), logarithm of the number of processors/threads p, rounded up.

Definition at line 105 of file cyqlone.hpp.

◆ ceil_p()

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
index_t cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::ceil_p ( ) const
inlinenodiscardconstexpr

The number of processors p rounded up to the next power of two.

Definition at line 107 of file cyqlone.hpp.

◆ ceil_P()

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
index_t cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::ceil_P ( ) const
inlinenodiscardconstexpr

The number of parallel execution units P = p * v rounded up to the next power of two.

Definition at line 109 of file cyqlone.hpp.

◆ lv()

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
constexpr index_t cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::lv ( )
inlinestaticnodiscardconstexpr

log₂(v), logarithm of the vector length v.

Definition at line 111 of file cyqlone.hpp.

◆ create_parallel_context()

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
std::unique_ptr< SharedContext > cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::create_parallel_context ( ) const
inline

Create a new parallel execution context, storing synchronization primitives and shared data for the parallel algorithms.

Definition at line 122 of file cyqlone.hpp.

◆ ν2()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
index_t cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::ν2 ( index_t i) const
nodiscard

2-adic valuation ν₂.

Definition at line 30 of file indexing.tpp.

◆ ν2p()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
index_t cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::ν2p ( index_t i) const
nodiscard

2-adic valuation modulo p, i.e. ν2p(0) = ν2p(p) = lp().

Definition at line 36 of file indexing.tpp.

◆ add_wrap_ceil_p()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
auto cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::add_wrap_ceil_p ( index_t a,
index_t b ) const
nodiscard

Add b to a modulo ceil_p().

Definition at line 19 of file indexing.tpp.

◆ sub_wrap_ceil_p()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
auto cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::sub_wrap_ceil_p ( index_t a,
index_t b ) const
nodiscard

Subtract b from a modulo ceil_p().

Definition at line 8 of file indexing.tpp.

◆ init_diag()

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
decltype(auto) cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::init_diag ( Context & ctx,
auto && func )
inline

Initialize the diagonal blocks M of the block tridiagonal system using a user-provided function.

The function is called with an index i in [0, p) and a mutable view to the compact workspace of depth v where the diagonal blocks M(i+l*p) for l in [0, v) should be stored. Copying non-compact data to this workspace can be achieved using the cyqlone::linalg::pack function.

Definition at line 177 of file cyqlone.hpp.

◆ init_subdiag()

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
decltype(auto) cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::init_subdiag ( Context & ctx,
auto && func )
inline

Initialize the subdiagonal blocks K of the block tridiagonal system using a user-provided function.

The function is called with an index i in [0, p) and a mutable view to the compact workspace of depth v where the subdiagonal blocks K(i+l*p) for l in [0, v) should be stored. Copying non-compact data to this workspace can be achieved using the cyqlone::linalg::pack function.

Definition at line 185 of file cyqlone.hpp.

◆ init_rhs()

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
decltype(auto) cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::init_rhs ( Context & ctx,
mut_view<> b,
auto && func ) const
inline

Initialize the right-hand side of the linear system using a user-provided function.

The function is called with an index i in [0, p) and a mutable view of the batch of b of depth v where the right-hand side blocks b(i+l*p) for l in [0, v) should be stored. Copying non-compact data to this batch can be achieved using the cyqlone::linalg::pack function.

Definition at line 194 of file cyqlone.hpp.

◆ get_solution() [1/3]

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
decltype(auto) cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::get_solution ( Context & ctx,
view<> λ,
auto && func ) const
inline

Get access to the solution computed by this thread using a user-provided function.

The function is called with an index i in [0, p) and a view to the compact batch of λ of depth v where the solution blocks λ(i+l*p) for l in [0, v) are stored. Copying this batch to a non-compact layout can be achieved using the cyqlone::linalg::unpack function.

Definition at line 202 of file cyqlone.hpp.

◆ get_solution() [2/3]

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
decltype(auto) cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::get_solution ( Context & ctx,
mut_view<> λ,
auto && func ) const
inline

Get access to the solution computed by this thread using a user-provided function.

The function is called with an index i in [0, p) and a view to the compact batch of λ of depth v where the solution blocks λ(i+l*p) for l in [0, v) are stored. Copying this batch to a non-compact layout can be achieved using the cyqlone::linalg::unpack function.

Definition at line 206 of file cyqlone.hpp.

◆ get_solution() [3/3]

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
template<class Λ, class F>
decltype(auto) cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::get_solution ( Context & ctx,
Λ && λ,
F && func ) const
inline

Get access to the solution computed by this thread using a user-provided function.

The function is called with an index i in [0, p) and a view to the compact batch of λ of depth v where the solution blocks λ(i+l*p) for l in [0, v) are stored. Copying this batch to a non-compact layout can be achieved using the cyqlone::linalg::unpack function.

Definition at line 211 of file cyqlone.hpp.

◆ factor_solve()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::factor_solve ( Context & ctx,
mut_view<> λ,
index_t stride = 1 )

Fused factorization and forward solve.

[Cyqlone factor Schur]

Performs CR to reduce the system down to a single block tridiagonal system of size v which is then solved using PCR or PCG.

Precondition
The workspaces cr_L.batch(i) contain the diagonal blocks M(i:v:p) of the block tridiagonal system (see init_diag).
If p > 1, the workspaces cr_Y.batch(i) with odd i contain the subdiagonal blocks K(i:v:p) of the block tridiagonal system (see init_subdiag).
If p > 1, the workspaces cr_U.batch(i) with odd i contain the superdiagonal blocks K(i-1:v:p)ᵀ of the block tridiagonal system (see init_subdiag).
If p = 1, the workspace cr_Y.batch(0) contains the subdiagonal blocks K(0:v:1) of the block tridiagonal system, and cr_U.batch(0) is not used.
Postcondition
The workspaces cr_L.batch(i) with i > 0 contain the diagonal blocks of the CR Cholesky factor. cr_Y.batch(i) and cr_U.batch(i) contain the subdiagonal blocks. The final batch cr_L.batch(0) contains the diagonal blocks of the Schur complement of all other blocks, which is the input to the PCR or PCG solver. The batch pcr_L.batch(0) contains its Cholesky factorizations. The subdiagonal blocks of this Schur complement are stored in pcr_Y.batch(0).
If the PCR solver was selected, the full PCR factorization is stored in pcr_L, pcr_U and pcr_Y.
Parameters
ctxParallel execution context for communication/synchronization between threads.
λOn entry, the right-hand side of the linear system. On exit, the solution of the forward solve phase.
strideStride (in number of batches) between batches of λ. In total, λ contains stride * p batches, but only every stride-th batch is accessed.

Note that a backward solve of the final block λ.batch(0) is performed during the forward solve phase, so λ.batch(0) contains (part of) the final solution. Performing the forward and backward solves separately for this block is not possible, because it is solved using PCR or PCG. Both methods solve the full block tridiagonal system at once, rather than using a single explicit CR Cholesky factorization with distinct forward and backward solves.

Definition at line 117 of file factor.tpp.

◆ factor()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::factor ( Context & ctx)

Perform only the factorization as described by factor_solve.

Definition at line 122 of file factor.tpp.

◆ solve_forward()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::solve_forward ( Context & ctx,
mut_view<> λ,
index_t stride = 1 )

Perform only the forward solve as described by factor_solve.

Definition at line 126 of file factor.tpp.

◆ solve_reverse() [1/2]

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::solve_reverse ( Context & ctx,
mut_view<> λ,
mut_view<> work,
index_t stride = 1 ) const

Perform the backward solve phase, after the forward solve phase has been performed by factor_solve.

Parameters
ctxParallel execution context for communication/synchronization between threads.
λOn entry, the solution of the forward solve phase. On exit, the solution of the full linear system.
workWorkspace of p * v column vectors of size block_size.
strideStride (in number of batches) between batches of λ. In total, λ contains stride * p batches, but only every stride-th batch is accessed.

Definition at line 164 of file factor.tpp.

◆ solve_reverse() [2/2]

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::solve_reverse ( Context & ctx,
mut_view<> λ,
index_t stride = 1 )
inline

Definition at line 261 of file cyqlone.hpp.

◆ factor_solve_skip_first()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
template<bool Factor, bool Solve>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::factor_solve_skip_first ( Context & ctx,
mut_view<> λ,
index_t stride = 1 )

Fused factorization and forward solve.

[Cyqlone factor Schur]

Unlike factor_solve, this function assumes that the odd diagonal blocks in the first level of CR have already been factorized and solved. This allows the user to fuse the evaluation and factorization of these blocks, possibly enabling higher performance by avoiding an additional trip to memory.

Precondition
If p > 1, the workspaces cr_L.batch(i) contain the diagonal blocks M(i) for even i, and the Cholesky factors L(i) of M(i) for odd i.
If p = 1, the workspace cr_L.batch(0) contains the diagonal block M(0), and the workspace pcr_L.batch(0) contains its Cholesky factor L(0).
If p > 1, the workspaces cr_Y.batch(i) contain the subdiagonal blocks K(i) for odd i (uninitialized for even i).
If p = 1, the workspace cr_Y.batch(0) contains the subdiagonal block K(0).
If p > 1, the workspace cr_U.batch(i) contains the superdiagonal blocks K(i-1)ᵀ for odd i (uninitialized for even i).
If p = 1, the workspace cr_U.batch(0) is not used.
If p > 1, the right-hand sides λ.batch(stride * i) contain the right-hand sides of the system for even i, and the right-hand sides multiplied by L(i)⁻¹ for odd i.
If p = 1, the right-hand side λ.batch(0) contains the right-hand side of the system.

Definition at line 48 of file factor.tpp.

◆ factor_skip_first()

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::factor_skip_first ( Context & ctx)
inline

Factorization-only variant of factor_solve_skip_first.

Definition at line 380 of file cyqlone.hpp.

◆ solve_forward_skip_first()

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::solve_forward_skip_first ( Context & ctx,
mut_view<> λ,
index_t stride = 1 )
inline

Solution-only variant of factor_solve_skip_first.

Definition at line 382 of file cyqlone.hpp.

◆ factor_solve_impl()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
template<bool Factor, bool Solve>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::factor_solve_impl ( Context & ctx,
mut_view<> λ,
index_t stride = 1 )

Implementation of factor_solve.

[Cyqlone factorization and fused forward solve]

Definition at line 29 of file factor.tpp.

◆ cr_thread_assignment()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
index_t cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::cr_thread_assignment ( index_t l,
index_t c ) const
nodiscard

Adjust thread assignment for non-power-of-two p: The diagonal blocks M(⌊p/2⌋2) are usually mapped to increasing thread indices c as the CR level l increases, as can be seen in the functions above, where iY = c + 1 - 2^l, and from the way the path of M nodes curves to the right in the thread assignment diagram in the paper.

However, these large thread indices are not actually present if p is not a power of two, so we need to remap them, undoing the offset 1 - 2^l. We always assign the last M evaluation to the even thread ⌊p/2⌋2, since this thread is present even if p is odd. The odd thread ⌊p/2⌋2+1 is assigned an inactive index, since it never has any work during CR, as there is no coupling between the last and first stages (at least not in the scalar case).

Definition at line 277 of file factor.tpp.

◆ factor_U()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::factor_U ( index_t l,
index_t iU )

Compute a block U in the Cholesky factor for the given CR level l and column index iU.

[Cyqlone factor CR helper]

Definition at line 23 of file cr.tpp.

◆ factor_Y()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::factor_Y ( index_t l,
index_t iY )

Compute a block Y in the Cholesky factor for the given CR level l and column index iY.

Definition at line 37 of file cr.tpp.

◆ factor_L()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::factor_L ( index_t l,
index_t i )

Update and factorize a block L in the Cholesky factor for CR level l+1 and column index i, using the previously computed blocks U and Y in the same row at level l.

Definition at line 71 of file cr.tpp.

◆ update_K()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::update_K ( index_t l,
index_t i )

Compute a subdiagonal block K of the Schur complement for CR level l+1 and column index i, using the previously computed blocks U and Y in the same row at level l.

Definition at line 50 of file cr.tpp.

◆ solve_u_forward()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::solve_u_forward ( index_t l,
index_t iU,
mut_view<> λ,
index_t stride ) const

Update the right-hand side λ during the forward solve phase of CR after computing block iU of λ at level l, subtracting the product U(iU) λ(iU) from the block of λ in the same row as U(iU).

[Cyqlone factor CR helper]

The stride parameter specifies the stride between consecutive blocks of λ.

[Cyqlone solve CR helper]

Definition at line 163 of file cr.tpp.

◆ solve_y_forward()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::solve_y_forward ( index_t l,
index_t iY,
mut_view<> λ,
mut_view<> w,
index_t stride ) const

Update the right-hand side λ during the forward solve phase of CR after computing block iY of λ at level l, subtracting the product Y(iY) λ(iY) from the block of λ in the same row as Y(iY).

The product Y(iY) λ(iY) is stored in the workspace w to allow it to be computed concurrently with solve_u_forward, which updates the same block of λ. The stride parameter specifies the stride between consecutive blocks of λ.

Definition at line 177 of file cr.tpp.

◆ solve_λ_forward()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::solve_λ_forward ( index_t l,
index_t iL,
mut_view<> λ,
view<> w,
index_t stride ) const

Apply the updates to block iL of the right-hand side from solve_u_forward and solve_y_forward, and then solve with the diagonal block L for the next level l+1.

The stride parameter specifies the stride between consecutive blocks of λ.

Definition at line 190 of file cr.tpp.

◆ factor_pcr()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::factor_pcr ( )

Compute the parallel cyclic reduction factorization of the final block tridiagonal system of size v.

[PCR factor serial]

The function assumes that the Cholesky factors of the diagonal blocks are stored in cr_L.batch(0), and the subdiagonal blocks are stored in cr_Y.batch(0). This variant computes both subdiagonal blocks on a single thread.

Definition at line 28 of file pcr.tpp.

◆ factor_pcr_level()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
template<index_t Level>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::factor_pcr_level ( )

Perform a single level of the PCR factorization.

This variant computes both subdiagonal blocks on a single thread.

Definition at line 38 of file pcr.tpp.

◆ factor_pcr_parallel()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::factor_pcr_parallel ( Context & ctx)

Compute the parallel cyclic reduction factorization of the final block tridiagonal system of size v.

[PCR factor serial]

This variant computes the subdiagonal blocks in parallel on different threads.

[PCR factor]

Definition at line 101 of file pcr.tpp.

◆ factor_pcr_level_parallel()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
template<index_t Level>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::factor_pcr_level_parallel ( Context & ctx)

Perform a single level of the PCR factorization.

This variant computes the subdiagonal blocks in parallel on different threads.

Definition at line 109 of file pcr.tpp.

◆ solve_pcr() [1/2]

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::solve_pcr ( mut_batch_view<> λ,
mut_batch_view<> work_pcr ) const

Solve a linear system with the final block tridiagonal system of size v using the PCR factorization.

[PCR factor]

[Cyqlone solve PCR]

Definition at line 181 of file pcr.tpp.

◆ solve_pcr() [2/2]

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::solve_pcr ( mut_batch_view<> λ)
inline

Solve a linear system with the final block tridiagonal system of size v using the PCR factorization.

[PCR factor]

[Cyqlone solve PCR]

Definition at line 452 of file cyqlone.hpp.

◆ solve_pcr_level()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
template<index_t Level>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::solve_pcr_level ( mut_batch_view<> λ,
mut_batch_view<> work_pcr ) const

Perform a single level of the PCR solve.

Definition at line 194 of file pcr.tpp.

◆ mul_Mv()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
auto cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::mul_Mv ( batch_view<> p,
mut_batch_view<> Mp,
batch_view< default_order > L,
batch_view< default_order > K ) const

Multiply a vector by the final block tridiagonal matrix of size v.

The matrix is represented by the Cholesky factors of the diagonal blocks L and the subdiagonal blocks K.

M = [ M(0) Kᵀ(0) ] where M(i) = L(i) L(i)ᵀ
[ K(0) M(1) Kᵀ(1) ]
[ K(1) M(2) Kᵀ(2) ]
[ K(2) M(3) ]

Definition at line 23 of file pcg.tpp.

◆ mul_precond()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
auto cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::mul_precond ( batch_view<> r,
mut_batch_view<> z,
mut_batch_view<> w,
batch_view< default_order > L,
batch_view< default_order > K ) const

Multiply a vector by the preconditioner for the final block tridiagonal system of size v.

See also
mul_Mv

Definition at line 35 of file pcg.tpp.

◆ solve_pcg() [1/2]

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::solve_pcg ( mut_batch_view<> λ,
mut_batch_view<> work_pcg ) const

Solve a linear system with the final block tridiagonal system of size v using the preconditioned conjugate gradient method.

The function assumes that the Cholesky factors of the diagonal blocks are stored in cr_L.batch(0), and the subdiagonal blocks are stored in cr_Y.batch(0).

Definition at line 54 of file pcg.tpp.

◆ solve_pcg() [2/2]

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::solve_pcg ( mut_batch_view<> λ)
inline

Solve a linear system with the final block tridiagonal system of size v using the preconditioned conjugate gradient method.

The function assumes that the Cholesky factors of the diagonal blocks are stored in cr_L.batch(0), and the subdiagonal blocks are stored in cr_Y.batch(0).

Definition at line 484 of file cyqlone.hpp.

◆ solve_reverse_parallel()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::solve_reverse_parallel ( Context & ctx,
mut_view<> λ,
mut_view<> work,
index_t stride ) const

[Cyqlone solve CR]

Definition at line 179 of file factor.tpp.

◆ solve_reverse_serial()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::solve_reverse_serial ( mut_view<> λ,
mut_view<> work,
index_t stride ) const

[Cyqlone solve CR]

[Cyqlone solve CR serial]

Definition at line 228 of file factor.tpp.

◆ solve_u_backward()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::solve_u_backward ( index_t l,
index_t iU,
mut_view<> λ,
mut_view<> w,
index_t stride ) const

Definition at line 210 of file cr.tpp.

◆ solve_y_backward()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::solve_y_backward ( index_t l,
index_t iY,
mut_view<> λ,
index_t stride ) const

Definition at line 225 of file cr.tpp.

◆ solve_λ_backward()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::solve_λ_backward ( index_t biL,
mut_view<> λ,
view<> w,
index_t stride ) const

Definition at line 241 of file cr.tpp.

◆ set_thread_update_rank()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::set_thread_update_rank ( Context & ctx,
index_t c,
index_t m )

[Cyqlone update Riccati]

Definition at line 539 of file update.tpp.

◆ set_update_rank_extra()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::set_update_rank_extra ( index_t m)

Definition at line 547 of file update.tpp.

◆ clear_update_rank_extra()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::clear_update_rank_extra ( )

Definition at line 552 of file update.tpp.

◆ cols_Ups_fwd()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
std::pair< index_t, index_t > cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::cols_Ups_fwd ( index_t l,
index_t i ) const
nodiscard

Definition at line 558 of file update.tpp.

◆ cols_Ups_bwd()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
std::pair< index_t, index_t > cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::cols_Ups_bwd ( index_t l,
index_t i ) const
nodiscard

Definition at line 573 of file update.tpp.

◆ cols_Q_cr()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
std::pair< index_t, index_t > cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::cols_Q_cr ( index_t l,
index_t i ) const
nodiscard

Definition at line 588 of file update.tpp.

◆ work_Ups_fwd_w()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
index_t cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::work_Ups_fwd_w ( index_t l,
index_t i ) const
nodiscard

Definition at line 593 of file update.tpp.

◆ work_Ups_bwd_w()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
index_t cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::work_Ups_bwd_w ( index_t l,
index_t i ) const
nodiscard

Definition at line 604 of file update.tpp.

◆ work_Ups_fwd()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
auto cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::work_Ups_fwd ( index_t l,
index_t i )
nodiscard

Definition at line 612 of file update.tpp.

◆ work_Ups_bwd()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
auto cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::work_Ups_bwd ( index_t l,
index_t i )
nodiscard

Definition at line 620 of file update.tpp.

◆ work_Q_cr()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
auto cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::work_Q_cr ( index_t l,
index_t i )
nodiscard

Definition at line 628 of file update.tpp.

◆ work_Σ_fwd()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
auto cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::work_Σ_fwd ( index_t l,
index_t i )
nodiscard

Definition at line 636 of file update.tpp.

◆ work_Σ_bwd()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
auto cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::work_Σ_bwd ( index_t l,
index_t i )
nodiscard

Definition at line 643 of file update.tpp.

◆ work_Σ_Q()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
auto cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::work_Σ_Q ( index_t l,
index_t i )
nodiscard

Definition at line 650 of file update.tpp.

◆ work_Ups_fwd_last()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
auto cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::work_Ups_fwd_last ( )
nodiscard

Definition at line 657 of file update.tpp.

◆ work_Ups_bwd_last()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
auto cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::work_Ups_bwd_last ( )
nodiscard

Definition at line 668 of file update.tpp.

◆ work_Σ_fwd_last()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
auto cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::work_Σ_fwd_last ( )
nodiscard

Definition at line 679 of file update.tpp.

◆ work_Σ_bwd_last()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
auto cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::work_Σ_bwd_last ( )
nodiscard

Definition at line 689 of file update.tpp.

◆ work_Ups_extra()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
auto cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::work_Ups_extra ( )
nodiscard

Definition at line 699 of file update.tpp.

◆ work_Σ_extra()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
auto cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::work_Σ_extra ( )
nodiscard

Definition at line 706 of file update.tpp.

◆ update_solve_cr()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
template<bool Solve>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::update_solve_cr ( Context & ctx,
mut_view<> λ,
index_t stride )

[Cyqlone update CR]

Definition at line 297 of file update.tpp.

◆ update_L()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::update_L ( index_t l,
index_t i )

[Cyqlone update CR helper]

Definition at line 23 of file update.tpp.

◆ update_U()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::update_U ( index_t l,
index_t i )

Definition at line 123 of file update.tpp.

◆ update_Y()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::update_Y ( index_t l,
index_t i )

Definition at line 157 of file update.tpp.

◆ update_pcr_level()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
template<index_t Level>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::update_pcr_level ( index_t m,
mut_batch_view<> WYU,
mut_batch_view<>  )

Definition at line 198 of file update.tpp.

◆ update_pcr()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::update_pcr ( batch_view<> fwd,
batch_view<> bwd,
batch_view<> Σbwd )

[Cyqlone update CR helper]

[PCR update]

Definition at line 180 of file update.tpp.

◆ prefetch()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
template<StorageOrder O>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::prefetch ( batch_view< O > X) const

[Cyqlone solve CR helper]

Definition at line 260 of file cr.tpp.

◆ prefetch_L() [1/2]

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
template<StorageOrder O>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::prefetch_L ( batch_view< O > X) const

Definition at line 276 of file cr.tpp.

◆ prefetch_L() [2/2]

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::prefetch_L ( index_t bi) const

Definition at line 291 of file cr.tpp.

◆ prefetch_U()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::prefetch_U ( index_t l,
index_t iU ) const

Definition at line 297 of file cr.tpp.

◆ prefetch_Y()

template<index_t VL, class T, StorageOrder DefaultOrder, class Ctx>
void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::prefetch_Y ( index_t l,
index_t iY ) const

Definition at line 306 of file cr.tpp.

Member Data Documentation

◆ block_size

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
const index_t cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::block_size

Block size of the block-tridiagonal system.

Definition at line 75 of file cyqlone.hpp.

◆ max_rank

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
const index_t cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::max_rank = 0

Maximum update rank.

Definition at line 76 of file cyqlone.hpp.

◆ circular

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
bool cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::circular = false

Whether the block-tridiagonal system is circular (nonzero top-right & bottom-left corners).

Definition at line 79 of file cyqlone.hpp.

◆ params

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
Params cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::params {}

Solver parameters for Tricyqle-specific settings.

Definition at line 87 of file cyqlone.hpp.

◆ p

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
const index_t cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::p = 8

Number of processors/threads.

Definition at line 101 of file cyqlone.hpp.

◆ v

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
index_t cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::v = VL
staticconstexpr

Vector length.

Definition at line 103 of file cyqlone.hpp.

◆ default_order

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
auto cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::default_order = DefaultOrder
staticconstexpr

Default storage order for most matrices.

Definition at line 146 of file cyqlone.hpp.

◆ column_major

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
auto cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::column_major = StorageOrder::ColMajor
staticconstexpr

Column-major storage order for column vectors and update matrices.

Definition at line 148 of file cyqlone.hpp.

◆ cr_L

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
matrix<default_order> cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::cr_L
Initial value:
= [this] {
return matrix<default_order>{{.depth = p * v, .rows = block_size, .cols = block_size}};
}()
static constexpr index_t v
Vector length.
Definition cyqlone.hpp:103
const index_t block_size
Block size of the block-tridiagonal system.
Definition cyqlone.hpp:75
const index_t p
Number of processors/threads.
Definition cyqlone.hpp:101
batmat::matrix::Matrix< value_type, index_t, vl_t, index_t, O, align_t > matrix
Owning type for a batch of matrices (with batch size v).
Definition cyqlone.hpp:152

Diagonal blocks of the Cholesky factor of the Schur complement (used during CR).

Batch indices correspond to block column indices of the block tridiagonal system.

Definition at line 272 of file cyqlone.hpp.

◆ cr_U

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
matrix<default_order> cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::cr_U
Initial value:
= [this] {
return matrix<default_order>{{.depth = p * v, .rows = block_size, .cols = block_size}};
}()

Subdiagonal blocks U of the Cholesky factor of the Schur complement (used during CR).

Batch indices correspond to block column indices of the block tridiagonal system.

Definition at line 277 of file cyqlone.hpp.

◆ cr_Y

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
matrix<default_order> cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::cr_Y
Initial value:
= [this] {
return matrix<default_order>{{.depth = p * v, .rows = block_size, .cols = block_size}};
}()

Subdiagonal blocks Y of the Cholesky factor of the Schur complement (used during CR).

Batch indices correspond to block column indices of the block tridiagonal system.

Definition at line 282 of file cyqlone.hpp.

◆ work_cr

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
matrix<column_major> cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::work_cr
Initial value:
= [this] {
return matrix<column_major>{{.depth = p * v, .rows = block_size, .cols = 1}};
}()

Temporary workspace for the CR solve phase.

Definition at line 286 of file cyqlone.hpp.

◆ pcr_L

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
matrix<default_order> cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::pcr_L
Initial value:
= [this] {
{.depth = v * (lv() + 1), .rows = block_size, .cols = block_size}};
}()

Diagonal blocks of the PCR Cholesky factorizations.

Definition at line 296 of file cyqlone.hpp.

◆ pcr_Y

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
matrix<default_order> cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::pcr_Y
Initial value:
= [this] {
return matrix<default_order>{{.depth = v * lv(), .rows = block_size, .cols = block_size}};
}()

Subdiagonal blocks Y of the PCR Cholesky factorizations.

Definition at line 301 of file cyqlone.hpp.

◆ pcr_U

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
matrix<default_order> cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::pcr_U
Initial value:
= [this] {
return matrix<default_order>{{.depth = v * lv(), .rows = block_size, .cols = block_size}};
}()

Subdiagonal blocks U of the PCR Cholesky factorizations.

Definition at line 305 of file cyqlone.hpp.

◆ pcr_M

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
matrix<default_order> cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::pcr_M
Initial value:
= [this] {
return matrix<default_order>{{.depth = v, .rows = block_size, .cols = block_size}};
}()

Workspace to store the diagonal blocks during the PCR factorization.

Definition at line 309 of file cyqlone.hpp.

◆ work_pcg

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
matrix<column_major> cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::work_pcg
Initial value:
= [this] {
return matrix<column_major>{{.depth = v, .rows = block_size, .cols = 4}};
}()

Temporary workspace for CG vectors.

Definition at line 313 of file cyqlone.hpp.

◆ m_update

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
std::vector<index_t> cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::m_update = std::vector<index_t>(p)

Update rank (number of changing constraints) per thread.

Definition at line 323 of file cyqlone.hpp.

◆ m_update_u0

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
index_t cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::m_update_u0 = -1

Update rank from D(0). Negative if D(0) is not handled separately.

Definition at line 325 of file cyqlone.hpp.

◆ work_update_Σ

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
matrix<column_major> cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::work_update_Σ
Initial value:
= [this] {
return matrix<column_major>{{.depth = v, .rows = max_rank, .cols = 1}};
}()
const index_t max_rank
Maximum update rank.
Definition cyqlone.hpp:76

Compressed reprentation of the nonzero diagonal elements of the matrix Σ.

Definition at line 328 of file cyqlone.hpp.

◆ work_update

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
matrix<column_major> cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::work_update
Initial value:
= [this] {
return matrix<column_major>{{.depth = 4 * v, .rows = block_size, .cols = max_rank}};
}()

Workspace to store the update matrices Ξ(Υ) for the factorization update.

Definition at line 332 of file cyqlone.hpp.

◆ work_hyh

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
matrix<column_major> cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::work_hyh
Initial value:
= [this] {
using namespace batmat::linalg;
const auto [r, c] = hyhound_size_W(tril(cr_L.batch(0)));
return matrix<column_major>{{.depth = p * v, .rows = r, .cols = c}};
}()
auto hyhound_size_W(Structured< VL, SL > L)
constexpr auto tril(M &&m)
matrix< default_order > cr_L
Diagonal blocks of the Cholesky factor of the Schur complement (used during CR).
Definition cyqlone.hpp:272

Storage for the hyperbolic Householder transformations.

Definition at line 336 of file cyqlone.hpp.

◆ work_update_pcr_Σ

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
matrix<column_major> cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::work_update_pcr_Σ
Initial value:
= [this] {
return matrix<column_major>{{.depth = v, .rows = 2 * max_rank, .cols = 1}};
}()

Two copies of work_update_Σ for PCR updates.

Definition at line 343 of file cyqlone.hpp.

◆ work_update_pcr_L

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
matrix<column_major> cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::work_update_pcr_L
Initial value:
= [this] {
return matrix<column_major>{{.depth = v, .rows = block_size, .cols = max_rank}};
}()

Update matrices to apply to the diagonal blocks L during PCR updates.

Definition at line 347 of file cyqlone.hpp.

◆ work_update_pcr_UY

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
matrix<column_major> cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::work_update_pcr_UY
Initial value:
= [this] {
return matrix<column_major>{{.depth = v, .rows = block_size, .cols = 2 * max_rank}};
}()

Update matrices to apply to the subdiagonal blocks U and Y during PCR updates.

Definition at line 351 of file cyqlone.hpp.

◆ merge_last_level_pcr

template<index_t VL = 4, class T = real_t, StorageOrder DefaultOrder = StorageOrder::ColMajor, class Ctx = parallel::Context<>>
bool cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::merge_last_level_pcr = true
staticconstexpr

Definition at line 428 of file cyqlone.hpp.


The documentation for this struct was generated from the following files:
  • src/cyqlone/include/cyqlone/cyqlone.hpp
  • src/cyqlone/include/cyqlone/implementation/cr.tpp
  • src/cyqlone/include/cyqlone/implementation/factor.tpp
  • src/cyqlone/include/cyqlone/implementation/indexing.tpp
  • src/cyqlone/include/cyqlone/implementation/pcg.tpp
  • src/cyqlone/include/cyqlone/implementation/pcr.tpp
  • src/cyqlone/include/cyqlone/implementation/update.tpp