|
cyqlone
develop
Fast, parallel and vectorized solver for linear systems with optimal control structure.
|
#include <cyqlone/cyqlone.hpp>
Solver for block-tridiagonal systems using cyclic reduction (CR), parallel cyclic reduction (PCR), and preconditioned conjugate gradient (PCG) methods.
| VL | Vector length. |
| T | Scalar type. |
| DefaultOrder | Storage order for the matrix workspaces (row/column major). |
Definition at line 66 of file cyqlone.hpp.
Parallelization and vectorization | |
| using | simd = batmat::datapar::deduced_simd<value_type, v> |
| Represents a SIMD vector of width v storing values of type value_type. | |
| using | vl_t = std::integral_constant<index_t, v> |
| Integral constant type for the vector length. | |
| using | align_t = std::integral_constant<index_t, v * alignof(value_type)> |
| Integral constant type for the alignment of the batched matrix data structures. | |
| const index_t | p = 8 |
| Number of processors/threads. | |
| static constexpr index_t | v = VL |
| Vector length. | |
| constexpr index_t | lp () const |
| log₂(p), logarithm of the number of processors/threads p, rounded up. | |
| constexpr index_t | ceil_p () const |
| The number of processors p rounded up to the next power of two. | |
| constexpr index_t | ceil_P () const |
| The number of parallel execution units P = p * v rounded up to the next power of two. | |
| std::unique_ptr< SharedContext > | create_parallel_context () const |
| Create a new parallel execution context, storing synchronization primitives and shared data for the parallel algorithms. | |
| static constexpr index_t | lv () |
| log₂(v), logarithm of the vector length v. | |
Matrix data structures | |
| template<StorageOrder O = column_major> | |
| using | matrix = batmat::matrix::Matrix<value_type, index_t, vl_t, index_t, O, align_t> |
| Owning type for a batch of matrices (with batch size v). | |
| template<StorageOrder O = column_major> | |
| using | view = batmat::matrix::View<const value_type, index_t, vl_t, index_t, index_t, O> |
| Non-owning immutable view type for matrix. | |
| template<StorageOrder O = column_major> | |
| using | mut_view = batmat::matrix::View<value_type, index_t, vl_t, index_t, index_t, O> |
| Non-owning mutable view type for matrix. | |
| using | layer_stride = batmat::matrix::DefaultStride |
| template<StorageOrder O = column_major> | |
| using | batch_view = batmat::matrix::View<const value_type, index_t, vl_t, vl_t, layer_stride, O> |
| Non-owning immutable view type for a single batch of v matrices. | |
| template<StorageOrder O = column_major> | |
| using | mut_batch_view = batmat::matrix::View<value_type, index_t, vl_t, vl_t, layer_stride, O> |
| Non-owning mutable view type for a single batch of v matrices. | |
| static constexpr auto | default_order = DefaultOrder |
| Default storage order for most matrices. | |
| static constexpr auto | column_major = StorageOrder::ColMajor |
| Column-major storage order for column vectors and update matrices. | |
Problem dimensions | |
| const index_t | block_size |
| Block size of the block-tridiagonal system. | |
| const index_t | max_rank = 0 |
| Maximum update rank. | |
| bool | circular = false |
| Whether the block-tridiagonal system is circular (nonzero top-right & bottom-left corners). | |
Solver parameters | |
| Params | params {} |
| Solver parameters for Tricyqle-specific settings. | |
| Params | get_params () const |
| Get the current solver parameters. | |
| void | update_params (const Params &new_params) |
| Update the solver parameters. | |
Cyclic reduction data structures | |
| matrix< default_order > | cr_L |
| Diagonal blocks of the Cholesky factor of the Schur complement (used during CR). | |
| matrix< default_order > | cr_U |
| Subdiagonal blocks U of the Cholesky factor of the Schur complement (used during CR). | |
| matrix< default_order > | cr_Y |
| Subdiagonal blocks Y of the Cholesky factor of the Schur complement (used during CR). | |
| matrix< column_major > | work_cr |
| Temporary workspace for the CR solve phase. | |
Parallel cyclic reduction data structures | |
| matrix< default_order > | pcr_L |
| Diagonal blocks of the PCR Cholesky factorizations. | |
| matrix< default_order > | pcr_Y |
| Subdiagonal blocks Y of the PCR Cholesky factorizations. | |
| matrix< default_order > | pcr_U |
| Subdiagonal blocks U of the PCR Cholesky factorizations. | |
| matrix< default_order > | pcr_M |
| Workspace to store the diagonal blocks during the PCR factorization. | |
| matrix< column_major > | work_pcg |
| Temporary workspace for CG vectors. | |
Factorization update data structures | |
| std::vector< index_t > | m_update = std::vector<index_t>(p) |
| Update rank (number of changing constraints) per thread. | |
| index_t | m_update_u0 = -1 |
| Update rank from D(0). Negative if D(0) is not handled separately. | |
| matrix< column_major > | work_update_Σ |
| Compressed reprentation of the nonzero diagonal elements of the matrix Σ. | |
| matrix< column_major > | work_update |
| Workspace to store the update matrices Ξ(Υ) for the factorization update. | |
| matrix< column_major > | work_hyh |
| Storage for the hyperbolic Householder transformations. | |
| matrix< column_major > | work_update_pcr_Σ |
| Two copies of work_update_Σ for PCR updates. | |
| matrix< column_major > | work_update_pcr_L |
| Update matrices to apply to the diagonal blocks L during PCR updates. | |
| matrix< column_major > | work_update_pcr_UY |
| Update matrices to apply to the subdiagonal blocks U and Y during PCR updates. | |
Low-level PCR factorization and solve routines | |
| static constexpr bool | merge_last_level_pcr = true |
| void | factor_pcr () |
| Compute the parallel cyclic reduction factorization of the final block tridiagonal system of size v. | |
| template<index_t Level> | |
| void | factor_pcr_level () |
| Perform a single level of the PCR factorization. | |
| void | factor_pcr_parallel (Context &ctx) |
| Compute the parallel cyclic reduction factorization of the final block tridiagonal system of size v. | |
| template<index_t Level> | |
| void | factor_pcr_level_parallel (Context &ctx) |
| Perform a single level of the PCR factorization. | |
| void | solve_pcr (mut_batch_view<> λ, mut_batch_view<> work_pcr) const |
| Solve a linear system with the final block tridiagonal system of size v using the PCR factorization. | |
| void | solve_pcr (mut_batch_view<> λ) |
| Solve a linear system with the final block tridiagonal system of size v using the PCR factorization. | |
| template<index_t Level> | |
| void | solve_pcr_level (mut_batch_view<> λ, mut_batch_view<> work_pcr) const |
| Perform a single level of the PCR solve. | |
Indexing utilities | |
| index_t | ν2 (index_t i) const |
| 2-adic valuation ν₂. | |
| index_t | ν2p (index_t i) const |
| 2-adic valuation modulo p, i.e. ν2p(0) = ν2p(p) = lp(). | |
| index_t | add_wrap_ceil_p (index_t a, index_t b) const |
Add b to a modulo ceil_p(). | |
| index_t | sub_wrap_ceil_p (index_t a, index_t b) const |
Subtract b from a modulo ceil_p(). | |
Factorization and solve routines | |
| decltype(auto) | init_diag (Context &ctx, auto &&func) |
| Initialize the diagonal blocks M of the block tridiagonal system using a user-provided function. | |
| decltype(auto) | init_subdiag (Context &ctx, auto &&func) |
| Initialize the subdiagonal blocks K of the block tridiagonal system using a user-provided function. | |
| decltype(auto) | init_rhs (Context &ctx, mut_view<> b, auto &&func) const |
| Initialize the right-hand side of the linear system using a user-provided function. | |
| decltype(auto) | get_solution (Context &ctx, view<> λ, auto &&func) const |
| Get access to the solution computed by this thread using a user-provided function. | |
| decltype(auto) | get_solution (Context &ctx, mut_view<> λ, auto &&func) const |
| Get access to the solution computed by this thread using a user-provided function. | |
| template<class Λ, class F> | |
| decltype(auto) | get_solution (Context &ctx, Λ &&λ, F &&func) const |
| Get access to the solution computed by this thread using a user-provided function. | |
| void | factor_solve (Context &ctx, mut_view<> λ, index_t stride=1) |
| Fused factorization and forward solve. | |
| void | factor (Context &ctx) |
| Perform only the factorization as described by factor_solve. | |
| void | solve_forward (Context &ctx, mut_view<> λ, index_t stride=1) |
| Perform only the forward solve as described by factor_solve. | |
| void | solve_reverse (Context &ctx, mut_view<> λ, mut_view<> work, index_t stride=1) const |
| Perform the backward solve phase, after the forward solve phase has been performed by factor_solve. | |
| void | solve_reverse (Context &ctx, mut_view<> λ, index_t stride=1) |
Low-level factorization and solve routines | |
| template<bool Factor = true, bool Solve = true> | |
| void | factor_solve_skip_first (Context &ctx, mut_view<> λ, index_t stride=1) |
| Fused factorization and forward solve. | |
| void | factor_skip_first (Context &ctx) |
| Factorization-only variant of factor_solve_skip_first. | |
| void | solve_forward_skip_first (Context &ctx, mut_view<> λ, index_t stride=1) |
| Solution-only variant of factor_solve_skip_first. | |
| template<bool Factor = true, bool Solve = true> | |
| void | factor_solve_impl (Context &ctx, mut_view<> λ, index_t stride=1) |
| Implementation of factor_solve. | |
Low-level CR factorization and solve routines | |
| index_t | cr_thread_assignment (index_t l, index_t c) const |
| Adjust thread assignment for non-power-of-two p: The diagonal blocks M(⌊p/2⌋2) are usually mapped to increasing thread indices c as the CR level l increases, as can be seen in the functions above, where iY = c + 1 - 2^l, and from the way the path of M nodes curves to the right in the thread assignment diagram in the paper. | |
| void | factor_U (index_t l, index_t iU) |
Compute a block U in the Cholesky factor for the given CR level l and column index iU. | |
| void | factor_Y (index_t l, index_t iY) |
Compute a block Y in the Cholesky factor for the given CR level l and column index iY. | |
| void | factor_L (index_t l, index_t i) |
Update and factorize a block L in the Cholesky factor for CR level l+1 and column index i, using the previously computed blocks U and Y in the same row at level l. | |
| void | update_K (index_t l, index_t i) |
Compute a subdiagonal block K of the Schur complement for CR level l+1 and column index i, using the previously computed blocks U and Y in the same row at level l. | |
| void | solve_u_forward (index_t l, index_t iU, mut_view<> λ, index_t stride) const |
Update the right-hand side λ during the forward solve phase of CR after computing block iU of λ at level l, subtracting the product U(iU) λ(iU) from the block of λ in the same row as U(iU). | |
| void | solve_y_forward (index_t l, index_t iY, mut_view<> λ, mut_view<> w, index_t stride) const |
Update the right-hand side λ during the forward solve phase of CR after computing block iY of λ at level l, subtracting the product Y(iY) λ(iY) from the block of λ in the same row as Y(iY). | |
| void | solve_λ_forward (index_t l, index_t iL, mut_view<> λ, view<> w, index_t stride) const |
Apply the updates to block iL of the right-hand side from solve_u_forward and solve_y_forward, and then solve with the diagonal block L for the next level l+1. | |
Low-level PCG routines | |
| value_type | mul_Mv (batch_view<> p, mut_batch_view<> Mp, batch_view< default_order > L, batch_view< default_order > K) const |
| Multiply a vector by the final block tridiagonal matrix of size v. | |
| value_type | mul_precond (batch_view<> r, mut_batch_view<> z, mut_batch_view<> w, batch_view< default_order > L, batch_view< default_order > K) const |
| Multiply a vector by the preconditioner for the final block tridiagonal system of size v. | |
| void | solve_pcg (mut_batch_view<> λ, mut_batch_view<> work_pcg) const |
| Solve a linear system with the final block tridiagonal system of size v using the preconditioned conjugate gradient method. | |
| void | solve_pcg (mut_batch_view<> λ) |
| Solve a linear system with the final block tridiagonal system of size v using the preconditioned conjugate gradient method. | |
Low-level reverse solve routines | |
| void | solve_reverse_parallel (Context &ctx, mut_view<> λ, mut_view<> work, index_t stride) const |
| [Cyqlone solve CR] | |
| void | solve_reverse_serial (mut_view<> λ, mut_view<> work, index_t stride) const |
| [Cyqlone solve CR] | |
| void | solve_u_backward (index_t l, index_t iU, mut_view<> λ, mut_view<> w, index_t stride) const |
| void | solve_y_backward (index_t l, index_t iY, mut_view<> λ, index_t stride) const |
| void | solve_λ_backward (index_t biL, mut_view<> λ, view<> w, index_t stride) const |
Low-level factorization update routines | |
| void | set_thread_update_rank (Context &ctx, index_t c, index_t m) |
| [Cyqlone update Riccati] | |
| void | set_update_rank_extra (index_t m) |
| void | clear_update_rank_extra () |
| std::pair< index_t, index_t > | cols_Ups_fwd (index_t l, index_t i) const |
| std::pair< index_t, index_t > | cols_Ups_bwd (index_t l, index_t i) const |
| std::pair< index_t, index_t > | cols_Q_cr (index_t l, index_t i) const |
| index_t | work_Ups_fwd_w (index_t l, index_t i) const |
| index_t | work_Ups_bwd_w (index_t l, index_t i) const |
| mut_batch_view< column_major > | work_Ups_fwd (index_t l, index_t i) |
| mut_batch_view< column_major > | work_Ups_bwd (index_t l, index_t i) |
| mut_batch_view< column_major > | work_Q_cr (index_t l, index_t i) |
| mut_batch_view< column_major > | work_Σ_fwd (index_t l, index_t i) |
| mut_batch_view< column_major > | work_Σ_bwd (index_t l, index_t i) |
| mut_batch_view< column_major > | work_Σ_Q (index_t l, index_t i) |
| mut_batch_view< column_major > | work_Ups_fwd_last () |
| mut_batch_view< column_major > | work_Ups_bwd_last () |
| mut_batch_view< column_major > | work_Σ_fwd_last () |
| mut_batch_view< column_major > | work_Σ_bwd_last () |
| mut_batch_view< column_major > | work_Ups_extra () |
| mut_batch_view< column_major > | work_Σ_extra () |
| template<bool Solve = true> | |
| void | update_solve_cr (Context &ctx, mut_view<> λ, index_t stride) |
| [Cyqlone update CR] | |
| void | update_L (index_t l, index_t i) |
| [Cyqlone update CR helper] | |
| void | update_U (index_t l, index_t i) |
| void | update_Y (index_t l, index_t i) |
| template<index_t Level> | |
| void | update_pcr_level (index_t m, mut_batch_view<> WYU, mut_batch_view<> WΣ) |
| void | update_pcr (batch_view<> fwd, batch_view<> bwd, batch_view<> Σ) |
| [Cyqlone update CR helper] | |
Low-level prefetching | |
| template<StorageOrder O> | |
| void | prefetch (batch_view< O > X) const |
| [Cyqlone solve CR helper] | |
| template<StorageOrder O> | |
| void | prefetch_L (batch_view< O > X) const |
| void | prefetch_L (index_t bi) const |
| void | prefetch_U (index_t l, index_t iU) const |
| void | prefetch_Y (index_t l, index_t iY) const |
Public Types | |
| using | value_type = T |
| using | Params = TricyqleParams<value_type> |
| using | Context = Ctx |
| using | SharedContext = typename Context::shared_context_type |
| using cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::value_type = T |
Definition at line 67 of file cyqlone.hpp.
| using cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::Params = TricyqleParams<value_type> |
Definition at line 68 of file cyqlone.hpp.
| using cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::Context = Ctx |
Definition at line 69 of file cyqlone.hpp.
| using cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::SharedContext = typename Context::shared_context_type |
Definition at line 70 of file cyqlone.hpp.
| using cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::simd = batmat::datapar::deduced_simd<value_type, v> |
Represents a SIMD vector of width v storing values of type value_type.
Definition at line 114 of file cyqlone.hpp.
| using cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::vl_t = std::integral_constant<index_t, v> |
Integral constant type for the vector length.
Definition at line 116 of file cyqlone.hpp.
| using cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::align_t = std::integral_constant<index_t, v * alignof(value_type)> |
Integral constant type for the alignment of the batched matrix data structures.
Definition at line 118 of file cyqlone.hpp.
| using cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::matrix = batmat::matrix::Matrix<value_type, index_t, vl_t, index_t, O, align_t> |
Owning type for a batch of matrices (with batch size v).
Definition at line 152 of file cyqlone.hpp.
| using cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::view = batmat::matrix::View<const value_type, index_t, vl_t, index_t, index_t, O> |
Non-owning immutable view type for matrix.
Definition at line 155 of file cyqlone.hpp.
| using cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::mut_view = batmat::matrix::View<value_type, index_t, vl_t, index_t, index_t, O> |
Non-owning mutable view type for matrix.
Definition at line 158 of file cyqlone.hpp.
| using cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::layer_stride = batmat::matrix::DefaultStride |
Definition at line 159 of file cyqlone.hpp.
| using cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::batch_view = batmat::matrix::View<const value_type, index_t, vl_t, vl_t, layer_stride, O> |
Non-owning immutable view type for a single batch of v matrices.
Definition at line 162 of file cyqlone.hpp.
| using cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::mut_batch_view = batmat::matrix::View<value_type, index_t, vl_t, vl_t, layer_stride, O> |
Non-owning mutable view type for a single batch of v matrices.
Definition at line 165 of file cyqlone.hpp.
|
inlinenodiscard |
Get the current solver parameters.
Definition at line 90 of file cyqlone.hpp.
|
inline |
Update the solver parameters.
Definition at line 93 of file cyqlone.hpp.
|
inlinenodiscardconstexpr |
log₂(p), logarithm of the number of processors/threads p, rounded up.
Definition at line 105 of file cyqlone.hpp.
|
inlinenodiscardconstexpr |
The number of processors p rounded up to the next power of two.
Definition at line 107 of file cyqlone.hpp.
|
inlinenodiscardconstexpr |
The number of parallel execution units P = p * v rounded up to the next power of two.
Definition at line 109 of file cyqlone.hpp.
|
inlinestaticnodiscardconstexpr |
log₂(v), logarithm of the vector length v.
Definition at line 111 of file cyqlone.hpp.
|
inline |
Create a new parallel execution context, storing synchronization primitives and shared data for the parallel algorithms.
Definition at line 122 of file cyqlone.hpp.
|
nodiscard |
2-adic valuation ν₂.
Definition at line 30 of file indexing.tpp.
|
nodiscard |
2-adic valuation modulo p, i.e. ν2p(0) = ν2p(p) = lp().
Definition at line 36 of file indexing.tpp.
|
nodiscard |
Add b to a modulo ceil_p().
Definition at line 19 of file indexing.tpp.
|
nodiscard |
Subtract b from a modulo ceil_p().
Definition at line 8 of file indexing.tpp.
|
inline |
Initialize the diagonal blocks M of the block tridiagonal system using a user-provided function.
The function is called with an index i in [0, p) and a mutable view to the compact workspace of depth v where the diagonal blocks M(i+l*p) for l in [0, v) should be stored. Copying non-compact data to this workspace can be achieved using the cyqlone::linalg::pack function.
Definition at line 177 of file cyqlone.hpp.
|
inline |
Initialize the subdiagonal blocks K of the block tridiagonal system using a user-provided function.
The function is called with an index i in [0, p) and a mutable view to the compact workspace of depth v where the subdiagonal blocks K(i+l*p) for l in [0, v) should be stored. Copying non-compact data to this workspace can be achieved using the cyqlone::linalg::pack function.
Definition at line 185 of file cyqlone.hpp.
|
inline |
Initialize the right-hand side of the linear system using a user-provided function.
The function is called with an index i in [0, p) and a mutable view of the batch of b of depth v where the right-hand side blocks b(i+l*p) for l in [0, v) should be stored. Copying non-compact data to this batch can be achieved using the cyqlone::linalg::pack function.
Definition at line 194 of file cyqlone.hpp.
|
inline |
Get access to the solution computed by this thread using a user-provided function.
The function is called with an index i in [0, p) and a view to the compact batch of λ of depth v where the solution blocks λ(i+l*p) for l in [0, v) are stored. Copying this batch to a non-compact layout can be achieved using the cyqlone::linalg::unpack function.
Definition at line 202 of file cyqlone.hpp.
|
inline |
Get access to the solution computed by this thread using a user-provided function.
The function is called with an index i in [0, p) and a view to the compact batch of λ of depth v where the solution blocks λ(i+l*p) for l in [0, v) are stored. Copying this batch to a non-compact layout can be achieved using the cyqlone::linalg::unpack function.
Definition at line 206 of file cyqlone.hpp.
|
inline |
Get access to the solution computed by this thread using a user-provided function.
The function is called with an index i in [0, p) and a view to the compact batch of λ of depth v where the solution blocks λ(i+l*p) for l in [0, v) are stored. Copying this batch to a non-compact layout can be achieved using the cyqlone::linalg::unpack function.
Definition at line 211 of file cyqlone.hpp.
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::factor_solve | ( | Context & | ctx, |
| mut_view<> | λ, | ||
| index_t | stride = 1 ) |
Fused factorization and forward solve.
[Cyqlone factor Schur]
Performs CR to reduce the system down to a single block tridiagonal system of size v which is then solved using PCR or PCG.
| ctx | Parallel execution context for communication/synchronization between threads. |
| λ | On entry, the right-hand side of the linear system. On exit, the solution of the forward solve phase. |
| stride | Stride (in number of batches) between batches of λ. In total, λ contains stride * p batches, but only every stride-th batch is accessed. |
Note that a backward solve of the final block λ.batch(0) is performed during the forward solve phase, so λ.batch(0) contains (part of) the final solution. Performing the forward and backward solves separately for this block is not possible, because it is solved using PCR or PCG. Both methods solve the full block tridiagonal system at once, rather than using a single explicit CR Cholesky factorization with distinct forward and backward solves.
Definition at line 117 of file factor.tpp.
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::factor | ( | Context & | ctx | ) |
Perform only the factorization as described by factor_solve.
Definition at line 122 of file factor.tpp.
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::solve_forward | ( | Context & | ctx, |
| mut_view<> | λ, | ||
| index_t | stride = 1 ) |
Perform only the forward solve as described by factor_solve.
Definition at line 126 of file factor.tpp.
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::solve_reverse | ( | Context & | ctx, |
| mut_view<> | λ, | ||
| mut_view<> | work, | ||
| index_t | stride = 1 ) const |
Perform the backward solve phase, after the forward solve phase has been performed by factor_solve.
| ctx | Parallel execution context for communication/synchronization between threads. |
| λ | On entry, the solution of the forward solve phase. On exit, the solution of the full linear system. |
| work | Workspace of p * v column vectors of size block_size. |
| stride | Stride (in number of batches) between batches of λ. In total, λ contains stride * p batches, but only every stride-th batch is accessed. |
Definition at line 164 of file factor.tpp.
|
inline |
Definition at line 261 of file cyqlone.hpp.
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::factor_solve_skip_first | ( | Context & | ctx, |
| mut_view<> | λ, | ||
| index_t | stride = 1 ) |
Fused factorization and forward solve.
[Cyqlone factor Schur]
Unlike factor_solve, this function assumes that the odd diagonal blocks in the first level of CR have already been factorized and solved. This allows the user to fuse the evaluation and factorization of these blocks, possibly enabling higher performance by avoiding an additional trip to memory.
Definition at line 48 of file factor.tpp.
|
inline |
Factorization-only variant of factor_solve_skip_first.
Definition at line 380 of file cyqlone.hpp.
|
inline |
Solution-only variant of factor_solve_skip_first.
Definition at line 382 of file cyqlone.hpp.
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::factor_solve_impl | ( | Context & | ctx, |
| mut_view<> | λ, | ||
| index_t | stride = 1 ) |
Implementation of factor_solve.
[Cyqlone factorization and fused forward solve]
Definition at line 29 of file factor.tpp.
|
nodiscard |
Adjust thread assignment for non-power-of-two p: The diagonal blocks M(⌊p/2⌋2) are usually mapped to increasing thread indices c as the CR level l increases, as can be seen in the functions above, where iY = c + 1 - 2^l, and from the way the path of M nodes curves to the right in the thread assignment diagram in the paper.
However, these large thread indices are not actually present if p is not a power of two, so we need to remap them, undoing the offset 1 - 2^l. We always assign the last M evaluation to the even thread ⌊p/2⌋2, since this thread is present even if p is odd. The odd thread ⌊p/2⌋2+1 is assigned an inactive index, since it never has any work during CR, as there is no coupling between the last and first stages (at least not in the scalar case).
Definition at line 277 of file factor.tpp.
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::factor_U | ( | index_t | l, |
| index_t | iU ) |
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::factor_Y | ( | index_t | l, |
| index_t | iY ) |
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::factor_L | ( | index_t | l, |
| index_t | i ) |
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::update_K | ( | index_t | l, |
| index_t | i ) |
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::solve_u_forward | ( | index_t | l, |
| index_t | iU, | ||
| mut_view<> | λ, | ||
| index_t | stride ) const |
Update the right-hand side λ during the forward solve phase of CR after computing block iU of λ at level l, subtracting the product U(iU) λ(iU) from the block of λ in the same row as U(iU).
[Cyqlone factor CR helper]
The stride parameter specifies the stride between consecutive blocks of λ.
[Cyqlone solve CR helper]
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::solve_y_forward | ( | index_t | l, |
| index_t | iY, | ||
| mut_view<> | λ, | ||
| mut_view<> | w, | ||
| index_t | stride ) const |
Update the right-hand side λ during the forward solve phase of CR after computing block iY of λ at level l, subtracting the product Y(iY) λ(iY) from the block of λ in the same row as Y(iY).
The product Y(iY) λ(iY) is stored in the workspace w to allow it to be computed concurrently with solve_u_forward, which updates the same block of λ. The stride parameter specifies the stride between consecutive blocks of λ.
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::solve_λ_forward | ( | index_t | l, |
| index_t | iL, | ||
| mut_view<> | λ, | ||
| view<> | w, | ||
| index_t | stride ) const |
Apply the updates to block iL of the right-hand side from solve_u_forward and solve_y_forward, and then solve with the diagonal block L for the next level l+1.
The stride parameter specifies the stride between consecutive blocks of λ.
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::factor_pcr | ( | ) |
Compute the parallel cyclic reduction factorization of the final block tridiagonal system of size v.
[PCR factor serial]
The function assumes that the Cholesky factors of the diagonal blocks are stored in cr_L.batch(0), and the subdiagonal blocks are stored in cr_Y.batch(0). This variant computes both subdiagonal blocks on a single thread.
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::factor_pcr_level | ( | ) |
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::factor_pcr_parallel | ( | Context & | ctx | ) |
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::factor_pcr_level_parallel | ( | Context & | ctx | ) |
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::solve_pcr | ( | mut_batch_view<> | λ, |
| mut_batch_view<> | work_pcr ) const |
|
inline |
Solve a linear system with the final block tridiagonal system of size v using the PCR factorization.
[PCR factor]
[Cyqlone solve PCR]
Definition at line 452 of file cyqlone.hpp.
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::solve_pcr_level | ( | mut_batch_view<> | λ, |
| mut_batch_view<> | work_pcr ) const |
| auto cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::mul_Mv | ( | batch_view<> | p, |
| mut_batch_view<> | Mp, | ||
| batch_view< default_order > | L, | ||
| batch_view< default_order > | K ) const |
Multiply a vector by the final block tridiagonal matrix of size v.
The matrix is represented by the Cholesky factors of the diagonal blocks L and the subdiagonal blocks K.
| auto cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::mul_precond | ( | batch_view<> | r, |
| mut_batch_view<> | z, | ||
| mut_batch_view<> | w, | ||
| batch_view< default_order > | L, | ||
| batch_view< default_order > | K ) const |
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::solve_pcg | ( | mut_batch_view<> | λ, |
| mut_batch_view<> | work_pcg ) const |
Solve a linear system with the final block tridiagonal system of size v using the preconditioned conjugate gradient method.
The function assumes that the Cholesky factors of the diagonal blocks are stored in cr_L.batch(0), and the subdiagonal blocks are stored in cr_Y.batch(0).
|
inline |
Solve a linear system with the final block tridiagonal system of size v using the preconditioned conjugate gradient method.
The function assumes that the Cholesky factors of the diagonal blocks are stored in cr_L.batch(0), and the subdiagonal blocks are stored in cr_Y.batch(0).
Definition at line 484 of file cyqlone.hpp.
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::solve_reverse_parallel | ( | Context & | ctx, |
| mut_view<> | λ, | ||
| mut_view<> | work, | ||
| index_t | stride ) const |
[Cyqlone solve CR]
Definition at line 179 of file factor.tpp.
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::solve_reverse_serial | ( | mut_view<> | λ, |
| mut_view<> | work, | ||
| index_t | stride ) const |
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::solve_u_backward | ( | index_t | l, |
| index_t | iU, | ||
| mut_view<> | λ, | ||
| mut_view<> | w, | ||
| index_t | stride ) const |
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::solve_y_backward | ( | index_t | l, |
| index_t | iY, | ||
| mut_view<> | λ, | ||
| index_t | stride ) const |
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::solve_λ_backward | ( | index_t | biL, |
| mut_view<> | λ, | ||
| view<> | w, | ||
| index_t | stride ) const |
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::set_thread_update_rank | ( | Context & | ctx, |
| index_t | c, | ||
| index_t | m ) |
[Cyqlone update Riccati]
Definition at line 539 of file update.tpp.
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::set_update_rank_extra | ( | index_t | m | ) |
Definition at line 547 of file update.tpp.
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::clear_update_rank_extra | ( | ) |
Definition at line 552 of file update.tpp.
|
nodiscard |
Definition at line 558 of file update.tpp.
|
nodiscard |
Definition at line 573 of file update.tpp.
|
nodiscard |
Definition at line 588 of file update.tpp.
|
nodiscard |
Definition at line 593 of file update.tpp.
|
nodiscard |
Definition at line 604 of file update.tpp.
|
nodiscard |
Definition at line 612 of file update.tpp.
|
nodiscard |
Definition at line 620 of file update.tpp.
|
nodiscard |
Definition at line 628 of file update.tpp.
|
nodiscard |
Definition at line 636 of file update.tpp.
|
nodiscard |
Definition at line 643 of file update.tpp.
|
nodiscard |
Definition at line 650 of file update.tpp.
|
nodiscard |
Definition at line 657 of file update.tpp.
|
nodiscard |
Definition at line 668 of file update.tpp.
|
nodiscard |
Definition at line 679 of file update.tpp.
|
nodiscard |
Definition at line 689 of file update.tpp.
|
nodiscard |
Definition at line 699 of file update.tpp.
|
nodiscard |
Definition at line 706 of file update.tpp.
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::update_solve_cr | ( | Context & | ctx, |
| mut_view<> | λ, | ||
| index_t | stride ) |
[Cyqlone update CR]
Definition at line 297 of file update.tpp.
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::update_L | ( | index_t | l, |
| index_t | i ) |
[Cyqlone update CR helper]
Definition at line 23 of file update.tpp.
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::update_U | ( | index_t | l, |
| index_t | i ) |
Definition at line 123 of file update.tpp.
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::update_Y | ( | index_t | l, |
| index_t | i ) |
Definition at line 157 of file update.tpp.
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::update_pcr_level | ( | index_t | m, |
| mut_batch_view<> | WYU, | ||
| mut_batch_view<> | WΣ ) |
Definition at line 198 of file update.tpp.
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::update_pcr | ( | batch_view<> | fwd, |
| batch_view<> | bwd, | ||
| batch_view<> | Σbwd ) |
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::prefetch | ( | batch_view< O > | X | ) | const |
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::prefetch_L | ( | batch_view< O > | X | ) | const |
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::prefetch_L | ( | index_t | bi | ) | const |
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::prefetch_U | ( | index_t | l, |
| index_t | iU ) const |
| void cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::prefetch_Y | ( | index_t | l, |
| index_t | iY ) const |
| const index_t cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::block_size |
Block size of the block-tridiagonal system.
Definition at line 75 of file cyqlone.hpp.
| const index_t cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::max_rank = 0 |
Maximum update rank.
Definition at line 76 of file cyqlone.hpp.
| bool cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::circular = false |
Whether the block-tridiagonal system is circular (nonzero top-right & bottom-left corners).
Definition at line 79 of file cyqlone.hpp.
| Params cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::params {} |
Solver parameters for Tricyqle-specific settings.
Definition at line 87 of file cyqlone.hpp.
| const index_t cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::p = 8 |
Number of processors/threads.
Definition at line 101 of file cyqlone.hpp.
|
staticconstexpr |
Vector length.
Definition at line 103 of file cyqlone.hpp.
|
staticconstexpr |
Default storage order for most matrices.
Definition at line 146 of file cyqlone.hpp.
|
staticconstexpr |
Column-major storage order for column vectors and update matrices.
Definition at line 148 of file cyqlone.hpp.
| matrix<default_order> cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::cr_L |
Diagonal blocks of the Cholesky factor of the Schur complement (used during CR).
Batch indices correspond to block column indices of the block tridiagonal system.
Definition at line 272 of file cyqlone.hpp.
| matrix<default_order> cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::cr_U |
Subdiagonal blocks U of the Cholesky factor of the Schur complement (used during CR).
Batch indices correspond to block column indices of the block tridiagonal system.
Definition at line 277 of file cyqlone.hpp.
| matrix<default_order> cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::cr_Y |
Subdiagonal blocks Y of the Cholesky factor of the Schur complement (used during CR).
Batch indices correspond to block column indices of the block tridiagonal system.
Definition at line 282 of file cyqlone.hpp.
| matrix<column_major> cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::work_cr |
Temporary workspace for the CR solve phase.
Definition at line 286 of file cyqlone.hpp.
| matrix<default_order> cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::pcr_L |
Diagonal blocks of the PCR Cholesky factorizations.
Definition at line 296 of file cyqlone.hpp.
| matrix<default_order> cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::pcr_Y |
Subdiagonal blocks Y of the PCR Cholesky factorizations.
Definition at line 301 of file cyqlone.hpp.
| matrix<default_order> cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::pcr_U |
Subdiagonal blocks U of the PCR Cholesky factorizations.
Definition at line 305 of file cyqlone.hpp.
| matrix<default_order> cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::pcr_M |
Workspace to store the diagonal blocks during the PCR factorization.
Definition at line 309 of file cyqlone.hpp.
| matrix<column_major> cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::work_pcg |
Temporary workspace for CG vectors.
Definition at line 313 of file cyqlone.hpp.
| std::vector<index_t> cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::m_update = std::vector<index_t>(p) |
Update rank (number of changing constraints) per thread.
Definition at line 323 of file cyqlone.hpp.
| index_t cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::m_update_u0 = -1 |
Update rank from D(0). Negative if D(0) is not handled separately.
Definition at line 325 of file cyqlone.hpp.
| matrix<column_major> cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::work_update_Σ |
Compressed reprentation of the nonzero diagonal elements of the matrix Σ.
Definition at line 328 of file cyqlone.hpp.
| matrix<column_major> cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::work_update |
Workspace to store the update matrices Ξ(Υ) for the factorization update.
Definition at line 332 of file cyqlone.hpp.
| matrix<column_major> cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::work_hyh |
Storage for the hyperbolic Householder transformations.
Definition at line 336 of file cyqlone.hpp.
| matrix<column_major> cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::work_update_pcr_Σ |
Two copies of work_update_Σ for PCR updates.
Definition at line 343 of file cyqlone.hpp.
| matrix<column_major> cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::work_update_pcr_L |
Update matrices to apply to the diagonal blocks L during PCR updates.
Definition at line 347 of file cyqlone.hpp.
| matrix<column_major> cyqlone::TricyqleSolver< VL, T, DefaultOrder, Ctx >::work_update_pcr_UY |
Update matrices to apply to the subdiagonal blocks U and Y during PCR updates.
Definition at line 351 of file cyqlone.hpp.
|
staticconstexpr |
Definition at line 428 of file cyqlone.hpp.