|
cyqlone
develop
Fast, parallel and vectorized solver for linear systems with optimal control structure.
|
#include <cyqlone/parallel.hpp>
Thread context for parallel execution.
Each thread has a unique thread index, and can synchronize and communicate with other threads in the same shared context.
Definition at line 64 of file parallel.hpp.
Public Types | |
| using | shared_context_type = SC |
| using | arrival_token = typename shared_context_type::barrier_type::arrival_token |
Public Member Functions | |
| bool | is_master () const |
| Check if this thread is the master thread (thread index 0). | |
| arrival_token | arrive () |
| Arrive at the barrier and obtain a token that can be used to wait for completion of the current barrier phase. | |
| void | wait (arrival_token &&token) |
| Await a token returned by arrive(), waiting for the barrier phase to complete. | |
| void | arrive_and_wait () |
| Arrive at the barrier and wait for the barrier phase to complete. | |
| void | arrive_and_wait (int line) |
| Debug version of arrive_and_wait() that performs a sanity check to ensure that all threads are arriving at the same line of code. | |
| template<class T> | |
| T | broadcast (T x, index_t src=0) |
Broadcast a value x from the thread with index src to all threads. | |
| template<class F, class... Args> | |
| auto | call_broadcast (F &&f, Args &&...args) -> std::invoke_result_t< F, Args... > |
Call a function f with the given args on a single thread and broadcast the return value to all threads. | |
| template<class T, class F> | |
| auto | arrive_reduce (T x, F func) |
Perform a reduction of x across all threads using the given binary function func. | |
| template<class T> | |
| T | wait_reduce (shared_context_type::barrier_type::template arrival_token_typed< T > &&token) |
| Wait for the reduction initiated by arrive_reduce() to complete and obtain the reduced value. | |
| template<class T, class F> | |
| T | reduce (T x, F func) |
Perform a reduction of x across all threads using the given binary function func, and wait for the result. | |
| template<class T> | |
| T | reduce (T x) |
| Reduction with std::plus, i.e., summation across all threads. | |
| template<class F> | |
| void | run_single_sync (F &&f) |
| Wait for all threads to reach this point, then run the given function on a single thread before releasing all threads again. | |
Public Attributes | |
| shared_context_type & | shared |
| const index_t | index |
| const index_t | num_thr = shared.num_thr |
Friends | |
| constexpr bool | operator== (const Context &a, const Context &b) |
| using cyqlone::parallel::Context< SC >::shared_context_type = SC |
Definition at line 65 of file parallel.hpp.
| using cyqlone::parallel::Context< SC >::arrival_token = typename shared_context_type::barrier_type::arrival_token |
Definition at line 73 of file parallel.hpp.
|
inlinenodiscard |
Check if this thread is the master thread (thread index 0).
Useful for determining which thread should perform operations like printing to the console, which should be done by a single thread and does not require synchronization.
Definition at line 86 of file parallel.hpp.
|
inlinenodiscard |
Arrive at the barrier and obtain a token that can be used to wait for completion of the current barrier phase.
Definition at line 91 of file parallel.hpp.
|
inline |
Await a token returned by arrive(), waiting for the barrier phase to complete.
Definition at line 100 of file parallel.hpp.
|
inline |
Arrive at the barrier and wait for the barrier phase to complete.
This is a convenience wrapper around arrive() and wait() for the common case where the thread does not have other work to do while waiting.
Definition at line 112 of file parallel.hpp.
|
inline |
Debug version of arrive_and_wait() that performs a sanity check to ensure that all threads are arriving at the same line of code.
The line parameter should be the same for all threads arriving at the same barrier. It is only verified in debug builds, and is equivalent to arrive_and_wait() in release builds.
Definition at line 122 of file parallel.hpp.
|
inline |
Broadcast a value x from the thread with index src to all threads.
Definition at line 131 of file parallel.hpp.
|
inline |
Call a function f with the given args on a single thread and broadcast the return value to all threads.
Definition at line 139 of file parallel.hpp.
|
inlinenodiscard |
Perform a reduction of x across all threads using the given binary function func.
Returns a token that can be used to wait for the reduction to complete and obtain the reduced value.
Definition at line 154 of file parallel.hpp.
|
inline |
Wait for the reduction initiated by arrive_reduce() to complete and obtain the reduced value.
Definition at line 162 of file parallel.hpp.
|
inline |
Perform a reduction of x across all threads using the given binary function func, and wait for the result.
Definition at line 169 of file parallel.hpp.
Reduction with std::plus, i.e., summation across all threads.
Definition at line 176 of file parallel.hpp.
|
inline |
Wait for all threads to reach this point, then run the given function on a single thread before releasing all threads again.
Changes by all threads are visible during the call to f and changes made by f are visible to all threads after this function returns.
Definition at line 184 of file parallel.hpp.
Definition at line 79 of file parallel.hpp.
| shared_context_type& cyqlone::parallel::Context< SC >::shared |
Definition at line 76 of file parallel.hpp.
| const index_t cyqlone::parallel::Context< SC >::index |
Definition at line 77 of file parallel.hpp.
| const index_t cyqlone::parallel::Context< SC >::num_thr = shared.num_thr |
Definition at line 77 of file parallel.hpp.