template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
class cyqlone::TreeBarrier< CompletionFn, PhaseType >
Fairly vanilla combining tree barrier.
It is inspired by GCC 15.2's __tree_barrier, with some important API differences:
- Every thread has a unique thread ID in [0, expected-1]. This eliminates the need for hashing the pthread thread IDs and for the inner search loop to find free slots in the tree.
- Wait tries to spin for a given number of iterations before falling back to a futex-based atomic wait.
- The barrier phase is exposed to the user.
- Custom completion functions can be provided at arrival time.
- Reductions and broadcasts on small values are supported.
Definition at line 46 of file barrier.hpp.
|
| | TreeBarrier (uint32_t expected, CompletionFn completion) |
| | Create a barrier with expected participating threads and a completion function that is called by the last thread that arrives at each phase.
|
| | TreeBarrier (const TreeBarrier &)=delete |
| | TreeBarrier (TreeBarrier &&)=default |
| TreeBarrier & | operator= (const TreeBarrier &)=delete |
| TreeBarrier & | operator= (TreeBarrier &&)=default |
| template<class C> |
| arrival_token | arrive_with_completion (uint32_t thread_id, C &&custom_completion) |
| | Arrive at the barrier with a custom completion function that is called by the last thread that arrives, before advancing the barrier phase and notifying all waiting threads.
|
| arrival_token | arrive (uint32_t thread_id) |
| | Arrive at the barrier.
|
| arrival_token | arrive (uint32_t thread_id, int line) |
| | Arrive at the barrier, recording the given line number for sanity checking to make sure that all threads arrive from the same line or statement in the source code.
|
| BarrierPhase | current_phase () const |
| | Query the current barrier phase.
|
| bool | wait_may_block (const arrival_token &token) const noexcept |
| | Check if wait() may block.
|
| void | wait (arrival_token &&token) const |
| | Wait for the barrier to complete after an arrival, using the given token.
|
| void | arrive_and_wait (uint32_t thread_id) |
| | Convenience function to arrive and wait in a single call.
|
| void | arrive_and_wait (uint32_t thread_id, int line) |
| | Convenience function to arrive and wait in a single call (with optional sanity check).
|
| template<class C> |
| void | arrive_and_wait_with_completion (uint32_t thread_id, C &&custom_completion) |
| | Convenience function to arrive and wait in a single call (with custom completion).
|
| template<class C> |
| auto | arrive_and_wait_with_completion (uint32_t thread_id, C &&custom_completion) |
| | Convenience function to arrive and wait in a single call (with custom completion).
|
| template<class T, class F> |
| arrival_token_typed< T > | arrive_reduce (uint32_t thread_id, T x, F reduce) |
| | Combining tree reduction across all threads.
|
| template<class T> |
| T | wait_reduce (arrival_token_typed< T > &&token) |
| | Wait for the result of an arrive_reduce call and obtain the reduced value.
|
| template<class T, class F> |
| T | reduce (uint32_t thread_id, T x, F reduce) |
| | Combining tree reduction across all threads.
|
| template<class T> |
| T | broadcast (uint32_t thread_id, T &&x, uint32_t src=0) |
| | Broadcast a value from the source thread to all other threads.
|
template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
Combining tree arrival.
The last thread arriving at a certain ticket (counter) moves on to the next level of the tree. When reaching the root, it returns true. The number of tickets halves at each level, with at most two threads per ticket.
Definition at line 128 of file barrier.hpp.
template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
template<class T, class F>
Fused implementation of the combining tree arrival and a reduction operation.
The last thread arriving at a certain ticket (counter) moves on to the next level of the tree. When it does so, it reads the value written by the other thread that arrived at the same ticket, applies the reduction function, and writes the result to be used in the next level. When reaching the root, it stores the final value and returns true. Note that the left and right arguments to the reduction function are determined by the thread IDs, regardless of the order in which threads arrive. In other words, for a given number of threads, the order of the reduction operations is fully deterministic.
Definition at line 159 of file barrier.hpp.
template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
template<class A, class C>
Generic implementation of arrive with custom completion function.
The arrival function should return true when the thread is the last to arrive at the root of the tree. Returns a token that can be used to wait for the barrier to complete. The custom completion function is called by the last thread arriving at the root, before advancing the barrier phase and notifying all waiting threads.
Definition at line 202 of file barrier.hpp.
template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
template<class C>
Arrive at the barrier with a custom completion function that is called by the last thread that arrives, before advancing the barrier phase and notifying all waiting threads.
The completion function of the barrier is not called in this case. Each thread should use a unique thread ID in [0, expected-1].
Definition at line 249 of file barrier.hpp.
template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
Arrive at the barrier.
The barrier's completion function is called by the last thread that arrives, before advancing the barrier phase and notifying all waiting threads. Each thread should use a unique thread ID in [0, expected-1].
Definition at line 259 of file barrier.hpp.
template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
Arrive at the barrier, recording the given line number for sanity checking to make sure that all threads arrive from the same line or statement in the source code.
This is useful for debugging purposes to detect mismatched barrier calls, but should not really be used otherwise. If CYQLONE_SANITY_CHECKS_BARRIER is disabled, the line number is ignored and this function is equivalent to arrive(uint32_t). Each thread should use a unique thread ID in [0, expected-1].
Definition at line 269 of file barrier.hpp.
template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
Check if wait() may block.
If it returns false, the caller can call wait() and it will return immediately without spinning or sleeping. This is useful if the caller has other non-critical work to do while waiting for other threads. Users should still call wait() before arriving again.
- Note
- This function does not impose any memory ordering, so even when it returns false, changes made before the arrival of other threads may not be visible yet. In contrast, wait() does ensure proper synchronization.
Definition at line 300 of file barrier.hpp.
template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
Wait for the barrier to complete after an arrival, using the given token.
Separating the arrival and wait phases allows for overlapping computation with waiting, hiding the synchronization latency. Waiting on the same token multiple times is not allowed.
Definition at line 308 of file barrier.hpp.
template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
template<class C>
| auto cyqlone::TreeBarrier< CompletionFn, PhaseType >::arrive_and_wait_with_completion |
( |
uint32_t | thread_id, |
|
|
C && | custom_completion ) |
|
inlinenodiscard |
Convenience function to arrive and wait in a single call (with custom completion).
Broadcasts the return value of the custom completion function to all threads.
Definition at line 336 of file barrier.hpp.
template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
template<class T, class F>
Combining tree reduction across all threads.
Deterministic application order for a given number of threads.
Definition at line 348 of file barrier.hpp.
template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
template<class T, class F>
Combining tree reduction across all threads.
Deterministic application order for a given number of threads.
Definition at line 365 of file barrier.hpp.