cyqlone develop
Fast, parallel and vectorized solver for linear systems with optimal control structure.
Loading...
Searching...
No Matches
cyqlone::TreeBarrier< CompletionFn, PhaseType > Class Template Reference

#include <cyqlone/barrier.hpp>

Detailed Description

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
class cyqlone::TreeBarrier< CompletionFn, PhaseType >

Fairly vanilla combining tree barrier.

It is inspired by GCC 15.2's __tree_barrier, with some important API differences:

  • Every thread has a unique thread ID in [0, expected-1]. This eliminates the need for hashing the pthread thread IDs and for the inner search loop to find free slots in the tree.
  • Wait tries to spin for a given number of iterations before falling back to a futex-based atomic wait.
  • The barrier phase is exposed to the user.
  • Custom completion functions can be provided at arrival time.
  • Reductions and broadcasts on small values are supported.

Definition at line 46 of file barrier.hpp.

Classes

class  arrival_token
class  arrival_token_typed
struct  Storage
 Storage for small values used in reductions and broadcasts. More...
struct  State
 Atomic counters for each level of the combining tree. More...

Public Types

enum class  BarrierPhase : PhaseType

Public Member Functions

 TreeBarrier (uint32_t expected, CompletionFn completion)
 Create a barrier with expected participating threads and a completion function that is called by the last thread that arrives at each phase.
 TreeBarrier (const TreeBarrier &)=delete
 TreeBarrier (TreeBarrier &&)=default
TreeBarrieroperator= (const TreeBarrier &)=delete
TreeBarrieroperator= (TreeBarrier &&)=default
template<class C>
arrival_token arrive_with_completion (uint32_t thread_id, C &&custom_completion)
 Arrive at the barrier with a custom completion function that is called by the last thread that arrives, before advancing the barrier phase and notifying all waiting threads.
arrival_token arrive (uint32_t thread_id)
 Arrive at the barrier.
arrival_token arrive (uint32_t thread_id, int line)
 Arrive at the barrier, recording the given line number for sanity checking to make sure that all threads arrive from the same line or statement in the source code.
BarrierPhase current_phase () const
 Query the current barrier phase.
bool wait_may_block (const arrival_token &token) const noexcept
 Check if wait() may block.
void wait (arrival_token &&token) const
 Wait for the barrier to complete after an arrival, using the given token.
void arrive_and_wait (uint32_t thread_id)
 Convenience function to arrive and wait in a single call.
void arrive_and_wait (uint32_t thread_id, int line)
 Convenience function to arrive and wait in a single call (with optional sanity check).
template<class C>
void arrive_and_wait_with_completion (uint32_t thread_id, C &&custom_completion)
 Convenience function to arrive and wait in a single call (with custom completion).
template<class C>
auto arrive_and_wait_with_completion (uint32_t thread_id, C &&custom_completion)
 Convenience function to arrive and wait in a single call (with custom completion).
template<class T, class F>
arrival_token_typed< T > arrive_reduce (uint32_t thread_id, T x, F reduce)
 Combining tree reduction across all threads.
template<class T>
wait_reduce (arrival_token_typed< T > &&token)
 Wait for the result of an arrive_reduce call and obtain the reduced value.
template<class T, class F>
reduce (uint32_t thread_id, T x, F reduce)
 Combining tree reduction across all threads.
template<class T>
broadcast (uint32_t thread_id, T &&x, uint32_t src=0)
 Broadcast a value from the source thread to all other threads.

Static Public Member Functions

static constexpr uint32_t max ()
 Maximum number of threads supported by this barrier implementation.

Public Attributes

uint32_t spin_count = 1000
 Number of spin iterations before falling back to futex-based wait.

Private Types

using ticket_value_type = typename State::ticket_t::value_type

Private Member Functions

State::ticket_tget_local_phase (uint32_t thread_id) noexcept
State::ticket_tget_local_line (uint32_t thread_id) noexcept
void sanity_check_arrival (uint32_t thread_id, BarrierPhase cur_phase) noexcept
bool arrive_impl (BarrierPhase old_phase, uint32_t thread_id)
 Combining tree arrival.
template<class T, class F>
bool arrive_impl (BarrierPhase old_phase, uint32_t thread_id, T value, F reduce)
 Fused implementation of the combining tree arrival and a reduction operation.
template<class A, class C>
arrival_token arrive_with_completion (uint32_t thread_id, A arrival, C &&custom_completion)
 Generic implementation of arrive with custom completion function.

Private Attributes

uint32_t expected
 Number of participating threads.
std::unique_ptr< State[]> state
 Combining tree state.
std::unique_ptr< Storage[]> storage
 Used for reductions.
Storage broadcast_storage
 Used for broadcasts (including after reductions).
CompletionFn completion
 Called when last thread arrives.
std::atomic< BarrierPhasephase {}

Static Private Attributes

static constexpr size_t cache_line_size = 64

Member Typedef Documentation

◆ ticket_value_type

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
using cyqlone::TreeBarrier< CompletionFn, PhaseType >::ticket_value_type = typename State::ticket_t::value_type
private

Definition at line 102 of file barrier.hpp.

Member Enumeration Documentation

◆ BarrierPhase

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
enum class cyqlone::TreeBarrier::BarrierPhase : PhaseType
strong

Definition at line 48 of file barrier.hpp.

Constructor & Destructor Documentation

◆ TreeBarrier() [1/3]

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
cyqlone::TreeBarrier< CompletionFn, PhaseType >::TreeBarrier ( uint32_t expected,
CompletionFn completion )
inline

Create a barrier with expected participating threads and a completion function that is called by the last thread that arrives at each phase.

Definition at line 231 of file barrier.hpp.

◆ TreeBarrier() [2/3]

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
cyqlone::TreeBarrier< CompletionFn, PhaseType >::TreeBarrier ( const TreeBarrier< CompletionFn, PhaseType > & )
delete

◆ TreeBarrier() [3/3]

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
cyqlone::TreeBarrier< CompletionFn, PhaseType >::TreeBarrier ( TreeBarrier< CompletionFn, PhaseType > && )
default

Member Function Documentation

◆ get_local_phase()

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
State::ticket_t & cyqlone::TreeBarrier< CompletionFn, PhaseType >::get_local_phase ( uint32_t thread_id)
inlineprivatenoexcept

Definition at line 112 of file barrier.hpp.

◆ get_local_line()

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
State::ticket_t & cyqlone::TreeBarrier< CompletionFn, PhaseType >::get_local_line ( uint32_t thread_id)
inlineprivatenoexcept

Definition at line 115 of file barrier.hpp.

◆ sanity_check_arrival()

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
void cyqlone::TreeBarrier< CompletionFn, PhaseType >::sanity_check_arrival ( uint32_t thread_id,
BarrierPhase cur_phase )
inlineprivatenoexcept

Definition at line 118 of file barrier.hpp.

◆ arrive_impl() [1/2]

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
bool cyqlone::TreeBarrier< CompletionFn, PhaseType >::arrive_impl ( BarrierPhase old_phase,
uint32_t thread_id )
inlineprivate

Combining tree arrival.

The last thread arriving at a certain ticket (counter) moves on to the next level of the tree. When reaching the root, it returns true. The number of tickets halves at each level, with at most two threads per ticket.

Definition at line 128 of file barrier.hpp.

◆ arrive_impl() [2/2]

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
template<class T, class F>
bool cyqlone::TreeBarrier< CompletionFn, PhaseType >::arrive_impl ( BarrierPhase old_phase,
uint32_t thread_id,
T value,
F reduce )
inlineprivate

Fused implementation of the combining tree arrival and a reduction operation.

The last thread arriving at a certain ticket (counter) moves on to the next level of the tree. When it does so, it reads the value written by the other thread that arrived at the same ticket, applies the reduction function, and writes the result to be used in the next level. When reaching the root, it stores the final value and returns true. Note that the left and right arguments to the reduction function are determined by the thread IDs, regardless of the order in which threads arrive. In other words, for a given number of threads, the order of the reduction operations is fully deterministic.

Definition at line 159 of file barrier.hpp.

◆ arrive_with_completion() [1/2]

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
template<class A, class C>
arrival_token cyqlone::TreeBarrier< CompletionFn, PhaseType >::arrive_with_completion ( uint32_t thread_id,
A arrival,
C && custom_completion )
inlinenodiscardprivate

Generic implementation of arrive with custom completion function.

The arrival function should return true when the thread is the last to arrive at the root of the tree. Returns a token that can be used to wait for the barrier to complete. The custom completion function is called by the last thread arriving at the root, before advancing the barrier phase and notifying all waiting threads.

Definition at line 202 of file barrier.hpp.

◆ max()

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
constexpr uint32_t cyqlone::TreeBarrier< CompletionFn, PhaseType >::max ( )
inlinestaticconstexpr

Maximum number of threads supported by this barrier implementation.

Definition at line 220 of file barrier.hpp.

◆ operator=() [1/2]

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
TreeBarrier & cyqlone::TreeBarrier< CompletionFn, PhaseType >::operator= ( const TreeBarrier< CompletionFn, PhaseType > & )
delete

◆ operator=() [2/2]

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
TreeBarrier & cyqlone::TreeBarrier< CompletionFn, PhaseType >::operator= ( TreeBarrier< CompletionFn, PhaseType > && )
default

◆ arrive_with_completion() [2/2]

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
template<class C>
arrival_token cyqlone::TreeBarrier< CompletionFn, PhaseType >::arrive_with_completion ( uint32_t thread_id,
C && custom_completion )
inlinenodiscard

Arrive at the barrier with a custom completion function that is called by the last thread that arrives, before advancing the barrier phase and notifying all waiting threads.

The completion function of the barrier is not called in this case. Each thread should use a unique thread ID in [0, expected-1].

Definition at line 249 of file barrier.hpp.

◆ arrive() [1/2]

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
arrival_token cyqlone::TreeBarrier< CompletionFn, PhaseType >::arrive ( uint32_t thread_id)
inlinenodiscard

Arrive at the barrier.

The barrier's completion function is called by the last thread that arrives, before advancing the barrier phase and notifying all waiting threads. Each thread should use a unique thread ID in [0, expected-1].

Definition at line 259 of file barrier.hpp.

◆ arrive() [2/2]

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
arrival_token cyqlone::TreeBarrier< CompletionFn, PhaseType >::arrive ( uint32_t thread_id,
int line )
inlinenodiscard

Arrive at the barrier, recording the given line number for sanity checking to make sure that all threads arrive from the same line or statement in the source code.

This is useful for debugging purposes to detect mismatched barrier calls, but should not really be used otherwise. If CYQLONE_SANITY_CHECKS_BARRIER is disabled, the line number is ignored and this function is equivalent to arrive(uint32_t). Each thread should use a unique thread ID in [0, expected-1].

Definition at line 269 of file barrier.hpp.

◆ current_phase()

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
BarrierPhase cyqlone::TreeBarrier< CompletionFn, PhaseType >::current_phase ( ) const
inlinenodiscard

Query the current barrier phase.

May wrap around on overflow, but all threads will see the same phase values in the same order.

Definition at line 286 of file barrier.hpp.

◆ wait_may_block()

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
bool cyqlone::TreeBarrier< CompletionFn, PhaseType >::wait_may_block ( const arrival_token & token) const
inlinenoexcept

Check if wait() may block.

If it returns false, the caller can call wait() and it will return immediately without spinning or sleeping. This is useful if the caller has other non-critical work to do while waiting for other threads. Users should still call wait() before arriving again.

Note
This function does not impose any memory ordering, so even when it returns false, changes made before the arrival of other threads may not be visible yet. In contrast, wait() does ensure proper synchronization.

Definition at line 300 of file barrier.hpp.

◆ wait()

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
void cyqlone::TreeBarrier< CompletionFn, PhaseType >::wait ( arrival_token && token) const
inline

Wait for the barrier to complete after an arrival, using the given token.

Separating the arrival and wait phases allows for overlapping computation with waiting, hiding the synchronization latency. Waiting on the same token multiple times is not allowed.

Definition at line 308 of file barrier.hpp.

◆ arrive_and_wait() [1/2]

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
void cyqlone::TreeBarrier< CompletionFn, PhaseType >::arrive_and_wait ( uint32_t thread_id)
inline

Convenience function to arrive and wait in a single call.

Definition at line 321 of file barrier.hpp.

◆ arrive_and_wait() [2/2]

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
void cyqlone::TreeBarrier< CompletionFn, PhaseType >::arrive_and_wait ( uint32_t thread_id,
int line )
inline

Convenience function to arrive and wait in a single call (with optional sanity check).

Definition at line 323 of file barrier.hpp.

◆ arrive_and_wait_with_completion() [1/2]

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
template<class C>
void cyqlone::TreeBarrier< CompletionFn, PhaseType >::arrive_and_wait_with_completion ( uint32_t thread_id,
C && custom_completion )
inline

Convenience function to arrive and wait in a single call (with custom completion).

Definition at line 327 of file barrier.hpp.

◆ arrive_and_wait_with_completion() [2/2]

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
template<class C>
auto cyqlone::TreeBarrier< CompletionFn, PhaseType >::arrive_and_wait_with_completion ( uint32_t thread_id,
C && custom_completion )
inlinenodiscard

Convenience function to arrive and wait in a single call (with custom completion).

Broadcasts the return value of the custom completion function to all threads.

Definition at line 336 of file barrier.hpp.

◆ arrive_reduce()

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
template<class T, class F>
arrival_token_typed< T > cyqlone::TreeBarrier< CompletionFn, PhaseType >::arrive_reduce ( uint32_t thread_id,
T x,
F reduce )
inlinenodiscard

Combining tree reduction across all threads.

Deterministic application order for a given number of threads.

Definition at line 348 of file barrier.hpp.

◆ wait_reduce()

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
template<class T>
T cyqlone::TreeBarrier< CompletionFn, PhaseType >::wait_reduce ( arrival_token_typed< T > && token)
inlinenodiscard

Wait for the result of an arrive_reduce call and obtain the reduced value.

Definition at line 357 of file barrier.hpp.

◆ reduce()

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
template<class T, class F>
T cyqlone::TreeBarrier< CompletionFn, PhaseType >::reduce ( uint32_t thread_id,
T x,
F reduce )
inlinenodiscard

Combining tree reduction across all threads.

Deterministic application order for a given number of threads.

Definition at line 365 of file barrier.hpp.

◆ broadcast()

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
template<class T>
T cyqlone::TreeBarrier< CompletionFn, PhaseType >::broadcast ( uint32_t thread_id,
T && x,
uint32_t src = 0 )
inlinenodiscard

Broadcast a value from the source thread to all other threads.

All threads must call this function with the same source thread ID.

Definition at line 372 of file barrier.hpp.

Member Data Documentation

◆ cache_line_size

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
size_t cyqlone::TreeBarrier< CompletionFn, PhaseType >::cache_line_size = 64
staticconstexprprivate

Definition at line 65 of file barrier.hpp.

◆ expected

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
uint32_t cyqlone::TreeBarrier< CompletionFn, PhaseType >::expected
private

Number of participating threads.

Definition at line 104 of file barrier.hpp.

◆ state

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
std::unique_ptr<State[]> cyqlone::TreeBarrier< CompletionFn, PhaseType >::state
private

Combining tree state.

Definition at line 105 of file barrier.hpp.

◆ storage

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
std::unique_ptr<Storage[]> cyqlone::TreeBarrier< CompletionFn, PhaseType >::storage
private

Used for reductions.

Definition at line 106 of file barrier.hpp.

◆ broadcast_storage

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
Storage cyqlone::TreeBarrier< CompletionFn, PhaseType >::broadcast_storage
private

Used for broadcasts (including after reductions).

Definition at line 107 of file barrier.hpp.

◆ completion

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
CompletionFn cyqlone::TreeBarrier< CompletionFn, PhaseType >::completion
private

Called when last thread arrives.

Definition at line 108 of file barrier.hpp.

◆ phase

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
std::atomic<BarrierPhase> cyqlone::TreeBarrier< CompletionFn, PhaseType >::phase {}
private

Definition at line 109 of file barrier.hpp.

◆ spin_count

template<typename CompletionFn = EmptyCompletion, class PhaseType = uint32_t>
uint32_t cyqlone::TreeBarrier< CompletionFn, PhaseType >::spin_count = 1000

Number of spin iterations before falling back to futex-based wait.

Definition at line 291 of file barrier.hpp.


The documentation for this class was generated from the following file: