Adi Levin's Blog for programmers

January 23, 2010

Threads Barrier

Filed under: Multithreading — Adi Levin @ 9:23 am
Tags: , , ,

The purpose of a Threads Barrier is to synchronize a collection of threads. Many parallel computing programs are implemented by the SPMD model (Single Program Multiple Data). In this model, a number of threads (or computers) run the same function (or program), but each thread is responsible for handling a different part of the data.  

This is also the model that is used by OpenMP programs. OpenMP invokes a number of threads that run the same piece of code, where each thread processes a different part of the data. 

In such programs we often need to make sure that all threads arrive at a certain point of the code together. We call such a point a Barrier. 

A double-sided Barrier with callbacks 

In the following, I describe a specific implementation of Barrier, that has the following properties: 

  1. The barrier is double-sided. This means that it has an entrance and an exit. We first wait until all threads enter the barrier, and then wait until all threads exit the barrier.
  2. When all threads are inside the barrier, we can run two callback functions – one that is invoked by a single thread only, and one that is invoked by all threads.

The advantage of the two-sided barrier over a one-sided barrier is that 

  1. In a one-sided barrier, that only waits until all threads pass a certain point, it is more difficult to make sure that the barrier object is reusable. To reuse the barrier, we need to make sure that all threads have exited the barrier. In a two-sided barrier we handle that explicitly, so the barrier can be reused freely.
  2. Callbacks are more useful in the two-sided barrier – because there is a period in time when all the threads are guaranteed to be inside the boundaries of the barrier (between the entrance and the exit). Functions that are used as callbacks during that time can safely make assumptions about the position of other threads.

Initialization of the barrier

    volatile long num_of_threads_that_entered_barrier = 0;
    volatile long num_of_threads_that_exited_barrier = 0;
    HANDLE entrance_semaphore = CreateSemaphore(NULL,0,4096,NULL);
    HANDLE exit_semaphore = CreateSemaphore(NULL,0,4096,NULL);

The barrier main function

void threads_barrier(int num_of_threads

    LONG prev; 

    // Wait until all threads enter the barrier 
    if (InterlockedIncrement(&num_of_threads_that_entered_barrier)<num_of_threads)
        WaitForSingleObject(entrance_semaphore,INFINITE);
    else { 
        num_of_threads_that_exited_barrier = 0;
         single_thread_callback();
        ReleaseSemaphore(entrance_semaphore,num_of_threads-1,&prev);
    } 

    all_threads_callback();

    // Wait until all threads exit the barrier 
    if (InterlockedIncrement(&num_of_threads_that_exited_barrier)<num_of_threads)
        WaitForSingleObject(exit_semaphore,INFINITE);
    else {
        num_of_threads_that_entered_barrier = 0;
        ReleaseSemaphore(exit_semaphore,num_of_threads-1,&prev);
    }
}

Explanation

The function threads_barrier starts by waiting for all threads to enter the barrier. We use interlocked functions to increment a counter num_of_threads_that_entered_barrier. If it is below the number of threads, we wait. If we reach the given number of threads, we know that we have (num_of_threads-1) threads that are waiting, so we release them by a call to ReleaseSemaphore. A similar mechanism is used in the exit. For an explanation of interlocked functions and semaphore, please refer to my post on locking mechanisms.

The function single_thread_callback() is invoked in only one thread (the last one to enter the barrier), before the call ReleaseSemaphore.  Thus it can assume that the other threads are not doing anything during that time. This makes it suitable for running initialization code that accesses shared data.

The function all_threads_callback() is invoked in all threads, after all have entered and before any threads has left the barrier. This makes it suitable for running thread-specific initialization code that only accesses thread-local data.

To guarantee reusability of the barrier, we clear the value num_of_threads_that_exited_barrier to zero only after all threads have entered the barrier. This way we know that no thread is currently in the process of exiting the barrier. In the same way, we only clear the value of num_of_threads_that_entered_barrier after all threads have exited the barrier.

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: